Stupid Question 63: ’Regular Expression’ syntax, are there variations?
regular expression syntax, does it vary?
I haven’t been using regular expressions a lot, and when I have I have relied on the help from others. Although feeling slightly uncomfortable with the symbols that my brain seems unable to box into some sort of logic, I find regular expression rather neat. They have not been on my to do list, basically because I haven’t considered knowing that a necessity. But while being a bit bored waiting to fall asleep last night, I decided to google.
I knew that regular expression is a sort of search language (this is the best way I can describe it, but I know it’s not a 100% accurate description- read my answer below), and if this is the case,- then there must be variations? Right?
Here is what I learned,
The .NET supports regular expressions which come from Perl 5. It is provided through a library (so does Java and Python), while Perl and Ruby integrate regular expressions in the language itself. There is usually an engine, a regular expression processor that identifies the patterns and does the parsing. These two actions are the minimum, in .Net we can replace as well as search. And yes, there are differences, based on the engine used, and type of integration and ‘language origin’. While I’m not aware of the differences, there is a book called Mastering Regular Expressions that describes Java, PHP, Perl and .Net implementation. I’m going to get this book, and read it, and update this blog post once I know :)
Would be totally awesome if some regex expert would like to share their wisdom :D Comments are supre-welcome!
Comments
The best feature in difference to the PCRE-implementation of RegEx is - for me - the BRACE BALANCING. see this article -> http://blogs.msdn.com/b/bclteam/archive/2005/03/15/396452.aspx it should be mentioned too, that RegEx is not just a language -- it's one of the most prominent exampled of a finite automata in theoretical computer science. written in a formal language, transformed with a parser generator... greetings from germany, robin
Jeffrey Friedl, the author of that book blogs at regex.info/blog and is @jfriedl on Twitter. I tweeted him saying you'd appreciate his input.
Hi Iris, Your question isn't stupid at all. It's important to understand that Regex syntax and semantics vary between different engines. For example, there's a pupular library called Perl Compatible Regular Expressions (PCRE) used in Apache HTTP Server and PHP. It isn't compatible with the regular expressions in Perl. Here's a blog post I wrote about different outcomes for the same regex in Ruby and Java: http://blog.staffannoteberg.com/2011/05/04/regex-syntax-and-semantics-varies In my opinion .NET has excellent support for Regex, one of the best. An example: As far as I know it's the only engine that supports named conditionals: checks whether the named capturing group 'name' matched. It's a shame though that C# doesn't support Regex literals, like e.g. Ruby. The verbatim strings makes it easier than Java, but Regex literals is an important feature in a programming language. Hope you don't mind I'll add a shameless plug for my Regex book prototype. :-) It's an upcoming book, that will be published in 2013 and it explains Regex in a new way with applied mathematics made approachable, TDD stories and loads of illustrations. Right now, I'm looking for reader's feedback: http://www.staffannoteberg.com/regexbook Best // Staffan --- Twitter: @staffannoteberg
I think a better question is, "What are the similarities in the millions of RegEx variations?" Almost every platform/application which supports RegExes uses a slightly different syntax. The only 100% commonality I've seen is dot (match any one character) star (match previous character zero or more times) plus (match previous character one or more times) and brackets (match any of the given characters). Caret (match start of line) and dollar-sign (match end of line) show up in about 80% of regex syntaxes.
Last modified on 2012-10-19