Jon Aquino's Mental Garden

Engineering beautiful software jon aquino labs | personal blog

Thursday, December 13, 2007

PHP Regular Expressions: Recursion, Named Capture

Here's a PHP regular expression for matching *nested* parentheses (e.g. blocks of code):

    ((?:[^()]++|\((?1)\))*)

The ?1 is a recursive reference to the regex marked by the outermost parentheses. It is a feature of the PHP regex engine.

See Jeffrey Friedl's Mastering Regular Expressions, 3rd ed., p. 476, "Recursive reference to a set of capturing parentheses".

Another potentially useful regex technique is "named capture":

    ^(?P<protocol>https?)://(?P<host>[^/:]+)(?::(?P<port>\d+))?

Here you can use either $matches[0], $matches[1], $matches[2] or $matches['protocol'], $matches['host'], $matches['port'].

6 Comments:

  • I didn't know about ?1. Is it also in other languages like Perl or Ruby?

    With things like ?1, perhaps regular expressions should now be called context free expressions instead.

    -K

    By Blogger Kaushik, at 12/14/2007 11:45 p.m.  

  • Hi Kaushik - Some regex engines have recursive matching. There are a couple of entries in Friedl's book about recursive matching -

    Perl has a "dynamic regex": (??{perl code})

    Java had ?1 as an undocumented feature until 1.4.2, after which it was removed.

    .NET uses (?<DEPTH>) to achieve something similar.

    By Blogger Jonathan, at 12/15/2007 9:54 a.m.  

  • Interesting! I guess I just didn't venture beyond bare-bones regexes.

    Thanks for your reply!

    By Blogger Kaushik, at 12/15/2007 11:22 p.m.  

  • Regular expression is really wonderful to parsing HTML or matching pattern. I use this a lot when i code. Actually when I learn any new langauge, first of all I first try whether it supports regex or not. I feel ezee when I found that.

    http://icfun.blogspot.com/2008/04/ruby-regular-expression-handling.html

    Here is about ruby regex. This was posted by me when I first learn ruby regex. So it will be helpfull for New coders.

    By Blogger Demon, at 3/29/2009 11:56 a.m.  

  • Good to know - thanks Wolf!

    By Blogger Jonathan, at 3/29/2009 1:12 p.m.  

  • Will recursive regex ever take off? They take the potential of regex for confusion and errors to another dimension. There's an example in this recursive regex tutorial that shows a subtle problem in a recursive expression because of atomic grouping. Anyhow, thanks for spreading the regex "Gospel".

    By Anonymous Anonymous, at 12/15/2011 10:16 a.m.  

Post a Comment

<< Home