Jon Aquino's Mental Garden

Engineering beautiful software jon aquino labs | personal blog

Thursday, December 13, 2007

PHP Regular Expressions: Recursion, Named Capture

Here's a PHP regular expression for matching *nested* parentheses (e.g. blocks of code):


The ?1 is a recursive reference to the regex marked by the outermost parentheses. It is a feature of the PHP regex engine.

See Jeffrey Friedl's Mastering Regular Expressions, 3rd ed., p. 476, "Recursive reference to a set of capturing parentheses".

Another potentially useful regex technique is "named capture":


Here you can use either $matches[0], $matches[1], $matches[2] or $matches['protocol'], $matches['host'], $matches['port'].


  • I didn't know about ?1. Is it also in other languages like Perl or Ruby?

    With things like ?1, perhaps regular expressions should now be called context free expressions instead.


    By Blogger Kaushik, at 12/14/2007 11:45 p.m.  

  • Hi Kaushik - Some regex engines have recursive matching. There are a couple of entries in Friedl's book about recursive matching -

    Perl has a "dynamic regex": (??{perl code})

    Java had ?1 as an undocumented feature until 1.4.2, after which it was removed.

    .NET uses (?<DEPTH>) to achieve something similar.

    By Blogger Jonathan, at 12/15/2007 9:54 a.m.  

  • Interesting! I guess I just didn't venture beyond bare-bones regexes.

    Thanks for your reply!

    By Blogger Kaushik, at 12/15/2007 11:22 p.m.  

  • Regular expression is really wonderful to parsing HTML or matching pattern. I use this a lot when i code. Actually when I learn any new langauge, first of all I first try whether it supports regex or not. I feel ezee when I found that.

    Here is about ruby regex. This was posted by me when I first learn ruby regex. So it will be helpfull for New coders.

    By Blogger Demon, at 3/29/2009 11:56 a.m.  

  • Good to know - thanks Wolf!

    By Blogger Jonathan, at 3/29/2009 1:12 p.m.  

  • Will recursive regex ever take off? They take the potential of regex for confusion and errors to another dimension. There's an example in this recursive regex tutorial that shows a subtle problem in a recursive expression because of atomic grouping. Anyhow, thanks for spreading the regex "Gospel".

    By Anonymous Anonymous, at 12/15/2011 10:16 a.m.  

Post a Comment

<< Home