You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@forrest.apache.org by Torsten Stolpmann <To...@verit.de> on 2005/12/29 19:14:52 UTC

Fun with Regular expressions. Was: OutOfMemoryException with customized project sitemap

>> There are examples of regexp matchers in the core sitemap. I'm pretty 
>> poor with regular expressions, if you don't know what to put in the 
>> pattern ask here, I'm sure there will be someone who can tell you how 
>> to match
>>
>> **.html but not (**/menu-*.html or **/body-*.html or **/tabs-*.html)
>>
>> (I think they are the only ones you need to avoid).
>>
> 
> So this would be something like ^(?!tab-|menu-|body-).*.html$ and 
> ^.*/(?!tab-|menu-|body-).*.html$ respectivly.
> 
> Unfortunatly jakarta-regexp (which is used inside cocoon) doesn't seem 
> to support the negative lookahead (?!...) and gives me a 
> 'RESyntaxException: Syntax error: Missing operand to closure'.
> 
> This already been reported on the regexp mailing list (See: 
> http://permalink.gmane.org/gmane.comp.jakarta.regexp.user/168).
> 
> Too bad - jakarta-oro supports perl5 regexps.
> 
> I'll go hunting for a supported regexp and will report in later.
> 

Since I promised an update:

A working regular expression (without negative lookahead) is the following:

^(([^t^m^b].*)|((t[^a].*)|(ta[^b].*)|(tab[^\-].*))|((m[^e].*)|(me[^n].*)|(men[^u].*)|(menu[^\-].*))|((b[^o].*)|(bo[^d].*)|(bod[^y].*)|(body[^\-].*)))\.html$

But then again jakarta-regexp leaves me standing in the cold with:

java.lang.StackOverflowError
at org.apache.regexp.RE.matchNodes(Unknown Source)
at org.apache.regexp.RE.matchNodes(Unknown Source)
...
at 
org.apache.cocoon.matching.AbstractRegexpMatcher.preparedMatch(AbstractRegexpMatcher.java:86)

Again jakarta-oro matches this without problems.

*sigh*

Torsten