You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@forrest.apache.org by Torsten Stolpmann <To...@verit.de> on 2005/12/29 19:14:52 UTC
Fun with Regular expressions. Was: OutOfMemoryException with customized
project sitemap
>> There are examples of regexp matchers in the core sitemap. I'm pretty
>> poor with regular expressions, if you don't know what to put in the
>> pattern ask here, I'm sure there will be someone who can tell you how
>> to match
>>
>> **.html but not (**/menu-*.html or **/body-*.html or **/tabs-*.html)
>>
>> (I think they are the only ones you need to avoid).
>>
>
> So this would be something like ^(?!tab-|menu-|body-).*.html$ and
> ^.*/(?!tab-|menu-|body-).*.html$ respectivly.
>
> Unfortunatly jakarta-regexp (which is used inside cocoon) doesn't seem
> to support the negative lookahead (?!...) and gives me a
> 'RESyntaxException: Syntax error: Missing operand to closure'.
>
> This already been reported on the regexp mailing list (See:
> http://permalink.gmane.org/gmane.comp.jakarta.regexp.user/168).
>
> Too bad - jakarta-oro supports perl5 regexps.
>
> I'll go hunting for a supported regexp and will report in later.
>
Since I promised an update:
A working regular expression (without negative lookahead) is the following:
^(([^t^m^b].*)|((t[^a].*)|(ta[^b].*)|(tab[^\-].*))|((m[^e].*)|(me[^n].*)|(men[^u].*)|(menu[^\-].*))|((b[^o].*)|(bo[^d].*)|(bod[^y].*)|(body[^\-].*)))\.html$
But then again jakarta-regexp leaves me standing in the cold with:
java.lang.StackOverflowError
at org.apache.regexp.RE.matchNodes(Unknown Source)
at org.apache.regexp.RE.matchNodes(Unknown Source)
...
at
org.apache.cocoon.matching.AbstractRegexpMatcher.preparedMatch(AbstractRegexpMatcher.java:86)
Again jakarta-oro matches this without problems.
*sigh*
Torsten