You are viewing a plain text version of this content. The canonical link for it is here.
Posted to oro-dev@jakarta.apache.org by Ed Chidester <ec...@textwise.com> on 2001/06/08 17:19:06 UTC
End Anchor bug on non-Unix platforms
Hi,
I'm posting this here before entering a bug in bugzilla just to make sure
that it's not related to bug #125...
My results were the same in both oro-dev-2.0.2-dev-2 and oro-dev-2.0.3
MULTILINE_MASK patterns that use the end anchor '$' are not matching
non-UNIX files.
(I don't have a Macintosh to test with,
but I've confirmed this with Windows NT)
So, if I have the regular expression
/<matching pattern>$/m
-Slurp a file into a string (using the System's line.separator between lines)
-Try to match the file string (which contains a line that matches)
+ On Solaris, the pattern matches without any problems.
+ On WinNt, the pattern doesn't match.
The fix I'm using now is to write the regular expression so that it looks like
/<matching pattern>([\r\n]|$)/sm
I see that there are many checks in the oro code that look for a character
equal to '\n'... My suggested fix is to create a helper class (what package
this belongs in I don't know). But, this helper class could have a static
method "boolean isLineEnding( char )" or similar that could replace all the
current "<char> == '\n'" code. I'd be happy to implement this (with a little
guidance as to where it belongs).
Notes on test results using the attached TestEndAnchor code:
On a Solaris 2.6 machine, there were two failures when trying to match the
first pattern. On a WinNT 4.0 machine, there were three failures when trying
to match the same pattern. The added failure was from the string that uses
the System.getProperty( "line.separator" ) for it's line ending.
Thanks,
Ed.
P.S.
Daniel, I would have liked to give you the regular expressions I used in my
timing test code (from mid-May). But, I was advised against doing that because
the REs were very application specific.