You are viewing a plain text version of this content. The canonical link for it is here.
Posted to j-users@xerces.apache.org by Eddie Robertsson <er...@allette.com.au> on 2002/04/11 07:27:02 UTC

Bug in RegExp

Hi all,

I posted the below bug to Bugzilla a couple of weeks ago but I haven't
received a response so I thought I'd check with you if you maybe have a
fix:

Cheers,
/Eddie
-------------------------------------8<---------------------------------------------

Hi,

I've discovered a bug when using the RegularExpression package in Xerces
2.0.1
when using the '\s' escape sequence for whitespace in
RegularExpressions.

Examples:

1) If I create a RegularExpression like this:

RegularExpression regExpChecker = new RegularExpression("d\\s*", "uw");
or
RegularExpression regExpChecker = new RegularExpression("d\\s+", "uw");

I get the following exception:

Exception occurred during event dispatching:
java.lang.NullPointerException
        at
org.apache.xerces.impl.xpath.regex.RegularExpression.compile(RegularExpression.java:610)

        at
org.apache.xerces.impl.xpath.regex.RegularExpression.compile(RegularExpression.java:565)

        at
org.apache.xerces.impl.xpath.regex.RegularExpression.compile(RegularExpression.java:531)

        at
org.apache.xerces.impl.xpath.regex.RegularExpression.prepare(RegularExpression.java:2832)

        at
org.apache.xerces.impl.xpath.regex.RegularExpression.matches(RegularExpression.java:1444)

However if I use the expanded character class instead:

RegularExpression regExpChecker = new RegularExpression("d[
\\f\\n\\r\\t]*", "uw"); or
RegularExpression regExpChecker = new RegularExpression("d[
\\f\\n\\r\\t]", "uw");

it works fine so it's probably an easy error to fix if you know where to
look.
This exception occurs when you have '\s' followed by either '*' or '+'.

2) Even when using the '\s' escape operator without the * and + it
doesn't work properly. For example:

RegularExpression regExpChecker = new RegularExpression("d\\s", "uw");
Match match = new Match();
String test = "hgdjh";
if (regExpChecker.matches(test, 0, test.length(), match)) {
    System.out.println("Error: This shoulnd't be a match!!!");
}

This will cause the error text to be printed and this shouldn't happen.
The regular expression is set to match the character 'd' followed by any
whitespace character and yet it matches the 'd' character in the string
"hgdjh".

Does anyone have a fix for this yet?

Cheers,
/Eddie


---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-user-help@xml.apache.org