You are viewing a plain text version of this content. The canonical link for it is here.
Posted to j-dev@xerces.apache.org by "Michael Glavassevich (JIRA)" <xe...@xml.apache.org> on 2006/06/20 21:59:33 UTC

[jira] Resolved: (XERCESJ-1126) Regular Expressions starting with "." in non-XML Schema mode use wrong first character optimization

     [ http://issues.apache.org/jira/browse/XERCESJ-1126?page=all ]
     
Michael Glavassevich resolved XERCESJ-1126:
-------------------------------------------

    Resolution: Fixed

Thanks very much for this patch.  I've applied it to SVN.

> Regular Expressions starting with "." in non-XML Schema mode use wrong first character optimization
> ---------------------------------------------------------------------------------------------------
>
>          Key: XERCESJ-1126
>          URL: http://issues.apache.org/jira/browse/XERCESJ-1126
>      Project: Xerces2-J
>         Type: Bug

>     Versions: 2.7.1
>     Reporter: Martin Probst
>     Assignee: Michael Glavassevich

>
> The following Java snippet prints "not matched", but should print "matched".
>     RegularExpression regex = new RegularExpression(".oo", "");
>     if (regex.matches("foo")) System.out.println("matched");
>     else System.out.println("not matched");
> It uses the class org.apache.xerces.impl.xpath.regex.RegularExpression.java. I believe this happens because of the first character optimization kicking in and checks for a first character of "o", does not match the 'f' and then consequently returns false. This may be caused by this code snippet from Token.java:493
>     case DOT: // ****
>       if (isSet(options, RegularExpression.SINGLE_LINE)) {
>         return FC_CONTINUE; // **** We can not optimize.
>       } else {
>         return FC_CONTINUE;
>         /*
>          * result.addRange(0, RegularExpression.LINE_FEED-1);
>          * result.addRange(RegularExpression.LINE_FEED+1,
>          * RegularExpression.CARRIAGE_RETURN-1);
>          * result.addRange(RegularExpression.CARRIAGE_RETURN+1,
>          * RegularExpression.LINE_SEPARATOR-1);
>          * result.addRange(RegularExpression.PARAGRAPH_SEPARATOR+1, UTF16_MAX);
>          * return 1;
>          */
>       }
> I think it should unconditionally return FC_ANY for DOT, at least in the case of a starting '.'

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


---------------------------------------------------------------------
To unsubscribe, e-mail: j-dev-unsubscribe@xerces.apache.org
For additional commands, e-mail: j-dev-help@xerces.apache.org