You are viewing a plain text version of this content. The canonical link for it is here.
Posted to j-dev@xerces.apache.org by "Martin Probst (JIRA)" <xe...@xml.apache.org> on 2006/01/05 15:37:01 UTC

[jira] Created: (XERCESJ-1126) Regular Expressions starting with "." in non-XML Schema mode use wrong first character optimization

Regular Expressions starting with "." in non-XML Schema mode use wrong first character optimization
---------------------------------------------------------------------------------------------------

         Key: XERCESJ-1126
         URL: http://issues.apache.org/jira/browse/XERCESJ-1126
     Project: Xerces2-J
        Type: Bug
    Versions: 2.7.1    
    Reporter: Martin Probst


The following Java snippet prints "not matched", but should print "matched".

    RegularExpression regex = new RegularExpression(".oo", "");
    if (regex.matches("foo")) System.out.println("matched");
    else System.out.println("not matched");

It uses the class org.apache.xerces.impl.xpath.regex.RegularExpression.java. I believe this happens because of the first character optimization kicking in and checks for a first character of "o", does not match the 'f' and then consequently returns false. This may be caused by this code snippet from Token.java:493

    case DOT: // ****
      if (isSet(options, RegularExpression.SINGLE_LINE)) {
        return FC_CONTINUE; // **** We can not optimize.
      } else {
        return FC_CONTINUE;
        /*
         * result.addRange(0, RegularExpression.LINE_FEED-1);
         * result.addRange(RegularExpression.LINE_FEED+1,
         * RegularExpression.CARRIAGE_RETURN-1);
         * result.addRange(RegularExpression.CARRIAGE_RETURN+1,
         * RegularExpression.LINE_SEPARATOR-1);
         * result.addRange(RegularExpression.PARAGRAPH_SEPARATOR+1, UTF16_MAX);
         * return 1;
         */
      }

I think it should unconditionally return FC_ANY for DOT, at least in the case of a starting '.'

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


---------------------------------------------------------------------
To unsubscribe, e-mail: j-dev-unsubscribe@xerces.apache.org
For additional commands, e-mail: j-dev-help@xerces.apache.org


[jira] Assigned: (XERCESJ-1126) Regular Expressions starting with "." in non-XML Schema mode use wrong first character optimization

Posted by "Michael Glavassevich (JIRA)" <xe...@xml.apache.org>.
     [ http://issues.apache.org/jira/browse/XERCESJ-1126?page=all ]

Michael Glavassevich reassigned XERCESJ-1126:
---------------------------------------------

    Assign To: Michael Glavassevich

> Regular Expressions starting with "." in non-XML Schema mode use wrong first character optimization
> ---------------------------------------------------------------------------------------------------
>
>          Key: XERCESJ-1126
>          URL: http://issues.apache.org/jira/browse/XERCESJ-1126
>      Project: Xerces2-J
>         Type: Bug

>     Versions: 2.7.1
>     Reporter: Martin Probst
>     Assignee: Michael Glavassevich

>
> The following Java snippet prints "not matched", but should print "matched".
>     RegularExpression regex = new RegularExpression(".oo", "");
>     if (regex.matches("foo")) System.out.println("matched");
>     else System.out.println("not matched");
> It uses the class org.apache.xerces.impl.xpath.regex.RegularExpression.java. I believe this happens because of the first character optimization kicking in and checks for a first character of "o", does not match the 'f' and then consequently returns false. This may be caused by this code snippet from Token.java:493
>     case DOT: // ****
>       if (isSet(options, RegularExpression.SINGLE_LINE)) {
>         return FC_CONTINUE; // **** We can not optimize.
>       } else {
>         return FC_CONTINUE;
>         /*
>          * result.addRange(0, RegularExpression.LINE_FEED-1);
>          * result.addRange(RegularExpression.LINE_FEED+1,
>          * RegularExpression.CARRIAGE_RETURN-1);
>          * result.addRange(RegularExpression.CARRIAGE_RETURN+1,
>          * RegularExpression.LINE_SEPARATOR-1);
>          * result.addRange(RegularExpression.PARAGRAPH_SEPARATOR+1, UTF16_MAX);
>          * return 1;
>          */
>       }
> I think it should unconditionally return FC_ANY for DOT, at least in the case of a starting '.'

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


---------------------------------------------------------------------
To unsubscribe, e-mail: j-dev-unsubscribe@xerces.apache.org
For additional commands, e-mail: j-dev-help@xerces.apache.org


[jira] Resolved: (XERCESJ-1126) Regular Expressions starting with "." in non-XML Schema mode use wrong first character optimization

Posted by "Michael Glavassevich (JIRA)" <xe...@xml.apache.org>.
     [ http://issues.apache.org/jira/browse/XERCESJ-1126?page=all ]
     
Michael Glavassevich resolved XERCESJ-1126:
-------------------------------------------

    Resolution: Fixed

Thanks very much for this patch.  I've applied it to SVN.

> Regular Expressions starting with "." in non-XML Schema mode use wrong first character optimization
> ---------------------------------------------------------------------------------------------------
>
>          Key: XERCESJ-1126
>          URL: http://issues.apache.org/jira/browse/XERCESJ-1126
>      Project: Xerces2-J
>         Type: Bug

>     Versions: 2.7.1
>     Reporter: Martin Probst
>     Assignee: Michael Glavassevich

>
> The following Java snippet prints "not matched", but should print "matched".
>     RegularExpression regex = new RegularExpression(".oo", "");
>     if (regex.matches("foo")) System.out.println("matched");
>     else System.out.println("not matched");
> It uses the class org.apache.xerces.impl.xpath.regex.RegularExpression.java. I believe this happens because of the first character optimization kicking in and checks for a first character of "o", does not match the 'f' and then consequently returns false. This may be caused by this code snippet from Token.java:493
>     case DOT: // ****
>       if (isSet(options, RegularExpression.SINGLE_LINE)) {
>         return FC_CONTINUE; // **** We can not optimize.
>       } else {
>         return FC_CONTINUE;
>         /*
>          * result.addRange(0, RegularExpression.LINE_FEED-1);
>          * result.addRange(RegularExpression.LINE_FEED+1,
>          * RegularExpression.CARRIAGE_RETURN-1);
>          * result.addRange(RegularExpression.CARRIAGE_RETURN+1,
>          * RegularExpression.LINE_SEPARATOR-1);
>          * result.addRange(RegularExpression.PARAGRAPH_SEPARATOR+1, UTF16_MAX);
>          * return 1;
>          */
>       }
> I think it should unconditionally return FC_ANY for DOT, at least in the case of a starting '.'

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


---------------------------------------------------------------------
To unsubscribe, e-mail: j-dev-unsubscribe@xerces.apache.org
For additional commands, e-mail: j-dev-help@xerces.apache.org


[jira] Updated: (XERCESJ-1126) Regular Expressions starting with "." in non-XML Schema mode use wrong first character optimization

Posted by "Michael Glavassevich (JIRA)" <xe...@xml.apache.org>.
     [ http://issues.apache.org/jira/browse/XERCESJ-1126?page=all ]

Michael Glavassevich updated XERCESJ-1126:
------------------------------------------

      Component/s: XML Schema datatypes
    Fix Version/s: 2.8.1

> Regular Expressions starting with "." in non-XML Schema mode use wrong first character optimization
> ---------------------------------------------------------------------------------------------------
>
>                 Key: XERCESJ-1126
>                 URL: http://issues.apache.org/jira/browse/XERCESJ-1126
>             Project: Xerces2-J
>          Issue Type: Bug
>          Components: XML Schema datatypes
>    Affects Versions: 2.7.1
>            Reporter: Martin Probst
>         Assigned To: Michael Glavassevich
>             Fix For: 2.8.1
>
>
> The following Java snippet prints "not matched", but should print "matched".
>     RegularExpression regex = new RegularExpression(".oo", "");
>     if (regex.matches("foo")) System.out.println("matched");
>     else System.out.println("not matched");
> It uses the class org.apache.xerces.impl.xpath.regex.RegularExpression.java. I believe this happens because of the first character optimization kicking in and checks for a first character of "o", does not match the 'f' and then consequently returns false. This may be caused by this code snippet from Token.java:493
>     case DOT: // ****
>       if (isSet(options, RegularExpression.SINGLE_LINE)) {
>         return FC_CONTINUE; // **** We can not optimize.
>       } else {
>         return FC_CONTINUE;
>         /*
>          * result.addRange(0, RegularExpression.LINE_FEED-1);
>          * result.addRange(RegularExpression.LINE_FEED+1,
>          * RegularExpression.CARRIAGE_RETURN-1);
>          * result.addRange(RegularExpression.CARRIAGE_RETURN+1,
>          * RegularExpression.LINE_SEPARATOR-1);
>          * result.addRange(RegularExpression.PARAGRAPH_SEPARATOR+1, UTF16_MAX);
>          * return 1;
>          */
>       }
> I think it should unconditionally return FC_ANY for DOT, at least in the case of a starting '.'

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: j-dev-unsubscribe@xerces.apache.org
For additional commands, e-mail: j-dev-help@xerces.apache.org


[jira] Commented: (XERCESJ-1126) Regular Expressions starting with "." in non-XML Schema mode use wrong first character optimization

Posted by "TAMURA, Kent (JIRA)" <xe...@xml.apache.org>.
    [ http://issues.apache.org/jira/browse/XERCESJ-1126?page=comments#action_12368312 ] 

TAMURA, Kent commented on XERCESJ-1126:
---------------------------------------

> I think it should unconditionally return FC_ANY for DOT

That's correct.  We should apply the following:

Index: src/org/apache/xerces/impl/xpath/regex/Token.java
===================================================================
--- src/org/apache/xerces/impl/xpath/regex/Token.java	(revision 382072)
+++ src/org/apache/xerces/impl/xpath/regex/Token.java	(working copy)
@@ -437,20 +437,8 @@
             }
             return FC_TERMINAL;
 
-          case DOT:                             // ****
-            if (isSet(options, RegularExpression.SINGLE_LINE)) {
-                return FC_CONTINUE;             // **** We can not optimize.
-            } else {
-                return FC_CONTINUE;
-                /*
-                result.addRange(0, RegularExpression.LINE_FEED-1);
-                result.addRange(RegularExpression.LINE_FEED+1, RegularExpression.CARRIAGE_RETURN-1);
-                result.addRange(RegularExpression.CARRIAGE_RETURN+1,
-                                RegularExpression.LINE_SEPARATOR-1);
-                result.addRange(RegularExpression.PARAGRAPH_SEPARATOR+1, UTF16_MAX);
-                return 1;
-                */
-            }
+          case DOT:
+            return FC_ANY;
 
           case RANGE:
             if (isSet(options, RegularExpression.IGNORE_CASE)) {


> Regular Expressions starting with "." in non-XML Schema mode use wrong first character optimization
> ---------------------------------------------------------------------------------------------------
>
>          Key: XERCESJ-1126
>          URL: http://issues.apache.org/jira/browse/XERCESJ-1126
>      Project: Xerces2-J
>         Type: Bug
>     Versions: 2.7.1
>     Reporter: Martin Probst

>
> The following Java snippet prints "not matched", but should print "matched".
>     RegularExpression regex = new RegularExpression(".oo", "");
>     if (regex.matches("foo")) System.out.println("matched");
>     else System.out.println("not matched");
> It uses the class org.apache.xerces.impl.xpath.regex.RegularExpression.java. I believe this happens because of the first character optimization kicking in and checks for a first character of "o", does not match the 'f' and then consequently returns false. This may be caused by this code snippet from Token.java:493
>     case DOT: // ****
>       if (isSet(options, RegularExpression.SINGLE_LINE)) {
>         return FC_CONTINUE; // **** We can not optimize.
>       } else {
>         return FC_CONTINUE;
>         /*
>          * result.addRange(0, RegularExpression.LINE_FEED-1);
>          * result.addRange(RegularExpression.LINE_FEED+1,
>          * RegularExpression.CARRIAGE_RETURN-1);
>          * result.addRange(RegularExpression.CARRIAGE_RETURN+1,
>          * RegularExpression.LINE_SEPARATOR-1);
>          * result.addRange(RegularExpression.PARAGRAPH_SEPARATOR+1, UTF16_MAX);
>          * return 1;
>          */
>       }
> I think it should unconditionally return FC_ANY for DOT, at least in the case of a starting '.'

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


---------------------------------------------------------------------
To unsubscribe, e-mail: j-dev-unsubscribe@xerces.apache.org
For additional commands, e-mail: j-dev-help@xerces.apache.org