You are viewing a plain text version of this content. The canonical link for it is here.
Posted to j-dev@xerces.apache.org by "TAMURA, Kent (JIRA)" <xe...@xml.apache.org> on 2006/03/01 17:53:38 UTC
[jira] Commented: (XERCESJ-1126) Regular Expressions starting with
"." in non-XML Schema mode use wrong first character optimization
[ http://issues.apache.org/jira/browse/XERCESJ-1126?page=comments#action_12368312 ]
TAMURA, Kent commented on XERCESJ-1126:
---------------------------------------
> I think it should unconditionally return FC_ANY for DOT
That's correct. We should apply the following:
Index: src/org/apache/xerces/impl/xpath/regex/Token.java
===================================================================
--- src/org/apache/xerces/impl/xpath/regex/Token.java (revision 382072)
+++ src/org/apache/xerces/impl/xpath/regex/Token.java (working copy)
@@ -437,20 +437,8 @@
}
return FC_TERMINAL;
- case DOT: // ****
- if (isSet(options, RegularExpression.SINGLE_LINE)) {
- return FC_CONTINUE; // **** We can not optimize.
- } else {
- return FC_CONTINUE;
- /*
- result.addRange(0, RegularExpression.LINE_FEED-1);
- result.addRange(RegularExpression.LINE_FEED+1, RegularExpression.CARRIAGE_RETURN-1);
- result.addRange(RegularExpression.CARRIAGE_RETURN+1,
- RegularExpression.LINE_SEPARATOR-1);
- result.addRange(RegularExpression.PARAGRAPH_SEPARATOR+1, UTF16_MAX);
- return 1;
- */
- }
+ case DOT:
+ return FC_ANY;
case RANGE:
if (isSet(options, RegularExpression.IGNORE_CASE)) {
> Regular Expressions starting with "." in non-XML Schema mode use wrong first character optimization
> ---------------------------------------------------------------------------------------------------
>
> Key: XERCESJ-1126
> URL: http://issues.apache.org/jira/browse/XERCESJ-1126
> Project: Xerces2-J
> Type: Bug
> Versions: 2.7.1
> Reporter: Martin Probst
>
> The following Java snippet prints "not matched", but should print "matched".
> RegularExpression regex = new RegularExpression(".oo", "");
> if (regex.matches("foo")) System.out.println("matched");
> else System.out.println("not matched");
> It uses the class org.apache.xerces.impl.xpath.regex.RegularExpression.java. I believe this happens because of the first character optimization kicking in and checks for a first character of "o", does not match the 'f' and then consequently returns false. This may be caused by this code snippet from Token.java:493
> case DOT: // ****
> if (isSet(options, RegularExpression.SINGLE_LINE)) {
> return FC_CONTINUE; // **** We can not optimize.
> } else {
> return FC_CONTINUE;
> /*
> * result.addRange(0, RegularExpression.LINE_FEED-1);
> * result.addRange(RegularExpression.LINE_FEED+1,
> * RegularExpression.CARRIAGE_RETURN-1);
> * result.addRange(RegularExpression.CARRIAGE_RETURN+1,
> * RegularExpression.LINE_SEPARATOR-1);
> * result.addRange(RegularExpression.PARAGRAPH_SEPARATOR+1, UTF16_MAX);
> * return 1;
> */
> }
> I think it should unconditionally return FC_ANY for DOT, at least in the case of a starting '.'
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: j-dev-unsubscribe@xerces.apache.org
For additional commands, e-mail: j-dev-help@xerces.apache.org