You are viewing a plain text version of this content. The canonical link for it is here.

Posted to j-dev@xerces.apache.org by "Michael Glavassevich (JIRA)" <xe...@xml.apache.org> on 2006/11/20 23:24:06 UTC

[jira] Updated: (XERCESJ-1061) Regex "$" and "^" characters treated as special chars in conflict with XML Schema spec

     [ http://issues.apache.org/jira/browse/XERCESJ-1061?page=all ]

Michael Glavassevich updated XERCESJ-1061:
------------------------------------------

    Fix Version/s: 2.9.0

> Regex "$" and "^" characters treated as special chars in conflict with XML Schema spec
> --------------------------------------------------------------------------------------
>
>                 Key: XERCESJ-1061
>                 URL: http://issues.apache.org/jira/browse/XERCESJ-1061
>             Project: Xerces2-J
>          Issue Type: Bug
>          Components: XML Schema datatypes
>    Affects Versions: 2.6.2
>         Environment: Test Environment: Win XP SP1, JDK v1.5.0_02, Xerces v2.6.2 (manually used; overrides any other, if packaged with the JDK)
>            Reporter: Darien Kindlund
>         Assigned To: Michael Glavassevich
>            Priority: Minor
>             Fix For: 2.9.0
>
>         Attachments: RegexParser.diff, regexparser.java
>
>
> Xerces rejects the following schema:
> <xs:schema xmlns:xs='http://www.w3.org/2001/XMLSchema'>
>  <xs:element name="test">
>   <xs:simpleType>
>    <xs:restriction base="xs:string">
>     <xs:pattern value="$?[0-9]+\.[0-9]{2}" />
>    </xs:restriction>
>   </xs:simpleType>
>  </xs:element>
> </xs:schema>
> The code within org.apache.xerces.impl.xpath.regex.RegexParser throws a parser exception over the use of the "$?" characters, unless the "$" character is escaped. For example, this works:
>     <xs:pattern value="\$?[0-9]+\.[0-9]{2}" />
> The fundamental problem is that the Xerces RegexParser code does NOT follow the XML Schema specification, as defined by this URL:
> http://www.w3.org/TR/2000/WD-xmlschema-2-20000407/#dt-metac
> Specifically, the XML Schema specification does NOT give special meaning to the "$" and "^" characters, whereas the RegexParser code seems to indicate that these characters have the normal, standard UNIX definitions of "end-of-line" and "start-of-line" anchors respectively.
> Regards,
> --
> Darien Kindlund
> The MITRE Corporation
> InfoSec Engr / Scientist, Sr.
> kindlund@mitre.org

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: j-dev-unsubscribe@xerces.apache.org
For additional commands, e-mail: j-dev-help@xerces.apache.org