You are viewing a plain text version of this content. The canonical link for it is here.
Posted to c-dev@xerces.apache.org by bu...@apache.org on 2002/07/19 13:04:31 UTC

DO NOT REPLY [Bug 10981] New: - Regular Expressions : \w incorrectly matching punctuation characters

DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG 
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://nagoya.apache.org/bugzilla/show_bug.cgi?id=10981>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND 
INSERTED IN THE BUG DATABASE.

http://nagoya.apache.org/bugzilla/show_bug.cgi?id=10981

Regular Expressions : \w incorrectly matching punctuation characters

           Summary: Regular Expressions : \w incorrectly matching
                    punctuation characters
           Product: Xerces-C++
           Version: 1.7.0
          Platform: PC
        OS/Version: Other
            Status: NEW
          Severity: Normal
          Priority: Other
         Component: Validating Parser (Schema) (Xerces 1.5 or up only)
        AssignedTo: xerces-c-dev@xml.apache.org
        ReportedBy: richard_schofield@uk.ibm.com


The XML Schema Spec Part 2 (Appendix F) defines the multi-charcater escapes 
which can be used in regular expression matching.

\w should match all characters EXCEPT the set of "punctuation", "separator" 
and "other" characters as defined by the unicode specification.

However, \w sets up a range which matches all characters between x0020 and 
xD7FF (gXMLChars). This range results in the punctuation, separator and other 
characters being matched incorrectly.

---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-c-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-c-dev-help@xml.apache.org