You are viewing a plain text version of this content. The canonical link for it is here.
Posted to c-dev@xerces.apache.org by bu...@apache.org on 2002/07/19 13:04:31 UTC
DO NOT REPLY [Bug 10981] New: -
Regular Expressions : \w incorrectly matching punctuation characters
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://nagoya.apache.org/bugzilla/show_bug.cgi?id=10981>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND
INSERTED IN THE BUG DATABASE.
http://nagoya.apache.org/bugzilla/show_bug.cgi?id=10981
Regular Expressions : \w incorrectly matching punctuation characters
Summary: Regular Expressions : \w incorrectly matching
punctuation characters
Product: Xerces-C++
Version: 1.7.0
Platform: PC
OS/Version: Other
Status: NEW
Severity: Normal
Priority: Other
Component: Validating Parser (Schema) (Xerces 1.5 or up only)
AssignedTo: xerces-c-dev@xml.apache.org
ReportedBy: richard_schofield@uk.ibm.com
The XML Schema Spec Part 2 (Appendix F) defines the multi-charcater escapes
which can be used in regular expression matching.
\w should match all characters EXCEPT the set of "punctuation", "separator"
and "other" characters as defined by the unicode specification.
However, \w sets up a range which matches all characters between x0020 and
xD7FF (gXMLChars). This range results in the punctuation, separator and other
characters being matched incorrectly.
---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-c-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-c-dev-help@xml.apache.org