You are viewing a plain text version of this content. The canonical link for it is here.
Posted to c-dev@xerces.apache.org by "Alexey Miroshnichenko (JIRA)" <xe...@xml.apache.org> on 2007/05/17 23:06:17 UTC
[jira] Created: (XERCESC-1705) Regular Expression: not standard
processing for '-' in character range
Regular Expression: not standard processing for '-' in character range
----------------------------------------------------------------------
Key: XERCESC-1705
URL: https://issues.apache.org/jira/browse/XERCESC-1705
Project: Xerces-C++
Issue Type: Bug
Components: Validating Parser (Schema) (Xerces 1.5 or up only)
Reporter: Alexey Miroshnichenko
according to http://www.w3.org/TR/2004/REC-xmlschema-2-20041028/datatypes.html#dt-regex:
============================================================================
"A single XML character is a - character range- that identifies the set of
characters containing only itself. All XML characters are valid character
ranges, except as follows:
* The [, ], - and \ characters are not valid character ranges;
* The ^ character is only valid at the beginning of a - positive character
group- if it is part of a - negative character group-
* The - character is a valid character range only at the beginning or end
of a - positive character group- .
Note: The grammar for - character range- as given above is ambiguous, but the
second and third bullets above together remove the ambiguity."
============================================================================
same time ..\xerces-c-src_2_6_0\src\xercesc\util\regx\ParserForXMLSchema.cpp
comment at line 268 says:
// '[', ']', '-' not allowed and should be esacaped
and it looks like follow is applicable to xerces even it was found in jakarta http://jakarta.apache.org/regexp/apidocs/ :
============================================================================
NOTE: Incomplete ranges will be interpreted as "starts from zero" or "ends with
last character".
I.e. [-a] is the same as [\\u0000-a], and [a-] is the same as [a-\\uFFFF], [-]
means "all characters".
============================================================================
PS: What is the reason for "not standard" behavior?
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: c-dev-unsubscribe@xerces.apache.org
For additional commands, e-mail: c-dev-help@xerces.apache.org
[jira] Resolved: (XERCESC-1705) Regular Expression: not standard
processing for '-' in character range
Posted by "Alberto Massari (JIRA)" <xe...@xml.apache.org>.
[ https://issues.apache.org/jira/browse/XERCESC-1705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Alberto Massari resolved XERCESC-1705.
--------------------------------------
Resolution: Duplicate
Already fixed in 2.7
> Regular Expression: not standard processing for '-' in character range
> ----------------------------------------------------------------------
>
> Key: XERCESC-1705
> URL: https://issues.apache.org/jira/browse/XERCESC-1705
> Project: Xerces-C++
> Issue Type: Bug
> Components: Validating Parser (Schema) (Xerces 1.5 or up only)
> Reporter: Alexey Miroshnichenko
>
> according to http://www.w3.org/TR/2004/REC-xmlschema-2-20041028/datatypes.html#dt-regex:
> ============================================================================
> "A single XML character is a - character range- that identifies the set of
> characters containing only itself. All XML characters are valid character
> ranges, except as follows:
> * The [, ], - and \ characters are not valid character ranges;
> * The ^ character is only valid at the beginning of a - positive character
> group- if it is part of a - negative character group-
> * The - character is a valid character range only at the beginning or end
> of a - positive character group- .
> Note: The grammar for - character range- as given above is ambiguous, but the
> second and third bullets above together remove the ambiguity."
> ============================================================================
> same time ..\xerces-c-src_2_6_0\src\xercesc\util\regx\ParserForXMLSchema.cpp
> comment at line 268 says:
> // '[', ']', '-' not allowed and should be esacaped
> and it looks like follow is applicable to xerces even it was found in jakarta http://jakarta.apache.org/regexp/apidocs/ :
> ============================================================================
> NOTE: Incomplete ranges will be interpreted as "starts from zero" or "ends with
> last character".
> I.e. [-a] is the same as [\\u0000-a], and [a-] is the same as [a-\\uFFFF], [-]
> means "all characters".
> ============================================================================
> PS: What is the reason for "not standard" behavior?
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: c-dev-unsubscribe@xerces.apache.org
For additional commands, e-mail: c-dev-help@xerces.apache.org