You are viewing a plain text version of this content. The canonical link for it is here.

Posted to c-dev@xerces.apache.org by "Alexey Miroshnichenko (JIRA)" <xe...@xml.apache.org> on 2007/05/17 23:06:17 UTC

[jira] Created: (XERCESC-1705) Regular Expression: not standard processing for '-' in character range

Regular Expression: not standard processing for '-' in character range
----------------------------------------------------------------------

                 Key: XERCESC-1705
                 URL: https://issues.apache.org/jira/browse/XERCESC-1705
             Project: Xerces-C++
          Issue Type: Bug
          Components: Validating Parser (Schema) (Xerces 1.5 or up only)
            Reporter: Alexey Miroshnichenko


according to http://www.w3.org/TR/2004/REC-xmlschema-2-20041028/datatypes.html#dt-regex:
============================================================================
"A single XML character is a - character range-  that identifies the set of
characters containing only itself. All XML characters are valid character
ranges, except as follows:

    * The [, ], - and \ characters are not valid character ranges;
    * The ^ character is only valid at the beginning of a - positive character
group-  if it is part of a - negative character group- 
    * The - character is a valid character range only at the beginning or end
of a - positive character group- . 

Note: The grammar for - character range-  as given above is ambiguous, but the
second and third bullets above together remove the ambiguity."
============================================================================

same time ..\xerces-c-src_2_6_0\src\xercesc\util\regx\ParserForXMLSchema.cpp
comment at line 268 says:
 // '[', ']', '-' not allowed and should be esacaped

and it looks like follow is applicable to xerces even it was found in jakarta http://jakarta.apache.org/regexp/apidocs/ :
============================================================================
NOTE: Incomplete ranges will be interpreted as "starts from zero" or "ends with
last character". 
I.e. [-a] is the same as [\\u0000-a], and [a-] is the same as [a-\\uFFFF], [-]
means "all characters".
============================================================================

PS: What is the reason for "not standard" behavior?


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: c-dev-unsubscribe@xerces.apache.org
For additional commands, e-mail: c-dev-help@xerces.apache.org

[jira] Resolved: (XERCESC-1705) Regular Expression: not standard processing for '-' in character range

Posted by "Alberto Massari (JIRA)" <xe...@xml.apache.org>.

     [ https://issues.apache.org/jira/browse/XERCESC-1705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Alberto Massari resolved XERCESC-1705.
--------------------------------------

    Resolution: Duplicate

Already fixed in 2.7

> Regular Expression: not standard processing for '-' in character range
> ----------------------------------------------------------------------
>
>                 Key: XERCESC-1705
>                 URL: https://issues.apache.org/jira/browse/XERCESC-1705
>             Project: Xerces-C++
>          Issue Type: Bug
>          Components: Validating Parser (Schema) (Xerces 1.5 or up only)
>            Reporter: Alexey Miroshnichenko
>
> according to http://www.w3.org/TR/2004/REC-xmlschema-2-20041028/datatypes.html#dt-regex:
> ============================================================================
> "A single XML character is a - character range-  that identifies the set of
> characters containing only itself. All XML characters are valid character
> ranges, except as follows:
>     * The [, ], - and \ characters are not valid character ranges;
>     * The ^ character is only valid at the beginning of a - positive character
> group-  if it is part of a - negative character group- 
>     * The - character is a valid character range only at the beginning or end
> of a - positive character group- . 
> Note: The grammar for - character range-  as given above is ambiguous, but the
> second and third bullets above together remove the ambiguity."
> ============================================================================
> same time ..\xerces-c-src_2_6_0\src\xercesc\util\regx\ParserForXMLSchema.cpp
> comment at line 268 says:
>  // '[', ']', '-' not allowed and should be esacaped
> and it looks like follow is applicable to xerces even it was found in jakarta http://jakarta.apache.org/regexp/apidocs/ :
> ============================================================================
> NOTE: Incomplete ranges will be interpreted as "starts from zero" or "ends with
> last character". 
> I.e. [-a] is the same as [\\u0000-a], and [a-] is the same as [a-\\uFFFF], [-]
> means "all characters".
> ============================================================================
> PS: What is the reason for "not standard" behavior?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: c-dev-unsubscribe@xerces.apache.org
For additional commands, e-mail: c-dev-help@xerces.apache.org