You are viewing a plain text version of this content. The canonical link for it is here.
Posted to j-dev@xerces.apache.org by "Michael Glavassevich (Jira)" <xe...@xml.apache.org> on 2021/05/12 19:11:00 UTC

[jira] [Commented] (XERCESJ-1592) schema validation incorrectly treating single character outside of BMP as two characters

    [ https://issues.apache.org/jira/browse/XERCESJ-1592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17343495#comment-17343495 ] 

Michael Glavassevich commented on XERCESJ-1592:
-----------------------------------------------

You should ask your question in a forum for OpenJDK. We have no awareness of what downstream projects are doing.

> schema validation incorrectly treating single character outside of BMP as two characters
> ----------------------------------------------------------------------------------------
>
>                 Key: XERCESJ-1592
>                 URL: https://issues.apache.org/jira/browse/XERCESJ-1592
>             Project: Xerces2-J
>          Issue Type: Bug
>          Components: XML Schema 1.0 Datatypes
>    Affects Versions: 2.11.0
>         Environment: Windows 7, Oracle Java JRE 1.7
>            Reporter: Martin Honnen
>            Priority: Major
>             Fix For: 2.12.0
>
>
> When validating the instance document http://home.arcor.de/martin.honnen/xml/oneCharInstance1.xml against the schema http://home.arcor.de/martin.honnen/xml/oneCharSchema1.xsd Xerces reports the following validation error(s):
> "[Error] oneCharInstance1.xml:3:25: cvc-length-valid: Value '?' with length = '2'
>  is not facet-valid with respect to length '1' for type 'one-char'.
> [Error] oneCharInstance1.xml:3:25: cvc-type.3.1.3: The value '?' of element 'test' is not valid."
> The "test" element however contains a single character (<test>&#x10300;</test>), albeit one which is not inside the BMP. In terms of the XML specification http://www.w3.org/TR/xml/#dt-character and the schema data type specification http://www.w3.org/TR/xmlschema-2/#string there is no difference between characters in the BMP and outside of it, each one counts as a single character.
> So the sample XML is valid against the sample schema and Xerces should not report any error.
> Other validating parsers like Saxon 9.4 EE and XSV (http://www.w3.org/2001/03/webdata/xsv?docAddrs=http%3A%2F%2Fhome.arcor.de%2Fmartin.honnen%2Fxml%2FoneCharInstance1.xml+http%3A%2F%2Fhome.arcor.de%2Fmartin.honnen%2Fxml%2FoneCharSchema1.xsd&warnings=on&keepGoing=on&style=xsl#) don't report any validation error for the samples named above.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: j-dev-unsubscribe@xerces.apache.org
For additional commands, e-mail: j-dev-help@xerces.apache.org