You are viewing a plain text version of this content. The canonical link for it is here.
Posted to c-dev@xerces.apache.org by "Benjamin Fritz (Jira)" <xe...@xml.apache.org> on 2022/09/08 22:05:00 UTC

[jira] [Updated] (XERCESC-2240) Junk characters (including null) allowed in XML declaration

     [ https://issues.apache.org/jira/browse/XERCESC-2240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Benjamin Fritz updated XERCESC-2240:
------------------------------------
    Attachment: basic_bad_bytes.xml
                basic_bad_bytes2.xml

> Junk characters (including null) allowed in XML declaration
> -----------------------------------------------------------
>
>                 Key: XERCESC-2240
>                 URL: https://issues.apache.org/jira/browse/XERCESC-2240
>             Project: Xerces-C++
>          Issue Type: Bug
>    Affects Versions: 3.2.3
>         Environment: Linux
>            Reporter: Benjamin Fritz
>            Priority: Minor
>         Attachments: basic_bad_bytes.xml, basic_bad_bytes2.xml
>
>
> In a library we've written using Xerces-C++ to validate XML files against a given XSD, we have discovered that the XercesDOMParser::parse() function does not record any errors if the XML declaration at the beginning of an XML document contains "junk" characters, including control characters (^K) or null bytes. The null control character specifically should be invalid in any XML document. I.e. the following XML file (attaching as basic_bad_bytes.xml) parses without error, but it should not:
> <?xml version="1.0" encoding^@^@^@^@^@="UTF-8" ?>
> <root_elem>
>   <child_elem some_attr="abc" />
>   <child_elem some_attr="def" />
> </root_elem>
> The following XML (attaching as basic_bad_bytes2.xml) correctly reports an error:
> <?xml version="1.0" encoding="UTF-8" ?>
> <root_elem^@^@^@^@^@>
>   <child_elem some_attr="abc" />
>   <child_elem some_attr="def" />
> </root_elem>
> This is similar to XERCESC-1701, where the end of the document after the root element was found to allow "junk" characters during parsing.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: c-dev-unsubscribe@xerces.apache.org
For additional commands, e-mail: c-dev-help@xerces.apache.org