You are viewing a plain text version of this content. The canonical link for it is here.
Posted to c-dev@xerces.apache.org by "Benjamin Fritz (Jira)" <xe...@xml.apache.org> on 2022/09/08 22:04:00 UTC
[jira] [Created] (XERCESC-2240) Junk characters (including null) allowed in XML declaration
Benjamin Fritz created XERCESC-2240:
---------------------------------------
Summary: Junk characters (including null) allowed in XML declaration
Key: XERCESC-2240
URL: https://issues.apache.org/jira/browse/XERCESC-2240
Project: Xerces-C++
Issue Type: Bug
Affects Versions: 3.2.3
Environment: Linux
Reporter: Benjamin Fritz
In a library we've written using Xerces-C++ to validate XML files against a given XSD, we have discovered that the XercesDOMParser::parse() function does not record any errors if the XML declaration at the beginning of an XML document contains "junk" characters, including control characters (^K) or null bytes. The null control character specifically should be invalid in any XML document. I.e. the following XML file (attaching as basic_bad_bytes.xml) parses without error, but it should not:
<?xml version="1.0" encoding^@^@^@^@^@="UTF-8" ?>
<root_elem>
<child_elem some_attr="abc" />
<child_elem some_attr="def" />
</root_elem>
The following XML (attaching as basic_bad_bytes2.xml) correctly reports an error:
<?xml version="1.0" encoding="UTF-8" ?>
<root_elem^@^@^@^@^@>
<child_elem some_attr="abc" />
<child_elem some_attr="def" />
</root_elem>
This is similar to XERCESC-1701, where the end of the document after the root element was found to allow "junk" characters during parsing.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: c-dev-unsubscribe@xerces.apache.org
For additional commands, e-mail: c-dev-help@xerces.apache.org