You are viewing a plain text version of this content. The canonical link for it is here.
Posted to c-dev@xerces.apache.org by "Benjamin Fritz (Jira)" <xe...@xml.apache.org> on 2022/09/08 22:04:00 UTC

[jira] [Created] (XERCESC-2240) Junk characters (including null) allowed in XML declaration

Benjamin Fritz created XERCESC-2240:
---------------------------------------

             Summary: Junk characters (including null) allowed in XML declaration
                 Key: XERCESC-2240
                 URL: https://issues.apache.org/jira/browse/XERCESC-2240
             Project: Xerces-C++
          Issue Type: Bug
    Affects Versions: 3.2.3
         Environment: Linux
            Reporter: Benjamin Fritz


In a library we've written using Xerces-C++ to validate XML files against a given XSD, we have discovered that the XercesDOMParser::parse() function does not record any errors if the XML declaration at the beginning of an XML document contains "junk" characters, including control characters (^K) or null bytes. The null control character specifically should be invalid in any XML document. I.e. the following XML file (attaching as basic_bad_bytes.xml) parses without error, but it should not:

<?xml version="1.0" encoding^@^@^@^@^@="UTF-8" ?>
<root_elem>
  <child_elem some_attr="abc" />
  <child_elem some_attr="def" />
</root_elem>

The following XML (attaching as basic_bad_bytes2.xml) correctly reports an error:

<?xml version="1.0" encoding="UTF-8" ?>
<root_elem^@^@^@^@^@>
  <child_elem some_attr="abc" />
  <child_elem some_attr="def" />
</root_elem>

This is similar to XERCESC-1701, where the end of the document after the root element was found to allow "junk" characters during parsing.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: c-dev-unsubscribe@xerces.apache.org
For additional commands, e-mail: c-dev-help@xerces.apache.org