You are viewing a plain text version of this content. The canonical link for it is here.
Posted to c-users@xerces.apache.org by BEEK Graham <gr...@capgemini.com.INVALID> on 2021/09/15 12:51:37 UTC

Xerces validating XML with extraneous characters

Hi,

I'm passing a buffer into a Xerces parser containing valid XML + characters off the start of another XML message. These extra characters are:

\\x00\\x03a\\\'<?xml<file://x00/x03a/'%3c%3fxml> version=\\\'1.0\\\' encoding=\\\'utf-8\\\'?>\\n<ses:myMessage <file://n%3cses:myMessage%20>  xmlns:cm="http:/

The '\' appearances are due to the way the logging is displayed to me, but essentially as there is no closing tag for "myMessage" (or any closing quotes for the URL) it is not valid XML. But the parser returns that it is valid against the XSD.

Is this known behaviour? I would have expected the created parser to return that it is invalid.

Thanks,
Graham
This message contains information that may be privileged or confidential and is the property of the Capgemini Group. It is intended only for the person to whom it is addressed. If you are not the intended recipient, you are not authorized to read, print, retain, copy, disseminate, distribute, or use this message or any part thereof. If you receive this message in error, please notify the sender immediately and delete all copies of this message.

RE: Xerces validating XML with extraneous characters

Posted by BEEK Graham <gr...@capgemini.com.INVALID>.
Hi,

I just thought I'd follow up on this. After a bit of digging it looks like it's accepted as valid because the first "extra" character is 0x00 i.e. null. I'm calling Xerces from within Ada, so converting to a c-string. This is being truncated, as a null character terminates the string

Cheers,
Graham

-----Original Message-----
From: Roger Leigh <rl...@codelibre.net> 
Sent: 19 September 2021 14:30
To: c-users@xerces.apache.org
Subject: Re: Xerces validating XML with extraneous characters

***This mail has been sent from an external source***

Hi Graham,

Please could you attach a complete self-contained example which reproduces this behaviour?

Thanks,
Roger

On 15/09/2021, 13:52, "BEEK Graham" <gr...@capgemini.com.INVALID> wrote:

    Hi,

    I'm passing a buffer into a Xerces parser containing valid XML + characters off the start of another XML message. These extra characters are:

    \\x00\\x03a\\\'<?xml<file://x00/x03a/'%3c%3fxml> version=\\\'1.0\\\' encoding=\\\'utf-8\\\'?>\\n<ses:myMessage <file://n%3cses:myMessage%20>  xmlns:cm="http:/

    The '\' appearances are due to the way the logging is displayed to me, but essentially as there is no closing tag for "myMessage" (or any closing quotes for the URL) it is not valid XML. But the parser returns that it is valid against the XSD.

    Is this known behaviour? I would have expected the created parser to return that it is invalid.

    Thanks,
    Graham
    This message contains information that may be privileged or confidential and is the property of the Capgemini Group. It is intended only for the person to whom it is addressed. If you are not the intended recipient, you are not authorized to read, print, retain, copy, disseminate, distribute, or use this message or any part thereof. If you receive this message in error, please notify the sender immediately and delete all copies of this message.



Re: Xerces validating XML with extraneous characters

Posted by Roger Leigh <rl...@codelibre.net>.
Hi Graham,

Please could you attach a complete self-contained example which reproduces this behaviour?

Thanks,
Roger

On 15/09/2021, 13:52, "BEEK Graham" <gr...@capgemini.com.INVALID> wrote:

    Hi,

    I'm passing a buffer into a Xerces parser containing valid XML + characters off the start of another XML message. These extra characters are:

    \\x00\\x03a\\\'<?xml<file://x00/x03a/'%3c%3fxml> version=\\\'1.0\\\' encoding=\\\'utf-8\\\'?>\\n<ses:myMessage <file://n%3cses:myMessage%20>  xmlns:cm="http:/

    The '\' appearances are due to the way the logging is displayed to me, but essentially as there is no closing tag for "myMessage" (or any closing quotes for the URL) it is not valid XML. But the parser returns that it is valid against the XSD.

    Is this known behaviour? I would have expected the created parser to return that it is invalid.

    Thanks,
    Graham
    This message contains information that may be privileged or confidential and is the property of the Capgemini Group. It is intended only for the person to whom it is addressed. If you are not the intended recipient, you are not authorized to read, print, retain, copy, disseminate, distribute, or use this message or any part thereof. If you receive this message in error, please notify the sender immediately and delete all copies of this message.