You are viewing a plain text version of this content. The canonical link for it is here.
Posted to c-users@xerces.apache.org by Nedim Srndic <ne...@uni-tuebingen.de> on 2012/01/03 15:01:03 UTC

No Error Reported When Serializing Invalid Character References in XML 1.0

Hello, 

&#x1; is an invalid character reference in XML 1.0. If I write the byte
value "\x01" to a Xerces-C TextNode and serialize the entire DOMDocument
using UTF-8 and StdOutFormatTarget with XML version set to "1.0", then
Xerces-C writes the resulting XML document (without substituting the
character with the corresponding character reference) and doesn't report
any errors. Of course, the resulting XML is not well-formed and so I
cannot use it in other programs. 

In XML 1.1 this character reference is allowed and Xerces-C correctly
performs the character substitution, but the software I am using these
documents with sadly still does not support XML 1.1. 

Is there a Xerces-C function that I can call that will check if a string
that I want to put in an XML document satisfies the rules of
well-formedness for the given XML version? Is there something else I can
do about this problem? Why doesn't Xerces-C report the error?

Thank you,
Nedim Srndic


Re: No Error Reported When Serializing Invalid Character References in XML 1.0

Posted by Michael Glavassevich <mr...@ca.ibm.com>.
Alberto Massari <Al...@progress.com> wrote on 01/03/2012 12:42:39
PM:

> Hi Michael,
> in this case the error is not caused by a character that the target
> encoding doesn't support (Xerces-C would handle that). It's because a
> node contains a character that XML is not supposed to accept.

That's something that would be covered by DOM Level 3 Load and Save if it
were fully supported.

Looks like you just fixed that. :-)

> Alberto
>
> Il 03/01/2012 17:47, Michael Glavassevich ha scritto:
> > Does Xerces-C's implementation of LSSerializer [1] support the
> > "well-formed" parameter? It's a required feature.
> >
> > Turning that on in Xerces-J would cause an error to be reported for the
> > invalid character.
> >
> > Thanks.
> >
> > [1] http://www.w3.org/TR/DOM-Level-3-LS/load-save.html#LS-LSSerializer
> >
> > Michael Glavassevich
> > XML Technologies and WAS Development
> > IBM Toronto Lab
> > E-mail: mrglavas@ca.ibm.com
> > E-mail: mrglavas@apache.org
> >
> > Alberto Massari<Al...@progress.com>  wrote on 01/03/2012
11:34:21
> > AM:
> >
> >> Hi Nedim,
> >> it's a known limitation of the current codebase: see
> >> https://issues.apache.org/jira/browse/XERCESC-1854
> >> You can check if a character is valid according to XML 1.0 by using
> >> XMLChar1_0::isXMLChar. For XML 1.1, use XMLChar1_1::isXMLChar
> >>
> >> Alberto
> >>
> >> Il 03/01/2012 15:01, Nedim Srndic ha scritto:
> >>> Hello,
> >>>
> >>> &#x1; is an invalid character reference in XML 1.0. If I write the
byte
> >>> value "\x01" to a Xerces-C TextNode and serialize the entire
> > DOMDocument
> >>> using UTF-8 and StdOutFormatTarget with XML version set to "1.0",
then
> >>> Xerces-C writes the resulting XML document (without substituting the
> >>> character with the corresponding character reference) and doesn't
> > report
> >>> any errors. Of course, the resulting XML is not well-formed and so I
> >>> cannot use it in other programs.
> >>>
> >>> In XML 1.1 this character reference is allowed and Xerces-C correctly
> >>> performs the character substitution, but the software I am using
these
> >>> documents with sadly still does not support XML 1.1.
> >>>
> >>> Is there a Xerces-C function that I can call that will check if a
> > string
> >>> that I want to put in an XML document satisfies the rules of
> >>> well-formedness for the given XML version? Is there something else I
> > can
> >>> do about this problem? Why doesn't Xerces-C report the error?
> >>>
> >>> Thank you,
> >>> Nedim Srndic

Michael Glavassevich
XML Technologies and WAS Development
IBM Toronto Lab
E-mail: mrglavas@ca.ibm.com
E-mail: mrglavas@apache.org

Re: No Error Reported When Serializing Invalid Character References in XML 1.0

Posted by Alberto Massari <Al...@progress.com>.
Hi Michael,
in this case the error is not caused by a character that the target 
encoding doesn't support (Xerces-C would handle that). It's because a 
node contains a character that XML is not supposed to accept.

Alberto

Il 03/01/2012 17:47, Michael Glavassevich ha scritto:
> Does Xerces-C's implementation of LSSerializer [1] support the
> "well-formed" parameter? It's a required feature.
>
> Turning that on in Xerces-J would cause an error to be reported for the
> invalid character.
>
> Thanks.
>
> [1] http://www.w3.org/TR/DOM-Level-3-LS/load-save.html#LS-LSSerializer
>
> Michael Glavassevich
> XML Technologies and WAS Development
> IBM Toronto Lab
> E-mail: mrglavas@ca.ibm.com
> E-mail: mrglavas@apache.org
>
> Alberto Massari<Al...@progress.com>  wrote on 01/03/2012 11:34:21
> AM:
>
>> Hi Nedim,
>> it's a known limitation of the current codebase: see
>> https://issues.apache.org/jira/browse/XERCESC-1854
>> You can check if a character is valid according to XML 1.0 by using
>> XMLChar1_0::isXMLChar. For XML 1.1, use XMLChar1_1::isXMLChar
>>
>> Alberto
>>
>> Il 03/01/2012 15:01, Nedim Srndic ha scritto:
>>> Hello,
>>>
>>> &#x1; is an invalid character reference in XML 1.0. If I write the byte
>>> value "\x01" to a Xerces-C TextNode and serialize the entire
> DOMDocument
>>> using UTF-8 and StdOutFormatTarget with XML version set to "1.0", then
>>> Xerces-C writes the resulting XML document (without substituting the
>>> character with the corresponding character reference) and doesn't
> report
>>> any errors. Of course, the resulting XML is not well-formed and so I
>>> cannot use it in other programs.
>>>
>>> In XML 1.1 this character reference is allowed and Xerces-C correctly
>>> performs the character substitution, but the software I am using these
>>> documents with sadly still does not support XML 1.1.
>>>
>>> Is there a Xerces-C function that I can call that will check if a
> string
>>> that I want to put in an XML document satisfies the rules of
>>> well-formedness for the given XML version? Is there something else I
> can
>>> do about this problem? Why doesn't Xerces-C report the error?
>>>
>>> Thank you,
>>> Nedim Srndic


Re: No Error Reported When Serializing Invalid Character References in XML 1.0

Posted by Michael Glavassevich <mr...@ca.ibm.com>.
Does Xerces-C's implementation of LSSerializer [1] support the
"well-formed" parameter? It's a required feature.

Turning that on in Xerces-J would cause an error to be reported for the
invalid character.

Thanks.

[1] http://www.w3.org/TR/DOM-Level-3-LS/load-save.html#LS-LSSerializer

Michael Glavassevich
XML Technologies and WAS Development
IBM Toronto Lab
E-mail: mrglavas@ca.ibm.com
E-mail: mrglavas@apache.org

Alberto Massari <Al...@progress.com> wrote on 01/03/2012 11:34:21
AM:

> Hi Nedim,
> it's a known limitation of the current codebase: see
> https://issues.apache.org/jira/browse/XERCESC-1854
> You can check if a character is valid according to XML 1.0 by using
> XMLChar1_0::isXMLChar. For XML 1.1, use XMLChar1_1::isXMLChar
>
> Alberto
>
> Il 03/01/2012 15:01, Nedim Srndic ha scritto:
> > Hello,
> >
> > &#x1; is an invalid character reference in XML 1.0. If I write the byte
> > value "\x01" to a Xerces-C TextNode and serialize the entire
DOMDocument
> > using UTF-8 and StdOutFormatTarget with XML version set to "1.0", then
> > Xerces-C writes the resulting XML document (without substituting the
> > character with the corresponding character reference) and doesn't
report
> > any errors. Of course, the resulting XML is not well-formed and so I
> > cannot use it in other programs.
> >
> > In XML 1.1 this character reference is allowed and Xerces-C correctly
> > performs the character substitution, but the software I am using these
> > documents with sadly still does not support XML 1.1.
> >
> > Is there a Xerces-C function that I can call that will check if a
string
> > that I want to put in an XML document satisfies the rules of
> > well-formedness for the given XML version? Is there something else I
can
> > do about this problem? Why doesn't Xerces-C report the error?
> >
> > Thank you,
> > Nedim Srndic

Re: No Error Reported When Serializing Invalid Character References in XML 1.0

Posted by Alberto Massari <Al...@progress.com>.
Hi Nedim,
it's a known limitation of the current codebase: see 
https://issues.apache.org/jira/browse/XERCESC-1854
You can check if a character is valid according to XML 1.0 by using 
XMLChar1_0::isXMLChar. For XML 1.1, use XMLChar1_1::isXMLChar

Alberto

Il 03/01/2012 15:01, Nedim Srndic ha scritto:
> Hello,
>
> &#x1; is an invalid character reference in XML 1.0. If I write the byte
> value "\x01" to a Xerces-C TextNode and serialize the entire DOMDocument
> using UTF-8 and StdOutFormatTarget with XML version set to "1.0", then
> Xerces-C writes the resulting XML document (without substituting the
> character with the corresponding character reference) and doesn't report
> any errors. Of course, the resulting XML is not well-formed and so I
> cannot use it in other programs.
>
> In XML 1.1 this character reference is allowed and Xerces-C correctly
> performs the character substitution, but the software I am using these
> documents with sadly still does not support XML 1.1.
>
> Is there a Xerces-C function that I can call that will check if a string
> that I want to put in an XML document satisfies the rules of
> well-formedness for the given XML version? Is there something else I can
> do about this problem? Why doesn't Xerces-C report the error?
>
> Thank you,
> Nedim Srndic
>