You are viewing a plain text version of this content. The canonical link for it is here.
Posted to c-dev@xerces.apache.org by Bryan Wilcox <az...@yahoo.com> on 2004/07/20 18:47:41 UTC

Encoding Question

I have observed this behavior, but haven't been able to find anything in the documentation to support what I believe to be true.  Could somebody clarify this for me.  If I have a text node and inside the text node is embedded "non-safe" characters such as <, >, etc. the parser will correctly read/validate && serialize these characters.  For example, if I have a xml instance document that contains the following line:
<error> I have an <error> in my document</error>
 
This line seems to validate correctly when validation is turned on.  Could anyone suggest where I could look to get further information on this behavior.  At first glance, one would think that you woud need to canonicalize the text to prevent the parser from getting confused, but it seems to handle this correctly.
 
Any help would be appreciated.

Thanks,
Bryan

		
---------------------------------
Do you Yahoo!?
Vote for the stars of Yahoo!'s next ad campaign!

Re: Encoding Question

Posted by Alberto Massari <am...@progress.com>.
At 10.04 20/07/2004 -0700, Bryan Wilcox wrote:
>That is exactly what I mean.  I have created a text node named error.  It 
>has the content I have an <error> in my document.  The file is created by 
>serializing data out using a DOMWriter.  It is read in using a validating 
>DOMBuilder which validates against our schema.  We just weren't sure that 
>it did the textual replacement that you alluded to below where < becomes 
>&lt; and > becomes &gt;  Is it possible for you to point out to me in 
>either the code or documentation where these "conversions" or 
>"cannonicalizations" are done when you are using DOMWriter?  I would like 
>to see what characters are converted by the code.

The conversion is done by the DOMWriter through an object called 
XMLFormatter (framework\XMLFormatter.cpp).
The characters being escaped are different according to the place where 
they are found: inside attribute values [& < " \n] or inside text nodes [& < >]

Alberto

>
>Thanks,
>Bryan
>
>Alberto Massari <am...@progress.com> wrote:
>Hi Bryan,
>
>At 09.47 20/07/2004 -0700, Bryan Wilcox wrote:
> >I have observed this behavior, but haven't been able to find anything in
> >the documentation to support what I believe to be true. Could somebody
> >clarify this for me. If I have a text node and inside the text node is
> >embedded "non-safe" characters such as <, >, etc. the parser will
> >correctly read/validate && serialize these characters. For example, if I
> >have a xml instance document that contains the following line:
> > I have an in my document
> >
> >This line seems to validate correctly when validation is turned on.
>
>This shouldn't even parse; unless you are saying that you have created a
>DOMText node with the content " I have an in my document", then you
>serialized using DOMWriter, then parsed it wi! th validation turned on. In
>this case, the serialization has done the right thing, and the saved XML
>file really contains
>
>I have an <error> in my document
>
>Alberto
>
> >Could anyone suggest where I could look to get further information on this
> >behavior. At first glance, one would think that you woud need to
> >canonicalize the text to prevent the parser from getting confused, but it
> >seems to handle this correctly.
> >
> >Any help would be appreciated.
> >
> >Thanks,
> >Bryan
> >
> >
> >Do you Yahoo!?
> >Vote for the
> >stars of Yahoo!'s next ad campaign!
>
>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: xerces-c-dev-unsubscribe@xml.apache.org
>For additional commands, e-mail: xerces-c-dev-help@xml.apache.org
>
>
>Do you Yahoo!?
><http://advision.webevents.yahoo.com/yahoo/votelifeengine/>Vote for the 
>stars of Yahoo!'s next ad campaign!



---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-c-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-c-dev-help@xml.apache.org


Re: Encoding Question

Posted by Bryan Wilcox <az...@yahoo.com>.
That is exactly what I mean.  I have created a text node named error.  It has the content I have an <error> in my document.  The file is created by serializing data out using a DOMWriter.  It is read in using a validating DOMBuilder which validates against our schema.  We just weren't sure that it did the textual replacement that you alluded to below where < becomes &lt; and > becomes &gt;  Is it possible for you to point out to me in either the code or documentation where these "conversions" or "cannonicalizations" are done when you are using DOMWriter?  I would like to see what characters are converted by the code.
 
Thanks,
Bryan

Alberto Massari <am...@progress.com> wrote:
Hi Bryan,

At 09.47 20/07/2004 -0700, Bryan Wilcox wrote:
>I have observed this behavior, but haven't been able to find anything in 
>the documentation to support what I believe to be true. Could somebody 
>clarify this for me. If I have a text node and inside the text node is 
>embedded "non-safe" characters such as <, >, etc. the parser will 
>correctly read/validate && serialize these characters. For example, if I 
>have a xml instance document that contains the following line:
> I have an in my document
>
>This line seems to validate correctly when validation is turned on.

This shouldn't even parse; unless you are saying that you have created a 
DOMText node with the content " I have an in my document", then you 
serialized using DOMWriter, then parsed it with validation turned on. In 
this case, the serialization has done the right thing, and the saved XML 
file really contains

I have an <error> in my document

Alberto

>Could anyone suggest where I could look to get further information on this 
>behavior. At first glance, one would think that you woud need to 
>canonicalize the text to prevent the parser from getting confused, but it 
>seems to handle this correctly.
>
>Any help would be appreciated.
>
>Thanks,
>Bryan
>
>
>Do you Yahoo!?
>Vote for the 
>stars of Yahoo!'s next ad campaign!



---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-c-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-c-dev-help@xml.apache.org


		
---------------------------------
Do you Yahoo!?
Vote for the stars of Yahoo!'s next ad campaign!

Re: Encoding Question

Posted by Alberto Massari <am...@progress.com>.
Hi Bryan,

At 09.47 20/07/2004 -0700, Bryan Wilcox wrote:
>I have observed this behavior, but haven't been able to find anything in 
>the documentation to support what I believe to be true.  Could somebody 
>clarify this for me.  If I have a text node and inside the text node is 
>embedded "non-safe" characters such as <, >, etc. the parser will 
>correctly read/validate && serialize these characters.  For example, if I 
>have a xml instance document that contains the following line:
><error> I have an <error> in my document</error>
>
>This line seems to validate correctly when validation is turned on.

This shouldn't even parse; unless you are saying that you have created a 
DOMText node with the content " I have an <error> in my document", then you 
serialized using DOMWriter, then parsed it with validation turned on. In 
this case, the serialization has done the right thing, and the saved XML 
file really contains

<error> I have an &lt;error&gt; in my document</error>

Alberto

>Could anyone suggest where I could look to get further information on this 
>behavior.  At first glance, one would think that you woud need to 
>canonicalize the text to prevent the parser from getting confused, but it 
>seems to handle this correctly.
>
>Any help would be appreciated.
>
>Thanks,
>Bryan
>
>
>Do you Yahoo!?
><http://advision.webevents.yahoo.com/yahoo/votelifeengine/>Vote for the 
>stars of Yahoo!'s next ad campaign!



---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-c-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-c-dev-help@xml.apache.org