You are viewing a plain text version of this content. The canonical link for it is here.
Posted to j-users@xerces.apache.org by Chris Bowditch <bo...@hotmail.com> on 2013/05/28 18:41:33 UTC

Problem with serializing Text Data

Hi All,

I've been searching JIRA for any issue serializing text data that 
contains CDATA keyword (but is not a fully formed CDATA section) I 
couldn't see one, so I'm posting here before I starting debugging the 
Serializer code to see if anyone has seen this issue.

In the input XML we have the following text node:

&lt;value&gt;&lt;![CDATA[-1]]&gt;&lt;/value&gt;

Our application is using Xerces to parse this XML and its correctly 
recognized as a character event. If I try to serialize this same 
character event, the resulting XML ends up like:

<value><![CDATA[-1]]]]><![CDATA[></value>

This looks wrong to me and results in a malformed XML File.

Any input would be welcomed.

Thanks,

Chris

---------------------------------------------------------------------
To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
For additional commands, e-mail: j-users-help@xerces.apache.org


Re: Problem with serializing Text Data

Posted by Michael Glavassevich <mr...@ca.ibm.com>.
Hi Chris,

I would generally expect any XML serializer to escape the '<' and '>' 
characters that appear in textual content when writing to an OutputStream. 
If they're not being escaped that is odd.

Thanks.

Michael Glavassevich
XML Technologies and WAS Development
IBM Toronto Lab
E-mail: mrglavas@ca.ibm.com
E-mail: mrglavas@apache.org

Chris Bowditch <bo...@hotmail.com> wrote on 05/29/2013 05:25:42 
AM:

> Hi Michael,
> 
> Thanks for your reply. The code to serialize is fairly straight forward:
> 
>          Properties serializerProperties = properties;
>          serializerProperties = 
> OutputPropertiesFactory.getDefaultMethodProperties(Method.XML);
>          Serializer serializer = 
> SerializerFactory.getSerializer(serializerProperties);
>          serializer.setOutputStream(m_out);
> 
> and then serializer.asContentHandler() is called to get a content 
> handler and the SAX Events from the parsing chain are tied to that. I've 

> used a debugger to examine the SAX Events and the below text is treated 
> as characters event all the way through. There are no calls to 
> startCDATA()/endCDATA() around this text.
> 
> The XML itself is provided by a customer of mine. The mixture of escaped 

> and unescaped characters in the CDATA definition is very unusual, 
> although still well formed, I don't know the reason why my customer has 
> choosen to use such a strange sequence.
> 
> Thanks,
> 
> Chris
> 
> On 28/05/2013 18:06, Michael Glavassevich wrote:
> > Hi Chris,
> >
> > What XML API are you using for serializing your document?
> >
> > A code snippet showing what you did might help.
> >
> > Thanks.
> >
> > Michael Glavassevich
> > XML Technologies and WAS Development
> > IBM Toronto Lab
> > E-mail: mrglavas@ca.ibm.com
> > E-mail: mrglavas@apache.org
> >
> > Chris Bowditch <bo...@hotmail.com> wrote on 05/28/2013 
12:41:33
> > PM:
> >
> >> Hi All,
> >>
> >> I've been searching JIRA for any issue serializing text data that
> >> contains CDATA keyword (but is not a fully formed CDATA section) I
> >> couldn't see one, so I'm posting here before I starting debugging the
> >> Serializer code to see if anyone has seen this issue.
> >>
> >> In the input XML we have the following text node:
> >>
> >> &lt;value&gt;&lt;![CDATA[-1]]&gt;&lt;/value&gt;
> >>
> >> Our application is using Xerces to parse this XML and its correctly
> >> recognized as a character event. If I try to serialize this same
> >> character event, the resulting XML ends up like:
> >>
> >> <value><![CDATA[-1]]]]><![CDATA[></value>
> >>
> >> This looks wrong to me and results in a malformed XML File.
> >>
> >> Any input would be welcomed.
> >>
> >> Thanks,
> >>
> >> Chris
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
> >> For additional commands, e-mail: j-users-help@xerces.apache.org
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
> > For additional commands, e-mail: j-users-help@xerces.apache.org
> >
> >
> >
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
> For additional commands, e-mail: j-users-help@xerces.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
For additional commands, e-mail: j-users-help@xerces.apache.org


Re: Problem with serializing Text Data

Posted by Chris Bowditch <bo...@hotmail.com>.
Hi Michael,

Thanks for your reply. The code to serialize is fairly straight forward:

         Properties serializerProperties = properties;
         serializerProperties = 
OutputPropertiesFactory.getDefaultMethodProperties(Method.XML);
         Serializer serializer = 
SerializerFactory.getSerializer(serializerProperties);
         serializer.setOutputStream(m_out);

and then serializer.asContentHandler() is called to get a content 
handler and the SAX Events from the parsing chain are tied to that. I've 
used a debugger to examine the SAX Events and the below text is treated 
as characters event all the way through. There are no calls to 
startCDATA()/endCDATA() around this text.

The XML itself is provided by a customer of mine. The mixture of escaped 
and unescaped characters in the CDATA definition is very unusual, 
although still well formed, I don't know the reason why my customer has 
choosen to use such a strange sequence.

Thanks,

Chris

On 28/05/2013 18:06, Michael Glavassevich wrote:
> Hi Chris,
>
> What XML API are you using for serializing your document?
>
> A code snippet showing what you did might help.
>
> Thanks.
>
> Michael Glavassevich
> XML Technologies and WAS Development
> IBM Toronto Lab
> E-mail: mrglavas@ca.ibm.com
> E-mail: mrglavas@apache.org
>
> Chris Bowditch <bo...@hotmail.com> wrote on 05/28/2013 12:41:33
> PM:
>
>> Hi All,
>>
>> I've been searching JIRA for any issue serializing text data that
>> contains CDATA keyword (but is not a fully formed CDATA section) I
>> couldn't see one, so I'm posting here before I starting debugging the
>> Serializer code to see if anyone has seen this issue.
>>
>> In the input XML we have the following text node:
>>
>> &lt;value&gt;&lt;![CDATA[-1]]&gt;&lt;/value&gt;
>>
>> Our application is using Xerces to parse this XML and its correctly
>> recognized as a character event. If I try to serialize this same
>> character event, the resulting XML ends up like:
>>
>> <value><![CDATA[-1]]]]><![CDATA[></value>
>>
>> This looks wrong to me and results in a malformed XML File.
>>
>> Any input would be welcomed.
>>
>> Thanks,
>>
>> Chris
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
>> For additional commands, e-mail: j-users-help@xerces.apache.org
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
> For additional commands, e-mail: j-users-help@xerces.apache.org
>
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
For additional commands, e-mail: j-users-help@xerces.apache.org


Re: Problem with serializing Text Data

Posted by Michael Glavassevich <mr...@ca.ibm.com>.
Hi Chris,

What XML API are you using for serializing your document?

A code snippet showing what you did might help.

Thanks.

Michael Glavassevich
XML Technologies and WAS Development
IBM Toronto Lab
E-mail: mrglavas@ca.ibm.com
E-mail: mrglavas@apache.org

Chris Bowditch <bo...@hotmail.com> wrote on 05/28/2013 12:41:33 
PM:

> Hi All,
> 
> I've been searching JIRA for any issue serializing text data that 
> contains CDATA keyword (but is not a fully formed CDATA section) I 
> couldn't see one, so I'm posting here before I starting debugging the 
> Serializer code to see if anyone has seen this issue.
> 
> In the input XML we have the following text node:
> 
> &lt;value&gt;&lt;![CDATA[-1]]&gt;&lt;/value&gt;
> 
> Our application is using Xerces to parse this XML and its correctly 
> recognized as a character event. If I try to serialize this same 
> character event, the resulting XML ends up like:
> 
> <value><![CDATA[-1]]]]><![CDATA[></value>
> 
> This looks wrong to me and results in a malformed XML File.
> 
> Any input would be welcomed.
> 
> Thanks,
> 
> Chris
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
> For additional commands, e-mail: j-users-help@xerces.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
For additional commands, e-mail: j-users-help@xerces.apache.org