You are viewing a plain text version of this content. The canonical link for it is here.
Posted to c-users@xerces.apache.org by mini thomas <mi...@yahoo.com> on 2009/07/30 17:24:20 UTC
DOMLSSerializer converts white space characters in attributes to xml entities
Hi,
I am using xerces 3.0.1 and doing the following
1) Parse a string
2)Set an attribute "newattr" on the root node. The attribute value is
char *temp = "\n Hello \t\t testing"
3) converting the parsed data back to xml
static const XMLCh gLS[] = { chLatin_L, chLatin_S, chNull };
DOMImplementation *impl = DOMImplementationRegistry::getDOMImplementation(gLS);
DOMLSSerializer* myWriter = (impl)->createLSSerializer();
DOMConfiguration* dc = myWriter->getDomConfig();
dc->setParameter( XMLUni::fgDOMWRTDiscardDefaultContent,true);
// serialize the DOMNode to a UTF-16 string
XMLCh* theXMLString_Unicode = myWriter->writeToString(toWrite.GetDOMNodePtr());
4) Convert theXMLString_Unicode to char* and print using cout.
I got the attribute printed this way.
newattr="
 Hello 		 testing"
Is there any way to get the attribute printed as newattr="
Hello testing"
Thanks,
Mini
RE: DOMLSSerializer converts white space characters in attributes to xml entities
Posted by Jesse Pelton <js...@PKC.com>.
Note that this behavior is required by the XML specification. See http://www.w3.org/TR/2008/REC-xml-20081126/#AVNormalize. It's dense, but in summary, when an attribute value is loaded, leading and trailing white space is discarded, and each sequence of spaces, tabs, carriage returns, and linefeeds are converted to a single space.
This applies only if there's no schema indicating that the attribute value is CDATA, but the safest thing for a serializer to do is assume that the value might not be CDATA (or might not be recognized as such by whatever processor loads the document) and that whitespace should be preserved. The only way to guarantee that is to write the whitespace characters as entities.
-----Original Message-----
From: Alberto Massari [mailto:amassari@datadirect.com]
Sent: Thu 7/30/2009 11:37 AM
To: c-users@xerces.apache.org
Subject: Re: DOMLSSerializer converts white space characters in attributes to xml entities
No, if the serialized attribute value has newlines/tab, they are
converted upon loading into spaces. If you want to really store such
characters in an attribute, they have to be encoded into entities.
Alberto
mini thomas wrote:
> Hi,
>
> I am using xerces 3.0.1 and doing the following
>
>
> 1) Parse a string
>
> 2)Set an attribute "newattr" on the root node. The attribute value is
> char *temp = "\n Hello \t\t testing"
>
> 3) converting the parsed data back to xml
>
> static const XMLCh gLS[] = { chLatin_L, chLatin_S, chNull };
> DOMImplementation *impl = DOMImplementationRegistry::getDOMImplementation(gLS);
> DOMLSSerializer* myWriter = (impl)->createLSSerializer();
> DOMConfiguration* dc = myWriter->getDomConfig();
> dc->setParameter( XMLUni::fgDOMWRTDiscardDefaultContent,true);
> // serialize the DOMNode to a UTF-16 string
> XMLCh* theXMLString_Unicode = myWriter->writeToString(toWrite.GetDOMNodePtr());
>
> 4) Convert theXMLString_Unicode to char* and print using cout.
>
> I got the attribute printed this way.
> newattr="
 Hello 		 testing"
>
>
> Is there any way to get the attribute printed as newattr="
> Hello testing"
>
>
> Thanks,
> Mini
>
>
>
>
Re: DOMLSSerializer converts white space characters in attributes
to xml entities
Posted by Alberto Massari <am...@datadirect.com>.
No, if the serialized attribute value has newlines/tab, they are
converted upon loading into spaces. If you want to really store such
characters in an attribute, they have to be encoded into entities.
Alberto
mini thomas wrote:
> Hi,
>
> I am using xerces 3.0.1 and doing the following
>
>
> 1) Parse a string
>
> 2)Set an attribute "newattr" on the root node. The attribute value is
> char *temp = "\n Hello \t\t testing"
>
> 3) converting the parsed data back to xml
>
> static const XMLCh gLS[] = { chLatin_L, chLatin_S, chNull };
> DOMImplementation *impl = DOMImplementationRegistry::getDOMImplementation(gLS);
> DOMLSSerializer* myWriter = (impl)->createLSSerializer();
> DOMConfiguration* dc = myWriter->getDomConfig();
> dc->setParameter( XMLUni::fgDOMWRTDiscardDefaultContent,true);
> // serialize the DOMNode to a UTF-16 string
> XMLCh* theXMLString_Unicode = myWriter->writeToString(toWrite.GetDOMNodePtr());
>
> 4) Convert theXMLString_Unicode to char* and print using cout.
>
> I got the attribute printed this way.
> newattr="
 Hello 		 testing"
>
>
> Is there any way to get the attribute printed as newattr="
> Hello testing"
>
>
> Thanks,
> Mini
>
>
>
>