You are viewing a plain text version of this content. The canonical link for it is here.
Posted to c-users@xerces.apache.org by mini thomas <mi...@yahoo.com> on 2009/07/30 17:24:20 UTC

DOMLSSerializer converts white space characters in attributes to xml entities

Hi,
 
I am using xerces 3.0.1 and doing the following
 

1) Parse a string
 
2)Set an attribute "newattr" on the root node. The attribute value is 
char *temp = "\n Hello \t\t testing"
 
3) converting the parsed data back to xml
 
static const XMLCh gLS[] = { chLatin_L,  chLatin_S,  chNull };
DOMImplementation *impl = DOMImplementationRegistry::getDOMImplementation(gLS);
DOMLSSerializer*  myWriter = (impl)->createLSSerializer();
DOMConfiguration* dc = myWriter->getDomConfig();
dc->setParameter( XMLUni::fgDOMWRTDiscardDefaultContent,true);
// serialize the DOMNode to a UTF-16 string
XMLCh* theXMLString_Unicode = myWriter->writeToString(toWrite.GetDOMNodePtr());

4) Convert theXMLString_Unicode  to char* and print using cout.
 
 I got the attribute printed this way.
newattr="&#xA; Hello &#x9;&#x9; testing"
 
 
Is there any way to get the attribute printed as newattr="
 Hello  testing"
 
 
Thanks,
Mini


      

RE: DOMLSSerializer converts white space characters in attributes to xml entities

Posted by Jesse Pelton <js...@PKC.com>.
Note that this behavior is required by the XML specification.  See http://www.w3.org/TR/2008/REC-xml-20081126/#AVNormalize.  It's dense, but in summary, when an attribute value is loaded, leading and trailing white space is discarded, and each sequence of spaces, tabs, carriage returns, and linefeeds are converted to a single space.

This applies only if there's no schema indicating that the attribute value is CDATA, but the safest thing for a serializer to do is assume that the value might not be CDATA (or might not be recognized as such by whatever processor loads the document) and that whitespace should be preserved.  The only way to guarantee that is to write the whitespace characters as entities.

-----Original Message-----
From: Alberto Massari [mailto:amassari@datadirect.com]
Sent: Thu 7/30/2009 11:37 AM
To: c-users@xerces.apache.org
Subject: Re: DOMLSSerializer converts white space characters in attributes to xml entities
 
No, if the serialized attribute value has newlines/tab, they are 
converted upon loading into spaces. If you want to really store such 
characters in an attribute, they have to be encoded into entities.

Alberto

mini thomas wrote:
> Hi,
>  
> I am using xerces 3.0.1 and doing the following
>  
>
> 1) Parse a string
>  
> 2)Set an attribute "newattr" on the root node. The attribute value is 
> char *temp = "\n Hello \t\t testing"
>  
> 3) converting the parsed data back to xml
>  
> static const XMLCh gLS[] = { chLatin_L,  chLatin_S,  chNull };
> DOMImplementation *impl = DOMImplementationRegistry::getDOMImplementation(gLS);
> DOMLSSerializer*  myWriter = (impl)->createLSSerializer();
> DOMConfiguration* dc = myWriter->getDomConfig();
> dc->setParameter( XMLUni::fgDOMWRTDiscardDefaultContent,true);
> // serialize the DOMNode to a UTF-16 string
> XMLCh* theXMLString_Unicode = myWriter->writeToString(toWrite.GetDOMNodePtr());
>
> 4) Convert theXMLString_Unicode  to char* and print using cout.
>  
>  I got the attribute printed this way.
> newattr="&#xA; Hello &#x9;&#x9; testing"
>  
>  
> Is there any way to get the attribute printed as newattr="
>  Hello  testing"
>  
>  
> Thanks,
> Mini
>
>
>       
>   



Re: DOMLSSerializer converts white space characters in attributes to xml entities

Posted by Alberto Massari <am...@datadirect.com>.
No, if the serialized attribute value has newlines/tab, they are 
converted upon loading into spaces. If you want to really store such 
characters in an attribute, they have to be encoded into entities.

Alberto

mini thomas wrote:
> Hi,
>  
> I am using xerces 3.0.1 and doing the following
>  
>
> 1) Parse a string
>  
> 2)Set an attribute "newattr" on the root node. The attribute value is 
> char *temp = "\n Hello \t\t testing"
>  
> 3) converting the parsed data back to xml
>  
> static const XMLCh gLS[] = { chLatin_L,  chLatin_S,  chNull };
> DOMImplementation *impl = DOMImplementationRegistry::getDOMImplementation(gLS);
> DOMLSSerializer*  myWriter = (impl)->createLSSerializer();
> DOMConfiguration* dc = myWriter->getDomConfig();
> dc->setParameter( XMLUni::fgDOMWRTDiscardDefaultContent,true);
> // serialize the DOMNode to a UTF-16 string
> XMLCh* theXMLString_Unicode = myWriter->writeToString(toWrite.GetDOMNodePtr());
>
> 4) Convert theXMLString_Unicode  to char* and print using cout.
>  
>  I got the attribute printed this way.
> newattr="&#xA; Hello &#x9;&#x9; testing"
>  
>  
> Is there any way to get the attribute printed as newattr="
>  Hello  testing"
>  
>  
> Thanks,
> Mini
>
>
>       
>