You are viewing a plain text version of this content. The canonical link for it is here.
Posted to c-dev@xerces.apache.org by nishank837 <ni...@gmail.com> on 2010/04/26 19:36:17 UTC

Problems with White Space and Line Feed Handling

Hi,
I am having problems handling white space etc while parsing XML documents.
The data I want to be saved is somethign like this which include line feeds
and white spaces
 <info>
       <label>ABB</label>
       <description>Please</description>
 </info>

<question id="TEacher" xsi:type="options">

when I save this xml and parse it and get it i get something like this
  <info>        <label>ABB</label>       
<description>Please</description></info><question id="TEacher"
xsi:type="options">
with all the line feeds stripped of.....Can someone let me know how I can
retain all these?
-- 
View this message in context: http://old.nabble.com/Problems-with-White-Space-and-Line-Feed-Handling-tp28367855p28367855.html
Sent from the Xerces - C - Dev mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: c-dev-unsubscribe@xerces.apache.org
For additional commands, e-mail: c-dev-help@xerces.apache.org


Re: Problems with White Space and Line Feed Handling

Posted by Gareth Reakes <ga...@we7.com>.
Hey,

On 27 Apr 2010, at 08:36, nishank837 wrote:

> 
> Actually I did put the setNewline attribute for the serializer...

OK. Lets make sure everything is sane. In your code go and access the text of the Attribute node and take a look whats in there. Confirm that the new lines are in there. If they are then its definitely a serialisation issue.


> As far as the other question is concerned..actually I have a series of other
> nodes where I have numbers so I initially started off storing them as
> attributes for the Node ....But later I realized that sometimes these
> attributes can be XML's so was looking for a fix .
> And I have one more question if I store this as a CDATA element how can I
> aceess it ? As in for the other elements(DOMElement)... I can get the
> element by the Tag Name but The CDATA(DOMCDATASection) element doesnt have
> soemthing like that So How do I access it ?
> 

If you put it in CDATA element then it will be the same as an attribute value - the XML itself won't be parsed. Is there a reason you are not putting it as content to a node? Like:

<item>
<info>
      <label>ABB</label>
      <description>Please</description>
</info>

<question id="TEacher" xsi:type="options">
</item>


That way the parser will check and make sure all the elements are closed etc.


Gareth


> nishank837 wrote:
>> 
>> Hi,
>> I am having problems handling white space etc while parsing XML documents.
>> The data I want to be saved is somethign like this which include line
>> feeds and white spaces
>> <info>
>>       <label>ABB</label>
>>       <description>Please</description>
>> </info>
>> 
>> <question id="TEacher" xsi:type="options">
>> 
>> when I save this xml and parse it and get it i get something like this
>>  <info>        <label>ABB</label>       
>> <description>Please</description></info><question id="TEacher"
>> xsi:type="options">
>> with all the line feeds stripped of.....Can someone let me know how I can
>> retain all these?
>> I am using the XercesC DOM parser
>> 
> 
> -- 
> View this message in context: http://old.nabble.com/Problems-with-White-Space-and-Line-Feed-Handling-tp28367855p28373844.html
> Sent from the Xerces - C - Dev mailing list archive at Nabble.com.
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: c-dev-unsubscribe@xerces.apache.org
> For additional commands, e-mail: c-dev-help@xerces.apache.org
> 
> 

-- 
Gareth Reakes, CTO         WE7 - Great Music, Free
+44-20-7117-0809                    http://www.we7.com




Re: Problems with White Space and Line Feed Handling

Posted by nishank837 <ni...@gmail.com>.
Actually I did put the setNewline attribute for the serializer...
As far as the other question is concerned..actually I have a series of other
nodes where I have numbers so I initially started off storing them as
attributes for the Node ....But later I realized that sometimes these
attributes can be XML's so was looking for a fix .
And I have one more question if I store this as a CDATA element how can I
aceess it ? As in for the other elements(DOMElement)... I can get the
element by the Tag Name but The CDATA(DOMCDATASection) element doesnt have
soemthing like that So How do I access it ?

nishank837 wrote:
> 
> Hi,
> I am having problems handling white space etc while parsing XML documents.
> The data I want to be saved is somethign like this which include line
> feeds and white spaces
>  <info>
>        <label>ABB</label>
>        <description>Please</description>
>  </info>
> 
> <question id="TEacher" xsi:type="options">
> 
> when I save this xml and parse it and get it i get something like this
>   <info>        <label>ABB</label>       
> <description>Please</description></info><question id="TEacher"
> xsi:type="options">
> with all the line feeds stripped of.....Can someone let me know how I can
> retain all these?
> I am using the XercesC DOM parser
> 

-- 
View this message in context: http://old.nabble.com/Problems-with-White-Space-and-Line-Feed-Handling-tp28367855p28373844.html
Sent from the Xerces - C - Dev mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: c-dev-unsubscribe@xerces.apache.org
For additional commands, e-mail: c-dev-help@xerces.apache.org


Re: Problems with White Space and Line Feed Handling

Posted by Gareth Reakes <ga...@we7.com>.
Hi,

On 27 Apr 2010, at 07:49, nishank837 wrote:

> 
> Well actually this is how I am doing it
> 
> pImplement =
> DOMImplementationRegistry::getDOMImplementation(XMLString::transcode("LS"));
> Serializer = ((DOMImplementationLS*)pImplement)->createDOMWriter();
> XMLFormatTarget *myFormatTarget = new MemBufFormatTarget();
> Serializer->writeNode(myFormatTarget, *XDOMDocument);
> XMLObject = (char*)((MemBufFormatTarget*)myFormatTarget)->getRawBuffer();
> 
> I totally forgot to mention one thing.....I have stored this XML as and
> attribute to another XML tag ....will that work?? Is it possible to retrieve
> the attribute with the original formatting
> 


You might want to make sure you really want to do what you are doing. What you have in the attribute value is a text node. Its not parsed into different DOM Nodes and it seems a touch strange that you would want to store XML there. The parser will not make sure its correct XML (apart from the fact its a valid attribute). For your problem, take a look at setNewLine on DOMLSSerializer.

http://xerces.apache.org/xerces-c/apiDocs-3/classDOMLSSerializer.html#56882d2fe0b4a0ecb1b3968febbcf4a3

Does this fix it?

Gareth




> Gareth Reakes-6 wrote:
>> 
>> Hi,
>> 
>> On 26 Apr 2010, at 18:36, nishank837 wrote:
>> 
>>> 
>>> Hi,
>>> I am having problems handling white space etc while parsing XML
>>> documents.
>>> The data I want to be saved is somethign like this which include line
>>> feeds
>>> and white spaces
>>> <info>
>>>      <label>ABB</label>
>>>      <description>Please</description>
>>> </info>
>>> 
>>> <question id="TEacher" xsi:type="options">
>>> 
>>> when I save this xml and parse it and get it i get something like this
>>> <info>        <label>ABB</label>       
>>> <description>Please</description></info><question id="TEacher"
>>> xsi:type="options">
>>> with all the line feeds stripped of.....Can someone let me know how I can
>>> retain all these?
>> 
>> 
>> When you say this happens after you parse the document, how do you tell?
>> How are you serialising to the second document that does not have the new
>> lines in?
>> 
>> Gareth
>> 
>> 
>> 
>> 
>>> -- 
>>> View this message in context:
>>> http://old.nabble.com/Problems-with-White-Space-and-Line-Feed-Handling-tp28367855p28367855.html
>>> Sent from the Xerces - C - Dev mailing list archive at Nabble.com.
>>> 
>>> 
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: c-dev-unsubscribe@xerces.apache.org
>>> For additional commands, e-mail: c-dev-help@xerces.apache.org
>>> 
>>> 
>> 
>> -- 
>> Gareth Reakes, CTO         WE7 - Great Music, Free
>> +44-20-7117-0809                    http://www.we7.com
>> 
>> 
>> 
>> 
>> 
> 
> -- 
> View this message in context: http://old.nabble.com/Problems-with-White-Space-and-Line-Feed-Handling-tp28367855p28373515.html
> Sent from the Xerces - C - Dev mailing list archive at Nabble.com.
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: c-dev-unsubscribe@xerces.apache.org
> For additional commands, e-mail: c-dev-help@xerces.apache.org
> 
> 

-- 
Gareth Reakes, CTO         WE7 - Great Music, Free
+44-20-7117-0809                    http://www.we7.com




Re: Problems with White Space and Line Feed Handling

Posted by nishank837 <ni...@gmail.com>.
Well actually this is how I am doing it

 pImplement =
DOMImplementationRegistry::getDOMImplementation(XMLString::transcode("LS"));
Serializer = ((DOMImplementationLS*)pImplement)->createDOMWriter();
XMLFormatTarget *myFormatTarget = new MemBufFormatTarget();
Serializer->writeNode(myFormatTarget, *XDOMDocument);
XMLObject = (char*)((MemBufFormatTarget*)myFormatTarget)->getRawBuffer();

I totally forgot to mention one thing.....I have stored this XML as and
attribute to another XML tag ....will that work?? Is it possible to retrieve
the attribute with the original formatting

Gareth Reakes-6 wrote:
> 
> Hi,
> 
> On 26 Apr 2010, at 18:36, nishank837 wrote:
> 
>> 
>> Hi,
>> I am having problems handling white space etc while parsing XML
>> documents.
>> The data I want to be saved is somethign like this which include line
>> feeds
>> and white spaces
>> <info>
>>       <label>ABB</label>
>>       <description>Please</description>
>> </info>
>> 
>> <question id="TEacher" xsi:type="options">
>> 
>> when I save this xml and parse it and get it i get something like this
>>  <info>        <label>ABB</label>       
>> <description>Please</description></info><question id="TEacher"
>> xsi:type="options">
>> with all the line feeds stripped of.....Can someone let me know how I can
>> retain all these?
> 
> 
> When you say this happens after you parse the document, how do you tell?
> How are you serialising to the second document that does not have the new
> lines in?
> 
> Gareth
> 
> 
> 
> 
>> -- 
>> View this message in context:
>> http://old.nabble.com/Problems-with-White-Space-and-Line-Feed-Handling-tp28367855p28367855.html
>> Sent from the Xerces - C - Dev mailing list archive at Nabble.com.
>> 
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: c-dev-unsubscribe@xerces.apache.org
>> For additional commands, e-mail: c-dev-help@xerces.apache.org
>> 
>> 
> 
> -- 
> Gareth Reakes, CTO         WE7 - Great Music, Free
> +44-20-7117-0809                    http://www.we7.com
> 
> 
> 
> 
> 

-- 
View this message in context: http://old.nabble.com/Problems-with-White-Space-and-Line-Feed-Handling-tp28367855p28373515.html
Sent from the Xerces - C - Dev mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: c-dev-unsubscribe@xerces.apache.org
For additional commands, e-mail: c-dev-help@xerces.apache.org


Re: Problems with White Space and Line Feed Handling

Posted by Gareth Reakes <ga...@we7.com>.
Hi,

On 26 Apr 2010, at 18:36, nishank837 wrote:

> 
> Hi,
> I am having problems handling white space etc while parsing XML documents.
> The data I want to be saved is somethign like this which include line feeds
> and white spaces
> <info>
>       <label>ABB</label>
>       <description>Please</description>
> </info>
> 
> <question id="TEacher" xsi:type="options">
> 
> when I save this xml and parse it and get it i get something like this
>  <info>        <label>ABB</label>       
> <description>Please</description></info><question id="TEacher"
> xsi:type="options">
> with all the line feeds stripped of.....Can someone let me know how I can
> retain all these?


When you say this happens after you parse the document, how do you tell? How are you serialising to the second document that does not have the new lines in?

Gareth




> -- 
> View this message in context: http://old.nabble.com/Problems-with-White-Space-and-Line-Feed-Handling-tp28367855p28367855.html
> Sent from the Xerces - C - Dev mailing list archive at Nabble.com.
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: c-dev-unsubscribe@xerces.apache.org
> For additional commands, e-mail: c-dev-help@xerces.apache.org
> 
> 

-- 
Gareth Reakes, CTO         WE7 - Great Music, Free
+44-20-7117-0809                    http://www.we7.com