You are viewing a plain text version of this content. The canonical link for it is here.
Posted to c-dev@xerces.apache.org by Joerg Toellner <to...@oss-gmbh.de> on 2004/06/01 15:21:54 UTC

Removed nodes leave blank lines after serializing to file

Hi Group,

sorry, i have to bother you with a problem i can't resolve by myself.

Environment:
Visual Studio 6.0
Xerces-C 2.5.0 (Windows Binaries downloaded from xml.apache.org)

What i want my program to do:
I have a XML-File with placeholders in the attribute-values and
textnode-values (call it a template). I get some data (not XML, a
tag=value format) from another application. I load the data and parse
it. Then i load my XML-File with a DOMBuilder and parse it too.

Then i traverse the DOMTree with a DOMNodeIterator and look for the
special placeholders. replacing the right data from the tag/value file
for the placeholder.

After i processed the last node i want to write the DOMTree to another
XML-Document with a DOMWriter (now not anymore a template, now it is a
XML-File with Data instead).

Nodes which are not used in this replacing process (no data transferred
from the other application for this unique placeholder) are removed from
the tree as it is not neccessary for this document. Maybe in another run
with other data transferred it is used of course. So my template
includes many nodes only sometimes used and sometimes not.

I hope you get the idea!

What my problem is:
Most of all works very fine and fast (Thx. to the great people her which
creates this great XERCES-Parser) But! When i look at my Data-XML-File
after processing and serializing i notice some blank lines and spaces
outside the nodes (not in the nodes - in the nodes all is as i wish). I
noticed further that the amount of spaces/lines between the outputted
nodes is somehow dependent on how much nodes i removed between the two
outputted nodes.

Let me show you a little snippet from my "template" and my outputted
"datafile" to clear things up:

------------SNIP-TEMPLATE--------------
<?xml version="1.0" encoding="iso-8859-1"?>
<?xml-stylesheet type="text/xsl" href="sci_arztbrief.xsl"?>
<levelone xmlns="urn::hl7-org/cda"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:sciphox="urn::sciphox-org/sciphox"
xsi:schemaLocation="urn::hl7-org/cda sciphox-cda.xsd">
	<clinical_document_header>
		<id RT="#cdh.docid.rt" EX="#cdh.docid.ex"/>
		<set_id RT="#cdh.docsetid.rt" EX="#cdh.docsetid.ex"/>
		<version_nbr V="#cdh.docvers"/>
		<document_type_cd V="#cdh.doctype.code"
S="2.16.840.1.113883.6.1" DN="#cdh.doctype.name"/>
		<service_tmr V="#cdh.servtmr"/>
		<origination_dttm V="#cdh.origdttm"/>
		<confidentiality_cd V="#cdh.conf.code"
S="2.16.840.1.113883.5.10228" DN="#cdh.conf.name" medidel="1"/>
<!-- About 114 Lines with many nodes and childnodes deleted here to
shorten the mail -->
		<intended_receipent>
------------SNIP-TEMPLATE--------------

------------SNIP-OUTPUT--------------
<levelone xmlns="urn::hl7-org/cda"
xmlns:sciphox="urn::sciphox-org/sciphox"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="urn::hl7-org/cda sciphox-cda.xsd">
	<clinical_document_header>
		  <id EX="179" RT="text-id"/>
		
		  <version_nbr V="1"/>
		  <document_type_cd DN="Befund"
S="2.16.840.1.113883.6.1" V="11542-8"/>
		
		
		
		
		
		
		
		  <intended_receipent>
------------SNIP-OUTPUT--------------

You see, between id and version_nbr is only one line blank and if you
look in the template there is only on element node in between (the
set_id element). Consecutive lines (no removed nodes) from the template
are consecutive in the output too. If there are many nodes removed, this
results in many blank lines.

Please look at my code snipptes as follows:

------------SNIP--------CODE------------
void walk_tree()
{
	DOMNode *nd;
	DOMImplementation *impl;
	DOMDocument *doc;

	// ... Code for initializing, getting the DOMNodeIterator a.s.o.
stripped
	dom_RemoveNode(nd);
	// ... Blah blah blah do more other stuff (not with the DOMTree)
	dom_WriteDoc(impl, doc, "./myoutput.xml");
}

void dom_RemoveNode(DOMNode *nd)
{
    nd->getParentNode()->removeChild(nd);
    nd->release();
}

void dom_WriteDoc(DOMImplementation* impl, DOMDocument *doc, char
*filename)
{
    DOMWriter* domwrite;
    XMLFormatTarget* formtarget;

    domwrite = ((DOMImplementationLS *) impl)->createDOMWriter();
    domwrite->setFeature(X("format-pretty-print"), true);
    domwrite->setFeature(X("whitespace-in-element-content"), true);
    formtarget = new LocalFileFormatTarget(filename);
    domwrite->writeNode(formtarget, *doc);

    // Aufraeumen
    domwrite->release();
    delete formtarget;
}
------------SNIP--------CODE------------

I tried it without setting the "whitespace-in-element-content" too with
no other effect. But i think it won't hurt anyway, as it belongs to
"in-node-content" and not to "outside-of-node-characters", did it?

I know, that my datafile is still a wellformed and maybe a valid
document (or am i wrong here?), but it wastes space and makes it harder
to read for humans in debugging purposes when it is watched with a
normal texteditor.

Please, can s.o. direct me in the right direction what to do to avoid
this unwanted blank lines? Do i sth. wrong with my "removeChild" usage?

Thx. in advance for your reply, help and for the time you willing to
waste for my problems.

CU
Joerg Toellner


---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-c-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-c-dev-help@xml.apache.org


Re: Removed nodes leave blank lines after serializing to file

Posted by "Jeroen N. Witmond" <jn...@xs4all.nl>.
>
> Nodes which are not used in this replacing process (no data transferred
> from the other application for this unique placeholder) are removed from
> the tree as it is not neccessary for this document.
>

My guess, but I'm no expert, is that the DOMTree contains text nodes
representating whitespace surrounding the (unused) nodes you delete. You
probably will have to also delete (some of) these text nodes to remove the
blank lines from the serialization output.

Jeroen.

---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-c-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-c-dev-help@xml.apache.org