You are viewing a plain text version of this content. The canonical link for it is here.
Posted to j-users@xerces.apache.org by Wai-Yip Tung <wt...@cisco.com> on 2002/11/14 06:36:15 UTC

Transforming XML document

I am trying to make simple transformation on a XML document. Let say just
changing one attribute value. I want to keep everything else the same,
including white spaces.

My first task is to parse and output a document identical to the input
document. It seems the sample code sax.Writer is a good example.
Unfortunately it altered the document in several ways

- white space in an element is changed, e.g.
  <x  y = "a"> becomes <x y="a">

- The empty element becomes two tags, e.g.
  <x/> becomes <x></x>

Anyone can give me some direction?

Wai yip




---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-user-help@xml.apache.org


RE: Transforming XML document

Posted by Wai-Yip Tung <wt...@cisco.com>.
Thanks for the comments. Sound like I'm out of luck. :(

Wai-yip

-----Original Message-----
From: Simon Kitching [mailto:simon@ecnetwork.co.nz]
Sent: Wednesday, November 13, 2002 10:06 PM
To: xerces-j-user@xml.apache.org
Subject: Re: Transforming XML document


Hi,

Unfortunately, you're out of luck. XML parsing just doesn't work that
way.

XML parsers are required to respect the contents of <i>text nodes</i>
within an xml document, but in every other place whitespace is not
significant according to the spec.

You can either treat the input file as a plain text file (eg use perl to
modify it), or you can treat it as XML in which case the XML parser will
guaruntee to preserve the *meaning* of your XML document, but not
necessarily its layout. 

For example, <x  y="a"> (two spaces) in xml means *exactly* the same
thing as <x y="a"> (one space). 

You do get to choose the "style" in which the output is generated
(indented or not, how much indenting, etc) but you cannot ask for "the
same as the input", because no existing XML parser bothers to keep that
information around.

Regards,

Simon

On Thu, 2002-11-14 at 18:36, Wai-Yip Tung wrote:
> I am trying to make simple transformation on a XML document. Let say just
> changing one attribute value. I want to keep everything else the same,
> including white spaces.
> 
> My first task is to parse and output a document identical to the input
> document. It seems the sample code sax.Writer is a good example.
> Unfortunately it altered the document in several ways
> 
> - white space in an element is changed, e.g.
>   <x  y = "a"> becomes <x y="a">
> 
> - The empty element becomes two tags, e.g.
>   <x/> becomes <x></x>
> 
> Anyone can give me some direction?
> 
> Wai yip
> 
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
> For additional commands, e-mail: xerces-j-user-help@xml.apache.org
> 
> 



---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-user-help@xml.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-user-help@xml.apache.org


Re: Transforming XML document

Posted by Simon Kitching <si...@ecnetwork.co.nz>.
Hi,

Unfortunately, you're out of luck. XML parsing just doesn't work that
way.

XML parsers are required to respect the contents of <i>text nodes</i>
within an xml document, but in every other place whitespace is not
significant according to the spec.

You can either treat the input file as a plain text file (eg use perl to
modify it), or you can treat it as XML in which case the XML parser will
guaruntee to preserve the *meaning* of your XML document, but not
necessarily its layout. 

For example, <x  y="a"> (two spaces) in xml means *exactly* the same
thing as <x y="a"> (one space). 

You do get to choose the "style" in which the output is generated
(indented or not, how much indenting, etc) but you cannot ask for "the
same as the input", because no existing XML parser bothers to keep that
information around.

Regards,

Simon

On Thu, 2002-11-14 at 18:36, Wai-Yip Tung wrote:
> I am trying to make simple transformation on a XML document. Let say just
> changing one attribute value. I want to keep everything else the same,
> including white spaces.
> 
> My first task is to parse and output a document identical to the input
> document. It seems the sample code sax.Writer is a good example.
> Unfortunately it altered the document in several ways
> 
> - white space in an element is changed, e.g.
>   <x  y = "a"> becomes <x y="a">
> 
> - The empty element becomes two tags, e.g.
>   <x/> becomes <x></x>
> 
> Anyone can give me some direction?
> 
> Wai yip
> 
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
> For additional commands, e-mail: xerces-j-user-help@xml.apache.org
> 
> 



---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-user-help@xml.apache.org


Re: Transforming XML document

Posted by Joseph Kesselman <ke...@us.ibm.com>.
"A difference that makes no difference is no difference". 

The whitespace change is pretty much unavoidable, since XML does not 
consider the number of spaces around an attribute meaningful and none of 
the XML APIs preserve that information.

The <foo/> versus <foo></foo> is also not meaningful, and there's no way 
to tell the difference between them, though I agree that it would be nice 
to use the shorthand version when possible. Doing so would somewhat 
complicate the design of the serializer -- essentially, it would mean you 
couldn't finish writing out the tag until you knew whether or not this 
element was going to be empty -- but it's certainly doable if someone 
wants to put the effort into it.

______________________________________
Joe Kesselman  / IBM Research

---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-user-help@xml.apache.org