You are viewing a plain text version of this content. The canonical link for it is here.
Posted to j-users@xerces.apache.org by Ian Hummel <hu...@parityinc.net> on 2008/12/11 15:06:50 UTC

How to preserve an empty text node?

Hi everyone,

I need to create XML that looks like this whenever the value of "tag"  
is "" (the empty string):

<root>
	<tag></tag>
</root>

I've tried the following:

DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
Document d = db.newDocument();
Element root = d.createElement("root");
Element tag = d.createElement("tag");
d.appendChild(root);
root.appendChild(tag);
Text text = d.createTextNode("\t");
tag.appendChild(text);


but I always end up with XML like this:

<root>
	<tag/>
</root>


Is there a way to force empty text nodes to get "denormalized" ?

Thanks,

Ian.

Re: How to preserve an empty text node?

Posted by Dave Brosius <db...@mebigfatguy.com>.
If you are stuck with a bad post processor, than you will have to post process what xerces spits out with something like

xmlAsAString = xmlAsAString.replaceAll("<([^>]*)/>", "<$1></$1>");

-----Original Message-----
From: "Ian Hummel" <hu...@parityinc.net>
Sent: Monday, December 15, 2008 3:08pm
To: "j-users@xerces.apache.org" <j-...@xerces.apache.org>
Subject: Re: How to preserve an empty text node?

Hi Michael,


I know <tag></tag> and <tag/> are the same, but unfortunately the buggy-parser-that-cannot-be-changed on the other end doesn't :)





DocumentBuilder db = dbf.newDocumentBuilder();
Document d = db.newDocument();
Element root = d.createElement("root");
Element tag = d.createElement("tag");
tag.setTextContent("");
d.appendChild(root);
root.appendChild(tag);
System.out.println(XmlUtils.formatXmlAsString(d));




This always outputs <tag/> and never <tag></tag> like I need it to.


- Ian.







On Dec 12, 2008, at 10:33 PM, Michael Glavassevich wrote:

Hi Ian,
 
 > I need to create XML that looks like this whenever the value of 
 > "tag" is "" (the empty string):
 > 
 > <root>
 > <tag></tag>
 > </root>
 
 Why? <tag/> and <tag></tag> have the same meaning. Whichever form is chosen by the serializer should have no significance.
 
 > I am more concerned in preserving the empty text node when I 
 > serialize to e.g. a file... not so much the parsing.
 > 
 > Any one else have any ideas?
 
 Would help if you showed your code for serializing the document. 
 
 > Are blank text nodes like that invalid XML or something?
 
 In the snippet you posted you created a text node with the '\t' (tab) character in it. That isn't "blank" or empty.
 
 Thanks.
 
 Michael Glavassevich
 XML Parser Development
 IBM Toronto Lab
 E-mail: [mailto:mrglavas@ca.ibm.com] mrglavas@ca.ibm.com
 E-mail: [mailto:mrglavas@apache.org] mrglavas@apache.org
 
 Ian Hummel <[mailto:hummel@parityinc.net] hummel@parityinc.net> wrote on 12/12/2008 09:21:47 AM:
 
 > Hi, I didn't really understand how that's going to help.
 > 
 > I am more concerned in preserving the empty text node when I 
 > serialize to e.g. a file... not so much the parsing.
 > 
 > Any one else have any ideas?  Are blank text nodes like that invalid
 > XML or something?
 > 
 > On Dec 11, 2008, at 11:53 AM, [mailto:ravikanth@gmail.com] ravikanth@gmail.com wrote:
 > 
 > Hi Lan,
 > 
 > I think we can Implement by LSParser Interface. [http://java.sun] http://java.sun.
 > com/j2se/1.5.0/docs/api/org/w3c/dom/ls/LSParser.html
 > this link may help you.
 > 
 > Regards,
 > Ravikanth
 
 > On Thu, Dec 11, 2008 at 7:36 PM, Ian Hummel <[mailto:hummel@parityinc.net] hummel@parityinc.net> wrote:
 > Hi everyone,
 > 
 > I need to create XML that looks like this whenever the value of 
 > "tag" is "" (the empty string):
 > 
 > <root>
 > <tag></tag>
 > </root>
 > 
 > I've tried the following:
 > 
 > DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
 > DocumentBuilder db = dbf.newDocumentBuilder();
 > Document d = db.newDocument();
 > Element root = d.createElement("root");
 > Element tag = d.createElement("tag");
 > d.appendChild(root);
 > root.appendChild(tag);
 > Text text = d.createTextNode("\t");
 > tag.appendChild(text);
 > 
 > but I always end up with XML like this:
 > 
 > <root>
 > <tag/>
 > </root>
 > 
 > Is there a way to force empty text nodes to get "denormalized" ?
 > 
 > Thanks,
 > 
 > Ian.
 > 
 > -- 
 > Ravikanth

Re: How to preserve an empty text node?

Posted by Ian Hummel <hu...@parityinc.net>.
Thank you all for your suggestions.  I will see what can be done!

Regards,

Ian.


On Dec 16, 2008, at 5:40 PM, keshlam@us.ibm.com wrote:

>
> The other kluge-around would be a postprocessor that converted the  
> XML <foo/> into the SGML <foo></foo>.
>
> But if the problem really is that the next stage is an SGML tool,  
> I'd try HTML mode serialization and see if you can get away with it.
>
> ______________________________________
> "... Three things see no end: A loop with exit code done wrong,
> A semaphore untested, And the change that comes along. ..."
>  -- "Threes" Rev 1.1 - Duane Elms / Leslie Fish (http://www.ovff.org/pegasus/songs/threes-rev-11.html 
> )


Re: How to preserve an empty text node?

Posted by ke...@us.ibm.com.
The other kluge-around would be a postprocessor that converted the XML 
<foo/> into the SGML <foo></foo>.

But if the problem really is that the next stage is an SGML tool, I'd try 
HTML mode serialization and see if you can get away with it.

______________________________________
"... Three things see no end: A loop with exit code done wrong,
A semaphore untested, And the change that comes along. ..."
  -- "Threes" Rev 1.1 - Duane Elms / Leslie Fish (
http://www.ovff.org/pegasus/songs/threes-rev-11.html)

Re: How to preserve an empty text node?

Posted by Michael Glavassevich <mr...@ca.ibm.com>.
keshlam@us.ibm.com wrote on 12/16/2008 04:48:05 PM:

> As far as XPath/XSLT is concerned, there is no such thing as "an
> empty text node" -- if it's empty, it's absent. So the Xalan
> serializer will almost certainly treat empty text nodes as not
> existing. Any properly written XML application should treat <foo/>
> and <foo></foo> as IDENTICAL. If you really need to draw this
> distinction, your output isn't XML.

It seems that Ian has no choice. A defective parser is apparently on the
receiving end that he cannot change. He understands that <foo/> and
<foo></foo> has the same meaning, but needs the latter as a workaround.

> Maybe your output is HTML. If so, setting the serializer to HTML
> mode will generate separate start and end tags rather than an empty-
> element tag. But that has other effects as well, which may or may
> not be things you want if the document is some SGML language other than
HTML.
>
> ______________________________________
> "... Three things see no end: A loop with exit code done wrong,
> A semaphore untested, And the change that comes along. ..."
>  -- "Threes" Rev 1.1 - Duane Elms / Leslie Fish (http://www.ovff.
> org/pegasus/songs/threes-rev-11.html)

Michael Glavassevich
XML Parser Development
IBM Toronto Lab
E-mail: mrglavas@ca.ibm.com
E-mail: mrglavas@apache.org

Re: How to preserve an empty text node?

Posted by ke...@us.ibm.com.
As far as XPath/XSLT is concerned, there is no such thing as "an empty 
text node" -- if it's empty, it's absent. So the Xalan serializer will 
almost certainly treat empty text nodes as not existing. Any properly 
written XML application should treat <foo/> and <foo></foo> as IDENTICAL. 
If you really need to draw this distinction, your output isn't XML.

Maybe your output is HTML. If so, setting the serializer to HTML mode will 
generate separate start and end tags rather than an empty-element tag. But 
that has other effects as well, which may or may not be things you want if 
the document is some SGML language other than HTML.

______________________________________
"... Three things see no end: A loop with exit code done wrong,
A semaphore untested, And the change that comes along. ..."
  -- "Threes" Rev 1.1 - Duane Elms / Leslie Fish (
http://www.ovff.org/pegasus/songs/threes-rev-11.html)

Michael Glavassevich <mr...@ca.ibm.com> wrote on 12/16/2008 04:23:35 
PM:

> Michael Glavassevich <mr...@ca.ibm.com> 
> 12/16/2008 04:23 PM
> 
> Please respond to
> j-users@xerces.apache.org
> 
> To
> 
> j-users@xerces.apache.org
> 
> cc
> 
> Subject
> 
> Re: How to preserve an empty text node?
> 
> Seems like Xalan has already optimized that. Perhaps if you try a 
> different serializer (e.g. DOM Level 3 LSSerializer [in 
> serializer.jar] or Xerces' deprecated one) it will do what you were 
> hoping for.
> 
> Michael Glavassevich
> XML Parser Development
> IBM Toronto Lab
> E-mail: mrglavas@ca.ibm.com
> E-mail: mrglavas@apache.org
> 
> Ian Hummel <hu...@parityinc.net> wrote on 12/15/2008 03:42:50 PM:
> 
> > Hi Michael,
> > 
> > Here is my formatXmlAsString  method formatted for brevity:
> > 
> >     public static String formatXmlAsString(Document doc) {
> >         StringWriter out = new StringWriter();
> >         TransformerFactory factory = TransformerFactory.newInstance();
> >         factory.setAttribute("indent-number", new Integer(2));
> >         Transformer serializer;
> >         serializer = factory.newTransformer();
> >         serializer.setOutputProperty(OutputKeys.INDENT, "yes");
> >         serializer.setOutputProperty("{http://xml.apache.org/xslt}
> > indent-amount", "4");
> >         serializer.transform(new DOMSource(doc), new 
StreamResult(out));
> >         return out.toString();
> >     }
> > 
> > Here I tried appending a blank text node, and I can see in the DOM 
> > that the text node _is_ there, but it still gets squashed upon output:
> > 
> > DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
> > DocumentBuilder db = dbf.newDocumentBuilder();
> > Document d = db.newDocument();
> > Element root = d.createElement("root");
> > Element tag = d.createElement("tag");
> > d.appendChild(root);
> > root.appendChild(tag);
> > 
> > System.out.println("Child nodes? " + tag.getChildNodes().item(0));
> > System.out.println("Child nodes l? " + 
tag.getChildNodes().getLength());
> > Text text = d.createTextNode("");
> > tag.appendChild(text);
> > System.out.println("Child nodes? " + tag.getChildNodes().item(0));
> > System.out.println("Child nodes l? " + 
tag.getChildNodes().getLength());
> > System.out.println(formatXmlAsString(d));
> > 
> > Child nodes? null
> > Child nodes l? 0
> > Child nodes? [#text: ]
> > Child nodes l? 1
> > <?xml version="1.0" encoding="UTF-8"?>
> > <root>
> >     <tag/>
> > </root>
> > 
> > On Dec 15, 2008, at 3:29 PM, Michael Glavassevich wrote:
> > 
> > To add to what I said ...
> > 
> > I think it's likely the case that the Xerces/Xalan serializer will 
> > write <tag></tag> if you attach an empty text node to the element. 
> > You should keep in mind that this is an implementation detail that 
> > could change in the future. Perhaps one day it will write <tag/> 
instead.
> > 
> > Thanks.
> > 
> > Michael Glavassevich
> > XML Parser Development
> > IBM Toronto Lab
> > E-mail: mrglavas@ca.ibm.com
> > E-mail: mrglavas@apache.org
> > 
> > Michael Glavassevich/Toronto/IBM@IBMCA wrote on 12/15/2008 03:20:57 
PM:
> > 
> > > Hi Ian,
> > > 
> > > I've never heard of XmlUtils.formatXmlAsString. It's certainly not 
> > > distributed with Xerces. Have you tried one of the standard 
> > > serialization methods [1] from JAXP or DOM Level 3?
> > > 
> > > Thanks.
> > > 
> > > [1] http://xerces.apache.org/xerces2-j/faq-general.html#faq-6
> > > 
> > > Michael Glavassevich
> > > XML Parser Development
> > > IBM Toronto Lab
> > > E-mail: mrglavas@ca.ibm.com
> > > E-mail: mrglavas@apache.org
> > > 
> > > Ian Hummel <hu...@parityinc.net> wrote on 12/15/2008 03:08:49 PM:
> > > 
> > > > Hi Michael,
> > > > 
> > > > I know <tag></tag> and <tag/> are the same, but unfortunately the 
> > > > buggy-parser-that-cannot-be-changed on the other end doesn't :)
> > > > 
> > > > DocumentBuilder db = dbf.newDocumentBuilder();
> > > > Document d = db.newDocument();
> > > > Element root = d.createElement("root");
> > > > Element tag = d.createElement("tag");
> > > > tag.setTextContent("");
> > > > d.appendChild(root);
> > > > root.appendChild(tag);
> > > > System.out.println(XmlUtils.formatXmlAsString(d));
> > > > 
> > > > This always outputs <tag/> and never <tag></tag> like I need it 
to.
> > > > 
> > > > - Ian.
> > > > 
> > > > On Dec 12, 2008, at 10:33 PM, Michael Glavassevich wrote:
> > > > 
> > > > Hi Ian,
> > > > 
> > > > > I need to create XML that looks like this whenever the value of 
> > > > > "tag" is "" (the empty string):
> > > > > 
> > > > > <root>
> > > > > <tag></tag>
> > > > > </root>
> > > > 
> > > > Why? <tag/> and <tag></tag> have the same meaning. Whichever form 
is
> > > > chosen by the serializer should have no significance.
> > > > 
> > > > > I am more concerned in preserving the empty text node when I 
> > > > > serialize to e.g. a file... not so much the parsing.
> > > > > 
> > > > > Any one else have any ideas?
> > > > 
> > > > Would help if you showed your code for serializing the document. 
> > > > 
> > > > > Are blank text nodes like that invalid XML or something?
> > > > 
> > > > In the snippet you posted you created a text node with the '\t' 
> > > > (tab) character in it. That isn't "blank" or empty.
> > > > 
> > > > Thanks.
> > > > 
> > > > Michael Glavassevich
> > > > XML Parser Development
> > > > IBM Toronto Lab
> > > > E-mail: mrglavas@ca.ibm.com
> > > > E-mail: mrglavas@apache.org
> > > > 
> > > > Ian Hummel <hu...@parityinc.net> wrote on 12/12/2008 09:21:47 AM:
> > > > 
> > > > > Hi, I didn't really understand how that's going to help.
> > > > > 
> > > > > I am more concerned in preserving the empty text node when I 
> > > > > serialize to e.g. a file... not so much the parsing.
> > > > > 
> > > > > Any one else have any ideas?  Are blank text nodes like that 
invalid
> > > > > XML or something?
> > > > > 
> > > > > On Dec 11, 2008, at 11:53 AM, ravikanth@gmail.com wrote:
> > > > > 
> > > > > Hi Lan,
> > > > > 
> > > > > I think we can Implement by LSParser Interface. http://java.sun.
> > > > > com/j2se/1.5.0/docs/api/org/w3c/dom/ls/LSParser.html
> > > > > this link may help you.
> > > > > 
> > > > > Regards,
> > > > > Ravikanth
> > > > 
> > > > > On Thu, Dec 11, 2008 at 7:36 PM, Ian Hummel 
<hummel@parityinc.net
> > > wrote:
> > > > > Hi everyone,
> > > > > 
> > > > > I need to create XML that looks like this whenever the value of 
> > > > > "tag" is "" (the empty string):
> > > > > 
> > > > > <root>
> > > > > <tag></tag>
> > > > > </root>
> > > > > 
> > > > > I've tried the following:
> > > > > 
> > > > > DocumentBuilderFactory dbf = 
DocumentBuilderFactory.newInstance();
> > > > > DocumentBuilder db = dbf.newDocumentBuilder();
> > > > > Document d = db.newDocument();
> > > > > Element root = d.createElement("root");
> > > > > Element tag = d.createElement("tag");
> > > > > d.appendChild(root);
> > > > > root.appendChild(tag);
> > > > > Text text = d.createTextNode("\t");
> > > > > tag.appendChild(text);
> > > > > 
> > > > > but I always end up with XML like this:
> > > > > 
> > > > > <root>
> > > > > <tag/>
> > > > > </root>
> > > > > 
> > > > > Is there a way to force empty text nodes to get "denormalized" ?
> > > > > 
> > > > > Thanks,
> > > > > 
> > > > > Ian.
> > > > > 
> > > > > -- 
> > > > > Ravikanth

Re: How to preserve an empty text node?

Posted by Michael Glavassevich <mr...@ca.ibm.com>.
Seems like Xalan has already optimized that. Perhaps if you try a different
serializer (e.g. DOM Level 3 LSSerializer [in serializer.jar] or Xerces'
deprecated one) it will do what you were hoping for.

Michael Glavassevich
XML Parser Development
IBM Toronto Lab
E-mail: mrglavas@ca.ibm.com
E-mail: mrglavas@apache.org

Ian Hummel <hu...@parityinc.net> wrote on 12/15/2008 03:42:50 PM:

> Hi Michael,
>
> Here is my formatXmlAsString  method formatted for brevity:
>
>     public static String formatXmlAsString(Document doc) {
>         StringWriter out = new StringWriter();
>         TransformerFactory factory = TransformerFactory.newInstance();
>         factory.setAttribute("indent-number", new Integer(2));
>         Transformer serializer;
>         serializer = factory.newTransformer();
>         serializer.setOutputProperty(OutputKeys.INDENT, "yes");
>         serializer.setOutputProperty("{http://xml.apache.org/xslt}
> indent-amount", "4");
>         serializer.transform(new DOMSource(doc), new StreamResult(out));
>         return out.toString();
>     }
>
> Here I tried appending a blank text node, and I can see in the DOM
> that the text node _is_ there, but it still gets squashed upon output:
>
> DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
> DocumentBuilder db = dbf.newDocumentBuilder();
> Document d = db.newDocument();
> Element root = d.createElement("root");
> Element tag = d.createElement("tag");
> d.appendChild(root);
> root.appendChild(tag);
>
> System.out.println("Child nodes? " + tag.getChildNodes().item(0));
> System.out.println("Child nodes l? " + tag.getChildNodes().getLength());
> Text text = d.createTextNode("");
> tag.appendChild(text);
> System.out.println("Child nodes? " + tag.getChildNodes().item(0));
> System.out.println("Child nodes l? " + tag.getChildNodes().getLength());
> System.out.println(formatXmlAsString(d));
>
> Child nodes? null
> Child nodes l? 0
> Child nodes? [#text: ]
> Child nodes l? 1
> <?xml version="1.0" encoding="UTF-8"?>
> <root>
>     <tag/>
> </root>
>
> On Dec 15, 2008, at 3:29 PM, Michael Glavassevich wrote:
>
> To add to what I said ...
>
> I think it's likely the case that the Xerces/Xalan serializer will
> write <tag></tag> if you attach an empty text node to the element.
> You should keep in mind that this is an implementation detail that
> could change in the future. Perhaps one day it will write <tag/> instead.
>
> Thanks.
>
> Michael Glavassevich
> XML Parser Development
> IBM Toronto Lab
> E-mail: mrglavas@ca.ibm.com
> E-mail: mrglavas@apache.org
>
> Michael Glavassevich/Toronto/IBM@IBMCA wrote on 12/15/2008 03:20:57 PM:
>
> > Hi Ian,
> >
> > I've never heard of XmlUtils.formatXmlAsString. It's certainly not
> > distributed with Xerces. Have you tried one of the standard
> > serialization methods [1] from JAXP or DOM Level 3?
> >
> > Thanks.
> >
> > [1] http://xerces.apache.org/xerces2-j/faq-general.html#faq-6
> >
> > Michael Glavassevich
> > XML Parser Development
> > IBM Toronto Lab
> > E-mail: mrglavas@ca.ibm.com
> > E-mail: mrglavas@apache.org
> >
> > Ian Hummel <hu...@parityinc.net> wrote on 12/15/2008 03:08:49 PM:
> >
> > > Hi Michael,
> > >
> > > I know <tag></tag> and <tag/> are the same, but unfortunately the
> > > buggy-parser-that-cannot-be-changed on the other end doesn't :)
> > >
> > > DocumentBuilder db = dbf.newDocumentBuilder();
> > > Document d = db.newDocument();
> > > Element root = d.createElement("root");
> > > Element tag = d.createElement("tag");
> > > tag.setTextContent("");
> > > d.appendChild(root);
> > > root.appendChild(tag);
> > > System.out.println(XmlUtils.formatXmlAsString(d));
> > >
> > > This always outputs <tag/> and never <tag></tag> like I need it to.
> > >
> > > - Ian.
> > >
> > > On Dec 12, 2008, at 10:33 PM, Michael Glavassevich wrote:
> > >
> > > Hi Ian,
> > >
> > > > I need to create XML that looks like this whenever the value of
> > > > "tag" is "" (the empty string):
> > > >
> > > > <root>
> > > > <tag></tag>
> > > > </root>
> > >
> > > Why? <tag/> and <tag></tag> have the same meaning. Whichever form is
> > > chosen by the serializer should have no significance.
> > >
> > > > I am more concerned in preserving the empty text node when I
> > > > serialize to e.g. a file... not so much the parsing.
> > > >
> > > > Any one else have any ideas?
> > >
> > > Would help if you showed your code for serializing the document.
> > >
> > > > Are blank text nodes like that invalid XML or something?
> > >
> > > In the snippet you posted you created a text node with the '\t'
> > > (tab) character in it. That isn't "blank" or empty.
> > >
> > > Thanks.
> > >
> > > Michael Glavassevich
> > > XML Parser Development
> > > IBM Toronto Lab
> > > E-mail: mrglavas@ca.ibm.com
> > > E-mail: mrglavas@apache.org
> > >
> > > Ian Hummel <hu...@parityinc.net> wrote on 12/12/2008 09:21:47 AM:
> > >
> > > > Hi, I didn't really understand how that's going to help.
> > > >
> > > > I am more concerned in preserving the empty text node when I
> > > > serialize to e.g. a file... not so much the parsing.
> > > >
> > > > Any one else have any ideas?  Are blank text nodes like that
invalid
> > > > XML or something?
> > > >
> > > > On Dec 11, 2008, at 11:53 AM, ravikanth@gmail.com wrote:
> > > >
> > > > Hi Lan,
> > > >
> > > > I think we can Implement by LSParser Interface. http://java.sun.
> > > > com/j2se/1.5.0/docs/api/org/w3c/dom/ls/LSParser.html
> > > > this link may help you.
> > > >
> > > > Regards,
> > > > Ravikanth
> > >
> > > > On Thu, Dec 11, 2008 at 7:36 PM, Ian Hummel <hummel@parityinc.net
> > wrote:
> > > > Hi everyone,
> > > >
> > > > I need to create XML that looks like this whenever the value of
> > > > "tag" is "" (the empty string):
> > > >
> > > > <root>
> > > > <tag></tag>
> > > > </root>
> > > >
> > > > I've tried the following:
> > > >
> > > > DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
> > > > DocumentBuilder db = dbf.newDocumentBuilder();
> > > > Document d = db.newDocument();
> > > > Element root = d.createElement("root");
> > > > Element tag = d.createElement("tag");
> > > > d.appendChild(root);
> > > > root.appendChild(tag);
> > > > Text text = d.createTextNode("\t");
> > > > tag.appendChild(text);
> > > >
> > > > but I always end up with XML like this:
> > > >
> > > > <root>
> > > > <tag/>
> > > > </root>
> > > >
> > > > Is there a way to force empty text nodes to get "denormalized" ?
> > > >
> > > > Thanks,
> > > >
> > > > Ian.
> > > >
> > > > --
> > > > Ravikanth

Re: How to preserve an empty text node?

Posted by Ian Hummel <hu...@parityinc.net>.
Hi Michael,

Here is my formatXmlAsString  method formatted for brevity:

     public static String formatXmlAsString(Document doc) {
         StringWriter out = new StringWriter();
         TransformerFactory factory = TransformerFactory.newInstance();
         factory.setAttribute("indent-number", new Integer(2));
         Transformer serializer;
         serializer = factory.newTransformer();
         serializer.setOutputProperty(OutputKeys.INDENT, "yes");
         serializer.setOutputProperty("{http://xml.apache.org/ 
xslt}indent-amount", "4");
         serializer.transform(new DOMSource(doc), new  
StreamResult(out));
         return out.toString();
     }


Here I tried appending a blank text node, and I can see in the DOM  
that the text node _is_ there, but it still gets squashed upon output:


		DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
		DocumentBuilder db = dbf.newDocumentBuilder();
		Document d = db.newDocument();
		Element root = d.createElement("root");
		Element tag = d.createElement("tag");
		d.appendChild(root);
		root.appendChild(tag);
		
		System.out.println("Child nodes? " + tag.getChildNodes().item(0));
		System.out.println("Child nodes l? " +  
tag.getChildNodes().getLength());
		Text text = d.createTextNode("");
		tag.appendChild(text);
		System.out.println("Child nodes? " + tag.getChildNodes().item(0));
		System.out.println("Child nodes l? " +  
tag.getChildNodes().getLength());
		System.out.println(formatXmlAsString(d));



Child nodes? null
Child nodes l? 0
Child nodes? [#text: ]
Child nodes l? 1
<?xml version="1.0" encoding="UTF-8"?>
<root>
     <tag/>
</root>





On Dec 15, 2008, at 3:29 PM, Michael Glavassevich wrote:

> To add to what I said ...
>
> I think it's likely the case that the Xerces/Xalan serializer will  
> write <tag></tag> if you attach an empty text node to the element.  
> You should keep in mind that this is an implementation detail that  
> could change in the future. Perhaps one day it will write <tag/>  
> instead.
>
> Thanks.
>
> Michael Glavassevich
> XML Parser Development
> IBM Toronto Lab
> E-mail: mrglavas@ca.ibm.com
> E-mail: mrglavas@apache.org
>
> Michael Glavassevich/Toronto/IBM@IBMCA wrote on 12/15/2008 03:20:57  
> PM:
>
> > Hi Ian,
> >
> > I've never heard of XmlUtils.formatXmlAsString. It's certainly not
> > distributed with Xerces. Have you tried one of the standard
> > serialization methods [1] from JAXP or DOM Level 3?
> >
> > Thanks.
> >
> > [1] http://xerces.apache.org/xerces2-j/faq-general.html#faq-6
> >
> > Michael Glavassevich
> > XML Parser Development
> > IBM Toronto Lab
> > E-mail: mrglavas@ca.ibm.com
> > E-mail: mrglavas@apache.org
> >
> > Ian Hummel <hu...@parityinc.net> wrote on 12/15/2008 03:08:49 PM:
> >
> > > Hi Michael,
> > >
> > > I know <tag></tag> and <tag/> are the same, but unfortunately the
> > > buggy-parser-that-cannot-be-changed on the other end doesn't :)
> > >
> > > DocumentBuilder db = dbf.newDocumentBuilder();
> > > Document d = db.newDocument();
> > > Element root = d.createElement("root");
> > > Element tag = d.createElement("tag");
> > > tag.setTextContent("");
> > > d.appendChild(root);
> > > root.appendChild(tag);
> > > System.out.println(XmlUtils.formatXmlAsString(d));
> > >
> > > This always outputs <tag/> and never <tag></tag> like I need it  
> to.
> > >
> > > - Ian.
> > >
> > > On Dec 12, 2008, at 10:33 PM, Michael Glavassevich wrote:
> > >
> > > Hi Ian,
> > >
> > > > I need to create XML that looks like this whenever the value of
> > > > "tag" is "" (the empty string):
> > > >
> > > > <root>
> > > > <tag></tag>
> > > > </root>
> > >
> > > Why? <tag/> and <tag></tag> have the same meaning. Whichever  
> form is
> > > chosen by the serializer should have no significance.
> > >
> > > > I am more concerned in preserving the empty text node when I
> > > > serialize to e.g. a file... not so much the parsing.
> > > >
> > > > Any one else have any ideas?
> > >
> > > Would help if you showed your code for serializing the document.
> > >
> > > > Are blank text nodes like that invalid XML or something?
> > >
> > > In the snippet you posted you created a text node with the '\t'
> > > (tab) character in it. That isn't "blank" or empty.
> > >
> > > Thanks.
> > >
> > > Michael Glavassevich
> > > XML Parser Development
> > > IBM Toronto Lab
> > > E-mail: mrglavas@ca.ibm.com
> > > E-mail: mrglavas@apache.org
> > >
> > > Ian Hummel <hu...@parityinc.net> wrote on 12/12/2008 09:21:47 AM:
> > >
> > > > Hi, I didn't really understand how that's going to help.
> > > >
> > > > I am more concerned in preserving the empty text node when I
> > > > serialize to e.g. a file... not so much the parsing.
> > > >
> > > > Any one else have any ideas?  Are blank text nodes like that  
> invalid
> > > > XML or something?
> > > >
> > > > On Dec 11, 2008, at 11:53 AM, ravikanth@gmail.com wrote:
> > > >
> > > > Hi Lan,
> > > >
> > > > I think we can Implement by LSParser Interface. http://java.sun.
> > > > com/j2se/1.5.0/docs/api/org/w3c/dom/ls/LSParser.html
> > > > this link may help you.
> > > >
> > > > Regards,
> > > > Ravikanth
> > >
> > > > On Thu, Dec 11, 2008 at 7:36 PM, Ian Hummel <hummel@parityinc.net 
> > wrote:
> > > > Hi everyone,
> > > >
> > > > I need to create XML that looks like this whenever the value of
> > > > "tag" is "" (the empty string):
> > > >
> > > > <root>
> > > > <tag></tag>
> > > > </root>
> > > >
> > > > I've tried the following:
> > > >
> > > > DocumentBuilderFactory dbf =  
> DocumentBuilderFactory.newInstance();
> > > > DocumentBuilder db = dbf.newDocumentBuilder();
> > > > Document d = db.newDocument();
> > > > Element root = d.createElement("root");
> > > > Element tag = d.createElement("tag");
> > > > d.appendChild(root);
> > > > root.appendChild(tag);
> > > > Text text = d.createTextNode("\t");
> > > > tag.appendChild(text);
> > > >
> > > > but I always end up with XML like this:
> > > >
> > > > <root>
> > > > <tag/>
> > > > </root>
> > > >
> > > > Is there a way to force empty text nodes to get "denormalized" ?
> > > >
> > > > Thanks,
> > > >
> > > > Ian.
> > > >
> > > > --
> > > > Ravikanth
>


Re: How to preserve an empty text node?

Posted by Michael Glavassevich <mr...@ca.ibm.com>.
To add to what I said ...

I think it's likely the case that the Xerces/Xalan serializer will write
<tag></tag> if you attach an empty text node to the element. You should
keep in mind that this is an implementation detail that could change in the
future. Perhaps one day it will write <tag/> instead.

Thanks.

Michael Glavassevich
XML Parser Development
IBM Toronto Lab
E-mail: mrglavas@ca.ibm.com
E-mail: mrglavas@apache.org

Michael Glavassevich/Toronto/IBM@IBMCA wrote on 12/15/2008 03:20:57 PM:

> Hi Ian,
>
> I've never heard of XmlUtils.formatXmlAsString. It's certainly not
> distributed with Xerces. Have you tried one of the standard
> serialization methods [1] from JAXP or DOM Level 3?
>
> Thanks.
>
> [1] http://xerces.apache.org/xerces2-j/faq-general.html#faq-6
>
> Michael Glavassevich
> XML Parser Development
> IBM Toronto Lab
> E-mail: mrglavas@ca.ibm.com
> E-mail: mrglavas@apache.org
>
> Ian Hummel <hu...@parityinc.net> wrote on 12/15/2008 03:08:49 PM:
>
> > Hi Michael,
> >
> > I know <tag></tag> and <tag/> are the same, but unfortunately the
> > buggy-parser-that-cannot-be-changed on the other end doesn't :)
> >
> > DocumentBuilder db = dbf.newDocumentBuilder();
> > Document d = db.newDocument();
> > Element root = d.createElement("root");
> > Element tag = d.createElement("tag");
> > tag.setTextContent("");
> > d.appendChild(root);
> > root.appendChild(tag);
> > System.out.println(XmlUtils.formatXmlAsString(d));
> >
> > This always outputs <tag/> and never <tag></tag> like I need it to.
> >
> > - Ian.
> >
> > On Dec 12, 2008, at 10:33 PM, Michael Glavassevich wrote:
> >
> > Hi Ian,
> >
> > > I need to create XML that looks like this whenever the value of
> > > "tag" is "" (the empty string):
> > >
> > > <root>
> > > <tag></tag>
> > > </root>
> >
> > Why? <tag/> and <tag></tag> have the same meaning. Whichever form is
> > chosen by the serializer should have no significance.
> >
> > > I am more concerned in preserving the empty text node when I
> > > serialize to e.g. a file... not so much the parsing.
> > >
> > > Any one else have any ideas?
> >
> > Would help if you showed your code for serializing the document.
> >
> > > Are blank text nodes like that invalid XML or something?
> >
> > In the snippet you posted you created a text node with the '\t'
> > (tab) character in it. That isn't "blank" or empty.
> >
> > Thanks.
> >
> > Michael Glavassevich
> > XML Parser Development
> > IBM Toronto Lab
> > E-mail: mrglavas@ca.ibm.com
> > E-mail: mrglavas@apache.org
> >
> > Ian Hummel <hu...@parityinc.net> wrote on 12/12/2008 09:21:47 AM:
> >
> > > Hi, I didn't really understand how that's going to help.
> > >
> > > I am more concerned in preserving the empty text node when I
> > > serialize to e.g. a file... not so much the parsing.
> > >
> > > Any one else have any ideas?  Are blank text nodes like that invalid
> > > XML or something?
> > >
> > > On Dec 11, 2008, at 11:53 AM, ravikanth@gmail.com wrote:
> > >
> > > Hi Lan,
> > >
> > > I think we can Implement by LSParser Interface. http://java.sun.
> > > com/j2se/1.5.0/docs/api/org/w3c/dom/ls/LSParser.html
> > > this link may help you.
> > >
> > > Regards,
> > > Ravikanth
> >
> > > On Thu, Dec 11, 2008 at 7:36 PM, Ian Hummel <hu...@parityinc.net>
wrote:
> > > Hi everyone,
> > >
> > > I need to create XML that looks like this whenever the value of
> > > "tag" is "" (the empty string):
> > >
> > > <root>
> > > <tag></tag>
> > > </root>
> > >
> > > I've tried the following:
> > >
> > > DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
> > > DocumentBuilder db = dbf.newDocumentBuilder();
> > > Document d = db.newDocument();
> > > Element root = d.createElement("root");
> > > Element tag = d.createElement("tag");
> > > d.appendChild(root);
> > > root.appendChild(tag);
> > > Text text = d.createTextNode("\t");
> > > tag.appendChild(text);
> > >
> > > but I always end up with XML like this:
> > >
> > > <root>
> > > <tag/>
> > > </root>
> > >
> > > Is there a way to force empty text nodes to get "denormalized" ?
> > >
> > > Thanks,
> > >
> > > Ian.
> > >
> > > --
> > > Ravikanth

Re: How to preserve an empty text node?

Posted by Michael Glavassevich <mr...@ca.ibm.com>.
Hi Ian,

I've never heard of XmlUtils.formatXmlAsString. It's certainly not
distributed with Xerces. Have you tried one of the standard serialization
methods [1] from JAXP or DOM Level 3?

Thanks.

[1] http://xerces.apache.org/xerces2-j/faq-general.html#faq-6

Michael Glavassevich
XML Parser Development
IBM Toronto Lab
E-mail: mrglavas@ca.ibm.com
E-mail: mrglavas@apache.org

Ian Hummel <hu...@parityinc.net> wrote on 12/15/2008 03:08:49 PM:

> Hi Michael,
>
> I know <tag></tag> and <tag/> are the same, but unfortunately the
> buggy-parser-that-cannot-be-changed on the other end doesn't :)
>
> DocumentBuilder db = dbf.newDocumentBuilder();
> Document d = db.newDocument();
> Element root = d.createElement("root");
> Element tag = d.createElement("tag");
> tag.setTextContent("");
> d.appendChild(root);
> root.appendChild(tag);
> System.out.println(XmlUtils.formatXmlAsString(d));
>
> This always outputs <tag/> and never <tag></tag> like I need it to.
>
> - Ian.
>
> On Dec 12, 2008, at 10:33 PM, Michael Glavassevich wrote:
>
> Hi Ian,
>
> > I need to create XML that looks like this whenever the value of
> > "tag" is "" (the empty string):
> >
> > <root>
> > <tag></tag>
> > </root>
>
> Why? <tag/> and <tag></tag> have the same meaning. Whichever form is
> chosen by the serializer should have no significance.
>
> > I am more concerned in preserving the empty text node when I
> > serialize to e.g. a file... not so much the parsing.
> >
> > Any one else have any ideas?
>
> Would help if you showed your code for serializing the document.
>
> > Are blank text nodes like that invalid XML or something?
>
> In the snippet you posted you created a text node with the '\t'
> (tab) character in it. That isn't "blank" or empty.
>
> Thanks.
>
> Michael Glavassevich
> XML Parser Development
> IBM Toronto Lab
> E-mail: mrglavas@ca.ibm.com
> E-mail: mrglavas@apache.org
>
> Ian Hummel <hu...@parityinc.net> wrote on 12/12/2008 09:21:47 AM:
>
> > Hi, I didn't really understand how that's going to help.
> >
> > I am more concerned in preserving the empty text node when I
> > serialize to e.g. a file... not so much the parsing.
> >
> > Any one else have any ideas?  Are blank text nodes like that invalid
> > XML or something?
> >
> > On Dec 11, 2008, at 11:53 AM, ravikanth@gmail.com wrote:
> >
> > Hi Lan,
> >
> > I think we can Implement by LSParser Interface. http://java.sun.
> > com/j2se/1.5.0/docs/api/org/w3c/dom/ls/LSParser.html
> > this link may help you.
> >
> > Regards,
> > Ravikanth
>
> > On Thu, Dec 11, 2008 at 7:36 PM, Ian Hummel <hu...@parityinc.net>
wrote:
> > Hi everyone,
> >
> > I need to create XML that looks like this whenever the value of
> > "tag" is "" (the empty string):
> >
> > <root>
> > <tag></tag>
> > </root>
> >
> > I've tried the following:
> >
> > DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
> > DocumentBuilder db = dbf.newDocumentBuilder();
> > Document d = db.newDocument();
> > Element root = d.createElement("root");
> > Element tag = d.createElement("tag");
> > d.appendChild(root);
> > root.appendChild(tag);
> > Text text = d.createTextNode("\t");
> > tag.appendChild(text);
> >
> > but I always end up with XML like this:
> >
> > <root>
> > <tag/>
> > </root>
> >
> > Is there a way to force empty text nodes to get "denormalized" ?
> >
> > Thanks,
> >
> > Ian.
> >
> > --
> > Ravikanth

Re: How to preserve an empty text node?

Posted by Ian Hummel <hu...@parityinc.net>.
Hi Michael,

I know <tag></tag> and <tag/> are the same, but unfortunately the  
buggy-parser-that-cannot-be-changed on the other end doesn't :)


DocumentBuilder db = dbf.newDocumentBuilder();
Document d = db.newDocument();
Element root = d.createElement("root");
Element tag = d.createElement("tag");
tag.setTextContent("");
d.appendChild(root);
root.appendChild(tag);
System.out.println(XmlUtils.formatXmlAsString(d));


This always outputs <tag/> and never <tag></tag> like I need it to.

- Ian.



On Dec 12, 2008, at 10:33 PM, Michael Glavassevich wrote:

> Hi Ian,
>
> > I need to create XML that looks like this whenever the value of
> > "tag" is "" (the empty string):
> >
> > <root>
> > <tag></tag>
> > </root>
>
> Why? <tag/> and <tag></tag> have the same meaning. Whichever form is  
> chosen by the serializer should have no significance.
>
> > I am more concerned in preserving the empty text node when I
> > serialize to e.g. a file... not so much the parsing.
> >
> > Any one else have any ideas?
>
> Would help if you showed your code for serializing the document.
>
> > Are blank text nodes like that invalid XML or something?
>
> In the snippet you posted you created a text node with the  
> '\t' (tab) character in it. That isn't "blank" or empty.
>
> Thanks.
>
> Michael Glavassevich
> XML Parser Development
> IBM Toronto Lab
> E-mail: mrglavas@ca.ibm.com
> E-mail: mrglavas@apache.org
>
> Ian Hummel <hu...@parityinc.net> wrote on 12/12/2008 09:21:47 AM:
>
> > Hi, I didn't really understand how that's going to help.
> >
> > I am more concerned in preserving the empty text node when I
> > serialize to e.g. a file... not so much the parsing.
> >
> > Any one else have any ideas?  Are blank text nodes like that invalid
> > XML or something?
> >
> > On Dec 11, 2008, at 11:53 AM, ravikanth@gmail.com wrote:
> >
> > Hi Lan,
> >
> > I think we can Implement by LSParser Interface. http://java.sun.
> > com/j2se/1.5.0/docs/api/org/w3c/dom/ls/LSParser.html
> > this link may help you.
> >
> > Regards,
> > Ravikanth
>
> > On Thu, Dec 11, 2008 at 7:36 PM, Ian Hummel <hu...@parityinc.net>  
> wrote:
> > Hi everyone,
> >
> > I need to create XML that looks like this whenever the value of
> > "tag" is "" (the empty string):
> >
> > <root>
> > <tag></tag>
> > </root>
> >
> > I've tried the following:
> >
> > DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
> > DocumentBuilder db = dbf.newDocumentBuilder();
> > Document d = db.newDocument();
> > Element root = d.createElement("root");
> > Element tag = d.createElement("tag");
> > d.appendChild(root);
> > root.appendChild(tag);
> > Text text = d.createTextNode("\t");
> > tag.appendChild(text);
> >
> > but I always end up with XML like this:
> >
> > <root>
> > <tag/>
> > </root>
> >
> > Is there a way to force empty text nodes to get "denormalized" ?
> >
> > Thanks,
> >
> > Ian.
> >
> > --
> > Ravikanth
>


Re: How to preserve an empty text node?

Posted by Michael Glavassevich <mr...@ca.ibm.com>.
Hi Ian,

> I need to create XML that looks like this whenever the value of
> "tag" is "" (the empty string):
>
> <root>
> <tag></tag>
> </root>

Why? <tag/> and <tag></tag> have the same meaning. Whichever form is chosen
by the serializer should have no significance.

> I am more concerned in preserving the empty text node when I
> serialize to e.g. a file... not so much the parsing.
>
> Any one else have any ideas?

Would help if you showed your code for serializing the document.

> Are blank text nodes like that invalid XML or something?

In the snippet you posted you created a text node with the '\t' (tab)
character in it. That isn't "blank" or empty.

Thanks.

Michael Glavassevich
XML Parser Development
IBM Toronto Lab
E-mail: mrglavas@ca.ibm.com
E-mail: mrglavas@apache.org

Ian Hummel <hu...@parityinc.net> wrote on 12/12/2008 09:21:47 AM:

> Hi, I didn't really understand how that's going to help.
>
> I am more concerned in preserving the empty text node when I
> serialize to e.g. a file... not so much the parsing.
>
> Any one else have any ideas?  Are blank text nodes like that invalid
> XML or something?
>
> On Dec 11, 2008, at 11:53 AM, ravikanth@gmail.com wrote:
>
> Hi Lan,
>
> I think we can Implement by LSParser Interface. http://java.sun.
> com/j2se/1.5.0/docs/api/org/w3c/dom/ls/LSParser.html
> this link may help you.
>
> Regards,
> Ravikanth

> On Thu, Dec 11, 2008 at 7:36 PM, Ian Hummel <hu...@parityinc.net> wrote:
> Hi everyone,
>
> I need to create XML that looks like this whenever the value of
> "tag" is "" (the empty string):
>
> <root>
> <tag></tag>
> </root>
>
> I've tried the following:
>
> DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
> DocumentBuilder db = dbf.newDocumentBuilder();
> Document d = db.newDocument();
> Element root = d.createElement("root");
> Element tag = d.createElement("tag");
> d.appendChild(root);
> root.appendChild(tag);
> Text text = d.createTextNode("\t");
> tag.appendChild(text);
>
> but I always end up with XML like this:
>
> <root>
> <tag/>
> </root>
>
> Is there a way to force empty text nodes to get "denormalized" ?
>
> Thanks,
>
> Ian.
>
> --
> Ravikanth

Re: How to preserve an empty text node?

Posted by Ian Hummel <hu...@parityinc.net>.
Hi, I didn't really understand how that's going to help.

I am more concerned in preserving the empty text node when I serialize  
to e.g. a file... not so much the parsing.


Any one else have any ideas?  Are blank text nodes like that invalid  
XML or something?



On Dec 11, 2008, at 11:53 AM, ravikanth@gmail.com wrote:

> Hi Lan,
>
> I think we can Implement by LSParser Interface. http://java.sun.com/j2se/1.5.0/docs/api/org/w3c/dom/ls/LSParser.html
> this link may help you.
>
> Regards,
> Ravikanth
>
> On Thu, Dec 11, 2008 at 7:36 PM, Ian Hummel <hu...@parityinc.net>  
> wrote:
> Hi everyone,
>
> I need to create XML that looks like this whenever the value of  
> "tag" is "" (the empty string):
>
> <root>
> 	<tag></tag>
> </root>
>
> I've tried the following:
>
> DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
> DocumentBuilder db = dbf.newDocumentBuilder();
> Document d = db.newDocument();
> Element root = d.createElement("root");
> Element tag = d.createElement("tag");
> d.appendChild(root);
> root.appendChild(tag);
> Text text = d.createTextNode("\t");
> tag.appendChild(text);
>
>
> but I always end up with XML like this:
>
> <root>
> 	<tag/>
> </root>
>
>
> Is there a way to force empty text nodes to get "denormalized" ?
>
> Thanks,
>
> Ian.
>
>
>
> -- 
> Ravikanth


Re: How to preserve an empty text node?

Posted by ra...@gmail.com.
Hi Lan,

I think we can Implement by LSParser Interface.
http://java.sun.com/j2se/1.5.0/docs/api/org/w3c/dom/ls/LSParser.html
this link may help you.

Regards,
Ravikanth

On Thu, Dec 11, 2008 at 7:36 PM, Ian Hummel <hu...@parityinc.net> wrote:

> Hi everyone,
> I need to create XML that looks like this whenever the value of "tag" is ""
> (the empty string):
>
> <root>
> <tag></tag>
> </root>
>
> I've tried the following:
>
> DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
> DocumentBuilder db = dbf.newDocumentBuilder();
> Document d = db.newDocument();
> Element root = d.createElement("root");
> Element tag = d.createElement("tag");
> d.appendChild(root);
> root.appendChild(tag);
> Text text = d.createTextNode("\t");
> tag.appendChild(text);
>
>
> but I always end up with XML like this:
>
> <root>
> <tag/>
> </root>
>
>
> Is there a way to force empty text nodes to get "denormalized" ?
>
> Thanks,
>
> Ian.
>



-- 
Ravikanth