You are viewing a plain text version of this content. The canonical link for it is here.
Posted to j-users@xalan.apache.org by Dick Deneer <di...@donkeydevelopment.com> on 2007/02/19 13:40:08 UTC
Fomatting question serializing DOM with pretty-print
I build a DOM with the following xml :
<root><child1>text</child1><child2>text</child2></root>
After serializing using the xalan serializer with the pretty-print option I
get:
<?xml version="1.0" encoding="UTF-16"?><root>
<child1>text</child1>
<child2>text</child2>
</root>
So the opening of the root is also in the first line.
Second when I put in carriage returns or spaces, this effects the formatting
seriously.
For instance when I build a DOM with:
<root>\n\n\n<child1>text</child1> <child2>text</child2></root>
After serializing I get:
<?xml version="1.0" encoding="UTF-16"?><root>
<child1>text</child1> <child2>text</child2>
</root>
In all the cases the xerces serializer returns:
<?xml version="1.0" encoding="UTF-16"?>
<root>
<child1>text</child1>
<child2>text</child2>
</root>
Can you tell me if this behaviour i right?
http://www.nabble.com/file/6633/TestSerializer.java TestSerializer.java
--
View this message in context: http://www.nabble.com/Fomatting-question-serializing-DOM-with-pretty-print-tf3252607.html#a9041632
Sent from the Xalan - J - Users mailing list archive at Nabble.com.
Re: Fomatting question serializing DOM with pretty-print
Posted by Dick Deneer <di...@donkeydevelopment.com>.
Brian,
Thanks for the explanation.
As already commented in the example test code, it is possible to filter out
whitespace in the calling program.
To workaround the issue with the indent (newline) of the root may be more
complicated.
I hope that in the future the xalan parser will bring full backwards
compatibility with the deprecated xerces one.
Dick Deneer
Brian Minchau wrote:
>
> Hi Dick,
> I ran the program that you point to with the URL and got your results.
>
> For this input:
> String xml = "<root>\n\n\n<child1>text</child1>
> <child2>text</child2></root>";
>
> For Xalan serializer and then Xerces serializer I get this:
>
> Program started
> Xalan serializer will be used
> <?xml version="1.0" encoding="UTF-16"?><root>
> <child1>text</child1>
> <child2>text</child2>
> </root>
>
>
> Program started
> Xerces serializer will be used
> <?xml version="1.0" encoding="UTF-16"?>
> <root>
> <child1>text</child1>
> <child2>text</child2>
> </root>
>
>
> The differences here are due to the fact that in the past the Xalan
> serializer has decided that the output XML file could be used as an
> external general parsed entity and included in yet another XML file. As
> such we don't know where it will be included and the extra newline that
> Xerces inserts after the XML header may be included next to non-whitespace
> text and become part of that text node. Added indentation or not, extra
> whitespace before the document element is not always correct, so Xalan
> doesn't do it.
>
> I looked at the code recently and saw that the DOM3 save support does
> indeed choose to indent 3 spaces per indentation level, and this code was
> contributed by to Xalan by people on the Xerces team, so I don't know why
> Xerces indents by 4 spaces. This difference is not important, there is no
> "right" way to do indentation, it depends on implementation.
>
>
>
> Then to add some whitespace, for this input:
> String xml = "<root>\n\n\n<child1>text</child1>
> <child2>text</child2></root>";
>
> Program started
> Xalan serializer will be used
> <?xml version="1.0" encoding="UTF-16"?><root>
>
>
> <child1>text</child1> <child2>text</child2>
> </root>
>
>
>
>
> Program started
> Xerces serializer will be used
> <?xml version="1.0" encoding="UTF-16"?>
> <root>
> <child1>text</child1>
> <child2>text</child2>
> </root>
>
>
> Again Xalan does not inject any whitespace between the xml header and the
> document element, for the same reasons as given before. I'm not sure
> about
> the other whitespace differences. It looks like Xalan has decided that it
> won't add whitespace to existing whitespace and effectively does no
> indentation. Xerces serializer however rips out the whitespace from the
> document being serialized and replaces it with nicer looking whitespace.
> I'm not sure if that is OK to do that, perhaps someone from Xerces will
> comment on the differences.
> (Michael?)
>
>
> - Brian
> - - - - - - - - - - - - - - - - - - - -
> Brian Minchau, Ph.D.
> XSLT Development, IBM Toronto
> e-mail: minchau@ca.ibm.com
>
>
>
>
> Dick Deneer
> <dick.deneer@donk
> eydevelopment.com To
> > xalan-j-users@xml.apache.org
> cc
> 02/19/2007 07:40
> AM Subject
> Fomatting question serializing DOM
> with pretty-print
>
>
>
>
>
>
>
>
>
>
>
> I build a DOM with the following xml :
> <root><child1>text</child1><child2>text</child2></root>
> After serializing using the xalan serializer with the pretty-print option
> I
> get:
> <?xml version="1.0" encoding="UTF-16"?><root>
> <child1>text</child1>
> <child2>text</child2>
> </root>
>
> So the opening of the root is also in the first line.
>
> Second when I put in carriage returns or spaces, this effects the
> formatting
> seriously.
> For instance when I build a DOM with:
> <root>\n\n\n<child1>text</child1> <child2>text</child2></root>
> After serializing I get:
> <?xml version="1.0" encoding="UTF-16"?><root>
>
>
> <child1>text</child1> <child2>text</child2>
> </root>
>
> In all the cases the xerces serializer returns:
> <?xml version="1.0" encoding="UTF-16"?>
> <root>
> <child1>text</child1>
> <child2>text</child2>
> </root>
>
> Can you tell me if this behaviour i right?
> http://www.nabble.com/file/6633/TestSerializer.java TestSerializer.java
> --
> View this message in context:
> http://www.nabble.com/Fomatting-question-serializing-DOM-with-pretty-print-tf3252607.html#a9041632
>
> Sent from the Xalan - J - Users mailing list archive at Nabble.com.
>
>
>
>
>
--
View this message in context: http://www.nabble.com/Fomatting-question-serializing-DOM-with-pretty-print-tf3252607.html#a9067789
Sent from the Xalan - J - Users mailing list archive at Nabble.com.
Re: Fomatting question serializing DOM with pretty-print
Posted by Eric Kolotyluk <er...@kodak.com>.
OK, done & done.
Thanks, Eric
Henry Zongaro wrote:
>
> Hi, Eric.
>
> Eric Kolotyluk <er...@kodak.com> wrote on 2007-06-29 10:54:22 AM:
>> The first obvious problem is that the first element of our document does
> not
>> have a linebreak before it - it's on the same line as the <?XML ... ?>
>
> This sounds like it qualifies as a bug. Please open a bug report in
> Jira.[1]
>
>> The second obvious problem is that element with a long list of
> attributes
>> are not wrapped and indented. They should be wrapped after some
> reasonable
>> line limit (i.e. 60, 80, 100 characters - pick one). It would be nice if
>> there was a way to specify this through the API. Also, when they are
>> wrapped, they should be intented.
>
> That sounds like a good suggestion. Please open that as an improvement in
> Jira.[1]
>
> Thanks,
>
> Henry
> [1] http://issues.apache.org/jira/secure/CreateIssue!default.jspa
> ------------------------------------------------------------------
> Henry Zongaro XSLT Processors Development
> IBM SWS Toronto Lab T/L 969-6044; Phone +1 905 413-6044
> mailto:zongaro@ca.ibm.com
>
>
--
View this message in context: http://www.nabble.com/Fomatting-question-serializing-DOM-with-pretty-print-tf3252607.html#a11364754
Sent from the Xalan - J - Users mailing list archive at Nabble.com.
Re: Fomatting question serializing DOM with pretty-print
Posted by Henry Zongaro <zo...@ca.ibm.com>.
Hi, Eric.
Eric Kolotyluk <er...@kodak.com> wrote on 2007-06-29 10:54:22 AM:
> The first obvious problem is that the first element of our document does
not
> have a linebreak before it - it's on the same line as the <?XML ... ?>
This sounds like it qualifies as a bug. Please open a bug report in
Jira.[1]
> The second obvious problem is that element with a long list of
attributes
> are not wrapped and indented. They should be wrapped after some
reasonable
> line limit (i.e. 60, 80, 100 characters - pick one). It would be nice if
> there was a way to specify this through the API. Also, when they are
> wrapped, they should be intented.
That sounds like a good suggestion. Please open that as an improvement in
Jira.[1]
Thanks,
Henry
[1] http://issues.apache.org/jira/secure/CreateIssue!default.jspa
------------------------------------------------------------------
Henry Zongaro XSLT Processors Development
IBM SWS Toronto Lab T/L 969-6044; Phone +1 905 413-6044
mailto:zongaro@ca.ibm.com
Re: Fomatting question serializing DOM with pretty-print
Posted by Eric Kolotyluk <er...@kodak.com>.
Here is an example of our traffic log using XMLSerializer
2007-06-26 13:22:16.066
<?xml version="1.0" encoding="UTF-8"?>
<User clientName="EKolotyluk_380" clientPlatform="Windows XP (5.1)"
clientProtocolVersion="{DB4AEBDF-A4A9-4521-880B-02310D12723B}"
clientType="Admin" clientVersion="0.0.0.0"
cookie="1a741296:11369b28de8:-7fd8" isoLanguageCode="en"
sendCompressed="true" type="checkProtocolVersion"/>
2007-06-26 13:22:16.379
<?xml version="1.0" encoding="UTF-8"?>
<Server cookie="1a741296:11369b28de8:-7fd8" deviceType="Admin"
failureText="Protocol Version not supported" friendlyName="CSMP2610"
ipAddress="10.1.41.70" licenseStatus="0"
macAddress="00-14-22-38-AA-43" result="Failed" serialNumber="09665"
type="checkProtocolVersion">
<VersionInfo>
<AdminServer versionBuild="23" versionMajor="3" versionMinor="1"
versionOther="5" versionPatch="5"/>
</VersionInfo>
<Event eid="85008" hr="0" timeStamp="1182889209"/>
</Server>
The time-stamp we add to the log. Here is the same XML using LSSerializer
2007-06-29 07:42:06.774
<?xml version="1.0" encoding="UTF-8"?><User clientName="EKolotyluk_380"
clientPlatform="Windows XP (5.1)"
clientProtocolVersion="{DB4AEBDF-A4A9-4521-880B-02310D12723B}"
clientType="Admin" clientVersion="0.0.0.0"
cookie="-7dfe3a9b:11377ee20de:-7fdd" isoLanguageCode="en"
sendCompressed="true" type="checkProtocolVersion"/>
2007-06-29 07:42:07.039
<?xml version="1.0" encoding="UTF-8"?><Server
cookie="-7dfe3a9b:11377ee20de:-7fdd" deviceType="Admin"
failureText="Protocol Version not supported" friendlyName="CSMP2610"
ipAddress="10.1.41.70" licenseStatus="0" macAddress="00-14-22-38-AA-43"
result="Failed" serialNumber="09665" type="checkProtocolVersion">
<VersionInfo>
<AdminServer versionBuild="23" versionMajor="3" versionMinor="1"
versionOther="5" versionPatch="5"/>
</VersionInfo>
<Event eid="85008" hr="0" timeStamp="1183127997"/>
</Server>
The first obvious problem is that the first element of our document does not
have a linebreak before it - it's on the same line as the <?XML ... ?>
The second obvious problem is that element with a long list of attributes
are not wrapped and indented. They should be wrapped after some reasonable
line limit (i.e. 60, 80, 100 characters - pick one). It would be nice if
there was a way to specify this through the API. Also, when they are
wrapped, they should be intented.
What I tried to imply is that whatever XMLSerializer is doing now, make
LSSerializer do the same thing at least.
Cheers, Eric
keshlam wrote:
>
>>the pretty-printing is so bad - it's not all that pretty.
>
> If you were specific about what you want done differently, that would be
> helpful.
>
> Note too that if you want *really* pretty, the right answer may be to
> write
> a stylesheet that expresses precisely the formatting you want rather than
> taking the (relatively simple-minded) default.
>
> ______________________________________
> "... Three things see no end: A loop with exit code done wrong,
> A semaphore untested, And the change that comes along. ..."
> -- "Threes" Rev 1.1 - Duane Elms / Leslie Fish
> (http://www.ovff.org/pegasus/songs/threes-rev-11.html)
>
--
View this message in context: http://www.nabble.com/Fomatting-question-serializing-DOM-with-pretty-print-tf3252607.html#a11361625
Sent from the Xalan - J - Users mailing list archive at Nabble.com.
Re: Fomatting question serializing DOM with pretty-print
Posted by ke...@us.ibm.com.
>the pretty-printing is so bad - it's not all that pretty.
If you were specific about what you want done differently, that would be
helpful.
Note too that if you want *really* pretty, the right answer may be to write
a stylesheet that expresses precisely the formatting you want rather than
taking the (relatively simple-minded) default.
______________________________________
"... Three things see no end: A loop with exit code done wrong,
A semaphore untested, And the change that comes along. ..."
-- "Threes" Rev 1.1 - Duane Elms / Leslie Fish
(http://www.ovff.org/pegasus/songs/threes-rev-11.html)
Re: Fomatting question serializing DOM with pretty-print
Posted by Eric Kolotyluk <er...@kodak.com>.
I think you've hit the nail right on the head. Right now I want to convert my
code that calls the deprecated XMLSerializer to use LSSerializer and I can't
because the pretty-printing is so bad - it's not all that pretty. When using
the pretty-printing option the emphasis should be on maximizing readability.
We use pretty-printing for two main purposes: (1) we log our XML network
traffic and we have to be able to quickly and easily understand what is
going on, and (2) we store some basic data structures in files and during
trouble shooting we need to be able to quickly and easily understand what is
going on.
If XMLSerializer can do a good job why can't LSSerializer?
- Eric
keshlam wrote:
>
>
>> The DOM spec doesn't specify what pretty printing does. I believe what
>> Xerces is doing is fine.
>
> By definition, pretty-printing changes whitespace and should not be used
> in
> situations where the whitespace is significant. If you want to be sure
> you're preserving document semantics, use basic DOM serialization
> instead... or set up a much more detailed prettyprint which understands
> exactly where whitespace is and isn't significant in this kind of
> document.
>
--
View this message in context: http://www.nabble.com/Fomatting-question-serializing-DOM-with-pretty-print-tf3252607.html#a11351554
Sent from the Xalan - J - Users mailing list archive at Nabble.com.
Re: Fomatting question serializing DOM with pretty-print
Posted by ke...@us.ibm.com.
> The DOM spec doesn't specify what pretty printing does. I believe what
> Xerces is doing is fine.
By definition, pretty-printing changes whitespace and should not be used in
situations where the whitespace is significant. If you want to be sure
you're preserving document semantics, use basic DOM serialization
instead... or set up a much more detailed prettyprint which understands
exactly where whitespace is and isn't significant in this kind of document.
Re: Fomatting question serializing DOM with pretty-print
Posted by Michael Glavassevich <mr...@ca.ibm.com>.
Hi Brian,
Brian Minchau/Toronto/IBM@IBMCA wrote on 02/19/2007 12:29:32 PM:
<snip/>
> Again Xalan does not inject any whitespace between the xml header and
the
> document element, for the same reasons as given before. I'm not sure
about
> the other whitespace differences. It looks like Xalan has decided that
it
> won't add whitespace to existing whitespace and effectively does no
> indentation. Xerces serializer however rips out the whitespace from the
> document being serialized and replaces it with nicer looking whitespace.
> I'm not sure if that is OK to do that, perhaps someone from Xerces will
> comment on the differences.
> (Michael?)
The DOM spec doesn't specify what pretty printing does. I believe what
Xerces is doing is fine.
> - Brian
> - - - - - - - - - - - - - - - - - - - -
> Brian Minchau, Ph.D.
> XSLT Development, IBM Toronto
> e-mail: minchau@ca.ibm.com
Michael Glavassevich
XML Parser Development
IBM Toronto Lab
E-mail: mrglavas@ca.ibm.com
E-mail: mrglavas@apache.org
Re: Fomatting question serializing DOM with pretty-print
Posted by Brian Minchau <mi...@ca.ibm.com>.
Hi Dick,
I ran the program that you point to with the URL and got your results.
For this input:
String xml = "<root>\n\n\n<child1>text</child1>
<child2>text</child2></root>";
For Xalan serializer and then Xerces serializer I get this:
Program started
Xalan serializer will be used
<?xml version="1.0" encoding="UTF-16"?><root>
<child1>text</child1>
<child2>text</child2>
</root>
Program started
Xerces serializer will be used
<?xml version="1.0" encoding="UTF-16"?>
<root>
<child1>text</child1>
<child2>text</child2>
</root>
The differences here are due to the fact that in the past the Xalan
serializer has decided that the output XML file could be used as an
external general parsed entity and included in yet another XML file. As
such we don't know where it will be included and the extra newline that
Xerces inserts after the XML header may be included next to non-whitespace
text and become part of that text node. Added indentation or not, extra
whitespace before the document element is not always correct, so Xalan
doesn't do it.
I looked at the code recently and saw that the DOM3 save support does
indeed choose to indent 3 spaces per indentation level, and this code was
contributed by to Xalan by people on the Xerces team, so I don't know why
Xerces indents by 4 spaces. This difference is not important, there is no
"right" way to do indentation, it depends on implementation.
Then to add some whitespace, for this input:
String xml = "<root>\n\n\n<child1>text</child1>
<child2>text</child2></root>";
Program started
Xalan serializer will be used
<?xml version="1.0" encoding="UTF-16"?><root>
<child1>text</child1> <child2>text</child2>
</root>
Program started
Xerces serializer will be used
<?xml version="1.0" encoding="UTF-16"?>
<root>
<child1>text</child1>
<child2>text</child2>
</root>
Again Xalan does not inject any whitespace between the xml header and the
document element, for the same reasons as given before. I'm not sure about
the other whitespace differences. It looks like Xalan has decided that it
won't add whitespace to existing whitespace and effectively does no
indentation. Xerces serializer however rips out the whitespace from the
document being serialized and replaces it with nicer looking whitespace.
I'm not sure if that is OK to do that, perhaps someone from Xerces will
comment on the differences.
(Michael?)
- Brian
- - - - - - - - - - - - - - - - - - - -
Brian Minchau, Ph.D.
XSLT Development, IBM Toronto
e-mail: minchau@ca.ibm.com
Dick Deneer
<dick.deneer@donk
eydevelopment.com To
> xalan-j-users@xml.apache.org
cc
02/19/2007 07:40
AM Subject
Fomatting question serializing DOM
with pretty-print
I build a DOM with the following xml :
<root><child1>text</child1><child2>text</child2></root>
After serializing using the xalan serializer with the pretty-print option
I
get:
<?xml version="1.0" encoding="UTF-16"?><root>
<child1>text</child1>
<child2>text</child2>
</root>
So the opening of the root is also in the first line.
Second when I put in carriage returns or spaces, this effects the
formatting
seriously.
For instance when I build a DOM with:
<root>\n\n\n<child1>text</child1> <child2>text</child2></root>
After serializing I get:
<?xml version="1.0" encoding="UTF-16"?><root>
<child1>text</child1> <child2>text</child2>
</root>
In all the cases the xerces serializer returns:
<?xml version="1.0" encoding="UTF-16"?>
<root>
<child1>text</child1>
<child2>text</child2>
</root>
Can you tell me if this behaviour i right?
http://www.nabble.com/file/6633/TestSerializer.java TestSerializer.java
--
View this message in context:
http://www.nabble.com/Fomatting-question-serializing-DOM-with-pretty-print-tf3252607.html#a9041632
Sent from the Xalan - J - Users mailing list archive at Nabble.com.