You are viewing a plain text version of this content. The canonical link for it is here.
Posted to general@xerces.apache.org by Wong Kok Wai <wo...@pacific.net.sg> on 2000/01/06 06:27:30 UTC

Pretty print problem in serializer in 1.0.1

Hi,

Ref: Xerces 1.0.1

I've tried out the new org.apache.xml.serialize package and find that
the pretty printing is not as "pretty" as I've expected. Attached is the
modified DOMWriter from the samples. Try running it with personal.xml as
the input file and open the output file testout.dat and see what I mean.




Re: Pretty print problem in serializer in 1.0.1

Posted by Assaf Arkin <ar...@exoffice.com>.
+1 from me :-)

arkin

Andy Clark wrote:
> 
> Assaf Arkin wrote:
> >
> > Ignorable whitespace, not just whitespace.
> 
> Good point.
> 
> > I think the feature should be "include ignorable whitespace as text
> > nodes" and off by default.
> 
> So how about the following ID?
> 
>   http://apache.org/xml/features/dom/include-ignorable-whitespace
> 
> It's getting a little verbose but that's probably erring on
> the side of understandability.
> 
> --
> Andy Clark * IBM, JTC - Silicon Valley * andyc@apache.org

Re: Pretty print problem in serializer in 1.0.1

Posted by Andy Clark <an...@apache.org>.
Assaf Arkin wrote:
> 
> Ignorable whitespace, not just whitespace.

Good point.

> I think the feature should be "include ignorable whitespace as text
> nodes" and off by default.

So how about the following ID?

  http://apache.org/xml/features/dom/include-ignorable-whitespace

It's getting a little verbose but that's probably erring on
the side of understandability.

-- 
Andy Clark * IBM, JTC - Silicon Valley * andyc@apache.org

Re: Pretty print problem in serializer in 1.0.1

Posted by Assaf Arkin <ar...@exoffice.com>.
Ignorable whitespace, not just whitespace.

I think the feature should be "include ignorable whitespace as text
nodes" and off by default.

arkin

Andy Clark wrote:
> 
> I'm volunteering to put the feature for whitespace nodes
> in the DOM tree. The only things that we need to decide are:
> 
>   1) the feature ID
>   2) the default value
> 
> Should the feature reflect that we're "removing" whitespace
> nodes from the DOM tree? or should it reflect that we're
> "adding" whitespace nodes? Let's say, for argument, that
> we select the latter. Then the feature ID could be something
> like this:
> 
>   http://apache.org/xml/features/dom/whitespace-nodes
> 
> Is this acceptable to everyone?
> 
> And given the fact that we have a nice set of serializers
> now, I would be in favor of supporting the stance that the
> parser does NOT add whitespace nodes, by default.
> 
> Assuming there's quick concensus, I'll make the changes.
> 
> --
> Andy Clark * IBM, JTC - Silicon Valley * andyc@apache.org

Re: Pretty print problem in serializer in 1.0.1

Posted by Andy Clark <an...@apache.org>.
I'm volunteering to put the feature for whitespace nodes
in the DOM tree. The only things that we need to decide are:

  1) the feature ID
  2) the default value

Should the feature reflect that we're "removing" whitespace
nodes from the DOM tree? or should it reflect that we're
"adding" whitespace nodes? Let's say, for argument, that
we select the latter. Then the feature ID could be something
like this: 

  http://apache.org/xml/features/dom/whitespace-nodes

Is this acceptable to everyone?

And given the fact that we have a nice set of serializers
now, I would be in favor of supporting the stance that the
parser does NOT add whitespace nodes, by default.

Assuming there's quick concensus, I'll make the changes.

-- 
Andy Clark * IBM, JTC - Silicon Valley * andyc@apache.org

Re: Pretty print problem in serializer in 1.0.1

Posted by Assaf Arkin <ar...@exoffice.com>.
Nope.

setPreserveSpace defines the behavior of printing your text contents, it
corresponds to xml:space, and I don't think it's implemented. As Tim and
others pointed out, this sort of whitespace handling does break some
applications, so I gave it a rest.

The problem in your case is the whitespace between elements, which is
just there in the document to make it look better. The serializer has no
way of telling it's white space (I'm thinking of a fix for that, though
:-) ) and prints it. Once it's printer, the serializer cannot mess with
it by adding indentation, that is a clear violation of the information
model.

Try reading the document with the SAX praser and feeding it directly to
the Serializer and see if it works.

arkin


Wong Kok Wai wrote:
> 
> Doesn't using setPreserveSpace(false) in OutputFormat takes care of this?
> 
> Assaf Arkin wrote:
> 
> >
> > If you extract all these whitespaces from the original personal.xml, or
> > test run it with ProjectX (which does not add these whitespaces), you
> > will get the pretty printing you expect to.
> >

Re: Pretty print problem in serializer in 1.0.1

Posted by Wong Kok Wai <wo...@pacific.net.sg>.
Doesn't using setPreserveSpace(false) in OutputFormat takes care of this?

Assaf Arkin wrote:

>
> If you extract all these whitespaces from the original personal.xml, or
> test run it with ProjectX (which does not add these whitespaces), you
> will get the pretty printing you expect to.
>


Re: Pretty print problem in serializer in 1.0.1

Posted by Assaf Arkin <ar...@exoffice.com>.
The Xerces parser insists on adding whitespaces in the original document
into the DOM as text nodes.

These are the whitespaces you see when you use the Serializer, not the
pretty printing of the serializer.

If you extract all these whitespaces from the original personal.xml, or
test run it with ProjectX (which does not add these whitespaces), you
will get the pretty printing you expect to.

arkin


Wong Kok Wai wrote:
> 
> Hi,
> 
> Ref: Xerces 1.0.1
> 
> I've tried out the new org.apache.xml.serialize package and find that
> the pretty printing is not as "pretty" as I've expected. Attached is the
> modified DOMWriter from the samples. Try running it with personal.xml as
> the input file and open the output file testout.dat and see what I mean.
> 
>   ------------------------------------------------------------------------
>                       Name: DOMWriter2.java
>    DOMWriter2.java    Type: application/x-unknown-content-type-java_auto_file
>                   Encoding: base64

Re: Pretty print problem in serializer in 1.0.1

Posted by Andy Clark <an...@apache.org>.
The feature is in the code base with the default being that
the ignorable whitespace text nodes *are* included in the
tree.

-- 
Andy Clark * IBM, JTC - Silicon Valley * andyc@apache.org

Re: Pretty print problem in serializer in 1.0.1

Posted by Andy Clark <an...@apache.org>.
+1 For leaving the whitespace in, by default, for reasons stated 
   by Arnaud and agreed upon by Scott and Tim. As well as the fact 
   that the DOM tree should be the same regardless of whether a 
   DTD was present in order to figure out what text nodes can be 
   flagged as "ignorable".

Okay, so it sounds like we have agreement. I will add the
feature as agreed upon.

  feature ID:
http://apache.org/xml/features/dom/include-ignorable-whitespace
  default:    true

-- 
Andy Clark * IBM, JTC - Silicon Valley * andyc@apache.org