You are viewing a plain text version of this content. The canonical link for it is here.
Posted to j-dev@xerces.apache.org by Tung Mansfield <ma...@orbitcommerce.com> on 2000/06/15 16:16:09 UTC

Improving speed of serializing a Document

I recently did some tests on Windows NT 4.0 on serializing and deserializing
a Document. I used two different techniques:

1. Convert the Document into a String and serialize the String
2. Directly serializing the Document.

I found that the first technique is about 10 times faster than the second. I
recommend that somebody confirm my findings and enhance the Document
implementation (e.g. DocumentImpl) so that it will serialize and deserialize
faster. To implement the first technique you will need to make DocumentImpl
implements the Externalizable interface by implementing the following two
methods:

  public void writeExternal(ObjectOutput out) throws IOException
  {
    // convert Document to String
    StringWriter sw = new StringWriter();
    DOMWriter writer = new DOMWriter(sw, true);
    writer.print(node);
    String s = sw.toString();

    out.writeObject(s);
  }

  public void readExternal(ObjectInput in)
  throws IOException, ClassNotFoundException
  {
    String s = (String) in.readObject();

    try
    {
      DOMParser parser = new DOMParser();
      parser.parse(new InputSource(new StringReader(s)));
      m_doc = parser.getDocument();
    }
    catch (SAXException e)
    {
      throw new IOException(e.getMessage());
    }
  }

In the writeExternal method, the DOMWriter class was obtained from a sample
program shipped by Xerces. I did some modifications to it to make it a
standalone utility class rather than a test program. I don't know if this is
the fastest way to convert a Document to a String.

In the readExternal method m_doc is of type Document. Thus, this code was
designed to deserialize an object containing a Document. Somebody will need
to modify this code to actually deserialize into the Document.

Re: Improving speed of serializing a Document

Posted by Andy Clark <an...@apache.org>.
Stefan Rauch wrote:
> These figures are the results of counting the appearences of the element
> 'book' in the ot.xml (old testament from Jon Boshacks rel200.zip)
> I haven't tried out performance while manipulating the dom-Implementation
> but I believe things won't change that much ;-(

I'm always dubious of performance benchmarks when I can't see the
code or am able to reproduce the results myself.

If you are using the default DOM implementation (deferred) the
first traversal of the document is going to be slower, of course,
because the building of the tree is deferred until you need it.
This saves a lot of time parsing the document and scores big time
if you don't end up traversing the entire tree.

However, if you're only parsing the document with the deferred
DOM implementation and only looking at the time required for the
first traversal, you are getting an inaccurate picture of the
DOM performance. You should find that subsequent traversals of
the tree are much faster.

To adequately gauge how fast traversal of the document is, you
should try turning off the deferred DOM implementation so that
you aren't measuring the deferred savings into your performance
results. For example:

  DOMParser parser = new DOMParser();
  parser.setFeature("http://apache.org/xml/features/dom/defer-node-expansion", false);

Lastly, depending on *how* you search the DOM tree, you may be
hitting points where one implementation wins over the other.
Depending on how the tree is implemented, NodeList#item can be
faster than Node#getFirstChild/getNextSibling. And vice versa.

-- 
Andy Clark * IBM, JTC - Silicon Valley * andyc@apache.org

Re: Improving speed of serializing a Document

Posted by Stefan Rauch <sr...@uos.de>.
Thanks Andy!

It was definitely my fault. I was using the deferred DOMImplementation. If I
turn off the feature
("http://apache.org/xml/features/dom/defer-node-expansion") as you suggested
or if I search several times, xerces' performance is near to oracle's
xml-parser for java. (Ora: ~ 40 ms Ibm: ~ 45ms).

> Lastly, depending on *how* you search the DOM tree, you may be
> hitting points where one implementation wins over the other.
> Depending on how the tree is implemented, NodeList#item can be
> faster than Node#getFirstChild/getNextSibling. And vice versa.

I'm working on this point. Right now I'm using getElementsByTagName() and
NodeLists getLength()

Anyway! Thanks for your help/advice

Stefan.



Re: Improving speed of serializing a Document

Posted by Stefan Rauch <sr...@uos.de>.
Even if this is not the right thread I would suggest to take a good look at
the DocumentImpl.
I'm currently trying to benchmark some parsers and I found out that e.g. the
XMLDocument (which is the org.w3c.dom.Document Implementation of the oracle
parser for xml [V2.0.2.7]) is doing much better in such simple things like
searching an element in the dom.

Here are some figures

xerces1.1.1: ~1000 millis
oracle: ~40 millis
Aelfred2: ~160 millis
jaxp: ~510 millis
tidy: ~25 millis
xerces1.03: ~1250 millis

These figures are the results of counting the appearences of the element
'book' in the ot.xml (old testament from Jon Boshacks rel200.zip)
I haven't tried out performance while manipulating the dom-Implementation
but I believe things won't change that much ;-(