You are viewing a plain text version of this content. The canonical link for it is here.
Posted to j-users@xerces.apache.org by Yonghui Chen <ch...@yahoo.com> on 2002/03/10 16:41:31 UTC

About performance

Hi all,

I have just tried 3 parsers, Xerces Java 1, Xerce Java 2 and Crimson, use both DOM and SAX parser parse a 960kb XML file for 10 times, the time cost are:

           Xerces 1        Xerces 2      Crimson
DOM     3906               4937            4697
SAX      1322               1832            1332

It is said that my best choose is Xerces1. So, what's the improves in Xerces 2?

Re: About performance

Posted by Andy Clark <an...@apache.org>.
Yonghui Chen wrote:
> I have just tried 3 parsers, Xerces Java 1, Xerce Java 2 and Crimson,
> use both DOM and SAX parser parse a 960kb XML file for 10 times, the
> time cost are:

What are you using to achieve your results? Are you only
starting a VM, parsing a single document, showing the time,
and exiting the VM? This is not a fair comparison and does
not match real-world use of the parser.

The sax.Counter, dom.Counter, and xni.Counter samples that
come with Xerces2 are very convenient and can provide you
a "poor man's" performance test. The xni.Counter is the
one I use and I'll explain why.

Xerces2 is designed around the new Xerces Native Interface
(XNI) which allows us to more easily create new types of
parsers and re-use the same code to generate DOM trees,
emit SAX events, etc. The default parser configuration does
everything: full-fledged scanning of XML documents, DTD
validation, namespace binding, XML Schema validation, etc.

Depending on your needs, however, you can play tricks with
the parser configuration. For example, if you know that the
documents are generated and therefore are always well-formed
and valid, then you do not need to perform validation. So
the validation components can be removed from the pipeline
to improve performance.

Getting back to my point...

The xni.Counter sample (as well as the other XNI samples)
allow you to set the parser configuration by name so that
you can easily test new parser configurations. There is
an XNI sample included that creates a non-validating
parser configuration. You can use this with the xni.Counter
sample to see how much performance can be gained by not
validating every document.

This is just one example of ways to achieve better perf,
though. However, if you *need* validation then you must
find another way to improve performance. I will say a
few words on this issue, though.

First, in some areas Xerces2 will never be as fast as
Xerces 1.x. In particular, we made the decision in the
Xerces2 implementation to always transcode the document
(i.e. changing the bytes of the document into Java chars).
The old parser would defer this work until needed but
this created a situation where we had duplicated code
which introduced the possibility of more bugs. Also, defer-
ring the conversion of the underlying bytes was an issue in 
terms of memory usage.

Also, Xerces2 has much better support for the various
standards and other features than its predecessor. You
can't do more work in less time so this is one reason
why Xerces2 may appear initially slower. However, we
believe that the inherent modularity of the system is
better in the long-run for continued maintanence and
extension of the parser to add new features in the
future.

Lastly, we have not done serious performing tuning on
the new Xerces2 codebase. So we know that this is an
area in particular that we can definitely improve in
subsequent releases. We want to make the parser faster
and better but the standard parser configuration may
not match Xerces 1.x for larger documents. Xerces 1.x
was heavily optimized but not very flexible so we are
accepting a slight performance hit in certain areas.

But please hang in there -- it will get better! :)

-- 
Andy Clark * andyc@apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-user-help@xml.apache.org


Re: About performance

Posted by Yonghui Chen <ch...@yahoo.com>.
Thanks for your tip.
--- Dennis Sosnoski <dm...@sosnoski.com> wrote:
> From what I've seen Xerces2 performance is better
> for smaller documents,
> while Xerces1 is better for larger documents. Your
> file is a large one
> by my standards.
> 
> Xerces1 appeared to have a lot of initialization
> overhead for a parse.
> The Xerces2 code avoids that overhead, but long term
> performance on
> larger documents definitely suffers as shown by your
> timings. It'd be
> great if the Xerces2 performance on larger documents
> could be improved
> to match Xerces1.
> 
> - Dennis
> 
> Yonghui Chen wrote:
> 
> > Hi all,
> >
> > I have just tried 3 parsers, Xerces Java 1, Xerce
> Java 2 and Crimson,
> > use both DOM and SAX parser parse a 960kb XML file
> for 10 times, the
> > time cost are:
> >
> > Xerces 1 Xerces 2 Crimson
> >
> > DOM 3906 4937 4697
> >
> > SAX 1322 1832 1332
> >
> > It is said that my best choose is Xerces1. So,
> what's the improves in
> > Xerces 2?
> >
> 
> 
> 
>
---------------------------------------------------------------------
> To unsubscribe, e-mail:
> xerces-j-user-unsubscribe@xml.apache.org
> For additional commands, e-mail:
> xerces-j-user-help@xml.apache.org
> 


__________________________________________________
Do You Yahoo!?
Try FREE Yahoo! Mail - the world's greatest free email!
http://mail.yahoo.com/

---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-user-help@xml.apache.org


Re: About performance

Posted by Dennis Sosnoski <dm...@sosnoski.com>.
>From what I've seen Xerces2 performance is better for smaller documents,
while Xerces1 is better for larger documents. Your file is a large one
by my standards.

Xerces1 appeared to have a lot of initialization overhead for a parse.
The Xerces2 code avoids that overhead, but long term performance on
larger documents definitely suffers as shown by your timings. It'd be
great if the Xerces2 performance on larger documents could be improved
to match Xerces1.

- Dennis

Yonghui Chen wrote:

> Hi all,
>
> I have just tried 3 parsers, Xerces Java 1, Xerce Java 2 and Crimson,
> use both DOM and SAX parser parse a 960kb XML file for 10 times, the
> time cost are:
>
> Xerces 1 Xerces 2 Crimson
>
> DOM 3906 4937 4697
>
> SAX 1322 1832 1332
>
> It is said that my best choose is Xerces1. So, what's the improves in
> Xerces 2?
>



---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-user-help@xml.apache.org