You are viewing a plain text version of this content. The canonical link for it is here.
Posted to j-users@xerces.apache.org by Gopal Sharma <Go...@Sun.COM> on 2002/05/05 16:18:36 UTC
[Xerces2] Measuring performance and optimization
FYI
Hi,
I have forwarded this mail to _YOU_ ( general and xerces-j-user ) in view
that you might be using *Xerces 2* in one way or other and could provide
some data/details/suggestions/comments which would help us in this effort.
Thanks in advance for your valuable suggestion(s) and comment(s).
- Gopal
------------- Begin Forwarded Message -------------
Date: Fri, 3 May 2002 21:03:00 +0000 (Asia/Calcutta)
From: Rahul Srivastava <Ra...@Sun.COM>
Subject: [xerces2] Measuring performance and optimization
To: xerces-j-dev@xml.apache.org
Hi folks,
It has been long talking about improving the performance of Xerces2. There has
been some benchmarking done earlier, for instance the one done by Dennis
Sosnoski, see: http://www.sosnoski.com/opensrc/xmlbench/index.html . These
results are important to know how fast/slow xerces is as compared to other
parsers. But, we need to identify areas of improvement in xerces. We need to
calculate the time taken by each individual component in the pipeline and figure
out which component swallows how much time for various events and then we can
actually concentrate on improving performance for those areas. So, here is what
we plan to do:
+ sax parsing
- time taken
+ dom parsing
- dom construction time
- dom traversal time
- memory consumed
- considering the feature deferred-dom as true/false for all of above
+ DTD validation
- one time parse, time taken
- multiple times parse using same instance, time taken for second parse
onwards
+ Schema validation
- one time parse, time taken
- multiple times parse using same instance, time taken for second parse
onwards
+ optimising the pipeline
- calculate pipeline/component initialization time.
- calculating the time each component in the pipeline takes to propagate
the event.
- Using configurations to set up an optimised pipeline for various cases
such as novalidation, DTD validation only, etc. and calculate the
time taken.
Apart from this should we consider the existing grammar caching framework to
evaluate the performance of the parser?
We have classified the inputs to be used for this testing as follows:
+ instance docs used
- tag centric (more tags and small content say 10-50 bytes)
Type Tags#
-------------------
* small 5-50
* medium 50-500
* large >500
- content centric (less tags say 5-10 and huge content)
Type content b/w a pair of tag
-------------------------------------
* small 500 kb
* medium 500-5000 kb
* large >5000 kb
We can also have depth of the tags as a criteria for the above cases.
Actually speaking, there can be enormous combinations and different figures in
the above table that reflect the real word instance docs used. I would like to
know the view of the community here. Is this data enough to evaluate the
performance of the parser. Is there any data which is publicly available and can
be used for performance evaluation?.
+ DTD's used
- should use different types of entities
+ XMLSchema's used
- should use most of the elements and datatypes
Will it really help in any way?
Any comments or suggestions appreciated.
Thanks,
Rahul.
---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-dev-help@xml.apache.org
------------- End Forwarded Message -------------
---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-user-help@xml.apache.org