You are viewing a plain text version of this content. The canonical link for it is here.
Posted to c-dev@xerces.apache.org by "Masaoud T. Moonim" <ma...@mailandnews.com> on 2000/12/26 14:57:17 UTC

URGENT: Performance Issues

Hi,

        What I would like to know is that whether any profiling has 
been performed on the Xerces-C code.

1. If it has been done, then are those results published somewhere ?
2. What kind of XML data was used while testing ?
3. Are there any memory leaks in the code ?
4. Are there performance bottle necks in parts of the code ?
5. Also the Xerces Source is very large and contains a lot of utility 
   and misc classes (ICU, etc). Assume I only want to use the SAX 
   Parser functionality from the Xerces suite. Can I compile the 
   sources so as to reduce the size of the generated libraries, or get 
   more optimum performance ? In short can I get a highly optimized
   SAX Parser from Xerces ?

There are a lot of performance figures on the net ( XSLTMark, XSLBench
),
but these are mostly related to XSL ( Xalan-J, Xalan-C, etc ). Some 
how there is no mention of performance Issues related to Xerces-C.

Would it be possible for the Apache Xerces-C group to publish the 
performance related testing performed on Xerces-C ?

Thanks and Regards
Masaoud

P.S. : Merry Christmas and Happy New Year.

Re: URGENT: Performance Issues

Posted by Dean Roddey <dr...@charmedquark.com>.
> 1. If it has been done, then are those results published somewhere ?

Its not published anywhere, but we did signifacant amounts of profiling
using VTune, so there isn't much in the way of low lying fruit left in it
(at least in the core parser parts, I can't speak for the DOM level.)

> 2. What kind of XML data was used while testing ?

All kinds. There are lots of different types of file content available in
the various test suites. And other folks have done similar work and given us
their results.

> 3. Are there any memory leaks in the code ?

Not that we know of course, or it would have been fixed already.

> 4. Are there performance bottle necks in parts of the code ?

Any system like an XML parser or a language parser of any sort has a
critical data path via which character data is sucked through a straw. That
straw is necessary in order to centralize important things like transcoding,
keeping up with line/col info, doing newline normalization, etc... But of
course that is the part of the code that was most profiled and tweaked.

> 5. Also the Xerces Source is very large and contains a lot of utility
>    and misc classes (ICU, etc). Assume I only want to use the SAX
>    Parser functionality from the Xerces suite. Can I compile the
>    sources so as to reduce the size of the generated libraries, or get
>    more optimum performance ? In short can I get a highly optimized
>    SAX Parser from Xerces ?
>

If you can live with the set of encodings: ASCII, UTF-8, UTF-16, UCS-4,
8859-1, Win-1252, and a couple EBCDICs, then you can toss ICU out. Also, on
each platform, there is a platform specific transcoding implementation that
uses whatever capabilities are available on that platform. This is easy to
do, and just involves a slight change to the build settings and a rebuild.

> Would it be possible for the Apache Xerces-C group to publish the
> performance related testing performed on Xerces-C ?
>

Not for me to say. I'm not on the official team anymore. But given that they
were never really officially written down, that would probably require that
they be done all over again, which might be more than they would consider
worth it. The answert might be that if you want numbers, just do them
yourself.

--------------------------
Dean Roddey
The CIDLib C++ Frameworks
Charmed Quark Software
droddey@charmedquark.com
http://www.charmedquark.com

"It takes two buttocks to make friction"
    - African Proverb