You are viewing a plain text version of this content. The canonical link for it is here.
Posted to j-users@xerces.apache.org by Mike Brown <mi...@skew.org> on 2001/03/21 02:25:13 UTC

weird XML4J vs Xerces parse times

Hi,

I'm doing a cursory look at optimization options for one of my company's
software products. We've been relying on XML4J 2.0.15 since it was new,
but I'd like to upgrade to either XML4J 3.1.1 or Xerces-J 3.1.1.

It was my understanding that XML4J after 2.0.15 was just wrapper classes
on top of Xerces. I would expect that there would be a little, hopefully
negligible, extra overhead if I were to run XML4J instead of Xerces for
general SAX and DOM parsing.

However, when I started running my own simple benchmarks for SAX parsing,
I found that Xerces-J 3.1.1 only outperforms XML4J 3.1.1 during initial
parser construction. The first parse() is taking the same amount of time
in each, and each subsequent parse() consistently takes about 25% longer
in Xerces than in XML4J. These are not the results I was expecting to see!

                       XML4J  |  XML4J  | Xerces
                      2.0.15  |  3.1.1  |  3.1.1
                      ======  |  =====  | ======
parser construction    0.09s  |  0.42s  |  0.38s
first parse()          0.62s  |  1.20s  |  1.20s
subsequent parse()s    0.09s  |  0.08s  |  0.10s

Averages are shown, rounded up to the nearest 0.01 second. XML4J 2.0.15
results are shown just for fun, though it's interesting to see the amount
of overhead that has crept into the code.

I am really just wondering why there is a consistently higher parse() time
with Xerces (~100ms) versus XML4J (~80ms). If I am recommending a parser
for maximum scalability, these seemingly small differences can be of
great concern.

Now I realize these kinds of informal tests are frought with variable
factors, but the results were consistent and leave me puzzled. To rule out
some causes of concern, I share the following details:

The file being parsed was a 249K XSL document containing 4,786 lines. The
runtime is the java.exe that comes with JDK 1.2.2. The test machine is a
PIII-450 running NT4 SP6. The test code was very simple, using
System.currentTimeMillis() just like the SAXCount demo. The code changed
since I compiled the results, but a relatively accurate excerpt follows.
Also note that the same methodology was used for timing the parser
constructor call.

    public void parseDocAtURI( String uri )
    {
        // prepare an input source for the parser
        // the time on this is negligible, always 10ms or less
        InputSource parserinput = new InputSource( uri );

        // parse the document
        totalTime = 0;
        int n = 10; // number of iterations
        if ( verbosity > 0 ) { System.err.println( "Parsing the same input " + n + " times..." ); }

        for ( int i = 0; i < n; i++ ) {
            try {
                beginTime = System.currentTimeMillis();
                parser.parse( parserinput );
            }
            catch ( Exception e )
            {
                System.err.println( "Error during parsing: " + e.getMessage() );
            }
            finally
            {
                endTime = System.currentTimeMillis();
                totalTime = totalTime + endTime - beginTime;
                if ( verbosity > 1 ) { System.err.println( "Time to parse: " + (endTime - beginTime) + "ms." ); }
            }
        }
        if ( verbosity > 0 ) { System.err.println( "Avg time to parse: " + (totalTime / n) + "ms." ); }
    }

To test XML4J I imported com.ibm.xml.parsers.SAXParser
and set my CLASSPATH to include the xml4j.jar.

And to test Xerces-J I instead imported
org.apache.xerces.parsers.SAXParser
and set my CLASSPATH to include only the xerces.jar.

Any feedback appreciated. Thanks.

   - Mike
____________________________________________________________________
Mike J. Brown, software engineer at            My XML/XSL resources: 
webb.net in Denver, Colorado, USA              http://skew.org/xml/

---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-user-help@xml.apache.org