You are viewing a plain text version of this content. The canonical link for it is here.
Posted to j-users@xerces.apache.org by Pieter van der Spek <pi...@West.NL> on 2004/05/19 14:44:18 UTC

performce issues between XMLReader and SAXParser

I've written a small validating parser using the Xerces SAX parser.
However, I have found a performance problem between to similar implementations.
The code for both is shown below:

public static void validate(String instance)
  {
    try
    {
      XMLReader parser= null;
      SAXParser parser2= new SAXParser();

      try {
      	parser= XMLReaderFactory.createXMLReader(
      	  "org.apache.xerces.parsers.SAXParser"
      	);
      } catch(SAXException se) {
      	parser= XMLReaderFactory.createXMLReader();
      	System.out.println("oops");
      }

      //Validate the document and report validity errors.
      parser.setFeature("http://xml.org/sax/features/validation", true);
      parser2.setFeature("http://xml.org/sax/features/validation", true);

      //Turn on XML Schema validation by inserting XML Schema
      // validator in the pipeline.
      parser.setFeature(
        "http://apache.org/xml/features/validation/schema", true
      );
      parser2.setFeature(
            "http://apache.org/xml/features/validation/schema", true
          );

      //Set the external schema location.
      parser.setProperty(
        "http://apache.org/xml/properties/schema/external-noNamespaceSchemaLocation",
        "file:///myschema.xsd"
      );
      parser2.setProperty(
            "http://apache.org/xml/properties/schema/external-noNamespaceSchemaLocation",
            "file:///myschema.xsd"
          );

      ErrorHandler errors = new ErrorHandler();
      parser.setErrorHandler(errors);
      parser2.setErrorHandler(errors);

      //Parse an XML document
      System.out.println("XMLReader:");
      System.out.println("Start: "+System.currentTimeMillis());
      parser.parse(instance);
      System.out.println("End: "+System.currentTimeMillis());
      System.out.println("-------");
      System.out.println("SAXParser:");
      System.out.println("Start: "+System.currentTimeMillis());
      parser2.parse(instance);
      System.out.println("End: "+System.currentTimeMillis());

      if (!errors.errorSeen())
      {
        System.out.println("Sucessfully validated " + instance);
      }
    }
    catch (Exception e)
    {
      System.out.print("Problem parsing the file:");
      System.out.println(e.getMessage());
      e.printStackTrace();
    }

The output from the four calls to System.currentTimeMillis() is the following
(this can vary from time to time of course):
XMLReader:
Start: 1084970176879
End: 1084970181622
-------
SAXParser:
Start: 1084970181625
End: 1084970182964


This tells me that the SAXParser takes 1339 milliseconds to complete while the
XMLReader takes 4743 milliseconds to complete!
What causes this difference. I expected both implementations to be
approximately as fast.
I hope someone can clear this up for me.
Thanks in advance.

Greetings,
Pieter van der Spek



                                   @ @
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-oOO-(_)-OOo-=-=-=-=-=

Pieter van der Spek
---- West Consulting B.V.        - www.west.nl
---- Tu Delft / Computer Science - www.tudelft.nl

---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-user-help@xml.apache.org


Re: performce issues between XMLReader and SAXParser

Posted by Pieter van der Spek <pi...@West.NL>.
Andy Clark said:
>
> And what happens when you run the same thing with the SAXParser
> first and the XMLReader second? Then try running a test parse
> *before* timing any parsing runs.
>
> I'm guessing that your performance difference is due to Java
> loading the parser classes the first time you parse. After that,
> the classes are loaded and it's much faster.
>
I tried your suggestion and your absolutely right. After the first, untimed
parse, both methods are just as fast. So the "problem" here indeed is the
class being loaded.

Greetings,
Pieter van der Spek


                                   @ @
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-oOO-(_)-OOo-=-=-=-=-=

Pieter van der Spek
---- West Consulting B.V.        - www.west.nl
---- Tu Delft / Computer Science - www.tudelft.nl

---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-user-help@xml.apache.org


Re: performce issues between XMLReader and SAXParser

Posted by Andy Clark <an...@cyberneko.net>.
Pieter van der Spek wrote:
>       //Parse an XML document
>       System.out.println("XMLReader:");
>       System.out.println("Start: "+System.currentTimeMillis());
>       parser.parse(instance);
>       System.out.println("End: "+System.currentTimeMillis());
>       System.out.println("-------");
>       System.out.println("SAXParser:");
>       System.out.println("Start: "+System.currentTimeMillis());
>       parser2.parse(instance);
>       System.out.println("End: "+System.currentTimeMillis());
 >
 > [...]
> 
> This tells me that the SAXParser takes 1339 milliseconds to complete while the
> XMLReader takes 4743 milliseconds to complete!
> What causes this difference. I expected both implementations to be
> approximately as fast.

And what happens when you run the same thing with the SAXParser
first and the XMLReader second? Then try running a test parse
*before* timing any parsing runs.

I'm guessing that your performance difference is due to Java
loading the parser classes the first time you parse. After that,
the classes are loaded and it's much faster.

-- 
Andy Clark * andyc@apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-user-help@xml.apache.org