You are viewing a plain text version of this content. The canonical link for it is here.
Posted to j-users@xerces.apache.org by Adrian Crum <ad...@yahoo.com> on 2009/04/29 04:56:32 UTC

DOM vs SAX performance problem

Hello all.

I'm trying to convert from DOM parsing to SAX parsing. The basic code I'm using is:

DocumentBuilderFactory factory = new org.apache.xerces.jaxp.DocumentBuilderFactoryImpl();
factory.setValidating(validate);
factory.setNamespaceAware(true);
        factory.setAttribute("http://xml.org/sax/features/validation", validate);
        factory.setAttribute("http://apache.org/xml/features/validation/schema", validate);
DocumentBuilder builder = factory.newDocumentBuilder();
Document document = builder.parse(inputStream);

for DOM parsing, and

SAXParserFactory spf = new org.apache.xerces.jaxp.SAXParserFactoryImpl();
spf.setValidating(validate);
spf.setNamespaceAware(true);
        spf.setFeature("http://apache.org/xml/features/validation/schema", validating);
SAXParser parser = spf.newSAXParser();
parser.parse(inputStream, handler);

for SAX parsing.

Using the same source XML file (that references an XSD, and is about 68 KB), the SAX parsing runs 10 times slower than the DOM parsing. If I disable the schema validation, the SAX parsing runs faster than the DOM parsing - but the objects that are created are missing default attributes specified in the XSD. (By the way, the same Java objects are created in both scenarios - they have two constructors: one for DOM and one for SAX.)

Ideally, I would like to configure the SAX parser to just use the XSD to supply default attribute values, and not use it for validation.

I've Googled and searched the Xerces website for an answer, but I didn't find one. I need the SAX parsing to run as fast or faster than the DOM parsing with validation turned on.

Can anyone help me?

-Adrian



      

---------------------------------------------------------------------
To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
For additional commands, e-mail: j-users-help@xerces.apache.org


Re: DOM vs SAX performance problem

Posted by Michael Glavassevich <mr...@ca.ibm.com>.
Hi Adrian,

There's lots you haven't said about what work you're actually measuring
(e.g. what your SAX ContentHandler does) or how you're doing it (e.g. are
you running warm-up iterations to let the JIT do its optimizations before
starting to do your timing?), though I suspect at least a good chunk of
what you're seeing is due to the cost of processing the 68 KB schema and
not the actual validation time with it. You should take a look at the
grammar caching capabilities which Xerces has (i.e. load the schema once;
use it many times), in particular the JAXP 1.3 Validation API. See the FAQ
here [1] on how to use the JAXP Validation API as well as this one [2] on
general performance.

Thanks.

[1] http://xerces.apache.org/xerces2-j/faq-pcfp.html#faq-4
[2] http://xerces.apache.org/xerces2-j/faq-performance.html

Michael Glavassevich
XML Parser Development
IBM Toronto Lab
E-mail: mrglavas@ca.ibm.com
E-mail: mrglavas@apache.org

Adrian Crum <ad...@yahoo.com> wrote on 04/28/2009 10:56:32 PM:

> Hello all.
>
> I'm trying to convert from DOM parsing to SAX parsing. The basic
> code I'm using is:
>
> DocumentBuilderFactory factory = new org.apache.xerces.jaxp.
> DocumentBuilderFactoryImpl();
> factory.setValidating(validate);
> factory.setNamespaceAware(true);
>         factory.setAttribute("http://xml.org/sax/features/validation
> ", validate);
>
factory.setAttribute("http://apache.org/xml/features/validation/schema
> ", validate);
> DocumentBuilder builder = factory.newDocumentBuilder();
> Document document = builder.parse(inputStream);
>
> for DOM parsing, and
>
> SAXParserFactory spf = new org.apache.xerces.jaxp.SAXParserFactoryImpl();
> spf.setValidating(validate);
> spf.setNamespaceAware(true);
>         spf.setFeature("http://apache.org/xml/features/validation/schema
> ", validating);
> SAXParser parser = spf.newSAXParser();
> parser.parse(inputStream, handler);
>
> for SAX parsing.
>
> Using the same source XML file (that references an XSD, and is about
> 68 KB), the SAX parsing runs 10 times slower than the DOM parsing.
> If I disable the schema validation, the SAX parsing runs faster than
> the DOM parsing - but the objects that are created are missing
> default attributes specified in the XSD. (By the way, the same Java
> objects are created in both scenarios - they have two constructors:
> one for DOM and one for SAX.)
>
> Ideally, I would like to configure the SAX parser to just use the
> XSD to supply default attribute values, and not use it for validation.
>
> I've Googled and searched the Xerces website for an answer, but I
> didn't find one. I need the SAX parsing to run as fast or faster
> than the DOM parsing with validation turned on.
>
> Can anyone help me?
>
> -Adrian
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
> For additional commands, e-mail: j-users-help@xerces.apache.org