You are viewing a plain text version of this content. The canonical link for it is here.
Posted to j-users@xerces.apache.org by Curtiss Howard <cu...@gmail.com> on 2004/12/18 21:40:51 UTC

Slow SAX parse when using schema validation?

Just for unscientific benchmarking purposes, I tried timing a DOM and
SAX parse of the same file with the same settings (grammar pool on,
don't generate PSVI, etc.).  Both parsers use schema validation.

For a small file, SAX only had a 20% performance advantage over DOM. 
I'd expected much more.  Surprisingly enough, as I increased the size
of the file being parsed, the performance advantage _narrowed_, not
widened, as I'd expected.  Obviously the parse is going to be somewhat
slow due to schema validation, but I'm confused as to why SAX is
performing so poorly compared to DOM.  Can anyone shed some light on
this?


Thanks,


Curtiss Howard

---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-user-help@xml.apache.org


Re: Slow SAX parse when using schema validation?

Posted by Elena Litani <el...@ca.ibm.com>.
Hi Curtiss,

> Curtiss Howard wrote:
> > For a small file, SAX only had a 20% performance advantage over DOM. 
> > I'd expected much more.

By default Xerces uses deferred DOM implementation so if you don't attempt 
to access any information from the DOM tree, you are pretty much only 
testing the "parsing" part of the implementation. If you try to access 
tree, you'll see that the deferred implementation is more efficient than 
Xerces DOM implementation for larger XML documents and less efficient for 
smaller ones. It does not look right though that for large documents 
difference between SAX and DOM is less than 20%.

You might find this article interesting:
http://www-106.ibm.com/developerworks/xml/library/x-injava/

For more info on Xerces performance:
http://www-106.ibm.com/developerworks/xml/library/x-perfap2.html

Thank you,
-- 
Elena Litani / IBM Toronto

Re: Slow SAX parse when using schema validation?

Posted by Michael Glavassevich <mr...@ca.ibm.com>.
That's not at all how the deferred DOM implementation works. No part of 
the document is skipped. Deferred DOM defers creation of nodes [1], only 
building them as the tree is traversed. The scanner which sits at the 
front of the pipeline parses the document and if the document isn't 
well-formed it'll report that.

[1] http://xml.apache.org/xerces2-j/features.html#dom.defer-node-expansion

Elliotte Harold <el...@metalab.unc.edu> wrote on 12/19/2004 05:58:56 AM:

> Curtiss Howard wrote:
> 
> 
> > For a small file, SAX only had a 20% performance advantage over DOM. 
> > I'd expected much more.  Surprisingly enough, as I increased the size
> > of the file being parsed, the performance advantage _narrowed_, not
> > widened, as I'd expected.  Obviously the parse is going to be somewhat
> > slow due to schema validation, but I'm confused as to why SAX is
> > performing so poorly compared to DOM.  Can anyone shed some light on
> > this?
> 
> Is it possible you're using the deferred DOM implementation? If so, 
> Xerces DOM parser is not actually parsing part of the file until you 
> actually walk the tree. That would explain why it seems to speed up with 

> larger documents: more to skip. Personally I think this behavior is 
> nonconformant to the XML and DOM specifications--it fails to detect 
> well-formedness errors as early as required--but others disagree with 
me.
> 
> -- 
> Elliotte Rusty Harold  elharo@metalab.unc.edu
> XML in a Nutshell 3rd Edition Just Published!
> http://www.cafeconleche.org/books/xian3/
> http://www.amazon.com/exec/obidos/ISBN=0596007647/cafeaulaitA/ref=nosim
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
> For additional commands, e-mail: xerces-j-user-help@xml.apache.org
> 

Michael Glavassevich
XML Parser Development
IBM Toronto Lab
E-mail: mrglavas@ca.ibm.com
E-mail: mrglavas@apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-user-help@xml.apache.org


Re: Slow SAX parse when using schema validation?

Posted by Elliotte Harold <el...@metalab.unc.edu>.
Curtiss Howard wrote:


> For a small file, SAX only had a 20% performance advantage over DOM. 
> I'd expected much more.  Surprisingly enough, as I increased the size
> of the file being parsed, the performance advantage _narrowed_, not
> widened, as I'd expected.  Obviously the parse is going to be somewhat
> slow due to schema validation, but I'm confused as to why SAX is
> performing so poorly compared to DOM.  Can anyone shed some light on
> this?

Is it possible you're using the deferred DOM implementation? If so, 
Xerces DOM parser is not actually parsing part of the file until you 
actually walk the tree. That would explain why it seems to speed up with 
larger documents: more to skip. Personally I think this behavior is 
nonconformant to the XML and DOM specifications--it fails to detect 
well-formedness errors as early as required--but others disagree with me.

-- 
Elliotte Rusty Harold  elharo@metalab.unc.edu
XML in a Nutshell 3rd Edition Just Published!
http://www.cafeconleche.org/books/xian3/
http://www.amazon.com/exec/obidos/ISBN=0596007647/cafeaulaitA/ref=nosim

---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-user-help@xml.apache.org