You are viewing a plain text version of this content. The canonical link for it is here.
Posted to j-dev@xerces.apache.org by Joseph Kesselman/CAM/Lotus <jo...@us.ibm.com> on 2002/05/01 00:25:51 UTC

Schema validation -- known performance problem?

I've got an ... interesting ... result on my hands.

I've been experimenting with feeding an application (Xalan) directly from
an XNI stream rather than from a SAX stream. The experimental code's
basically just doing a conversion from XNI to SAX and calling my normal SAX
handlers... plus a bit of additional hacking about if we happen to find a
PSVI annotation.

With schema validation turned off, performance of the XNI setup is
comparable to that of the SAX version -- as expected, since my XNI-to-SAX
conversion is probably very similar to what you folks do.

HOWEVER -- when I turn on the schema validator, parser performance falls
through the floor -- even though none of the test documents references a
schema, and only two of them reference a DTD. The parse() operation takes
almost twice as long to complete.

JProbe calls out the following as accounting for most of the difference.
The measurement is how much fasterr NON-schema-validated is versus
shema-validated, and includes all methods called by the named method. I
haven't attempted to sort them into who-calls-who order, though I suspec
the time sort actually comes pretty close to achieving that. (Apologies in
advance if this doesn't line up nicely on your screen; try a fixed-pitch
font.)

                                                        Cumulative time
StandardParserConfiguration.parse(boolean)              -169872 (-48.7%)
XMLDocumentFragmentScannerImpl.scanDocument(boolean)    -167192 (-48.4%)
XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(boolean)
                                                        -167182 (-48.4%)
XMLNamespaceBinder.handleStartElement(QName, XMLAttributes, Augmentations,
boolean)
                                                        -112707 (-67.2%)
XMLDocumentFragmentScannerImpl.scanStartElement()       -112687 (-59.3%)
XMLDTDValidator.startElement(QName, XMLAttributes, Augmentations)
                                                        -112335 (-65.1%)
XMLNamespaceBinder.startElement(QName, XMLAttributes, Augmentations)
                                                        -112305 (-66.8%)
Is this a known issue (possibly already patched)? Or have I botched the
parser configuration somehow?

If the answer is "yes, it's slow and we're working on it" or "we've
improved in in the current CVS code", that's fine... but I thought I should
make sure you were aware of this, and ensure that it wasn't something
particularly stupid in my own code, before I proceeded to work on trying to
optimize my end of things.



---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-dev-help@xml.apache.org


Re: Schema validation -- known performance problem?

Posted by Elena Litani <el...@ca.ibm.com>.
Joe, 

Joseph Kesselman/CAM/Lotus wrote:
> HOWEVER -- when I turn on the schema validator, parser performance falls
> through the floor -- even though none of the test documents references a
> schema, and only two of them reference a DTD. The parse() operation takes
> almost twice as long to complete.

This is a single parse(), correct? I mean you did not use any warm-up..?
The performance falls for a single parse since the cost of adding XML
Schema Validator is included in the parse() method. To get correct
performance measure you should exclude the first parse from your
performance tests. If you do so, the time difference in parsing with
"-v" or "-v -s" options is insignificant.

I am not sure how and if we can improve initialization time for
XMLSchemaValidator, but since initialization happens just one, for the
parsers used in run-more-than-once scenario the performance difference
should be minor.

As Henry noticed if XML Schema is included in the pipeline we do some
work in startElement(). I am not sure how we can change it unless we can
come up with a new feature stating something like: "if DOCTYPE is found
validate only against DTD" ... (I am not sure we want to do it).

Currently we try to validate against both: DTDs and XML Schemas. That is
why we do check if XML Schema is found on some element.
 
-- 
Elena Litani / IBM Toronto

---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-dev-help@xml.apache.org