You are viewing a plain text version of this content. The canonical link for it is here.
Posted to p-dev@xerces.apache.org by Steve Mathias <sm...@unm.edu> on 2003/10/15 18:14:49 UTC
Schema Support Issues

Hi Jason,

>>>>> "Jason" == Jason E Stewart <ja...@openinformatics.com> writes:

Jason> Hmmm.. In all honesty schema support is the not a very well
Jason> tested feature of Xerces.pm. If you don't need schemas, this will
Jason> not affect you. If you do, then please test them better than I
Jason> have - and let me know if anything breaks.

Unfortunately, I do need schema support: I'm developing a web service
that returns XML and I want to validate that XML against a schema before
sending it out.  This is what led me to investigating and installing
xerces-p to begin with.

It seems that schema support does work, although there are definitley
some weirdnesses involved ;-) I think these are mostly due to issues
related to SWIG and/or XS framework, neither of which I am very familiar
with.  The attached script (xerces.pl) works for me and gives the same
results as the SAX2Count sample program from xerces-c (see below for a
minor exception to this) with both valid and invalid documents.

The biggest problem I had was in figuring out a way to handle parsing
invalid documents gracefully.  This is crucial for me, since I plan to
be calling the parser from a web service server that I obviously do not
want to die.  When given an invalid document, the parser does something
bizarre until it runs out of memory.  I can get around the out of memory
problem by setting the
http://apache.org/xml/features/validation-error-as-fatal feature to
true.  The parser still seg faults, but things are recoverable from the
perspective of the script.  The problem then was that when I tried to
reuse the perl parser object, it thought the parser was still parsing.
To get around this, I'm re-initializing a new parser whenever a parsing
error is encountered.  

This seems to work with my documents and schema, but I'm not very
comfortable with it because I tried to write a test script for Xerces.pm
based on all of the above and no matter what I do, I can't seem to get
around the out of memory error.  This was using personal-schema.xml and
personal.xsd in the XML-Xerces-2.3.0-1/samples directory.  I don't have
any other schema to test with handy, so I don't know if it's an accident
that mine works or that yours doesn't.  Or maybe something is up with
your schema and/or referencing it from the document.  I'm no XML schema
expert.

Regarding the minor exception referred to above, it seems that when
parsing multiple documents the counts in the content handler are not
reset between documents.  The problem can be demonstrated as follows:

# SAX2Count -v=always -f seqdb_9921.xml
seqdb_9921.xml: 230 ms (44 elems, 2 attrs, 902 spaces, 3070 chars)
# cat > foo.list
seqdb_9921.xml
seqdb_9921.xml
^D
# SAX2Count -v=always -f -l foo.list   
==Parsing== seqdb_9921.xml
seqdb_9921.xml: 42 ms (44 elems, 2 attrs, 902 spaces, 3070 chars)
==Parsing== seqdb_9921.xml
seqdb_9921.xml: 37 ms (88 elems, 4 attrs, 1804 spaces, 6140 chars)

I don't know whether or not this is a "feature" of SAX2Count, although I
doubt it since it is not the behaviour displayed by either SAXCount or
DOMCount.  It is certainly not what I expected.  Maybe that's something
for the xerces-c crew?

Anyway, hope some of this is useful.

Cheers,
Steve