You are viewing a plain text version of this content. The canonical link for it is here.
Posted to c-dev@xerces.apache.org by Robert Zimmermann <rz...@webde-ag.de> on 2004/03/16 13:03:18 UTC

validating sax parser does not report dtd or wxs errors on time

Hello,

I am not sure if this can be treated as an bug so first as a question:

Some DTD failures are reported by Xerces too late, in my case it is the
wrong order of elements.
This behaviour breaks my SAX parsing code as my interface relies on Xerces
to report invalid XML in time, and as a consequence attempts to create a
pice of information in the wrong place (which causes an segfault on Linux).
I have tested the same DTD/XML with an Python SAX parser, which reports the
DTD error in time.

I think, one who implements XML parsing with an SAX interface should be able
to rely on DTD failure reportings on time.

Sample Code:
DTD:
---------------------------
<!ELEMENT books (book)*>
<!-- author first, then title followed by price -->
<!ELEMENT book (author, title, price)>
<!ATTLIST book 
    category CDATA #REQUIRED
>
<!ELEMENT author (#PCDATA)>
<!ELEMENT title (#PCDATA)>
<!ELEMENT price (#PCDATA)>
---------------------------

XML (not valid one)
---------------------------
<!DOCTYPE books SYSTEM "books.dtd">
<books>
  <book category="reference">
    <title>Sayings of the Century</title>
    <author>Nigel Rees</author>
    <price>8.95</price>
  </book>
</books>
---------------------------

In this XML the <author> element is in the wrong position.
Xerces reports this error after <price>, more precisely when book is closed.

Also the row number of the wrong position is the one of the closing book
element. Not, as I would expect, the row of the author element.

Anyway the wrong row or column numbers in error reporting are not too bad
but the late exception in an SAX handler is fatal. Sure with the apropriate
knowledge the SAX handler could be implemented more robust, but first of all
every pice already declared in the grammar (DTD or WXS) has to be cared
about once again inside the SAX handler.

What do you guys think about this?

Error reported by SAXPrint sample of Xerces 2.5:
Error at file books_bad.xml, line 7, char 10
  Message: Element 'title' is not valid for content model:
'(author,title,price)'

Error reported by xmlproc of Python:
xml.sax._exceptions.SAXParseException: books_bad.xml:4:12: Element 'author'
missing before element 'title'


Thanks, 
 Robert

WXS = W3C XML Schema

Robert Zimmermann
Softwaredevelopment
WEB.DE AG
http://ComWin.name/rz@webde-ag.de

---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-c-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-c-dev-help@xml.apache.org


Re: validating sax parser does not report dtd or wxs errors on time

Posted by Alberto Massari <am...@progress.com>.
Hi Robert,
Xerces does the validation of an element after the element is closed, so 
that it has a list of child nodes to look at.
You will always get parsing errors as soon as they are encountered; only 
validation errors will be reported after the endElement notification.

Alberto

At 13.03 16/03/2004 +0100, Robert Zimmermann wrote:
>Hello,
>
>I am not sure if this can be treated as an bug so first as a question:
>
>Some DTD failures are reported by Xerces too late, in my case it is the
>wrong order of elements.
>This behaviour breaks my SAX parsing code as my interface relies on Xerces
>to report invalid XML in time, and as a consequence attempts to create a
>pice of information in the wrong place (which causes an segfault on Linux).
>I have tested the same DTD/XML with an Python SAX parser, which reports the
>DTD error in time.
>
>I think, one who implements XML parsing with an SAX interface should be able
>to rely on DTD failure reportings on time.
>
>Sample Code:
>DTD:
>---------------------------
><!ELEMENT books (book)*>
><!-- author first, then title followed by price -->
><!ELEMENT book (author, title, price)>
><!ATTLIST book
>     category CDATA #REQUIRED
> >
><!ELEMENT author (#PCDATA)>
><!ELEMENT title (#PCDATA)>
><!ELEMENT price (#PCDATA)>
>---------------------------
>
>XML (not valid one)
>---------------------------
><!DOCTYPE books SYSTEM "books.dtd">
><books>
>   <book category="reference">
>     <title>Sayings of the Century</title>
>     <author>Nigel Rees</author>
>     <price>8.95</price>
>   </book>
></books>
>---------------------------
>
>In this XML the <author> element is in the wrong position.
>Xerces reports this error after <price>, more precisely when book is closed.
>
>Also the row number of the wrong position is the one of the closing book
>element. Not, as I would expect, the row of the author element.
>
>Anyway the wrong row or column numbers in error reporting are not too bad
>but the late exception in an SAX handler is fatal. Sure with the apropriate
>knowledge the SAX handler could be implemented more robust, but first of all
>every pice already declared in the grammar (DTD or WXS) has to be cared
>about once again inside the SAX handler.
>
>What do you guys think about this?
>
>Error reported by SAXPrint sample of Xerces 2.5:
>Error at file books_bad.xml, line 7, char 10
>   Message: Element 'title' is not valid for content model:
>'(author,title,price)'
>
>Error reported by xmlproc of Python:
>xml.sax._exceptions.SAXParseException: books_bad.xml:4:12: Element 'author'
>missing before element 'title'
>
>
>Thanks,
>  Robert
>
>WXS = W3C XML Schema
>
>Robert Zimmermann
>Softwaredevelopment
>WEB.DE AG
>http://ComWin.name/rz@webde-ag.de
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: xerces-c-dev-unsubscribe@xml.apache.org
>For additional commands, e-mail: xerces-c-dev-help@xml.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-c-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-c-dev-help@xml.apache.org