You are viewing a plain text version of this content. The canonical link for it is here.
Posted to c-users@xerces.apache.org by Erik Wright <er...@wrighttechnologysolutions.com> on 2008/08/12 23:33:03 UTC

erroneous calls to startEntity/endEntity?

Hi,

I think I may have discovered a bug in Xerces-C 2.8.0. It seems that  
the LexicalHandler startEntity/endEntity events are not sent  
correctly. For example, I have been parsing a valid XHTML document.  
The strict XHTML DTD includes 4 other files with entity declarations.  
I see the following events on my LexicalHandler (ignoring elements,  
characters, whitespace, external entity declarations, and comments):

startDocument
...
startDTD: html, -//W3C//DTD XHTML 1.0 Strict//EN, http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd
...
startEntity: [dtd]
...
startEntity: [dtd]
...
startEntity: [dtd]
...
startEntity: [dtd]
...
endEntity: [dtd]
...
endDTD
...
endDocument

I expected something more like the following (as generated by the  
standard SAX parser in Java 6):

startDocument
startDTD: 'html', '-//W3C//DTD XHTML 1.0 Strict//EN', 'http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd'
startEntity: '[dtd]'
startEntity: '%HTMLlat1'
endEntity: '%HTMLlat1'
startEntity: '%HTMLsymbol'
endEntity: '%HTMLsymbol'
startEntity: '%HTMLspecial'
endEntity: '%HTMLspecial'
startEntity: '%head.misc'
endEntity: '%head.misc'
startEntity: '%head.misc'
endEntity: '%head.misc'
startEntity: '%head.misc'
endEntity: '%head.misc'
startEntity: '%head.misc'
endEntity: '%head.misc'
startEntity: '%head.misc'
endEntity: '%head.misc'
startEntity: '%block'
endEntity: '%block'
startEntity: '%inline'
endEntity: '%inline'
startEntity: '%misc'
endEntity: '%misc'
startEntity: '%block'
endEntity: '%block'
startEntity: '%misc'
endEntity: '%misc'
startEntity: '%block'
endEntity: '%block'
startEntity: '%inline'
endEntity: '%inline'
startEntity: '%misc'
endEntity: '%misc'
endEntity: '[dtd]'
endDTD
startPrefixMapping: '', 'http://www.w3.org/1999/xhtml'
endPrefixMapping: ''
endDocument

At a minimum, the mismatch of startEntity/endEntity events appears to  
be caused by the following code from DTDScanner::scanExtSubsetDecl  
(notice that the conditions are not the same):

     if (fDocTypeHandler && !inIncludeSect)
         fDocTypeHandler->startExtSubset();

     ...
     ...
     ...

     if (fDocTypeHandler && isDTD)
         fDocTypeHandler->endExtSubset();

I checked Jira and didn't see any related issues. Unless someone  
specifically knows that this has already been fixed in the source  
repository, I will create an issue tomorrow.

Thanks,

Erik