You are viewing a plain text version of this content. The canonical link for it is here.
Posted to general@xerces.apache.org by "Urobushkin, Gleb" <Gl...@westgroup.com> on 2000/02/24 03:54:29 UTC

repost: Xerces4J/SAX -- internal subset: issues with enity decla rations

Insights from developers will be of great help
Thank you
----------------------------------

> I am observing an inconsistent behavior in the treatment of internal
> subset declarations by a SAX parser from Xerces300ea3 for Java.
> 
> sax.SAXWriter from the samples jar was used as a parsing application
> 
> Issues:
> 
> 1. Scanning or buffering error (absent in older ibm4j) when reading a
> large file of character entity declarations. The declarations are referred
> to from the internal subset via a parameter entity.
> 
> ==============================
> <!DOCTYPE dummy [ 
> 	<!ENTITY % entts SYSTEM "allchars.ent">
> 	%entts;
> ]>
> <test>
> text
> </test>
> ==============================
> 
>  <<enttest.ent>> 
> ...
> <!-- Entity set.
>      Public identifier:
>      -//ISO 8879:1986//ENTITIES Added Math Symbols: Relations//EN
> -->
> ...
> <!-- take me out and xerces will break WEIRD -->
> <!ENTITY ape    "&#38;#38;ape;">    <!-- approximate, equals -->
> ...
> (from attachment enttest.ent)
> ==============================  output ==
> 
> [Fatal Error] enttest.ent:577:3: The markup declaration contained  ...
> 
> 
> In the same situation, everything *might* work if the DTD fragment
> contains fewer entity declarations (other things like adding a blank line
> can make it work too). 
> 
> 2. Additional declarations in the internal subset can not override the
> same declarations previously read from the external resource or from the
> same subset.
> 
> <!DOCTYPE dummy [ 
> 	<!ENTITY % entts SYSTEM "enttest.ent">
> <!-- has aacute mapping to itself -->
> 	%entts;
> 	<!ENTITY aacute "&#38;#38;xaacute;">
> ]>
> <test>
> text &aacute; text
> </test>
> 
> or
> 
> <!DOCTYPE dummy [ 
> 	<!ENTITY aacute "&#38;#38;aacute;">
> 	<!ENTITY aacute "&#38;#38;xaacute;">
> ]>
> <test>
> text &aacute; text
> </test>
> 
> ============================output ===
> <test>
> text &amp;aacute; text
> </test>
> 
> 
> I expected to see the following output
> <test>
> text &amp;xaacute; text
> </test>
> 
>  
> 3. The same relates to the following test that tries to unsuccessfully
> override one of the predefined XML entities
> 
> ===============================
> <!DOCTYPE dummy [ 
>       	<!ENTITY amp "&#38;#38;xxamp;">
> ]>
> <test>
> text &amp; text
> </test>
> =============================== output ===
> <test>
> text &amp; text
> </test>
> ===============================
> 
> A DOM parser would behave differently, BTW
> 
> 
> I think all these issues manifest internal bugs that need to be fixed.
> Thank you,
> 
> Gleb
> 
>