You are viewing a plain text version of this content. The canonical link for it is here.
Posted to j-users@xerces.apache.org by "Dr. Stefan Bettner" <St...@gmx.de> on 2021/09/21 17:48:58 UTC

PE Reference splitting in INCLUDE-section

Hi everybody,

I have a question regarding the Java xml parser. I have some behavior
that baffles me, regarding the replacement text of parameter entity
references within a conditional INCLUDE-section. And I would really be
grateful if someone could explain it to me.

When I look into the xml specification (fifth edition), they state:

"Well-formedness constraint: PE Between Declarations
The replacement text of a parameter entity reference in a DeclSep MUST
match the production extSubsetDecl."

That prevents for example to split a markup declaration into the
replacement text of two separate parameter entities. Like for example
the entity-declaration

<!ENTITY copyright '(C)'>

cannot be split into two parameter entity references like that:

<!ENTITY % A "<!ENTITY ">
<!ENTITY % B "copyright '(C)'>">
%A;%B;

because in that case the replacement text for %A; would not match the
production extSubsetDecl (as required), because it is incomplete.

And the Java xml-parser reports the above as a fatal error, in
validation and non-validation-mode alike. So far so good.

But strangely the situation changes when the expression %A;%B; is put
into a conditional INCLUDE-section, like that:

<!ENTITY % A "<!ENTITY ">
<!ENTITY % B "copyright '(C)'>">
<![INCLUDE[%A;%B;]]>

In that case the Java xml-parser has no problem at all with the above in
non-validation-mode, and in validation-mode only two validation errors
are given, but no fatal error. In both cases the entity "copyright" is
declared and can be used within the document.

How can that be? I would really be grateful if someone could explain
that to me.

When I look at the grammatical definition of a conditional include-section

[62]       includeSect       ::=       '<![' S? 'INCLUDE' S? '['
extSubsetDecl ']]>'

the inner part should be an extSubsetDecl. So with looking at

[31]       extSubsetDecl       ::=       ( markupdecl | conditionalSect
| DeclSep)*

in our case, when processing the INCLUDE-section, %A; can only match a
DeclSep, and with the well-formedness constraint meantioned above, I
would have assumed that the replacement text of %A; "MUST match the
production extSubsetDecl". But it does not, since the replacement text
of %A; is incomplete.

I would be grateful for any hint.
Thank you so much for your work.
Apache is such a great project.

Bye everybody.
Stay healthy you all.

Stefan Bettner.


PS: I appended my xml files and Java file, for comparison. I was using
the following Java-version on Windows 10:
java version "12.0.2" 2019-07-16
Java(TM) SE Runtime Environment (build 12.0.2+10)

splitEntity.xml

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE Root SYSTEM "splitEntity.dtd" [
<!ELEMENT Root ANY>
] >
<Root>&copyright;</Root>


splitEntity.dtd

<!ENTITY % A "<!ENTITY ">
<!ENTITY % B "copyright '(C)'>">
<![INCLUDE[%A;%B;]]>


XMLSplitEntity.java

import org.xml.sax.SAXException;
import org.xml.sax.SAXParseException;
import org.xml.sax.helpers.DefaultHandler;

import javax.xml.parsers.ParserConfigurationException;
import javax.xml.parsers.SAXParser;
import javax.xml.parsers.SAXParserFactory;
import java.io.File;
import java.io.IOException;

public class XMLSplitEntity
   {
   /*
    * custom DocumentHandler
    */
   public static class MyDocumentHandler extends DefaultHandler
     {
     @Override
     public void characters (char[] ch, int start, int length) throws
SAXException
       {
       System.out.println ("characters: " + new String (ch, start, length));
       }

     @Override
     public void warning (SAXParseException e) throws SAXException
       {
       System.out.println ("warning: " + e.getMessage ());
       }

     @Override
     public void error (SAXParseException e) throws SAXException
       {
       System.out.println ("error: " + e.getMessage ());
       }

     @Override
     public void fatalError (SAXParseException e) throws SAXException
       {
       System.out.println ("fatalError: " + e.getMessage ());
       }
     }

   /*
    * parse splitEntity.xml
    * (with external splitEntity.dtd)
    */
   public static void main (String[] args)
     {
     // create parser
     SAXParserFactory factory = SAXParserFactory.newInstance ();
     // factory.setValidating (true);
     SAXParser saxParser;
     try
       {
       saxParser = factory.newSAXParser ();
       }
     catch (ParserConfigurationException | SAXException e)
       {
       e.printStackTrace ();
       return;
       }

     // parse
     File file = new File ("C:\\splitEntity.xml");
     try
       {
       saxParser.parse (file, new MyDocumentHandler ());
       }
     catch (SAXException | IOException e)
       {
       e.printStackTrace ();
       }
     }
   }


---------------------------------------------------------------------
To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
For additional commands, e-mail: j-users-help@xerces.apache.org