You are viewing a plain text version of this content. The canonical link for it is here.
Posted to j-dev@xerces.apache.org by Andy Clark <an...@apache.org> on 2001/02/16 10:28:24 UTC

[XNI] DTD Information Set

[The other threads were getting really deep so I started a
fresh thread on this topic.]

I just checked in the following changes to the XMLDTDHandler:

1) I changed the parser to pass the systemId and encoding of the 
   document via the startDocument method instead of by using the 
   "[xml]" pseudo-entity to the start/endEntity calls.
2) I added a characters method so that the contents of IGNORE
   conditional sections in the DTD are reported. This requires
   people to keep track of whether they are currently in the
   DTD or in the document (IF they use the same handler object
   to implement both XMLDocumentHandler and XMLDTDHandler).

Thanks to Ryosuke Nanba for the suggestions.

Later I'll look at making the other changes that Ryosuke
suggested. However, I'd like to come to agreement about what
the DTD handler(s) should look like before I work on it.

Here is a first attempt at putting all of the necessary
information into the XMLDTDHandler interface:

public interface XMLDTDHandler {

    //
    // Constants
    //

    public static final short CONDITIONAL_INCLUDE = 0;
    public static final short CONDITIONAL_IGNORE = 1;

    public static final short SEPARATOR_CHOICE = 0;
    public static final short SEPARATOR_SEQUENCE = 1;

    public static final short OCCURS_ZERO_OR_ONE = 0;
    public static final short OCCURS_ZERO_OR_MORE = 1;
    public static final short OCCURS_ONE_OR_MORE = 2;

    //
    // XMLDTDHandler methods
    //

    public void startDTD() throws SAXException;

    public void startEntity(String name, String publicId, String
systemId,
                            String encoding) throws SAXException;
    public void textDecl(String version, String encoding) throws
SAXException;
    public void endEntity(String name) throws SAXException;

    public void startConditional(short type) throws SAXException;
    public void characters(XMLString text) throws SAXException;
    public void endConditional() throws SAXException;

    public void startElementDecl() throws SAXException;
    public void elementName(String name) throws SAXException;
    public void any() throws SAXException;
    public void empty() throws SAXException;
    public void startGroup() throws SAXException;
    //public void elementName(String name) throws SAXException;
    public void separator(int type) throws SAXException;
    public void endGroup() throws SAXException;
    public void occurs(int type) throws SAXException;
    public void endElementDecl() throws SAXException; 

    public void startAttlistDecl() throws SAXException;
    //public void elementName(String name) throws SAXException;
    public void attributeName(String name) throws SAXException;
    public void attributeType(String type) throws SAXException;
    public void startEnumeration() throws SAXException;
    public void enumerationValue(String value) throws SAXException;
    public void separator(int type) throws SAXException;
    public void endEnumeration() throws SAXException;
    public void implied() throws SAXException;
    public void required() throws SAXException;
    public void fixed() throws SAXException;
    public void defaultValue(String value) throws SAXException;
    public void ndata(String name) throws SAXException;
    public void endAttlistDecl() throws SAXException;

    public void startEntityDecl() throws SAXException;
    public void entityName(String name) throws SAXException;
    public void startLiteral(char quote) throws SAXException;
    //public void characters(XMLString text) throws SAXException;
    public void endLiteral() throws SAXException;
    public void publicId(char quote, String publicId) throws
SAXException;
    public void systemId(char quote, String systemId) throws
SAXException;
    public void ndata(String ndata) throws SAXException;
    public void endEntityDecl() throws SAXException;

    public void startNotationDecl() throws SAXException;
    public void notationName(String name) throws SAXException;
    //public void publicId(char quote, String publicId) throws
SAXException;
    //public void systemId(char quote, String systemId) throws
SAXException;
    public void endNotationDecl() throws SAXException;

    public void endDTD() throws SAXException;

} // interface XMLDTDHandler

There are some good and bad points to the interface being
designed this way. The good point is that people can detect
exactly what parts of decls are defined within entities.
However, the bad points include the following:

1) Lots of methods.
2) Duplicated method names require implementor to maintain
   state. However, this can be solved by making specific
   methods (e.g. "attlistElementName" vs. "elementName").

Thoughts?

-- 
Andy Clark * IBM, TRL - Japan * andyc@apache.org

Re: [XNI] DTD Information Set

Posted by Andy Clark <an...@apache.org>.
In the last posting on this discussion, I broke down information 
that is needed by validation, API propagation, and entity mgmt.
This is the core information needed so I think that the methods
that satisfy this requirement should become the XMLDTDHandler
API. And I propose that all other extra information that we 
emit from the DTD scanner (due to user requests or completeness)
would be added as an XNI "extension" package.

So, the question is: what do these interfaces look like? Well,
in this pass I will propose an API set. Comments are welcome.

  interface XMLDTDHandler

    SEPARATOR_CHOICE
    SEPARATOR_SEQUENCE

    OCCURS_ZERO_OR_MORE
    OCCURS_ZERO_OR_ONE
    OCCURS_ONE_OR_MORE

    *CONDITIONAL_IGNORE
    *CONDITIONAL_INCLUDE

    startDTD()

     *comment(text)
     *processingInstruction(target, data)

     startEntity(entityName, publicId, systemId, encoding)
      *textDecl(version, encoding)
     endEntity(entityName)

     startElementDecl()
      elementDeclName(elementName)
      contentModelAny()
      contentModelEmpty()
      contentModelStartGroup()
       contentModelPCDATA()
       contentModelElement(elementName)
       contentModelSeparator(separator)
       contentModelOccurs(occurs)
      contentModelEndGroup()
     endElementDecl()
     *elementDecl(elementName, contentModel)

     attributeDecl(elementName, attributeName, type, enumeration,
                   defaultType, defaultValue)

     internalEntityDecl(entityName, value)
     externalEntityDecl(entityName, publicId, systemId)
     unparsedEntityDecl(entityName, publicId, systemId,
                        notationName)

     notationDecl(notationName, publicId, systemId)

     *startConditional(type)
      *ignoredCharacters(text)
     *endConditional()

    endDTD()

NOTE: Methods marked with an asterisk (*) aren't absolutely
needed at this level of API. But their addition provides more
"complete" DTD information coverage.

The extended DTD API would add a bunch of methods to the ones
defined in the XMLDTDHandler interface. The added methods would
mainly be for separating out the other declaration information
so that the application could better detect the use of parameter
entities in the DTD.

-- 
Andy Clark * IBM, TRL - Japan * andyc@apache.org

Re: [XNI] DTD Information Set

Posted by Ted Leung <tw...@sauria.com>.
It's late here, so I don't have time to do the actual work to check that I'm
not hair-brained but here's my thought:

What about a hierarchy of interfaces:  Make the entity callbacks the
base interface, then derive the validation interface from that, and then
derive the API propagation from it.  That might make things more minimal.

I'll try to look at this a bit more in the AM.

Ted

----- Original Message ----- 
From: "Andy Clark" <an...@apache.org>
To: <xe...@xml.apache.org>
Sent: Tuesday, February 20, 2001 7:02 PM
Subject: Re: [XNI] DTD Information Set


> Ted Leung wrote:
> > How about presenting a grouping of the methods so that people can
> > see which ones are needed to build validators, etc.
> 
> Instead of grouping the methods, I'll list the information
> that is required from a DTD scanner for validation, API
> propagation, etc. And we can take this information (plus
> what users want) and create the grouping of methods.
> 
> Validation
> 
>   * element declaration
>     * name
>     * content model
>       e.g. EMPTY, ANY, (#PCDATA|...)*, (a|b+|(c,d)*)
>   * attribute list declaration
>     * element name
>     * attribute name
>     * type
>       e.g. CDATA, ..., NOTATION, (a|b)
>     * default type
>       e.g. #IMPLIED, #REQUIRED, #FIXED
>     * default value
>   * notation declaration
>     * name
>   * entity declaration
>     * name 
> 
> API Propagation
> 
>   * DTD boundaries
>     * start DTD
>     * external subset boundary
>     * end DTD
>   * element declaration
>     * name
>     * content model (as string)
>   * attribute declaration
>     * element name
>     * attribute name
>     * type
>     * value default (what I call default type)
>     * value (what I call default value)
>   * internal entity declaration
>     * name
>     * value
>   * external entity declaration
>     * name
>     * public id
>     * system id
>   * notation declaration
>     * name
>     * public id
>     * system id
>   * unparsed entity declaration
>     * name
>     * public id
>     * system id
>     * notation name
> 
> Entity Management
> 
>   * internal entity declaration
>     * name
>     * value
>   * external entity declaration
>     * name
>     * public id
>     * system id
>     * base system id
> 
> -- 
> Andy Clark * IBM, TRL - Japan * andyc@apache.org
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org
> For additional commands, e-mail: xerces-j-dev-help@xml.apache.org
> 


Re: [XNI] DTD Information Set

Posted by Andy Clark <an...@apache.org>.
Ted Leung wrote:
> How about presenting a grouping of the methods so that people can
> see which ones are needed to build validators, etc.

Instead of grouping the methods, I'll list the information
that is required from a DTD scanner for validation, API
propagation, etc. And we can take this information (plus
what users want) and create the grouping of methods.

Validation

  * element declaration
    * name
    * content model
      e.g. EMPTY, ANY, (#PCDATA|...)*, (a|b+|(c,d)*)
  * attribute list declaration
    * element name
    * attribute name
    * type
      e.g. CDATA, ..., NOTATION, (a|b)
    * default type
      e.g. #IMPLIED, #REQUIRED, #FIXED
    * default value
  * notation declaration
    * name
  * entity declaration
    * name 

API Propagation

  * DTD boundaries
    * start DTD
    * external subset boundary
    * end DTD
  * element declaration
    * name
    * content model (as string)
  * attribute declaration
    * element name
    * attribute name
    * type
    * value default (what I call default type)
    * value (what I call default value)
  * internal entity declaration
    * name
    * value
  * external entity declaration
    * name
    * public id
    * system id
  * notation declaration
    * name
    * public id
    * system id
  * unparsed entity declaration
    * name
    * public id
    * system id
    * notation name

Entity Management

  * internal entity declaration
    * name
    * value
  * external entity declaration
    * name
    * public id
    * system id
    * base system id

-- 
Andy Clark * IBM, TRL - Japan * andyc@apache.org

Re: [XNI] DTD Information Set

Posted by Ted Leung <tw...@sauria.com>.
----- Original Message ----- 
From: "Andy Clark" <an...@apache.org>
To: <xe...@xml.apache.org>
Sent: Sunday, February 18, 2001 3:42 PM
Subject: Re: [XNI] DTD Information Set


> Ted Leung wrote:
> > Ack.
> 
> I'm assuming that the "Ack" is in regards to the proposed 
> interface methods. ;)

Yes.

> > I'm having trouble with all the callbacks.  Can this be designed
> > so that "the common case" is easy, and that it makes the esoteric
> 
> I'm on the fence on this issue because the parser needs more
> information than just the "common sense" case would suggest
> in order to build content model validators and such. So the
> questions to be answered are: 1) should the DTD information
> be contained in a single interface or multiple interfaces;
> and 2) what are the methods in the interface(s)?

How about presenting a grouping of the methods so that people can
see which ones are needed to build validators, etc.

> -- 
> Andy Clark * IBM, TRL - Japan * andyc@apache.org
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org
> For additional commands, e-mail: xerces-j-dev-help@xml.apache.org
> 


Re: [XNI] DTD Information Set

Posted by Andy Clark <an...@apache.org>.
Ted Leung wrote:
> Ack.

I'm assuming that the "Ack" is in regards to the proposed 
interface methods. ;)

> I'm having trouble with all the callbacks.  Can this be designed
> so that "the common case" is easy, and that it makes the esoteric

I'm on the fence on this issue because the parser needs more
information than just the "common sense" case would suggest
in order to build content model validators and such. So the
questions to be answered are: 1) should the DTD information
be contained in a single interface or multiple interfaces;
and 2) what are the methods in the interface(s)?

-- 
Andy Clark * IBM, TRL - Japan * andyc@apache.org

Re: [XNI] DTD Information Set

Posted by Ted Leung <tw...@sauria.com>.
Ack.

I'm having trouble with all the callbacks.  Can this be designed
so that "the common case" is easy, and that it makes the esoteric
cases possible?  I haven't seen that much need for the conditional
section handling.  I can see the argument for it being there for
completeness,  but not at the expense of cluttering up the inteface
like this.

Ted

----- Original Message ----- 
From: "Andy Clark" <an...@apache.org>
To: <xe...@xml.apache.org>
Sent: Friday, February 16, 2001 1:28 AM
Subject: [XNI] DTD Information Set


> [The other threads were getting really deep so I started a
> fresh thread on this topic.]
> 
> I just checked in the following changes to the XMLDTDHandler:
> 
> 1) I changed the parser to pass the systemId and encoding of the 
>    document via the startDocument method instead of by using the 
>    "[xml]" pseudo-entity to the start/endEntity calls.
> 2) I added a characters method so that the contents of IGNORE
>    conditional sections in the DTD are reported. This requires
>    people to keep track of whether they are currently in the
>    DTD or in the document (IF they use the same handler object
>    to implement both XMLDocumentHandler and XMLDTDHandler).
> 
> Thanks to Ryosuke Nanba for the suggestions.
> 
> Later I'll look at making the other changes that Ryosuke
> suggested. However, I'd like to come to agreement about what
> the DTD handler(s) should look like before I work on it.
> 
> Here is a first attempt at putting all of the necessary
> information into the XMLDTDHandler interface:
> 
> public interface XMLDTDHandler {
> 
>     //
>     // Constants
>     //
> 
>     public static final short CONDITIONAL_INCLUDE = 0;
>     public static final short CONDITIONAL_IGNORE = 1;
> 
>     public static final short SEPARATOR_CHOICE = 0;
>     public static final short SEPARATOR_SEQUENCE = 1;
> 
>     public static final short OCCURS_ZERO_OR_ONE = 0;
>     public static final short OCCURS_ZERO_OR_MORE = 1;
>     public static final short OCCURS_ONE_OR_MORE = 2;
> 
>     //
>     // XMLDTDHandler methods
>     //
> 
>     public void startDTD() throws SAXException;
> 
>     public void startEntity(String name, String publicId, String
> systemId,
>                             String encoding) throws SAXException;
>     public void textDecl(String version, String encoding) throws
> SAXException;
>     public void endEntity(String name) throws SAXException;
> 
>     public void startConditional(short type) throws SAXException;
>     public void characters(XMLString text) throws SAXException;
>     public void endConditional() throws SAXException;
> 
>     public void startElementDecl() throws SAXException;
>     public void elementName(String name) throws SAXException;
>     public void any() throws SAXException;
>     public void empty() throws SAXException;
>     public void startGroup() throws SAXException;
>     //public void elementName(String name) throws SAXException;
>     public void separator(int type) throws SAXException;
>     public void endGroup() throws SAXException;
>     public void occurs(int type) throws SAXException;
>     public void endElementDecl() throws SAXException; 
> 
>     public void startAttlistDecl() throws SAXException;
>     //public void elementName(String name) throws SAXException;
>     public void attributeName(String name) throws SAXException;
>     public void attributeType(String type) throws SAXException;
>     public void startEnumeration() throws SAXException;
>     public void enumerationValue(String value) throws SAXException;
>     public void separator(int type) throws SAXException;
>     public void endEnumeration() throws SAXException;
>     public void implied() throws SAXException;
>     public void required() throws SAXException;
>     public void fixed() throws SAXException;
>     public void defaultValue(String value) throws SAXException;
>     public void ndata(String name) throws SAXException;
>     public void endAttlistDecl() throws SAXException;
> 
>     public void startEntityDecl() throws SAXException;
>     public void entityName(String name) throws SAXException;
>     public void startLiteral(char quote) throws SAXException;
>     //public void characters(XMLString text) throws SAXException;
>     public void endLiteral() throws SAXException;
>     public void publicId(char quote, String publicId) throws
> SAXException;
>     public void systemId(char quote, String systemId) throws
> SAXException;
>     public void ndata(String ndata) throws SAXException;
>     public void endEntityDecl() throws SAXException;
> 
>     public void startNotationDecl() throws SAXException;
>     public void notationName(String name) throws SAXException;
>     //public void publicId(char quote, String publicId) throws
> SAXException;
>     //public void systemId(char quote, String systemId) throws
> SAXException;
>     public void endNotationDecl() throws SAXException;
> 
>     public void endDTD() throws SAXException;
> 
> } // interface XMLDTDHandler
> 
> There are some good and bad points to the interface being
> designed this way. The good point is that people can detect
> exactly what parts of decls are defined within entities.
> However, the bad points include the following:
> 
> 1) Lots of methods.
> 2) Duplicated method names require implementor to maintain
>    state. However, this can be solved by making specific
>    methods (e.g. "attlistElementName" vs. "elementName").
> 
> Thoughts?
> 
> -- 
> Andy Clark * IBM, TRL - Japan * andyc@apache.org
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org
> For additional commands, e-mail: xerces-j-dev-help@xml.apache.org
>