You are viewing a plain text version of this content. The canonical link for it is here.

Posted to j-dev@xerces.apache.org by Ryosuke Nanba <rn...@cyber.email.ne.jp> on 2001/01/08 10:18:16 UTC

problem about parameter entity handling of XNI

Hi all.

I'm playing with XNI of Xerces-2.0.0.alpha.
I have two problems about handling parameter entity (PE).

1. I think DTDParser calls XMLDTDHandler.startEntity() when the
  parser starts expanding PE, and calls
  XMLDTDHandler.endEntity() when the parser has expanded
  PE. But following input makes invalid method call sequence.

  input:
    <!ENTITY % X "x">
    <!ELEMENT a (%X;)>

  method call sequence:
   *XMLDTDHandler.startEntity("%X")
    XMLDTDContentModelHandler.startContentModel("a", TYPE_CHILDREN)
    XMLDTDContentModelHandler.startChildrenGroup()
    XMLDTDContentModelHandler.childrenElement("x")
    XMLDTDHandler.endEntity()
    XMLDTDContentModelHandler.endChildrenGroup()
    XMLDTDContentModelHandler.endContentModel()

  valid sequence may be:
    XMLDTDContentModelHandler.startContentModel("a", TYPE_CHILDREN)
    XMLDTDContentModelHandler.startChildrenGroup()
   *XMLDTDHandler.startEntity("%X")
    XMLDTDContentModelHandler.childrenElement("x")
    XMLDTDHandler.endEntity()
    XMLDTDContentModelHandler.endChildrenGroup()
    XMLDTDContentModelHandler.endContentModel()

2. Nested PEs are expanded. Following input makes only
  one pair of start/endEntity() method call.
   
  input:
    <!ENTITY % Y "y">
    <!ENTITY % XY  "x|%Y;">
    <!ELEMENT a (z|%XY;)>

  method call sequence:
    XMLDTDContentModelHandler.startContentModel("a", TYPE_CHILDREN)
    XMLDTDContentModelHandler.startChildrenGroup()
    XMLDTDContentModelHandler.childrenElement("z")
    XMLDTDContentModelHandler.childrenSeparator(SEPARATOR_CHOICE)
   *XMLDTDHandler.startEntity(%XY)
    XMLDTDContentModelHandler.childrenElement("x")
    XMLDTDContentModelHandler.childrenSeparator(SEPARATOR_CHOICE)
    XMLDTDContentModelHandler.childrenElement("y")
   *XMLDTDHandler.endEntity()
    XMLDTDContentModelHandler.endChildrenGroup()
    XMLDTDContentModelHandler.endContentModel()
   
  what I want:
    ...
   *XMLDTDHandler.startEntity(%XY)
    XMLDTDContentModelHandler.childrenElement("x")
    XMLDTDContentModelHandler.childrenSeparator(SEPARATOR_CHOICE)
   *XMLDTDHandler.startEntity(%Y)
    XMLDTDContentModelHandler.childrenElement("y")
   *XMLDTDHandler.endEntity()
   *XMLDTDHandler.endEntity()
    ... 

  My aim is handle structured DTD in application such as
  DTD2RELAX ( http://www.horobi.com/RELAX/Archive/DTD2RELAX.html ).
---
	Ryosuke Nanba

Re: problem about parameter entity handling of XNI

Posted by Libor Kramolis <li...@netbeans.com>.

Sorry for delay ... I was on vacation.

Comments are below.


Andy Clark wrote:

> Libor Kramolis wrote:
> 
>> Level 1: *data* oriented
>> Level 2: *structure* oriented
>> Level 3: *indentation* oriented
> 
> 
> Okay, now I feel like we're getting somewhere! :)
> 

I hope.   :-))

> 
>> [example] <!ENTITY e1 '&#38;amp;'>
>> [1] - level 1 callback
>> [2] - level 2 callback
>> [3] - level 3 callback
>> 
>> [2]  startEntityDecl()
>> [3]    ignorableWhitespaces (" ")
>> [2]    entityName ("e1");
>> [3]    ignorableWhitespaces (" ")
>> [2]    startEntityValue ('\'')
>> [2]      startEntity ("#38")
>> [2]        characters ("&")
>> [2]      endEntity ("#38")
>> [2]      characters ("amp;")
>> [2]    endEntityValue ('\'')
>> [2]  endEntityDecl()
>> [1]  internalEntityDecl ("e1", "&amp;")
> 
> 
> Following your lead, how about the following interfaces. Don't
> mind the duplicated methods -- they're included for illustrative
> purposes so you know what kind of information is received at what
> time.
> 
>   interface XMLDTDHandler (XNI Core)
> 
>     startDTD()
> 
>      startEntity(String,String,String,String)
>       textDecl(String,String)
>      endEntity(String)
> 
>      comment(XMLString)
>      processingInstruction(String,XMLString)
> 
>      elementDecl(String,String)
>      attributeDecl(String,String,String,String[],String,XMLString)
> 
>      internalEntityDecl(String,XMLString)
>      externalEntityDecl(String,String,String)
>      unparsedEntityDecl(String,String,String,String)
> 
>      notationDecl(String,String,String)
> 
>     endDTD()
> 
> Note that this simplifies the basic XMLDTDHandler to return only 
> the basic information needed to communicate the "data" declared in 
> the DTD.


Agree. This could be level *1* information.


> 
>   interface XMLDTD???Handler (XNI extension)
> 
>     short CONDITIONAL_INCLUDE
>     short CONDITIONAL_IGNORE
> 
>     short SEPARATOR_CHOICE
>     short SEPARATOR_SEQUENCE
> 
>     short OCCURS_ZERO_OR_ONE 
>     short OCCURS_ZERO_OR_MORE 
>     short OCCURS_ONE_OR_MORE 
> 
>     startConditional(short)
>      characters(XMLString)
>     endConditional()
> 
>     startEntity(String,String,String,String)
>      textDecl(String,String)
>     endEntity(String)
> 
>     startEntityDecl()
>      entityName(String) // "%e3" would be a parameter entity
>      startLiteral(char)
>       characters(XMLString)
>      endLiteral()
>      publicId(char,String) // ??? Not sure about these two
>      systemId(char,String)
>      ndata(String)
>     endEntityDecl()
> 
>     startElementDecl()
>      elementName(String)
>      any()
>      empty()
>      startGroup()
>       pcdata()
>       element(String)
>       separator(short)
>       occurs(short)
>      endGroup()
>     endElementDecl()
> 
>     startAttlistDecl()
>      elementName(String)
>      attributeName(String)
>      // NEEDS TO BE FILLED IN! -- Note, this part is going to
>      // large due to the various possibilities
>     endAttlistDecl()
> 
>     startNotationDecl()
>      notationName(String)
>     endNotationDecl()
> 
> This level of DTD information incorporates many of the non-data 
> aspects of XMLDTDHandler and all of the XMLDTDContentModelHandler 
> interfaces, in a form changed as to allow the correct callbacks
> of start/endEntity.


Yes, this could be level *2* extension with entity reference boudaries 
(start/end-Entity).


> 
>   interface XMLDTD???Handler (XNI extension)
> 
>     whitespace(char[],int,int)
>     ignorableWhitespace(char[],int,int) // ???
> 
> This interface reports the "indentation" information from the DTD. 


Yes, and this couls be level *3* -- indentation level.

I think, this view of xml parsing make sense.


> NOTE: I don't believe that this is a reasonable requirement from 
> the DTD scanner. But I've included it anyway for the purposes of 
> discussion.


I think that (DTD) scanner could be more component. This was discused in 
'[xerces2] XNI comments' thread and I agree with Petr's proposal.


> 
> Let the discussion begin!


I am very glad, that you want to specify XNI extension, which could be 
useful for xml tools usage. This extension sould support level 2 and 3 (as 
you design above). Thanks.

Well, I would like to continue with discusion, so when you want (as I see) 
to specify those extension interfaces, let's start specifing it.


Thanks for any answer.

Regards,
Libor

Re: problem about parameter entity handling of XNI

Posted by Libor Kramolis <li...@netbeans.com>.

Sorry for delay ... I was on vacation.

Comments are below.


Andy Clark wrote:
 > Libor Kramolis wrote:
 >
 >> Level 1: *data* oriented
 >> Level 2: *structure* oriented
 >> Level 3: *indentation* oriented
 >
 >
 > Okay, now I feel like we're getting somewhere! :)
 >

I hope.   :-))

 >
 >> [example] <!ENTITY e1 '&#38;amp;'>
 >> [1] - level 1 callback
 >> [2] - level 2 callback
 >> [3] - level 3 callback
 >>
 >> [2]  startEntityDecl()
 >> [3]    ignorableWhitespaces (" ")
 >> [2]    entityName ("e1");
 >> [3]    ignorableWhitespaces (" ")
 >> [2]    startEntityValue ('\'')
 >> [2]      startEntity ("#38")
 >> [2]        characters ("&")
 >> [2]      endEntity ("#38")
 >> [2]      characters ("amp;")
 >> [2]    endEntityValue ('\'')
 >> [2]  endEntityDecl()
 >> [1]  internalEntityDecl ("e1", "&amp;")
 >
 >
 > Following your lead, how about the following interfaces. Don't
 > mind the duplicated methods -- they're included for illustrative
 > purposes so you know what kind of information is received at what
 > time.
 >
 >   interface XMLDTDHandler (XNI Core)
 >
 >     startDTD()
 >
 >      startEntity(String,String,String,String)
 >       textDecl(String,String)
 >      endEntity(String)
 >
 >      comment(XMLString)
 >      processingInstruction(String,XMLString)
 >
 >      elementDecl(String,String)
 >      attributeDecl(String,String,String,String[],String,XMLString)
 >
 >      internalEntityDecl(String,XMLString)
 >      externalEntityDecl(String,String,String)
 >      unparsedEntityDecl(String,String,String,String)
 >
 >      notationDecl(String,String,String)
 >
 >     endDTD()
 >
 > Note that this simplifies the basic XMLDTDHandler to return only
 > the basic information needed to communicate the "data" declared in
 > the DTD.


Agree. This could be level *1* information.


 >
 >   interface XMLDTD???Handler (XNI extension)
 >
 >     short CONDITIONAL_INCLUDE
 >     short CONDITIONAL_IGNORE
 >
 >     short SEPARATOR_CHOICE
 >     short SEPARATOR_SEQUENCE
 >
 >     short OCCURS_ZERO_OR_ONE
 >     short OCCURS_ZERO_OR_MORE
 >     short OCCURS_ONE_OR_MORE
 >
 >     startConditional(short)
 >      characters(XMLString)
 >     endConditional()
 >
 >     startEntity(String,String,String,String)
 >      textDecl(String,String)
 >     endEntity(String)
 >
 >     startEntityDecl()
 >      entityName(String) // "%e3" would be a parameter entity
 >      startLiteral(char)
 >       characters(XMLString)
 >      endLiteral()
 >      publicId(char,String) // ??? Not sure about these two
 >      systemId(char,String)
 >      ndata(String)
 >     endEntityDecl()
 >
 >     startElementDecl()
 >      elementName(String)
 >      any()
 >      empty()
 >      startGroup()
 >       pcdata()
 >       element(String)
 >       separator(short)
 >       occurs(short)
 >      endGroup()
 >     endElementDecl()
 >
 >     startAttlistDecl()
 >      elementName(String)
 >      attributeName(String)
 >      // NEEDS TO BE FILLED IN! -- Note, this part is going to
 >      // large due to the various possibilities
 >     endAttlistDecl()
 >
 >     startNotationDecl()
 >      notationName(String)
 >     endNotationDecl()
 >
 > This level of DTD information incorporates many of the non-data
 > aspects of XMLDTDHandler and all of the XMLDTDContentModelHandler
 > interfaces, in a form changed as to allow the correct callbacks
 > of start/endEntity.


Yes, this could be level *2* extension with entity reference boudaries
(start/end-Entity).


 >
 >   interface XMLDTD???Handler (XNI extension)
 >
 >     whitespace(char[],int,int)
 >     ignorableWhitespace(char[],int,int) // ???
 >
 > This interface reports the "indentation" information from the DTD.


Yes, and this couls be level *3* -- indentation level.

I think, this view of xml parsing make sense.


 > NOTE: I don't believe that this is a reasonable requirement from
 > the DTD scanner. But I've included it anyway for the purposes of
 > discussion.


I think that (DTD) scanner could be more component. This was discused in
'[xerces2] XNI comments' thread and I agree with Petr's proposal.


 >
 > Let the discussion begin!


I am very glad, that you want to specify XNI extension, which could be
useful for xml tools usage. This extension sould support level 2 and 3 (as
you design above). Thanks.

Well, I would like to continue with discusion, so when you want (as I see)
to specify those extension interfaces, let's start specifing it.


Thanks for any answer.

Regards,
Libor

Re: problem about parameter entity handling of XNI

Posted by Andy Clark <an...@apache.org>.

Libor Kramolis wrote:
> Level 1: *data* oriented
> Level 2: *structure* oriented
> Level 3: *indentation* oriented

Okay, now I feel like we're getting somewhere! :)

> [example] <!ENTITY e1 '&#38;amp;'>
> [1] - level 1 callback
> [2] - level 2 callback
> [3] - level 3 callback
> 
> [2]  startEntityDecl()
> [3]    ignorableWhitespaces (" ")
> [2]    entityName ("e1");
> [3]    ignorableWhitespaces (" ")
> [2]    startEntityValue ('\'')
> [2]      startEntity ("#38")
> [2]        characters ("&")
> [2]      endEntity ("#38")
> [2]      characters ("amp;")
> [2]    endEntityValue ('\'')
> [2]  endEntityDecl()
> [1]  internalEntityDecl ("e1", "&amp;")

Following your lead, how about the following interfaces. Don't
mind the duplicated methods -- they're included for illustrative
purposes so you know what kind of information is received at what
time.

  interface XMLDTDHandler (XNI Core)

    startDTD()

     startEntity(String,String,String,String)
      textDecl(String,String)
     endEntity(String)

     comment(XMLString)
     processingInstruction(String,XMLString)

     elementDecl(String,String)
     attributeDecl(String,String,String,String[],String,XMLString)

     internalEntityDecl(String,XMLString)
     externalEntityDecl(String,String,String)
     unparsedEntityDecl(String,String,String,String)

     notationDecl(String,String,String)

    endDTD()

Note that this simplifies the basic XMLDTDHandler to return only 
the basic information needed to communicate the "data" declared in 
the DTD.

  interface XMLDTD???Handler (XNI extension)

    short CONDITIONAL_INCLUDE
    short CONDITIONAL_IGNORE

    short SEPARATOR_CHOICE
    short SEPARATOR_SEQUENCE

    short OCCURS_ZERO_OR_ONE 
    short OCCURS_ZERO_OR_MORE 
    short OCCURS_ONE_OR_MORE 

    startConditional(short)
     characters(XMLString)
    endConditional()

    startEntity(String,String,String,String)
     textDecl(String,String)
    endEntity(String)

    startEntityDecl()
     entityName(String) // "%e3" would be a parameter entity
     startLiteral(char)
      characters(XMLString)
     endLiteral()
     publicId(char,String) // ??? Not sure about these two
     systemId(char,String)
     ndata(String)
    endEntityDecl()

    startElementDecl()
     elementName(String)
     any()
     empty()
     startGroup()
      pcdata()
      element(String)
      separator(short)
      occurs(short)
     endGroup()
    endElementDecl()

    startAttlistDecl()
     elementName(String)
     attributeName(String)
     // NEEDS TO BE FILLED IN! -- Note, this part is going to
     // large due to the various possibilities
    endAttlistDecl()

    startNotationDecl()
     notationName(String)
    endNotationDecl()

This level of DTD information incorporates many of the non-data 
aspects of XMLDTDHandler and all of the XMLDTDContentModelHandler 
interfaces, in a form changed as to allow the correct callbacks
of start/endEntity.

  interface XMLDTD???Handler (XNI extension)

    whitespace(char[],int,int)
    ignorableWhitespace(char[],int,int) // ???

This interface reports the "indentation" information from the DTD. 
NOTE: I don't believe that this is a reasonable requirement from 
the DTD scanner. But I've included it anyway for the purposes of 
discussion.

Let the discussion begin!

-- 
Andy Clark * IBM, TRL - Japan * andyc@apache.org

Re: problem about parameter entity handling of XNI

Posted by Libor Kramolis <li...@netbeans.com>.

Andy Clark wrote:
 > Libor Kramolis wrote:
 >
 > > I think when parser user want to know all references in document all
 > > occurences should informed about.
 > >
 > > [...]
 > >
 > > So, I would like to use parser like lexical analyzer - all used tokens
 > > (references include character) are important to me.
 >
 >
 > There has to be a line drawn somewhere. What information do you
 > expect to receive from the declarations in lines 5-6?

I am glad you wrote this example again. BTW, you create very nice example. :-)

I would like to divide parsing to three levels.

Level 1: *data* oriented
Level 2: *structure* oriented
Level 3: *indentation* oriented

when each level extends previous one.

Level *1* means that all (parameter) entity or character references are
resolved, ignorable whitespaces are ignored.

Level *2* means that all structure elements (tokens of xml document) are
published to appropriate handler, i.e. start- | end- Entity are used around
resolved reference, attr.specified = true when attribute was specified in
grammar (e.g. DTD). [ Level 1 + structure callbacks. ]

Level *3* means that all ignorable whitespaces (indentation whitespaces) are
published to appropriate handler. [ Level 2 + whitespace callbacks. ]

For each level there could be another handler interface.

What I need [sorry] to tool is level 2 (there is only indentation lost).
Let's say level 3 is bonus and it represents better round trip but no data
and structure is lost.

 >
 >  [1] <!ENTITY e1 '&#38;amp;'>
 >  [2] <!ENTITY e2 'M &e1; M'>
 >  [3] <!ENTITY % e3 ''>
 >  [4] <!ENTITY % e4 'EMPTY'>
 >  [5] <!ELEMENT%e3;elem%e3;%e4;%e3;>
 >  [6] <!ATTLIST%e3;elem%e3;attr%e3;CDATA%e3;'&e2;'%e3;>
 >
 > And what about extraneous whitespace? Do you expect to receive
 > callbacks to report all use of whitespace?

Yes. This is very difficult in current state of XNI.

I know, this is not very comfortable but I would like to cut declaration
handling (in level 2) into more then one method (like internalEntityDecl).

[example] <!ENTITY e1 '&#38;amp;'>
[1] - level 1 callback
[2] - level 2 callback
[3] - level 3 callback

[2]  startEntityDecl()
[3]    ignorableWhitespaces (" ")
[2]    entityName ("e1");
[3]    ignorableWhitespaces (" ")
[2]    startEntityValue ('\'')
[2]      startEntity ("#38")
[2]        characters ("&")
[2]      endEntity ("#38")
[2]      characters ("amp;")
[2]    endEntityValue ('\'')
[2]  endEntityDecl()
[1]  internalEntityDecl ("e1", "&amp;")

[example] <!ATTLIST%e3;elem%e3;attr%e3;CDATA%e3;'&e2;'%e3;>

[2]  startAttrlistDecl()
[2]    startEntity ("%e3")
[2]      characters ("")
[2]    endEntity ("%e3")

[2]    elementName ("elem")
[2]    startEntity ("%e3")
[2]      characters ("")
[2]    endEntity ("%e3")

[2]    attributeName ("attr")
[2]    startEntity ("%e3")
[2]      characters ("")
[2]    endEntity ("%e3")

[2]    attributeType ("CDATA")
[2]    startEntity ("%e3")
[2]      characters ("")
[2]    endEntity ("%e3")

[2]    startDefaultValue ('\'')
[2]      startEntity ("e2")
[2]        characters ("M & M")
[2]      endEntity ("e2")
[2]    endDefaultValue ('\'')
[2]    startEntity ("%e3")
[2]      characters ("")
[2]    endEntity ("%e3")

[2]  endAttrlistDecl()

[1]  startAttlist ("elem")
[1]    attributeDecl ("elem", "attr", null, "CDATA", null, "M & M")
[1]  endAttlist ("elem")

Uff. It is monstrous but what I need.  :-((  Eh, I don't know what is the
best way. "Help me Forrest, help me."  :-)

 >
 > Again, what about extension interfaces to XNI that can be
 > added later? I'd rather have a simple set of interfaces in
 > XNI Core and add more specific ones later. Thoughts?

I agree with extension interfaces, but why it should not be there at this
moment. Why to wait for it? Let's try to specify those interfaces now.

Thanks very much for better solution of level[2|3].  :-)

Libor

Re: problem about parameter entity handling of XNI

Posted by Andy Clark <an...@apache.org>.

Libor Kramolis wrote:
> I think when parser user want to know all references in document all
> occurences should informed about.
> 
> [...]
> 
> So, I would like to use parser like lexical analyzer - all used tokens
> (references include character) are important to me.

There has to be a line drawn somewhere. What information do you
expect to receive from the declarations in lines 5-6?

 [1] <!ENTITY e1 '&#38;amp;'>
 [2] <!ENTITY e2 'M &e1; M'>
 [3] <!ENTITY % e3 ''>
 [4] <!ENTITY % e4 'EMPTY'>
 [5] <!ELEMENT%e3;elem%e3;%e4;%e3;>
 [6] <!ATTLIST%e3;elem%e3;attr%e3;CDATA%e3;'&e2'%e3;>

And what about extraneous whitespace? Do you expect to receive
callbacks to report all use of whitespace?

Again, what about extension interfaces to XNI that can be
added later? I'd rather have a simple set of interfaces in
XNI Core and add more specific ones later. Thoughts?

-- 
Andy Clark * IBM, TRL - Japan * andyc@apache.org

Re: problem about parameter entity handling of XNI

Posted by Libor Kramolis <li...@netbeans.com>.

Andy Clark wrote:

> Ryosuke Nanba wrote:
> 
>> interface XMLContentModelHandler {
>>   void startContentModel(String elementName);
>>   void endContentModel();
>>   void pcData();
>>   void empty();
>>   void any();
>>   void element(String elementName);
>>   void occurence(short occurence);
>>   void separator(short separator);
>>   void startGroup();
>>   void endGroup();
>> }
> 
> 
> What I liked about the original handler was that it was very
> easy to implement the separation between the various content
> models. And there's no redundant information passed to the 
> handler, either.

I think this separation is not so important and is same behaviour is easy to 
implement with new methods too.

> 
> However, this *is* the kind of thing you want if you are
> trying to find out exactly what the parameter entity
> contains. So, ultimately, the question to answer is how
> much information do you want the parser to communicate
> to the handler?
> 
> How about this?
> 
>   <!-- in external subset -->
>   <!ENTITY % implied-space ''>
>   <!ELEMENT%implied-space;a EMPTY>
> 
> Or this?
> 
>   <!-- in external subset -->
>   <!ENTITY % required-string 'CDATA #REQUIRED'>
>   <!ATTLIST a attr %required-string;>
> 
> Do we want to make it possible to obtain this kind of
> information from the parser? We should carefully decide 
> exactly how far we are going to go with this and what 
> are the tradeoffs involved.

I think when parser user want to know all references in document all 
occurences should informed about.

I know our requirements are to big but why Xerces2 should not be the best 
parser for xml tools too? I don't want to put down performance, but I want 
to extend parser usage. :-)

So, I would like to use parser like lexical analyzer - all used tokens 
(references include character) are important to me.

Thanks very much.

Libor

Re: problem about parameter entity handling of XNI

Posted by Andy Clark <an...@apache.org>.

Ryosuke Nanba wrote:
> interface XMLContentModelHandler {
>   void startContentModel(String elementName);
>   void endContentModel();
>   void pcData();
>   void empty();
>   void any();
>   void element(String elementName);
>   void occurence(short occurence);
>   void separator(short separator);
>   void startGroup();
>   void endGroup();
> }

What I liked about the original handler was that it was very
easy to implement the separation between the various content
models. And there's no redundant information passed to the 
handler, either.

However, this *is* the kind of thing you want if you are
trying to find out exactly what the parameter entity
contains. So, ultimately, the question to answer is how
much information do you want the parser to communicate
to the handler?

How about this?

  <!-- in external subset -->
  <!ENTITY % implied-space ''>
  <!ELEMENT%implied-space;a EMPTY>

Or this?

  <!-- in external subset -->
  <!ENTITY % required-string 'CDATA #REQUIRED'>
  <!ATTLIST a attr %required-string;>

Do we want to make it possible to obtain this kind of
information from the parser? We should carefully decide 
exactly how far we are going to go with this and what 
are the tradeoffs involved.

-- 
Andy Clark * IBM, TRL - Japan * andyc@apache.org

Re: problem about parameter entity handling of XNI

Posted by Libor Kramolis <li...@netbeans.com>.

Ryosuke Nanba wrote:

> Hi, all.
> 
> Andy Clark wrote:
> 
>> Define the interface a little better so that we can discuss
>> the pros and cons between the two strategies.
> 
> 
> OK. 
> 
> API:
> 
> interface XMLContentModelHandler {

I think you thought XMLDTDContentModelHandler, isn't it?  :-)

>   void startContentModel(String elementName);
>   void endContentModel();
>   void pcData();
>   void empty();
>   void any();
>   void element(String elementName);
>   void occurence(short occurence);
>   void separator(short separator);
>   void startGroup();
>   void endGroup();
> }
> 

I agree. This is better than original specified handler. I would like to
vote for this change.

> Example:
> 
>  input:
> 
>    <!ENTITY % a.content "#PCDATA|b">
>    <!ELEMENT a (%a.content;)*>
> 
>  call sequence:
> 
>    startContentModel("a")
>      startGroup()
>        startEntity("%a.content") // XMLDTDHandler
>          pcData()
>          separator(SEPARATOR_CHOICE)
>          element("b")
>        endEntity("%a.content")
>      endGroup()
>      occurence(OCCURS_ZERO_OR_MORE)
>    endContentModel()
> 
> Pros & Cons:
> 
>   pros: parser need not hesitate to fire any(?) XMLContentModelHandler
>         callbacks.
>         easy to understand (?)
> 
>   cons: redundant callbacks.
>         it's hard to get type information (MIXED, CHILDREN, etc.)
>         # do you need type information?

Libor

Re: problem about parameter entity handling of XNI

Posted by Ryosuke Nanba <Ry...@justsystem.co.jp>.

Hi, all.

Andy Clark wrote:
> Define the interface a little better so that we can discuss
> the pros and cons between the two strategies.

OK. 

API:

interface XMLContentModelHandler {
  void startContentModel(String elementName);
  void endContentModel();
  void pcData();
  void empty();
  void any();
  void element(String elementName);
  void occurence(short occurence);
  void separator(short separator);
  void startGroup();
  void endGroup();
}

Example:

 input:

   <!ENTITY % a.content "#PCDATA|b">
   <!ELEMENT a (%a.content;)*>

 call sequence:

   startContentModel("a")
     startGroup()
       startEntity("%a.content") // XMLDTDHandler
         pcData()
         separator(SEPARATOR_CHOICE)
         element("b")
       endEntity("%a.content")
     endGroup()
     occurence(OCCURS_ZERO_OR_MORE)
   endContentModel()

Pros & Cons:

  pros: parser need not hesitate to fire any(?) XMLContentModelHandler
        callbacks.
        easy to understand (?)

  cons: redundant callbacks.
        it's hard to get type information (MIXED, CHILDREN, etc.)
        # do you need type information?
-- 
	Ryosuke Nanba

Re: problem about parameter entity handling of XNI

Posted by Andy Clark <an...@apache.org>.

Ryosuke Nanba wrote:
> OK, that's right. But why startContentModel() must report
> type of the content model? What do you think about modified
> API for XMLContentModelHandler such as:
> 
>   void startContentModel(String elementName)
>   void pcData()
>   void empty()
>   void any()
>   void element(String elementName)
>   ...

Define the interface a little better so that we can discuss 
the pros and cons between the two strategies.

-- 
Andy Clark * IBM, TRL - Japan * andyc@apache.org

Re: problem about parameter entity handling of XNI

Posted by Ryosuke Nanba <rn...@cyber.email.ne.jp>.

Hi, Arnaud & Andy.

Andy wrote:
> What Arnaud means is that when we see the open paren we don't know
> whether it's going to be a child model like "(a)*" or a mixed model
> like "(#PCDATA)". So we don't fire the startContentModel method
> until we know for sure but that's *after* the startEntity.

OK, that's right. But why startContentModel() must report
type of the content model? What do you think about modified
API for XMLContentModelHandler such as:

  void startContentModel(String elementName)
  void pcData()
  void empty()
  void any()
  void element(String elementName)
  ...

I think it's not cool, but natural for DTD syntax, and the
parser doesn't need buffering.
---
    Ryosuke Nanba

Re: problem about parameter entity handling of XNI

Posted by Andy Clark <an...@apache.org>.

Arnaud Le Hors wrote:
> You can see the startEntity callback is called too early. But getting it
> called after startContentModel requires buffering the info somewhere so
> that we can fire the event later. It doesn't seem so bad, except that

What Arnaud means is that when we see the open paren we don't know
whether it's going to be a child model like "(a)*" or a mixed model
like "(#PCDATA)". So we don't fire the startContentModel method
until we know for sure but that's *after* the startEntity.

> And PEs are really nasty. They can occur pretty much anywhere. So I'm
> not sure how many such cases there are.

And be recursive! Ouch.

-- 
Andy Clark * IBM, TRL - Japan * andyc@apache.org

Re: problem about parameter entity handling of XNI

Posted by Arnaud Le Hors <le...@us.ibm.com>.

Hi all,
sorry for the delay in responding, I've been busy with something else.
The request does seem reasonable to me. But, although it may seem quite
easy to support at first, I believe it will actually require quite some
work and may imply a significant overhead. In particular, calling the
callbacks in the correct order is definitely tricky.
In the following example:

     <!ENTITY % a 'a'>
     <!ELEMENT root (%a;)*>

     startEntity("%a",systemId=null,publicId=null,encoding=null) // !!!
       startContentModel(elementName="root",type=TYPE_CHILDREN)
         childrenStartGroup()
           childrenElement(elementName="a")
         endEntity("%a") // !!!
       childrenEndGroup()
     endContentModel()

You can see the startEntity callback is called too early. But getting it
called after startContentModel requires buffering the info somewhere so
that we can fire the event later. It doesn't seem so bad, except that
the pe might be recursive, in which case we'll have more than one
startEntity event to buffer. This means the buffer effectively needs to
be a stack of startEntity events to be fired after startContentModel.

And PEs are really nasty. They can occur pretty much anywhere. So I'm
not sure how many such cases there are.

Anyway, this is to say: interesting request, but not easy to fulfill...
:-) I may give it a try but I won't make any promise though.
-- 
Arnaud  Le Hors - IBM Cupertino, XML Strategy Group

Re: problem about parameter entity handling of XNI

Posted by Libor Kramolis <li...@netbeans.com>.

Andy Clark wrote:

> Ryosuke Nanba wrote:
> 
>> 1. I think DTDParser calls XMLDTDHandler.startEntity() when the
>>   parser starts expanding PE, and calls
>>   XMLDTDHandler.endEntity() when the parser has expanded
>>   PE. But following input makes invalid method call sequence.
> 
> 
> You are correct; this is a bug.
> 
>> 2. Nested PEs are expanded. Following input makes only
>>   one pair of start/endEntity() method call.
> 
> 
> This is more of a feature request. Without looking at the
> code, I don't know whether it's reasonable for the parser
> implementation to handle this (and also whether it's in sync
> with the SAX2 callbacks). But perhaps it's possible to
> support this feature.
> 
> Arnaud: any comments?
> 

Hello,
I would like to know what is current status of this feature. I would 
like to say that it is important to me to get information about all used 
references in parsed document.

Thanks.

Libor

Re: problem about parameter entity handling of XNI

Posted by Andy Clark <an...@apache.org>.

Ryosuke Nanba wrote:
> 1. I think DTDParser calls XMLDTDHandler.startEntity() when the
>   parser starts expanding PE, and calls
>   XMLDTDHandler.endEntity() when the parser has expanded
>   PE. But following input makes invalid method call sequence.

You are correct; this is a bug.

> 2. Nested PEs are expanded. Following input makes only
>   one pair of start/endEntity() method call.

This is more of a feature request. Without looking at the
code, I don't know whether it's reasonable for the parser
implementation to handle this (and also whether it's in sync
with the SAX2 callbacks). But perhaps it's possible to
support this feature.

Arnaud: any comments?

-- 
Andy Clark * IBM, TRL - Japan * andyc@apache.org