You are viewing a plain text version of this content. The canonical link for it is here.

Posted to j-dev@xerces.apache.org by Andy Clark <an...@apache.org> on 2001/02/16 11:02:00 UTC

[XNI] Document Information Set

Part of the changes I recently committed was an addition to the 
XMLDocumentHandler interface. I added an emptyElement callback
to differentiate between between "<foo></foo>" and "<foo/>" in
the document information set.

This addition was based on feedback by Petr Kuzel.

Regarding the request to obtain information about character
entities in the document, I believe that this can be supported 
with the current interface as a parser feature. Character
entities (and the 5 built-in entities such as "lt") would be
reported through the standard start/endEntity callbacks. But 
I think that the default setting for this feature is "off"
so that character entities are NOT reported by default.

-- 
Andy Clark * IBM, TRL - Japan * andyc@apache.org

Re: [XNI] Document Information Set

Posted by Andy Clark <an...@apache.org>.

Petr Kuzel wrote:
> Not implemented yet so I can implement it.

Great. (Saw your posted patch but it hasn't been reviewed, yet.)

> XNI definition change for start/endEntity():
> if name starts with "#" it is a character entity. It can
> occure it the feature is on.

Yep.

> If only we had streamed attributes (their values). There is no
> change but getEntityCount() returns higher number if charrefs
> are present. These can be then obtained by getEntityName(), etc.

Yep.

> AttributeList is deprecated and all its information can be get by
> Attributes. So I think that a wrapper/adapter is adequate.

Now that I think about it, we don't need to have XMLAttributes
extend AttributeList. Of course, in the case of SAX1, we'll have 
to do extra work to pass the attributes to the DocumentHandler.

So I guess we're back at the question of whether we continue to
support SAX1. I'm on the fence; Arnaud is in support of backward
compatibility; and I haven't heard anything from other people,
yet. I can see that both sides of the argument have merit. What
does everyone else think?

-- 
Andy Clark * IBM, TRL - Japan * andyc@apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-dev-help@xml.apache.org

Re: [XNI] Document Information Set

Posted by Petr Kuzel <Pe...@sun.com>.

Andy Clark wrote:
> 
> Petr Kuzel wrote:
> > > Regarding the request to obtain information about character
> > > entities in the document, I believe that this can be supported
> > > with the current interface as a parser feature. Character
> > > entities (and the 5 built-in entities such as "lt") would be
> > > reported through the standard start/endEntity callbacks. But
> > > I think that the default setting for this feature is "off"
> > > so that character entities are NOT reported by default.
> >
> > Sounds good.
> 
> Okay. I talked about this with Arnaud and he was the one that
> suggested this solution. So I guess that makes him the person
> to implement it! :)

Not implemented yet so I can implement it.

XNI definition change for start/endEntity():
if name starts with "#" it is a character entity. It can
occure it the feature is on.

Event flow for charrefs in an element content:

  *startEntity("#88", null, null);
    characters( ...the character... );
  *endEntity("#88");

* - the difference

Event flow for charrefs in attvalue:

If only we had streamed attributes (their values). There is no
change but getEntityCount() returns higher number if charrefs
are present. These can be then obtained by getEntityName(), etc.
 
> > May I also again open an issue related to XMLAttributes inteface?
> >
> >   why it extends deprecated sax.AttributeList?
> >     what about an adapter?
> 
> That is an open question -- do we continue to support SAX1
> parsers in Xerces2 or have people move up SAX2? I used to
> think that we needed to support the original SAX (which is
> why XMLAttributes extends AttributeList) but now I'm not
> so sure. I think it would be fine to only support SAX2 --
> I've already modified the samples to use only SAX2 and
> provide support for setting a SAX1 parser via an adapter.

AttributeList is deprecated and all its information can be get by
Attributes. So I think that a wrapper/adapter is adequate.

  Cc.

-- 
<address>
<a href="mailto:pkuzel@netbeans.com">Petr Kuzel</a>, Sun Microsystems
: <a href="http://www.sun.com/forte/ffj/ie/">Forte Tools</a>
: XML and <a href="http://jini.netbeans.org/">Jini</a> modules</address>

---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-dev-help@xml.apache.org

Re: [XNI] Document Information Set

Posted by Andy Clark <an...@apache.org>.

Petr Kuzel wrote:
> > Regarding the request to obtain information about character
> > entities in the document, I believe that this can be supported
> > with the current interface as a parser feature. Character
> > entities (and the 5 built-in entities such as "lt") would be
> > reported through the standard start/endEntity callbacks. But
> > I think that the default setting for this feature is "off"
> > so that character entities are NOT reported by default.
> 
> Sounds good.

Okay. I talked about this with Arnaud and he was the one that
suggested this solution. So I guess that makes him the person
to implement it! :)

> May I also again open an issue related to XMLAttributes inteface?
> 
>   why it extends deprecated sax.AttributeList?
>     what about an adapter?

That is an open question -- do we continue to support SAX1
parsers in Xerces2 or have people move up SAX2? I used to
think that we needed to support the original SAX (which is
why XMLAttributes extends AttributeList) but now I'm not
so sure. I think it would be fine to only support SAX2 --
I've already modified the samples to use only SAX2 and
provide support for setting a SAX1 parser via an adapter.

>   I still miss isSpecified(..) method.
>     if following workaround "getNonNoramlizedValue(..) != null" equal

No, I don't believe that is an acceptable solution. First,
I think it's fine for a defaulted attribute value to have
a non normalized form. Consider: "<!ATTLIST foo bar NMTOKEN
'  baz  '>". Also, that method isn't implemented, yet... ;)

As for the missing isSpecified method, I just added it to 
the Xerces2 design and source. I still have to modify the
validator to set the value and the DOMParser to use the
value when constructing attribute nodes. Would you care to
submit a patch for that on the current Xerces2 code?

-- 
Andy Clark * IBM, TRL - Japan * andyc@apache.org

Re: [XNI] Document Information Set

Posted by Petr Kuzel <Pe...@netbeans.com>.

Andy Clark wrote:
 
> Regarding the request to obtain information about character
> entities in the document, I believe that this can be supported
> with the current interface as a parser feature. Character
> entities (and the 5 built-in entities such as "lt") would be
> reported through the standard start/endEntity callbacks. But
> I think that the default setting for this feature is "off"
> so that character entities are NOT reported by default.

Sounds good.

May I also again open an issue related to XMLAttributes inteface?

  why it extends deprecated sax.AttributeList?
    what about an adapter?

  I still miss isSpecified(..) method.
    if following workaround "getNonNoramlizedValue(..) != null" equal

  
  Comments are welcome
  Cc.

-- 
<address>
<a href="mailto:pkuzel@netbeans.com">Petr Kuzel</a>, Sun Microsystems
: <a href="http://www.sun.com/forte/ffj/ie/">Forte Tools</a>
: XML and <a href="http://jini.netbeans.org/">Jini</a> modules</address>

Re: [XNI] Document Information Set

Posted by Andy Clark <an...@apache.org>.

Ted Leung wrote:
> I'm agreeing with adding the callback, don't worry.  This whole
> thing would be easier if we just Unicode-ized s-expressions.

Yeah. If I find the guy that decided to use these crazy
angle bracket things, I'll kill him! ;)

-- 
Andy Clark * IBM, TRL - Japan * andyc@apache.org

Re: [XNI] Document Information Set

Posted by Ted Leung <tw...@sauria.com>.

I'm agreeing with adding the callback, don't worry.  This whole
thing would be easier if we just Unicode-ized s-expressions.

Ted
----- Original Message ----- 
From: "Andy Clark" <an...@apache.org>
To: <xe...@xml.apache.org>
Sent: Sunday, February 18, 2001 3:37 PM
Subject: Re: [XNI] Document Information Set


> Ted Leung wrote:
> > See the "For interoperability" note in Sec 3.1 of the XML 1.0SE 
> > rec for a justification for doing this.
> 
> The XML spec only describes what XML should look like in its
> serialized form. It doesn't state what the information set
> should look like and I don't see any problem with supplying
> more information via XNI than what is stated in the XML spec.
> 
> Xerces users want this information so I think it's reasonable
> to accomodate such requests, when reasonable.
> 
> -- 
> Andy Clark * IBM, TRL - Japan * andyc@apache.org
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org
> For additional commands, e-mail: xerces-j-dev-help@xml.apache.org
>

Re: [XNI] Document Information Set

Posted by Andy Clark <an...@apache.org>.

Ted Leung wrote:
> See the "For interoperability" note in Sec 3.1 of the XML 1.0SE 
> rec for a justification for doing this.

The XML spec only describes what XML should look like in its
serialized form. It doesn't state what the information set
should look like and I don't see any problem with supplying
more information via XNI than what is stated in the XML spec.

Xerces users want this information so I think it's reasonable
to accomodate such requests, when reasonable.

-- 
Andy Clark * IBM, TRL - Japan * andyc@apache.org

Re: [XNI] Document Information Set

Posted by Ted Leung <tw...@sauria.com>.

----- Original Message ----- 
From: "Andy Clark" <an...@apache.org>
To: <xe...@xml.apache.org>
Sent: Friday, February 16, 2001 2:02 AM
Subject: [XNI] Document Information Set


> Part of the changes I recently committed was an addition to the 
> XMLDocumentHandler interface. I added an emptyElement callback
> to differentiate between between "<foo></foo>" and "<foo/>" in
> the document information set.

See the "For interoperability" note in Sec 3.1 of the XML 1.0SE rec for
a justification for doing this.

> This addition was based on feedback by Petr Kuzel.
> 
> Regarding the request to obtain information about character
> entities in the document, I believe that this can be supported 
> with the current interface as a parser feature. Character
> entities (and the 5 built-in entities such as "lt") would be
> reported through the standard start/endEntity callbacks. But 
> I think that the default setting for this feature is "off"
> so that character entities are NOT reported by default.
> 
> -- 
> Andy Clark * IBM, TRL - Japan * andyc@apache.org
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org
> For additional commands, e-mail: xerces-j-dev-help@xml.apache.org
>