You are viewing a plain text version of this content. The canonical link for it is here.
Posted to j-dev@xerces.apache.org by Michael Glavassevich <mr...@ca.ibm.com> on 2003/11/03 17:48:50 UTC

Recent changes to handling of undeclared attributes.

Hi all,

Over the weekend I noticed Venu's commit for DOM Level 3 TypeInfo.  This 
interface requires that the typeName [1] be null when the type declared 
for an attribute is unknown. The infoset [2] says when there is no 
declaration for an attribute the attribute type property "... has no 
value" and also that "applications should treat no value and unknown as 
equivalent to a value of CDATA".

The problem is that XNI does not expose information about whether an 
attribute was declared in the DTD, and setting the type of an attribute to 
null isn't allowed in XNI (nor is this an expected value when the 
configuration is used in a SAX context). In the current implementation 
you'll get a NullPointerException if you try reading a 'null' type back 
from XMLAttributesImpl.  All that said, it seems that there needs to be a 
change to align with the infoset.

I propose that we add a property to Augmentations for attributes which 
specifies whether the attribute was declared in the DTD.  We'd also need 
to give it a name. Perhaps something like "ATTRIBUTE_DECL".  To me, this 
seems like the cleanest way of exposing this information without making 
any XNI changes or relying on casting to get at implementation methods.

What does everyone else think?

[1] 
http://www.w3.org/TR/2003/WD-DOM-Level-3-Core-20030609/core.html#TypeInfo-typeName
[2] http://www.w3.org/TR/xml-infoset/#infoitem.attribute

Michael Glavassevich
XML Parser Development
IBM Toronto Lab
E-mail: mrglavas@ca.ibm.com
E-mail: mrglavas@apache.org

Re: Recent changes to handling of undeclared attributes.

Posted by Venu <K....@Sun.COM>.

Michael Glavassevich wrote:
> Hi Venu,
> 
> I meant that this is required by DOM, but is not correct for XNI or SAX,
> and may not be correct for other XML APIs.
> 
> The description of getType for org.apache.xerces.xni.XMLAttributes states
> that: 'The attribute type is one of the strings "CDATA", "ID", "IDREF",
> "IDREFS", "NMTOKEN", "NMTOKENS", "ENTITY", "ENTITIES", or "NOTATION"
> (always in upper case). If the parser has not read a declaration for the
> attribute, or if the parser does not report attribute types, then it must
> return the value "CDATA" as stated in the XML 1.0 Recommentation clause
> 3.3.3, "Attribute-Value Normalization"). For an enumerated attribute that
> is not a notation, the parser will report the type as "NMTOKEN".'
> 
Yes that's true and we could still achieve that by assuming that type is 
CDATA when type is found to  be null.

> Returning null from the getType methods means that an attribute is absent
> or you've specified an index that is out of bounds. It would appear to me
> that XNI doesn't allow null to be returned for other reasons, and
> explicitly says that it should return "CDATA" if no declaration is
> present. This is why we don't check internally for null values. 
> Unless we're opening up XNI to more changes, doing something else besides
> treating this as null everywhere, in my opinion, is the appropriate thing.
> Someone could always specify their own parser configuration, and if it
> behaves according to XNI, DOM won't get the null it was expecting.
> Also I believe the parser components should try to be API neutral where
> possible. If DOM needs null for undeclared attributes, then the pipeline
> should expose enough information for DOM to determine this value, rather
> than forcing this value within the internals.

agreed. Please go ahead and implement the changes you proposed i will 
revert back my checkins.

-venu


  This may work for some
> requirements, but it becomes a bit of a mess when two or more APIs have
> conflicting requirements that are difficult (or impossible) to satisfy
> simultaneously.
> 



> On Tue, 4 Nov 2003, Venu wrote:
> 
> 
>>Sorry i had missed checking in XMLAttributesImpl.java.
>>
>>As you already pointed out in [2]and [1]  attribute type should be null
>>when declaration is not found .My intent is to change all the places in
>>xerces where type is referenced appropriately( eg : check for null ,
>>etc.).If all feel this is not appropriate thing to do then provding a
>>variable as suggested is the easiest alternative.
>>Depending on the suggestions i would either goahead and make changes req
>>when a SAXParser is used or back out the current checkins.
>>
>>All please let us know your views at the earliest.
>>
>>Regards,
>>Venu
> 
> 
> ---------------------------
> Michael Glavassevich
> XML Parser Development
> IBM Toronto Lab
> E-mail: mrglavas@ca.ibm.com
> E-mail: mrglavas@apache.org
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org
> For additional commands, e-mail: xerces-j-dev-help@xml.apache.org
> 



---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-dev-help@xml.apache.org


Re: Recent changes to handling of undeclared attributes.

Posted by Michael Glavassevich <mr...@apache.org>.
Hi Venu,

I meant that this is required by DOM, but is not correct for XNI or SAX,
and may not be correct for other XML APIs.

The description of getType for org.apache.xerces.xni.XMLAttributes states
that: 'The attribute type is one of the strings "CDATA", "ID", "IDREF",
"IDREFS", "NMTOKEN", "NMTOKENS", "ENTITY", "ENTITIES", or "NOTATION"
(always in upper case). If the parser has not read a declaration for the
attribute, or if the parser does not report attribute types, then it must
return the value "CDATA" as stated in the XML 1.0 Recommentation clause
3.3.3, "Attribute-Value Normalization"). For an enumerated attribute that
is not a notation, the parser will report the type as "NMTOKEN".'

Returning null from the getType methods means that an attribute is absent
or you've specified an index that is out of bounds. It would appear to me
that XNI doesn't allow null to be returned for other reasons, and
explicitly says that it should return "CDATA" if no declaration is
present. This is why we don't check internally for null values.

Unless we're opening up XNI to more changes, doing something else besides
treating this as null everywhere, in my opinion, is the appropriate thing.
Someone could always specify their own parser configuration, and if it
behaves according to XNI, DOM won't get the null it was expecting.

Also I believe the parser components should try to be API neutral where
possible. If DOM needs null for undeclared attributes, then the pipeline
should expose enough information for DOM to determine this value, rather
than forcing this value within the internals. This may work for some
requirements, but it becomes a bit of a mess when two or more APIs have
conflicting requirements that are difficult (or impossible) to satisfy
simultaneously.

On Tue, 4 Nov 2003, Venu wrote:

> Sorry i had missed checking in XMLAttributesImpl.java.
>
> As you already pointed out in [2]and [1]  attribute type should be null
> when declaration is not found .My intent is to change all the places in
> xerces where type is referenced appropriately( eg : check for null ,
> etc.).If all feel this is not appropriate thing to do then provding a
> variable as suggested is the easiest alternative.
> Depending on the suggestions i would either goahead and make changes req
> when a SAXParser is used or back out the current checkins.
>
> All please let us know your views at the earliest.
>
> Regards,
> Venu

---------------------------
Michael Glavassevich
XML Parser Development
IBM Toronto Lab
E-mail: mrglavas@ca.ibm.com
E-mail: mrglavas@apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-dev-help@xml.apache.org


Re: Recent changes to handling of undeclared attributes.

Posted by Venu <K....@Sun.COM>.

Michael Glavassevich wrote:
> 
> Hi all,
> 
> Over the weekend I noticed Venu's commit for DOM Level 3 TypeInfo.  This 
> interface requires that the typeName [1] be null when the type declared 
> for an attribute is unknown. The infoset [2] says when there is no 
> declaration for an attribute the attribute type property "... has no 
> value" and also that "applications should treat no value and unknown as 
> equivalent to a value of CDATA".
> 
> The problem is that XNI does not expose information about whether an 
> attribute was declared in the DTD, and setting the type of an attribute 
> to null isn't allowed in XNI (nor is this an expected value when the 
> configuration is used in a SAX context). In the current implementation 
> you'll get a NullPointerException if you try reading a 'null' type back 
> from XMLAttributesImpl.  All that said, it seems that there needs to be 
> a change to align with the infoset.

Sorry i had missed checking in XMLAttributesImpl.java.

As you already pointed out in [2]and [1]  attribute type should be null 
when declaration is not found .My intent is to change all the places in 
xerces where type is referenced appropriately( eg : check for null , 
etc.).If all feel this is not appropriate thing to do then provding a 
variable as suggested is the easiest alternative.
Depending on the suggestions i would either goahead and make changes req 
when a SAXParser is used or back out the current checkins.

All please let us know your views at the earliest.

Regards,
Venu

> 
> I propose that we add a property to Augmentations for attributes which 
> specifies whether the attribute was declared in the DTD.  We'd also need 
> to give it a name. Perhaps something like "ATTRIBUTE_DECL".  To me, this 
> seems like the cleanest way of exposing this information without making 
> any XNI changes or relying on casting to get at implementation methods.
> 
> What does everyone else think?
> 
> [1] 
> http://www.w3.org/TR/2003/WD-DOM-Level-3-Core-20030609/core.html#TypeInfo-typeName 
> 
> [2] http://www.w3.org/TR/xml-infoset/#infoitem.attribute
> 
> Michael Glavassevich
> XML Parser Development
> IBM Toronto Lab
> E-mail: mrglavas@ca.ibm.com
> E-mail: mrglavas@apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-dev-help@xml.apache.org


Re: Recent changes to handling of undeclared attributes.

Posted by Michael Glavassevich <mr...@apache.org>.
On Mon, 3 Nov 2003, Andy Clark wrote:

> You mean instead of implementing the XMLDTDHandler to
> get that information?

SAXParser and DOMParser already implement XMLDTDHandler. You could
conceivably figure out if an attribute was declared by tracking all the
attributeDecl events in some structure. It doesn't seem like building such
a thing and querying it for each attribute would perform as well as
attaching this information to each attribute.

Also, when the user is caching grammars it doesn't look like this approach
would work, because I don't believe we generate any DTD events.

---------------------------
Michael Glavassevich
XML Parser Development
IBM Toronto Lab
E-mail: mrglavas@ca.ibm.com
E-mail: mrglavas@apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-dev-help@xml.apache.org


Re: Recent changes to handling of undeclared attributes.

Posted by Andy Clark <an...@apache.org>.
Michael Glavassevich wrote:
> I propose that we add a property to Augmentations for attributes which 
> specifies whether the attribute was declared in the DTD.  We'd also need 
> to give it a name. Perhaps something like "ATTRIBUTE_DECL".  To me, this 
> seems like the cleanest way of exposing this information without making 
> any XNI changes or relying on casting to get at implementation methods.

You mean instead of implementing the XMLDTDHandler to
get that information?

-- 
Andy Clark * andyc@apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-dev-help@xml.apache.org