You are viewing a plain text version of this content. The canonical link for it is here.

Posted to j-users@xerces.apache.org by Jacob Kjome <ho...@visi.com> on 2013/08/28 06:58:08 UTC

DOM Document.cloneNode(true), in-memory validation, and Attr.getSpecified()

When I parse a document, with validation against a DTD, attributes that 
weren't explicitly specified in the parsed XML, but have default values in the 
DTD, correctly show up as "Attr.getSpecified() == false".  So far, so good.

However, after cloning the document (Document = (Document) 
doc.cloneNode(true)) all the equivalent cloned Attrs show up as 
"Attr.getSpecified() == true".  Is that to be expected or a bug?  And a 
side-question: should I expect the value of Document.getDocumentURI() be 
preserved after cloning?  Because that value appears to be discarded.

Assuming it is to be expected, I figured that I could recover the original 
"Attr.getSpecified == false" by performing in-memory validation based on 
reading the DOM API doc [1][2].  I did this by setting 
DOMConfiguration.setParameter("validate", Boolean.TRUE) (after having first 
set Document.setDocumentURI(systemIdOfParsedDocument) to allow for resolving a 
relatively defined DTD) and calling Document.normalizeDocument().

The validation appears to succeed (no errors are reported), but I still get 
"Attr.getSpecified() == true" for defaulted attributes that didn't explicitly 
exist in the XML.  Is this a bug, or is it simply not possible to recover this 
information after a clone that sets these Attrs as "specified == true", even 
using in-memory validation?

I'd like to be able to preserve the "specified" state in the clone so that 
when the document is serialized, I can avoid printing these attributes by 
skipping those with "specified == false".  Seems to me this should be 
possible.  If not, why not?


[1] 
http://docs.oracle.com/javase/7/docs/api/org/w3c/dom/Attr.html#getSpecified%28%29
[2] http://docs.oracle.com/javase/7/docs/api/org/w3c/dom/DOMConfiguration.html


thanks,

Jake

---------------------------------------------------------------------
To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
For additional commands, e-mail: j-users-help@xerces.apache.org

Re: DOM Document.cloneNode(true), in-memory validation, and Attr.getSpecified()

Posted by Jacob Kjome <ho...@visi.com>.

Understood.  By the same token, since the spec doesn't hold us back, it means 
that any behavior we determine to be desirable can be applied to cloning.

Seems maintaining Attr specified state would be both desirable (can't think of 
any harm, only benefits) and, I would think, fairly simple.  Xerces does a 
fine job for the most part.  But it can be improved.  The bug I provided a 
patch for [1] and a fix to maintain specified state (maybe I'll look into 
providing a patch?) would be welcome improvements.

[1] https://issues.apache.org/jira/browse/XERCESJ-1597


Thanks,

Jake

On Wed, 28 Aug 2013 13:36:27 -0400
 Michael Glavassevich <mr...@ca.ibm.com> wrote:
> Been quite awhile since I've looked at this area but thought I'd point out 
> that the DOM specification says "cloning Document, DocumentType, Entity, 
> and Notation nodes is implementation dependent". If you're cloning 
> Document nodes it can do whatever the implementer felt was reasonable 
> which may or may not be what you hoped for. From an API perspective the 
> behaviour is undefined.
> 
> Thanks.
> 
> Michael Glavassevich
> XML Technologies and WAS Development
> IBM Toronto Lab
> E-mail: mrglavas@ca.ibm.com
> E-mail: mrglavas@apache.org
> 
> "Jacob Kjome" <ho...@visi.com> wrote on 08/28/2013 01:05:40 PM:
> 
>> I see.  So, once the attributes are set as specified during cloning, 
> there's 
>> no going back, even after in-memory validation since it will have been 
> as if 
>> the parser had found the attributes explicitly specified in the 
>> original XML.  
>> As such, the validation can't mark them as unspecified since, from all 
>> appearances, they were specified in the XML.  Correct?
>> 
>> What about UserDataHandlers?  Would it be possible, at parse time, 
>> to add them 
>> to Attr nodes that are unspecified so that when the clone occurs 
> (setting 
>> these attributes to specified), the UserDataHandlers can reset these 
>> attributes back to unspecified?  I haven't looked into how that would 
> work, 
>> but it would be nice to know whether to bother researching it or not.
>> 
>> The documentURI issue is much easier to deal with, since I can copy the 
> value 
>> prior to the clone and reset it manually after the clone.
>> 
>> While I'm at it, I figure I'll lobby for application of a patch I 
> submitted a 
>> while back related to cloning and Attr ID'ness...
>> https://issues.apache.org/jira/browse/XERCESJ-1597
>> 
>> The following issue also seems somewhat related...
>> https://issues.apache.org/jira/browse/XERCESJ-1430
>> 
>> 
>> Thanks,
>> 
>> Jake
>> 
>> On Wed, 28 Aug 2013 19:02:05 +0530
>>  Mukul Gandhi <mu...@apache.org> wrote:
>> > Here are my opinion for these questions. Sorry I couldn't do any tests 
> as
>> > yet.
>> > 
>> > On Wed, Aug 28, 2013 at 10:28 AM, Jacob Kjome <ho...@visi.com> wrote:
>> > 
>> >> However, after cloning the document (Document = (Document)
>> >> doc.cloneNode(true)) all the equivalent cloned Attrs show up as
>> >> "Attr.getSpecified() == true".  Is that to be expected or a bug?
>> > 
>> > 
>> > I think this is an expected behavior. The cloneNode() documentation 
> at,
>> > http://xerces.apache.org/xerces2-j/javadocs/api/org/w3c/dom/
>> Node.htmlmentions
>> > 
>> > "In addition, clones of unspecified Attr nodes are specified". This 
> should
>> > clarify this doubt I think.
>> > 
>> > And a side-question: should I expect the value of 
> Document.getDocumentURI()
>> >> be preserved after cloning?  Because that value appears to be 
> discarded.
>> >>
>> > 
>> > The cloneNode() spec doesn't mention any constraints wrt this aspect. 
> So
>> > if, after a node clone a document URI is not preserved, I would say 
> this is
>> > compliant to the spec of this method. Other than, going by the 
> wordings of
>> > the spec, I think a clone is happening on an object representation of 
> the
>> > original document and objects do not have a URI, therefore a clone
>> > operation not preserving the document URI looks acceptable to me.
>> > 
>> > 
>> >> I'd like to be able to preserve the "specified" state in the clone so 
> that
>> >> when the document is serialized, I can avoid printing these 
> attributes by
>> >> skipping those with "specified == false".  Seems to me this should be
>> >> possible.  If not, why not?
>> >>
>> > 
>> > I think, this is a sensible use case. But the default implementation
>> > provided, doesn't make it possible for you. This might be addressed by 
> an
>> > external application logic. You may keep the "specified" state of
>> > attributes external to your main document (or a more challenging way 
> may be
>> > to, read it directly from DTD), and during serialization use this 
> extra
>> > information to affect the serialization in your desired way. Another
>> > approach may be to, let the serializer emit the unspecified attributes 
> and
>> > then use this output with a DTD or an external metadata as a unit for 
> your
>> > application.
>> > 
>> > 
>> > 
>> >>
>> >> --
>> >> Regards,
>> >> Mukul Gandhi <j-...@xerces.apache.org>
>> 
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
>> For additional commands, e-mail: j-users-help@xerces.apache.org
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
>For additional commands, e-mail: j-users-help@xerces.apache.org
> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
For additional commands, e-mail: j-users-help@xerces.apache.org

Re: DOM Document.cloneNode(true), in-memory validation, and Attr.getSpecified()

Posted by Michael Glavassevich <mr...@ca.ibm.com>.

Been quite awhile since I've looked at this area but thought I'd point out 
that the DOM specification says "cloning Document, DocumentType, Entity, 
and Notation nodes is implementation dependent". If you're cloning 
Document nodes it can do whatever the implementer felt was reasonable 
which may or may not be what you hoped for. From an API perspective the 
behaviour is undefined.

Thanks.

Michael Glavassevich
XML Technologies and WAS Development
IBM Toronto Lab
E-mail: mrglavas@ca.ibm.com
E-mail: mrglavas@apache.org

"Jacob Kjome" <ho...@visi.com> wrote on 08/28/2013 01:05:40 PM:

> I see.  So, once the attributes are set as specified during cloning, 
there's 
> no going back, even after in-memory validation since it will have been 
as if 
> the parser had found the attributes explicitly specified in the 
> original XML.  
> As such, the validation can't mark them as unspecified since, from all 
> appearances, they were specified in the XML.  Correct?
> 
> What about UserDataHandlers?  Would it be possible, at parse time, 
> to add them 
> to Attr nodes that are unspecified so that when the clone occurs 
(setting 
> these attributes to specified), the UserDataHandlers can reset these 
> attributes back to unspecified?  I haven't looked into how that would 
work, 
> but it would be nice to know whether to bother researching it or not.
> 
> The documentURI issue is much easier to deal with, since I can copy the 
value 
> prior to the clone and reset it manually after the clone.
> 
> While I'm at it, I figure I'll lobby for application of a patch I 
submitted a 
> while back related to cloning and Attr ID'ness...
> https://issues.apache.org/jira/browse/XERCESJ-1597
> 
> The following issue also seems somewhat related...
> https://issues.apache.org/jira/browse/XERCESJ-1430
> 
> 
> Thanks,
> 
> Jake
> 
> On Wed, 28 Aug 2013 19:02:05 +0530
>  Mukul Gandhi <mu...@apache.org> wrote:
> > Here are my opinion for these questions. Sorry I couldn't do any tests 
as
> > yet.
> > 
> > On Wed, Aug 28, 2013 at 10:28 AM, Jacob Kjome <ho...@visi.com> wrote:
> > 
> >> However, after cloning the document (Document = (Document)
> >> doc.cloneNode(true)) all the equivalent cloned Attrs show up as
> >> "Attr.getSpecified() == true".  Is that to be expected or a bug?
> > 
> > 
> > I think this is an expected behavior. The cloneNode() documentation 
at,
> > http://xerces.apache.org/xerces2-j/javadocs/api/org/w3c/dom/
> Node.htmlmentions
> > 
> > "In addition, clones of unspecified Attr nodes are specified". This 
should
> > clarify this doubt I think.
> > 
> > And a side-question: should I expect the value of 
Document.getDocumentURI()
> >> be preserved after cloning?  Because that value appears to be 
discarded.
> >>
> > 
> > The cloneNode() spec doesn't mention any constraints wrt this aspect. 
So
> > if, after a node clone a document URI is not preserved, I would say 
this is
> > compliant to the spec of this method. Other than, going by the 
wordings of
> > the spec, I think a clone is happening on an object representation of 
the
> > original document and objects do not have a URI, therefore a clone
> > operation not preserving the document URI looks acceptable to me.
> > 
> > 
> >> I'd like to be able to preserve the "specified" state in the clone so 
that
> >> when the document is serialized, I can avoid printing these 
attributes by
> >> skipping those with "specified == false".  Seems to me this should be
> >> possible.  If not, why not?
> >>
> > 
> > I think, this is a sensible use case. But the default implementation
> > provided, doesn't make it possible for you. This might be addressed by 
an
> > external application logic. You may keep the "specified" state of
> > attributes external to your main document (or a more challenging way 
may be
> > to, read it directly from DTD), and during serialization use this 
extra
> > information to affect the serialization in your desired way. Another
> > approach may be to, let the serializer emit the unspecified attributes 
and
> > then use this output with a DTD or an external metadata as a unit for 
your
> > application.
> > 
> > 
> > 
> >>
> >> --
> >> Regards,
> >> Mukul Gandhi <j-...@xerces.apache.org>
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
> For additional commands, e-mail: j-users-help@xerces.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
For additional commands, e-mail: j-users-help@xerces.apache.org

Re: DOM Document.cloneNode(true), in-memory validation, and Attr.getSpecified()

Posted by Jacob Kjome <ho...@visi.com>.

I see.  So, once the attributes are set as specified during cloning, there's 
no going back, even after in-memory validation since it will have been as if 
the parser had found the attributes explicitly specified in the original XML.  
As such, the validation can't mark them as unspecified since, from all 
appearances, they were specified in the XML.  Correct?

What about UserDataHandlers?  Would it be possible, at parse time, to add them 
to Attr nodes that are unspecified so that when the clone occurs (setting 
these attributes to specified), the UserDataHandlers can reset these 
attributes back to unspecified?  I haven't looked into how that would work, 
but it would be nice to know whether to bother researching it or not.

The documentURI issue is much easier to deal with, since I can copy the value 
prior to the clone and reset it manually after the clone.

While I'm at it, I figure I'll lobby for application of a patch I submitted a 
while back related to cloning and Attr ID'ness...
https://issues.apache.org/jira/browse/XERCESJ-1597

The following issue also seems somewhat related...
https://issues.apache.org/jira/browse/XERCESJ-1430

Thanks,

Jake

On Wed, 28 Aug 2013 19:02:05 +0530
 Mukul Gandhi <mu...@apache.org> wrote:
> Here are my opinion for these questions. Sorry I couldn't do any tests as
> yet.
> 
> On Wed, Aug 28, 2013 at 10:28 AM, Jacob Kjome <ho...@visi.com> wrote:
> 
>> However, after cloning the document (Document = (Document)
>> doc.cloneNode(true)) all the equivalent cloned Attrs show up as
>> "Attr.getSpecified() == true".  Is that to be expected or a bug?
> 
> 
> I think this is an expected behavior. The cloneNode() documentation at,
> http://xerces.apache.org/xerces2-j/javadocs/api/org/w3c/dom/Node.htmlmentions
> 
> "In addition, clones of unspecified Attr nodes are specified". This should
> clarify this doubt I think.
> 
> And a side-question: should I expect the value of Document.getDocumentURI()
>> be preserved after cloning?  Because that value appears to be discarded.
>>
> 
> The cloneNode() spec doesn't mention any constraints wrt this aspect. So
> if, after a node clone a document URI is not preserved, I would say this is
> compliant to the spec of this method. Other than, going by the wordings of
> the spec, I think a clone is happening on an object representation of the
> original document and objects do not have a URI, therefore a clone
> operation not preserving the document URI looks acceptable to me.
> 
> 
>> I'd like to be able to preserve the "specified" state in the clone so that
>> when the document is serialized, I can avoid printing these attributes by
>> skipping those with "specified == false".  Seems to me this should be
>> possible.  If not, why not?
>>
> 
> I think, this is a sensible use case. But the default implementation
> provided, doesn't make it possible for you. This might be addressed by an
> external application logic. You may keep the "specified" state of
> attributes external to your main document (or a more challenging way may be
> to, read it directly from DTD), and during serialization use this extra
> information to affect the serialization in your desired way. Another
> approach may be to, let the serializer emit the unspecified attributes and
> then use this output with a DTD or an external metadata as a unit for your
> application.
> 
> 
> 
>>
>> --
>> Regards,
>> Mukul Gandhi <j-...@xerces.apache.org>

---------------------------------------------------------------------
To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
For additional commands, e-mail: j-users-help@xerces.apache.org

Re: DOM Document.cloneNode(true), in-memory validation, and Attr.getSpecified()

Posted by Mukul Gandhi <mu...@apache.org>.

Here are my opinion for these questions. Sorry I couldn't do any tests as
yet.

On Wed, Aug 28, 2013 at 10:28 AM, Jacob Kjome <ho...@visi.com> wrote:

> However, after cloning the document (Document = (Document)
> doc.cloneNode(true)) all the equivalent cloned Attrs show up as
> "Attr.getSpecified() == true".  Is that to be expected or a bug?

I think this is an expected behavior. The cloneNode() documentation at,
http://xerces.apache.org/xerces2-j/javadocs/api/org/w3c/dom/Node.htmlmentions

"In addition, clones of unspecified Attr nodes are specified". This should
clarify this doubt I think.

And a side-question: should I expect the value of Document.getDocumentURI()
> be preserved after cloning?  Because that value appears to be discarded.
>

The cloneNode() spec doesn't mention any constraints wrt this aspect. So
if, after a node clone a document URI is not preserved, I would say this is
compliant to the spec of this method. Other than, going by the wordings of
the spec, I think a clone is happening on an object representation of the
original document and objects do not have a URI, therefore a clone
operation not preserving the document URI looks acceptable to me.

> I'd like to be able to preserve the "specified" state in the clone so that
> when the document is serialized, I can avoid printing these attributes by
> skipping those with "specified == false".  Seems to me this should be
> possible.  If not, why not?
>

I think, this is a sensible use case. But the default implementation
provided, doesn't make it possible for you. This might be addressed by an
external application logic. You may keep the "specified" state of
attributes external to your main document (or a more challenging way may be
to, read it directly from DTD), and during serialization use this extra
information to affect the serialization in your desired way. Another
approach may be to, let the serializer emit the unspecified attributes and
then use this output with a DTD or an external metadata as a unit for your
application.

>
> --
> Regards,
> Mukul Gandhi <j-...@xerces.apache.org>