You are viewing a plain text version of this content. The canonical link for it is here.

Posted to j-users@xerces.apache.org by Michael Glavassevich <mr...@ca.ibm.com> on 2006/04/16 18:17:58 UTC

Re: best approach to whole document cloning in Xerces2?

Hi Jake,

The behaviour of Document.cloneNode(true) [1] is implementation dependent. 
In Xerces it will create a new Document and then import the children from 
the original document. I would be really surprised if reparsing the 
document performed better than an in-memory copy (unless you had a 
UserDataHandler [2] registered which does some heavy operation in response 
to the cloning/importing).

[1] 
http://www.w3.org/TR/2004/REC-DOM-Level-3-Core-20040407/core.html#ID-3A0ED0A4
[2] 
http://www.w3.org/TR/2004/REC-DOM-Level-3-Core-20040407/core.html#UserDataHandler

Michael Glavassevich
XML Parser Development
IBM Toronto Lab
E-mail: mrglavas@ca.ibm.com
E-mail: mrglavas@apache.org

Jacob Kjome <ho...@visi.com> wrote on 04/16/2006 02:17:10 AM:

> 
> I'm wondering what's the best approach to cloning an entire 
> document?  Would it be better to keep a master copy in memory and 
> then create a new document and import the other document in there, or 
> would it be better to simply reparse the document every time (where 
> the document is used over and over again as a template, a copy is 
> created and manipulated on each HTTP request, then serialized to the 
> browser)?  If I keep the document in memory and know I am dealing 
> with the Xerces2 implementation, can I just call cloneNode(true) and 
> get an identical copy of the whole document, including doctype, 
> entities, entity references, etc...?  Again, would this be more 
> efficient than reparsing the document each time with, say, the 
> Xerces2 DOMParser?  Is there a clear-cut answer to this, or does it 
> depend on document size or other aspect of the document or environment?
> 
> thanks,
> 
> Jake
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: general-unsubscribe@xml.apache.org
> For additional commands, e-mail: general-help@xml.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
For additional commands, e-mail: j-users-help@xerces.apache.org

Re: best approach to whole document cloning in Xerces2?

Posted by Michael Glavassevich <mr...@ca.ibm.com>.

Jacob Kjome <ho...@visi.com> wrote on 04/17/2006 09:17:53 AM:

> At 11:17 AM 4/16/2006, you wrote:
>  >Hi Jake,
>  >
>  >The behaviour of Document.cloneNode(true) [1] is implementation 
dependent.
>  >In Xerces it will create a new Document and then import the children 
from
>  >the original document.
> 
> Which would leave out the DTD, I suppose.

I believe it does copy DocumentType nodes, though there's no guarantee 
that other DOM implementations will do that.

> So, it would make more 
> sense to create my own document and do something like this, right?...
> 
>              DOMImplementation domImpl = document.getImplementation();
>              String documentElement = document.getDoctype().getName();
>              DocumentType docType = 
> domImpl.createDocumentType(documentElement, 
> document.getDoctype().getPublicId(), 
document.getDoctype().getSystemId());
>              Document doc = domImpl.createDocument("", 
> documentElement, docType);
>              Node node = doc.importNode(document.getDocumentElement(), 
true);
>              doc.replaceChild(node, doc.getDocumentElement());
> 
> This is what I do currently to get a copy of the template DOM at 
> runtime, but I just want to make sure I'm doing it the most correct 
> and efficient way possible.
> 
> Of course, this leaves out any internal subset and entity nodes, 
> no?

Right.

> How would I clone it all?

The implementation of Xerces' Document.cloneNode() should be able to do 
that.

> Is it possible via the DOM interfaces?

You cannot import DocumentType nodes using the DOM API (
http://www.w3.org/TR/2004/REC-DOM-Level-3-Core-20040407/core.html#Core-Document-importNode).

>  > I would be really surprised if reparsing the
>  >document performed better than an in-memory copy (unless you had a
>  >UserDataHandler [2] registered which does some heavy operation in 
response
>  >to the cloning/importing).
>  >
> 
> I kind of figured this, but I just wanted to make sure that the 
> caching of template DOM's that I'm doing makes sense.
> 
> Jake
> 
>  >[1]
>  >http://www.w3.org/TR/2004/REC-DOM-Level-3-Core-20040407/core.
> html#ID-3A0ED0A4
>  >[2]
>  >http://www.w3.org/TR/2004/REC-DOM-Level-3-Core-20040407/core.html#User
>  >DataHandler
>  >
>  >Michael Glavassevich
>  >XML Parser Development
>  >IBM Toronto Lab
>  >E-mail: mrglavas@ca.ibm.com
>  >E-mail: mrglavas@apache.org
>  >
>  >Jacob Kjome <ho...@visi.com> wrote on 04/16/2006 02:17:10 AM:
>  >
>  >>
>  >> I'm wondering what's the best approach to cloning an entire
>  >> document?  Would it be better to keep a master copy in memory and
>  >> then create a new document and import the other document in there, 
or
>  >> would it be better to simply reparse the document every time (where
>  >> the document is used over and over again as a template, a copy is
>  >> created and manipulated on each HTTP request, then serialized to the
>  >> browser)?  If I keep the document in memory and know I am dealing
>  >> with the Xerces2 implementation, can I just call cloneNode(true) and
>  >> get an identical copy of the whole document, including doctype,
>  >> entities, entity references, etc...?  Again, would this be more
>  >> efficient than reparsing the document each time with, say, the
>  >> Xerces2 DOMParser?  Is there a clear-cut answer to this, or does it
>  >> depend on document size or other aspect of the document or 
environment?
>  >>
>  >> thanks,
>  >>
>  >> Jake
>  >>
>  >>
>  >> 
---------------------------------------------------------------------
>  >> To unsubscribe, e-mail: general-unsubscribe@xml.apache.org
>  >> For additional commands, e-mail: general-help@xml.apache.org
>  >
>  >
>  >---------------------------------------------------------------------
>  >To unsubscribe, e-mail: general-unsubscribe@xml.apache.org
>  >For additional commands, e-mail: general-help@xml.apache.org
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: general-unsubscribe@xml.apache.org
> For additional commands, e-mail: general-help@xml.apache.org

Michael Glavassevich
XML Parser Development
IBM Toronto Lab
E-mail: mrglavas@ca.ibm.com
E-mail: mrglavas@apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
For additional commands, e-mail: j-users-help@xerces.apache.org

Re: best approach to whole document cloning in Xerces2?

Posted by Michael Glavassevich <mr...@ca.ibm.com>.

Jacob Kjome <ho...@visi.com> wrote on 04/17/2006 09:17:53 AM:

> At 11:17 AM 4/16/2006, you wrote:
>  >Hi Jake,
>  >
>  >The behaviour of Document.cloneNode(true) [1] is implementation 
dependent.
>  >In Xerces it will create a new Document and then import the children 
from
>  >the original document.
> 
> Which would leave out the DTD, I suppose.

I believe it does copy DocumentType nodes, though there's no guarantee 
that other DOM implementations will do that.

> So, it would make more 
> sense to create my own document and do something like this, right?...
> 
>              DOMImplementation domImpl = document.getImplementation();
>              String documentElement = document.getDoctype().getName();
>              DocumentType docType = 
> domImpl.createDocumentType(documentElement, 
> document.getDoctype().getPublicId(), 
document.getDoctype().getSystemId());
>              Document doc = domImpl.createDocument("", 
> documentElement, docType);
>              Node node = doc.importNode(document.getDocumentElement(), 
true);
>              doc.replaceChild(node, doc.getDocumentElement());
> 
> This is what I do currently to get a copy of the template DOM at 
> runtime, but I just want to make sure I'm doing it the most correct 
> and efficient way possible.
> 
> Of course, this leaves out any internal subset and entity nodes, 
> no?

Right.

> How would I clone it all?

The implementation of Xerces' Document.cloneNode() should be able to do 
that.

> Is it possible via the DOM interfaces?

You cannot import DocumentType nodes using the DOM API (
http://www.w3.org/TR/2004/REC-DOM-Level-3-Core-20040407/core.html#Core-Document-importNode).

>  > I would be really surprised if reparsing the
>  >document performed better than an in-memory copy (unless you had a
>  >UserDataHandler [2] registered which does some heavy operation in 
response
>  >to the cloning/importing).
>  >
> 
> I kind of figured this, but I just wanted to make sure that the 
> caching of template DOM's that I'm doing makes sense.
> 
> Jake
> 
>  >[1]
>  >http://www.w3.org/TR/2004/REC-DOM-Level-3-Core-20040407/core.
> html#ID-3A0ED0A4
>  >[2]
>  >http://www.w3.org/TR/2004/REC-DOM-Level-3-Core-20040407/core.html#User
>  >DataHandler
>  >
>  >Michael Glavassevich
>  >XML Parser Development
>  >IBM Toronto Lab
>  >E-mail: mrglavas@ca.ibm.com
>  >E-mail: mrglavas@apache.org
>  >
>  >Jacob Kjome <ho...@visi.com> wrote on 04/16/2006 02:17:10 AM:
>  >
>  >>
>  >> I'm wondering what's the best approach to cloning an entire
>  >> document?  Would it be better to keep a master copy in memory and
>  >> then create a new document and import the other document in there, 
or
>  >> would it be better to simply reparse the document every time (where
>  >> the document is used over and over again as a template, a copy is
>  >> created and manipulated on each HTTP request, then serialized to the
>  >> browser)?  If I keep the document in memory and know I am dealing
>  >> with the Xerces2 implementation, can I just call cloneNode(true) and
>  >> get an identical copy of the whole document, including doctype,
>  >> entities, entity references, etc...?  Again, would this be more
>  >> efficient than reparsing the document each time with, say, the
>  >> Xerces2 DOMParser?  Is there a clear-cut answer to this, or does it
>  >> depend on document size or other aspect of the document or 
environment?
>  >>
>  >> thanks,
>  >>
>  >> Jake
>  >>
>  >>
>  >> 
---------------------------------------------------------------------
>  >> To unsubscribe, e-mail: general-unsubscribe@xml.apache.org
>  >> For additional commands, e-mail: general-help@xml.apache.org
>  >
>  >
>  >---------------------------------------------------------------------
>  >To unsubscribe, e-mail: general-unsubscribe@xml.apache.org
>  >For additional commands, e-mail: general-help@xml.apache.org
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: general-unsubscribe@xml.apache.org
> For additional commands, e-mail: general-help@xml.apache.org

Michael Glavassevich
XML Parser Development
IBM Toronto Lab
E-mail: mrglavas@ca.ibm.com
E-mail: mrglavas@apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@xml.apache.org
For additional commands, e-mail: general-help@xml.apache.org

Re: best approach to whole document cloning in Xerces2?

Posted by Jacob Kjome <ho...@visi.com>.

At 11:17 AM 4/16/2006, you wrote:
 >Hi Jake,
 >
 >The behaviour of Document.cloneNode(true) [1] is implementation dependent.
 >In Xerces it will create a new Document and then import the children from
 >the original document.

Which would leave out the DTD, I suppose.  So, it would make more 
sense to create my own document and do something like this, right?...

             DOMImplementation domImpl = document.getImplementation();
             String documentElement = document.getDoctype().getName();
             DocumentType docType = 
domImpl.createDocumentType(documentElement, 
document.getDoctype().getPublicId(), document.getDoctype().getSystemId());
             Document doc = domImpl.createDocument("", 
documentElement, docType);
             Node node = doc.importNode(document.getDocumentElement(), true);
             doc.replaceChild(node, doc.getDocumentElement());

This is what I do currently to get a copy of the template DOM at 
runtime, but I just want to make sure I'm doing it the most correct 
and efficient way possible.

Of course, this leaves out any internal subset and entity nodes, 
no?  How would I clone it all?  Is it possible via the DOM interfaces?

 > I would be really surprised if reparsing the
 >document performed better than an in-memory copy (unless you had a
 >UserDataHandler [2] registered which does some heavy operation in response
 >to the cloning/importing).
 >

I kind of figured this, but I just wanted to make sure that the 
caching of template DOM's that I'm doing makes sense.

Jake

 >[1]
 >http://www.w3.org/TR/2004/REC-DOM-Level-3-Core-20040407/core.html#ID-3A0ED0A4
 >[2]
 >http://www.w3.org/TR/2004/REC-DOM-Level-3-Core-20040407/core.html#User
 >DataHandler
 >
 >Michael Glavassevich
 >XML Parser Development
 >IBM Toronto Lab
 >E-mail: mrglavas@ca.ibm.com
 >E-mail: mrglavas@apache.org
 >
 >Jacob Kjome <ho...@visi.com> wrote on 04/16/2006 02:17:10 AM:
 >
 >>
 >> I'm wondering what's the best approach to cloning an entire
 >> document?  Would it be better to keep a master copy in memory and
 >> then create a new document and import the other document in there, or
 >> would it be better to simply reparse the document every time (where
 >> the document is used over and over again as a template, a copy is
 >> created and manipulated on each HTTP request, then serialized to the
 >> browser)?  If I keep the document in memory and know I am dealing
 >> with the Xerces2 implementation, can I just call cloneNode(true) and
 >> get an identical copy of the whole document, including doctype,
 >> entities, entity references, etc...?  Again, would this be more
 >> efficient than reparsing the document each time with, say, the
 >> Xerces2 DOMParser?  Is there a clear-cut answer to this, or does it
 >> depend on document size or other aspect of the document or environment?
 >>
 >> thanks,
 >>
 >> Jake
 >>
 >>
 >> ---------------------------------------------------------------------
 >> To unsubscribe, e-mail: general-unsubscribe@xml.apache.org
 >> For additional commands, e-mail: general-help@xml.apache.org
 >
 >
 >---------------------------------------------------------------------
 >To unsubscribe, e-mail: general-unsubscribe@xml.apache.org
 >For additional commands, e-mail: general-help@xml.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@xml.apache.org
For additional commands, e-mail: general-help@xml.apache.org