You are viewing a plain text version of this content. The canonical link for it is here.

Posted to general@xml.apache.org by Jacob Kjome <ho...@visi.com> on 2006/04/16 08:17:10 UTC

best approach to whole document cloning in Xerces2?

I'm wondering what's the best approach to cloning an entire 
document?  Would it be better to keep a master copy in memory and 
then create a new document and import the other document in there, or 
would it be better to simply reparse the document every time (where 
the document is used over and over again as a template, a copy is 
created and manipulated on each HTTP request, then serialized to the 
browser)?  If I keep the document in memory and know I am dealing 
with the Xerces2 implementation, can I just call cloneNode(true) and 
get an identical copy of the whole document, including doctype, 
entities, entity references, etc...?  Again, would this be more 
efficient than reparsing the document each time with, say, the 
Xerces2 DOMParser?  Is there a clear-cut answer to this, or does it 
depend on document size or other aspect of the document or environment?

thanks,

Jake


---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@xml.apache.org
For additional commands, e-mail: general-help@xml.apache.org

Re: best approach to whole document cloning in Xerces2?

Posted by Michael Glavassevich <mr...@ca.ibm.com>.

Jacob Kjome <ho...@visi.com> wrote on 04/17/2006 09:17:53 AM:

> At 11:17 AM 4/16/2006, you wrote:
>  >Hi Jake,
>  >
>  >The behaviour of Document.cloneNode(true) [1] is implementation 
dependent.
>  >In Xerces it will create a new Document and then import the children 
from
>  >the original document.
> 
> Which would leave out the DTD, I suppose.

I believe it does copy DocumentType nodes, though there's no guarantee 
that other DOM implementations will do that.

> So, it would make more 
> sense to create my own document and do something like this, right?...
> 
>              DOMImplementation domImpl = document.getImplementation();
>              String documentElement = document.getDoctype().getName();
>              DocumentType docType = 
> domImpl.createDocumentType(documentElement, 
> document.getDoctype().getPublicId(), 
document.getDoctype().getSystemId());
>              Document doc = domImpl.createDocument("", 
> documentElement, docType);
>              Node node = doc.importNode(document.getDocumentElement(), 
true);
>              doc.replaceChild(node, doc.getDocumentElement());
> 
> This is what I do currently to get a copy of the template DOM at 
> runtime, but I just want to make sure I'm doing it the most correct 
> and efficient way possible.
> 
> Of course, this leaves out any internal subset and entity nodes, 
> no?

Right.

> How would I clone it all?

The implementation of Xerces' Document.cloneNode() should be able to do 
that.

> Is it possible via the DOM interfaces?

You cannot import DocumentType nodes using the DOM API (
http://www.w3.org/TR/2004/REC-DOM-Level-3-Core-20040407/core.html#Core-Document-importNode).

>  > I would be really surprised if reparsing the
>  >document performed better than an in-memory copy (unless you had a
>  >UserDataHandler [2] registered which does some heavy operation in 
response
>  >to the cloning/importing).
>  >
> 
> I kind of figured this, but I just wanted to make sure that the 
> caching of template DOM's that I'm doing makes sense.
> 
> Jake
> 
>  >[1]
>  >http://www.w3.org/TR/2004/REC-DOM-Level-3-Core-20040407/core.
> html#ID-3A0ED0A4
>  >[2]
>  >http://www.w3.org/TR/2004/REC-DOM-Level-3-Core-20040407/core.html#User
>  >DataHandler
>  >
>  >Michael Glavassevich
>  >XML Parser Development
>  >IBM Toronto Lab
>  >E-mail: mrglavas@ca.ibm.com
>  >E-mail: mrglavas@apache.org
>  >
>  >Jacob Kjome <ho...@visi.com> wrote on 04/16/2006 02:17:10 AM:
>  >
>  >>
>  >> I'm wondering what's the best approach to cloning an entire
>  >> document?  Would it be better to keep a master copy in memory and
>  >> then create a new document and import the other document in there, 
or
>  >> would it be better to simply reparse the document every time (where
>  >> the document is used over and over again as a template, a copy is
>  >> created and manipulated on each HTTP request, then serialized to the
>  >> browser)?  If I keep the document in memory and know I am dealing
>  >> with the Xerces2 implementation, can I just call cloneNode(true) and
>  >> get an identical copy of the whole document, including doctype,
>  >> entities, entity references, etc...?  Again, would this be more
>  >> efficient than reparsing the document each time with, say, the
>  >> Xerces2 DOMParser?  Is there a clear-cut answer to this, or does it
>  >> depend on document size or other aspect of the document or 
environment?
>  >>
>  >> thanks,
>  >>
>  >> Jake
>  >>
>  >>
>  >> 
---------------------------------------------------------------------
>  >> To unsubscribe, e-mail: general-unsubscribe@xml.apache.org
>  >> For additional commands, e-mail: general-help@xml.apache.org
>  >
>  >
>  >---------------------------------------------------------------------
>  >To unsubscribe, e-mail: general-unsubscribe@xml.apache.org
>  >For additional commands, e-mail: general-help@xml.apache.org
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: general-unsubscribe@xml.apache.org
> For additional commands, e-mail: general-help@xml.apache.org

Michael Glavassevich
XML Parser Development
IBM Toronto Lab
E-mail: mrglavas@ca.ibm.com
E-mail: mrglavas@apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
For additional commands, e-mail: j-users-help@xerces.apache.org

Re: best approach to whole document cloning in Xerces2?

Posted by Michael Glavassevich <mr...@ca.ibm.com>.

Jacob Kjome <ho...@visi.com> wrote on 04/17/2006 09:17:53 AM:

> At 11:17 AM 4/16/2006, you wrote:
>  >Hi Jake,
>  >
>  >The behaviour of Document.cloneNode(true) [1] is implementation 
dependent.
>  >In Xerces it will create a new Document and then import the children 
from
>  >the original document.
> 
> Which would leave out the DTD, I suppose.

I believe it does copy DocumentType nodes, though there's no guarantee 
that other DOM implementations will do that.

> So, it would make more 
> sense to create my own document and do something like this, right?...
> 
>              DOMImplementation domImpl = document.getImplementation();
>              String documentElement = document.getDoctype().getName();
>              DocumentType docType = 
> domImpl.createDocumentType(documentElement, 
> document.getDoctype().getPublicId(), 
document.getDoctype().getSystemId());
>              Document doc = domImpl.createDocument("", 
> documentElement, docType);
>              Node node = doc.importNode(document.getDocumentElement(), 
true);
>              doc.replaceChild(node, doc.getDocumentElement());
> 
> This is what I do currently to get a copy of the template DOM at 
> runtime, but I just want to make sure I'm doing it the most correct 
> and efficient way possible.
> 
> Of course, this leaves out any internal subset and entity nodes, 
> no?

Right.

> How would I clone it all?

The implementation of Xerces' Document.cloneNode() should be able to do 
that.

> Is it possible via the DOM interfaces?

You cannot import DocumentType nodes using the DOM API (
http://www.w3.org/TR/2004/REC-DOM-Level-3-Core-20040407/core.html#Core-Document-importNode).

>  > I would be really surprised if reparsing the
>  >document performed better than an in-memory copy (unless you had a
>  >UserDataHandler [2] registered which does some heavy operation in 
response
>  >to the cloning/importing).
>  >
> 
> I kind of figured this, but I just wanted to make sure that the 
> caching of template DOM's that I'm doing makes sense.
> 
> Jake
> 
>  >[1]
>  >http://www.w3.org/TR/2004/REC-DOM-Level-3-Core-20040407/core.
> html#ID-3A0ED0A4
>  >[2]
>  >http://www.w3.org/TR/2004/REC-DOM-Level-3-Core-20040407/core.html#User
>  >DataHandler
>  >
>  >Michael Glavassevich
>  >XML Parser Development
>  >IBM Toronto Lab
>  >E-mail: mrglavas@ca.ibm.com
>  >E-mail: mrglavas@apache.org
>  >
>  >Jacob Kjome <ho...@visi.com> wrote on 04/16/2006 02:17:10 AM:
>  >
>  >>
>  >> I'm wondering what's the best approach to cloning an entire
>  >> document?  Would it be better to keep a master copy in memory and
>  >> then create a new document and import the other document in there, 
or
>  >> would it be better to simply reparse the document every time (where
>  >> the document is used over and over again as a template, a copy is
>  >> created and manipulated on each HTTP request, then serialized to the
>  >> browser)?  If I keep the document in memory and know I am dealing
>  >> with the Xerces2 implementation, can I just call cloneNode(true) and
>  >> get an identical copy of the whole document, including doctype,
>  >> entities, entity references, etc...?  Again, would this be more
>  >> efficient than reparsing the document each time with, say, the
>  >> Xerces2 DOMParser?  Is there a clear-cut answer to this, or does it
>  >> depend on document size or other aspect of the document or 
environment?
>  >>
>  >> thanks,
>  >>
>  >> Jake
>  >>
>  >>
>  >> 
---------------------------------------------------------------------
>  >> To unsubscribe, e-mail: general-unsubscribe@xml.apache.org
>  >> For additional commands, e-mail: general-help@xml.apache.org
>  >
>  >
>  >---------------------------------------------------------------------
>  >To unsubscribe, e-mail: general-unsubscribe@xml.apache.org
>  >For additional commands, e-mail: general-help@xml.apache.org
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: general-unsubscribe@xml.apache.org
> For additional commands, e-mail: general-help@xml.apache.org

Michael Glavassevich
XML Parser Development
IBM Toronto Lab
E-mail: mrglavas@ca.ibm.com
E-mail: mrglavas@apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@xml.apache.org
For additional commands, e-mail: general-help@xml.apache.org

Re: best approach to whole document cloning in Xerces2?

Posted by Jacob Kjome <ho...@visi.com>.

At 11:17 AM 4/16/2006, you wrote:
 >Hi Jake,
 >
 >The behaviour of Document.cloneNode(true) [1] is implementation dependent.
 >In Xerces it will create a new Document and then import the children from
 >the original document.

Which would leave out the DTD, I suppose.  So, it would make more 
sense to create my own document and do something like this, right?...

             DOMImplementation domImpl = document.getImplementation();
             String documentElement = document.getDoctype().getName();
             DocumentType docType = 
domImpl.createDocumentType(documentElement, 
document.getDoctype().getPublicId(), document.getDoctype().getSystemId());
             Document doc = domImpl.createDocument("", 
documentElement, docType);
             Node node = doc.importNode(document.getDocumentElement(), true);
             doc.replaceChild(node, doc.getDocumentElement());

This is what I do currently to get a copy of the template DOM at 
runtime, but I just want to make sure I'm doing it the most correct 
and efficient way possible.

Of course, this leaves out any internal subset and entity nodes, 
no?  How would I clone it all?  Is it possible via the DOM interfaces?

 > I would be really surprised if reparsing the
 >document performed better than an in-memory copy (unless you had a
 >UserDataHandler [2] registered which does some heavy operation in response
 >to the cloning/importing).
 >

I kind of figured this, but I just wanted to make sure that the 
caching of template DOM's that I'm doing makes sense.

Jake

 >[1]
 >http://www.w3.org/TR/2004/REC-DOM-Level-3-Core-20040407/core.html#ID-3A0ED0A4
 >[2]
 >http://www.w3.org/TR/2004/REC-DOM-Level-3-Core-20040407/core.html#User
 >DataHandler
 >
 >Michael Glavassevich
 >XML Parser Development
 >IBM Toronto Lab
 >E-mail: mrglavas@ca.ibm.com
 >E-mail: mrglavas@apache.org
 >
 >Jacob Kjome <ho...@visi.com> wrote on 04/16/2006 02:17:10 AM:
 >
 >>
 >> I'm wondering what's the best approach to cloning an entire
 >> document?  Would it be better to keep a master copy in memory and
 >> then create a new document and import the other document in there, or
 >> would it be better to simply reparse the document every time (where
 >> the document is used over and over again as a template, a copy is
 >> created and manipulated on each HTTP request, then serialized to the
 >> browser)?  If I keep the document in memory and know I am dealing
 >> with the Xerces2 implementation, can I just call cloneNode(true) and
 >> get an identical copy of the whole document, including doctype,
 >> entities, entity references, etc...?  Again, would this be more
 >> efficient than reparsing the document each time with, say, the
 >> Xerces2 DOMParser?  Is there a clear-cut answer to this, or does it
 >> depend on document size or other aspect of the document or environment?
 >>
 >> thanks,
 >>
 >> Jake
 >>
 >>
 >> ---------------------------------------------------------------------
 >> To unsubscribe, e-mail: general-unsubscribe@xml.apache.org
 >> For additional commands, e-mail: general-help@xml.apache.org
 >
 >
 >---------------------------------------------------------------------
 >To unsubscribe, e-mail: general-unsubscribe@xml.apache.org
 >For additional commands, e-mail: general-help@xml.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@xml.apache.org
For additional commands, e-mail: general-help@xml.apache.org

Re: best approach to whole document cloning in Xerces2?

Posted by Michael Glavassevich <mr...@ca.ibm.com>.

Hi Jake,

The behaviour of Document.cloneNode(true) [1] is implementation dependent. 
In Xerces it will create a new Document and then import the children from 
the original document. I would be really surprised if reparsing the 
document performed better than an in-memory copy (unless you had a 
UserDataHandler [2] registered which does some heavy operation in response 
to the cloning/importing).

[1] 
http://www.w3.org/TR/2004/REC-DOM-Level-3-Core-20040407/core.html#ID-3A0ED0A4
[2] 
http://www.w3.org/TR/2004/REC-DOM-Level-3-Core-20040407/core.html#UserDataHandler

Michael Glavassevich
XML Parser Development
IBM Toronto Lab
E-mail: mrglavas@ca.ibm.com
E-mail: mrglavas@apache.org

Jacob Kjome <ho...@visi.com> wrote on 04/16/2006 02:17:10 AM:

> 
> I'm wondering what's the best approach to cloning an entire 
> document?  Would it be better to keep a master copy in memory and 
> then create a new document and import the other document in there, or 
> would it be better to simply reparse the document every time (where 
> the document is used over and over again as a template, a copy is 
> created and manipulated on each HTTP request, then serialized to the 
> browser)?  If I keep the document in memory and know I am dealing 
> with the Xerces2 implementation, can I just call cloneNode(true) and 
> get an identical copy of the whole document, including doctype, 
> entities, entity references, etc...?  Again, would this be more 
> efficient than reparsing the document each time with, say, the 
> Xerces2 DOMParser?  Is there a clear-cut answer to this, or does it 
> depend on document size or other aspect of the document or environment?
> 
> thanks,
> 
> Jake
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: general-unsubscribe@xml.apache.org
> For additional commands, e-mail: general-help@xml.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
For additional commands, e-mail: j-users-help@xerces.apache.org

Re: best approach to whole document cloning in Xerces2?

Posted by Michael Glavassevich <mr...@ca.ibm.com>.

Hi Jake,

The behaviour of Document.cloneNode(true) [1] is implementation dependent. 
In Xerces it will create a new Document and then import the children from 
the original document. I would be really surprised if reparsing the 
document performed better than an in-memory copy (unless you had a 
UserDataHandler [2] registered which does some heavy operation in response 
to the cloning/importing).

[1] 
http://www.w3.org/TR/2004/REC-DOM-Level-3-Core-20040407/core.html#ID-3A0ED0A4
[2] 
http://www.w3.org/TR/2004/REC-DOM-Level-3-Core-20040407/core.html#UserDataHandler

Michael Glavassevich
XML Parser Development
IBM Toronto Lab
E-mail: mrglavas@ca.ibm.com
E-mail: mrglavas@apache.org

Jacob Kjome <ho...@visi.com> wrote on 04/16/2006 02:17:10 AM:

> 
> I'm wondering what's the best approach to cloning an entire 
> document?  Would it be better to keep a master copy in memory and 
> then create a new document and import the other document in there, or 
> would it be better to simply reparse the document every time (where 
> the document is used over and over again as a template, a copy is 
> created and manipulated on each HTTP request, then serialized to the 
> browser)?  If I keep the document in memory and know I am dealing 
> with the Xerces2 implementation, can I just call cloneNode(true) and 
> get an identical copy of the whole document, including doctype, 
> entities, entity references, etc...?  Again, would this be more 
> efficient than reparsing the document each time with, say, the 
> Xerces2 DOMParser?  Is there a clear-cut answer to this, or does it 
> depend on document size or other aspect of the document or environment?
> 
> thanks,
> 
> Jake
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: general-unsubscribe@xml.apache.org
> For additional commands, e-mail: general-help@xml.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@xml.apache.org
For additional commands, e-mail: general-help@xml.apache.org