You are viewing a plain text version of this content. The canonical link for it is here.

Posted to j-users@xerces.apache.org by Jonathan Whitall <fi...@yahoo.com> on 2003/04/17 18:58:02 UTC

Deferred nodes

I know that Xerces-J's deferred nodes don't actually
map their children to themselves until a function that
needs that information is called, such as
getChildNodes().  Once instantiated, though, my
impression is that they are instantiated forever.  Is
this true, or is there a way to un-instantiate these
child nodes?  

Having a way to do this would be very handy for
extremely large XML documents which cannot fit into
memory when every single node is instantiated.

Please let me know,

Jonathan

__________________________________________________
Do you Yahoo!?
The New Yahoo! Search - Faster. Easier. Bingo
http://search.yahoo.com

---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-user-help@xml.apache.org

Re: Deferred nodes

Posted by Andy Clark <an...@apache.org>.

Jonathan Whitall wrote:
> getChildNodes().  Once instantiated, though, my
> impression is that they are instantiated forever.  Is
> this true, or is there a way to un-instantiate these
> child nodes?  

Joe is right. The DOM nodes would have to be removed
from the tree so that they're eligible for garbage
collection. The Xerces DOM implementation does not
have a way of doing this.

However, it should be noted that there is nothing in
the DOM specification that would stop someone from
writing an implementation that stores the information
in different ways. For example, the node info could
be stored entirely in memory but in a more efficient
format and then only dole out node proxy objects. You
would have more object churn but it wouldn't be as
heavyweight as having all the objects in memory.

Another idea would be to section the document into
a number of index caches on disk. Then, when a node
is requested but is not currently in memory, it can
be loaded off of the disk. This approach would allow
you to load and access documents with nearly unlimited
size. Unfortunately, it would be difficult to handle
editing of the document.

> Having a way to do this would be very handy for
> extremely large XML documents which cannot fit into
> memory when every single node is instantiated.

In the past I have considered writing a "native" DOM
model within Xerces, something akin to the DTM in the
Xalan project. But time and resources never allowed
for that kind of experimentation.

-- 
Andy Clark * andyc@apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-user-help@xml.apache.org

Re: Deferred nodes

Posted by Joseph Kesselman <ke...@us.ibm.com>.

>is there a way to un-instantiate these child nodes? 

Only by editing the document tree -- remove them from their parent and let 
GC recover the memory they were using.

If memory use is an issue, you may want to consider a SAX-based solution, 
building an in-memory model only for the portions of the document where 
you need one. (That model could be a DOM, or could be something more 
specialized.) More coding is involved, of course, and this requires that 
you be able to define in advance which portions of the document are 
"interesting" and trim those without altering other behavior of your 
application.

(DOM Level 3 has proposed adding a filtering mechanism to its Load/Save 
APIs, which would provide a similarly selective loading mechanism. I don't 
know whether Xerces has prototyped that yet.)

______________________________________
Joe Kesselman, IBM Next-Generation Web Technologies: XML, XSL and more. 
"may'ron DaroQbe'chugh vaj bIrIQbej"  ("Put down the squeezebox and nobody 
gets hurt.")