You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@xalan.apache.org by Sc...@lotus.com on 2001/03/15 04:53:53 UTC

Architectural Change Proposal: Direct DTM

At the moment Xalan processes it's source tree data via the DOM API, with
some extensions.  The main problem with this is that a node has to be
represented as an object with identity, which requires a certain amount of
resources.  I believe we've come about to the limit with direct DOM
processing.

An alternative is an index-based API, i.e. something like:
dtm.sourcetree.getData(nodeID), sdtm.ourceTree.getNameID(nodeID),
dtm.getNextSiblingID(nodeID),  dtm.dispatchCharacterEvent(nodeID,
contentHandler), etc.  Xalan would walk this API directly.

The default implementation for this API would be based on XalanJ1's DTM,
though there will be some fairly heavy modifications.  The reason that we
did not bring the DTM into XalanJ2 is that, if you're requesting an
interface, you need an implementing object.  So, though the DTM was much
smaller than Stree, traversal was more expensive.  But, if you're just
returning integer IDs, traversal can be just as fast or faster.  Also, in
the original DTM, we used the Xerces String table, but this version will
use a much more efficient approach.

The problem with this is it makes it harder to consume a foreign DOM... a
table would have to be constructed that mapped IDs to Nodes.  But, since
this could be done incrementally, this might not really be too bad.  And it
may actually make DOM processing faster, because we would end up with
document order indexes.

How much work would it be to adapt Xalan to this approach?  I think most of
the work in Xalan would be fairly mechanical.  None of the interfaces would
change, so this should be invisible to calling applications, except that
things should become much faster and consume less memory.

I'm pretty hot on this and would like to get it done soon... say over the
next six weeks.

Thoughts?

-scott

Re: Architectural Change Proposal: Direct DTM

Posted by Gary L Peskin <ga...@firstech.com>.

Scott_Boag@lotus.com wrote:
> 
> At the moment Xalan processes it's source tree data via the DOM API, with
> some extensions.  The main problem with this is that a node has to be
> represented as an object with identity, which requires a certain amount of
> resources.  I believe we've come about to the limit with direct DOM
> processing.
> 
> An alternative is an index-based API, i.e. something like:
> dtm.sourcetree.getData(nodeID), sdtm.ourceTree.getNameID(nodeID),
> dtm.getNextSiblingID(nodeID),  dtm.dispatchCharacterEvent(nodeID,
> contentHandler), etc.  Xalan would walk this API directly.
> 
> The default implementation for this API would be based on XalanJ1's DTM,
> though there will be some fairly heavy modifications.  The reason that we
> did not bring the DTM into XalanJ2 is that, if you're requesting an
> interface, you need an implementing object.  So, though the DTM was much
> smaller than Stree, traversal was more expensive.  But, if you're just
> returning integer IDs, traversal can be just as fast or faster.  Also, in
> the original DTM, we used the Xerces String table, but this version will
> use a much more efficient approach.
> 
> The problem with this is it makes it harder to consume a foreign DOM... a
> table would have to be constructed that mapped IDs to Nodes.  But, since
> this could be done incrementally, this might not really be too bad.  And it
> may actually make DOM processing faster, because we would end up with
> document order indexes.
> 
> How much work would it be to adapt Xalan to this approach?  I think most of
> the work in Xalan would be fairly mechanical.  None of the interfaces would
> change, so this should be invisible to calling applications, except that
> things should become much faster and consume less memory.
> 
> I'm pretty hot on this and would like to get it done soon... say over the
> next six weeks.
> 
> Thoughts?

Scott --

This sounds like a +1 to me from a performance standpoint.  I'm kind of
sad that we're moving away from an OO model and into some old style
hacks to get around the performance issue.  It reminds me of the old
days where we had a "record identifier" in column 1 of our 80 column
cards!  I suppose we'll lose the extensibility benefits of the OO model
but these are exactly what are causing the performance problems.

Hopefully, the structure of the indexed item (array, vector or whatever)
will be well documented so that we can follow it.

Gary