You are viewing a plain text version of this content. The canonical link for it is here.
Posted to j-users@xalan.apache.org by Simon Kitching <si...@ecnetwork.co.nz> on 2004/11/09 02:28:54 UTC

Request for information on XPathAPI/CachedXPathAPI

Hi,

I've currently got some code working with XPathAPI, and am now looking
at improving its performance. The javadocs say that CachedXPathAPI can
be faster, but I'm not clear on what the conditions for its use are.

The application I'm working on processes many xml input documents.

for each one:
  parse document into a DOM (using Xerces)
  apply a series of fixed xpath expressions until one matches
  apply a couple of fixed xpath expressions to extract some data
  //  other processing

All this is being done within a single thread.

Would I be right in guessing that what my current code is doing under
the hood when I apply N xpaths to a single DOM is creating a "DTM"
representation N times (ie each time I call XPathAPI.eval(...))? If so,
I clearly want to avoid this..any suggestions how?

I have re-read the javadocs for CachedXPathAPI several times, but am
still not clear whether I can use it in the above situation or not.
The javadocs seem to imply that a CachedXPathAPI is somehow "bound" to a
specific document (aka DTM). I can't see how this happens, though. Is
there an implicit assumption that all the eval(...) methods are passed
Node objects from the same document, and that the DTM is built when the
first eval function is called?

It would seem reasonable to me to convert my DOM Document into a DTM
then call low-level xpath functions that take DTM integer "node ids"
rather than Node parameters. However I can't see any way of creating a
DTM from a Document. Have I missed something obvious?

All suggestions gratefully received...

Regards,

Simon



Re: Request for information on XPathAPI/CachedXPathAPI

Posted by Simon Kitching <si...@ecnetwork.co.nz>.
On Wed, 2004-11-10 at 03:33, Joseph Kesselman wrote:
> 
> 
> CachedXPathAPI differs from XPathAPI in that it holds its document models
> (or document model adapters, if you're searching a DOM) in memory between
> XPath evaluations. This means you skip a lot of set-up work.
[snip]

Hi Joe,

Thanks for your hints. Between those and reading the source code I
managed to optimise my app's xpath usage.

I did spot a few issues with the CachedXPathAPI class which I will be
posting about separately.

Thanks,

Simon



Re: Request for information on XPathAPI/CachedXPathAPI

Posted by Joseph Kesselman <ke...@us.ibm.com>.



CachedXPathAPI differs from XPathAPI in that it holds its document models
(or document model adapters, if you're searching a DOM) in memory between
XPath evaluations. This means you skip a lot of set-up work.

The majory downside, of course, is that retaining this data structure ties
up memory.

A secondary disadvantage is that, since this model doesn't track changes to
the source data, the cache can cause erroneous results if you alter the
document between queries; it's your responsibility to keep track of when
such changes might have occurred and to flush the cache (by discarding the
CachedXPathAPI object and obtaining a new one) before your next XPath
search.

(There was an experiment in progress to try to create a compromise adapter
-- the DOM2DTM2 class -- which *would* track changes to the source DOM, at
the cost of being less performant. Due to some impedence mismatch issues
between the DOM and DTM APIs, this would work only for DOMs which had
uniquely hashable Node objects, which limited its applicability, and it was
set aside in favor of the XDM idea, which itself has been somewhat stalled
due to other time commitments.)


Basically: If you use CachedXPathAPI, pass it a DOMSource, and remember to
discard and recreate this set-up every time the DOM has changed between
XPath calls, all the right things should happen.

______________________________________
Joe Kesselman, IBM Next-Generation Web Technologies: XML, XSL and more.
"The world changed profoundly and unpredictably the day Tim Berners Lee
got bitten by a radioactive spider." -- Rafe Culpin, in r.m.filk