You are viewing a plain text version of this content. The canonical link for it is here.
Posted to j-users@xalan.apache.org by Bryan Kearney <bk...@avolent.com> on 2002/06/11 16:11:19 UTC

RE: General Xalan architecture question: how does SAX event model work with XPath?

Given that that xerces is running incrementally.. if we provide a stream
input source of XML, is there a better way to provide it? In our case, the
strea is collected in a string buffer, then the string buffer given to the
StreamInput. Would it make sense to avoind the internal buffer and write
directly to the Stream source? Perhaps allows xerces to pull info?

-- bk

>> -----Original Message-----
>> From: Joseph Kesselman/Watson/IBM [mailto:keshlam@us.ibm.com]
>> Sent: Tuesday, June 11, 2002 7:31 AM
>> To: xalan-j-users
>> Subject: Re: General Xalan architecture question: how does SAX event
>> model work with XPath?
>> 
>> 
>> 
>> > In some cases, we can also
>> > do a better job with incremental transforms when using Xerces
>> > specifically as a parser, since one of the classes takes 
>> advantage of
>> > Xerces-specific features.
>> 
>> The main advantage is that Xerces is specifically designed so it can
>> function as an incremental parser. Using other parsers in 
>> incremental mode
>> requires a multiple-thread handshaking solution which adds 
>> some overhead to
>> the process.
>> 
>> > Note that if you feed us a DOMSource as the XML document, we don't
>> > bother to use incremental since it's all in memory already.
>> 
>> For what it's worth: DOM2DTM is always built 
>> incrementally... but of course
>> relies on the DOM being entirely present (or appearing to 
>> be; it could be
>> incremental itself as long as that's hidden behind the DOM 
>> APIs). Note,
>> however, that both are built in document order -- SAX2DTM 
>> because that's
>> how SAX presents the info, DOM2DTM because that was the 
>> simplest way to
>> overcome the difficulty of mapping an arbitrary DOM node to 
>> an integer.
>> DOM2DTM might be improved for some implementations of the 
>> DOM, such as the
>> Xerces DOM, which have additional information available; 
>> that's a pending
>> project.
>> 
>> > In terms of only keeping the minimum you need in memory 
>> when using SAX,
>> > we're still working on that.  One technique is called 
>> pruning, where
>> > you periodically delete the DTM nodes of the XML from 
>> memory after you
>> > know you don't need them anymore in the transformation 
>> process.  The
>> > problem is knowing when you no longer need the nodes...
>> 
>> There's some discussion of the opportunities, and issues, deep in the
>> archives of this mailing list. We're actually doing some limited
>> "tail-pruning" now, in our low-level handling of Result Tree 
>> Fragments.  It
>> might not be very hard to generalize that. The problem, as always, is
>> finding time to develop and refine that code.
>> 
>> ______________________________________
>> Joe Kesselman  / IBM Research
>> 
>>