You are viewing a plain text version of this content. The canonical link for it is here.
Posted to j-users@xalan.apache.org by Christopher Giblin <CG...@zurich.ibm.com> on 2002/05/21 09:12:54 UTC

Streaming transformations for huge files?

Hi

Is a DOM always instantiated for transformation, regardless of whether the
factory is a SAXTransformerFactory or not?

I have XML files 100-200MB in size and I wish to transform and insert their
records into a DB using a  Xalan extension to perform the INSERT. I get
OutOfMemoryError, no surprise. I init the transformation with
SAXTransformerFactory and therefore use TransformerHandler, XMLReader and
write to a StreamResult.

Intuitively I envision a mode where the transformation is streaming, 'on
the fly' with the transformation output streaming out before the entire
input document has been parsed. On the other hand, my sense of reality
says, a stylesheet can not be correctly applied until the entire document
is instantiated as DOM.

Sorry if this has been discussed before - any pointers for performing
really large transformations?

Thanks, chris


Re: Streaming transformations for huge files?

Posted by Robert Scovell <ro...@mac.com>.
I quote from O'Reilly's 'Java and XML', p122:

"... beware the DOM for excessively large data!"

I know no more than this, so please don't ask me any supplementaries,
but there are parsers that contain a feature called 'deferred DOM' which
only place nodes currently being parsed into memory. There's a memory
management gain but a performamce loss, of course.

Rob

Christopher Giblin wrote:
> 
> Hi
> 
> Is a DOM always instantiated for transformation, regardless of whether the
> factory is a SAXTransformerFactory or not?
> 
> I have XML files 100-200MB in size and I wish to transform and insert their
> records into a DB using a  Xalan extension to perform the INSERT. I get
> OutOfMemoryError, no surprise. I init the transformation with
> SAXTransformerFactory and therefore use TransformerHandler, XMLReader and
> write to a StreamResult.
> 
> Intuitively I envision a mode where the transformation is streaming, 'on
> the fly' with the transformation output streaming out before the entire
> input document has been parsed. On the other hand, my sense of reality
> says, a stylesheet can not be correctly applied until the entire document
> is instantiated as DOM.
> 
> Sorry if this has been discussed before - any pointers for performing
> really large transformations?
> 
> Thanks, chris