You are viewing a plain text version of this content. The canonical link for it is here.
Posted to c-users@xalan.apache.org by Anton Khodakivskiy <ak...@gmail.com> on 2007/12/18 07:47:19 UTC

slt transforming large XML files 1gig+

---------- Forwarded message ----------
From: Anton Khodakivskiy <ak...@gmail.com>
Date: Dec 18, 2007 8:39 AM
Subject: slt transforming large XML files 1gig+
To: alan-c-users@xml.apache.org


Hello

I'm looking for a generic way to transform large XML files - possibly 1 gig
and more. As you understand my biggest concerns are memory usage and
performance.
I just h ave tried the command line tool Xalan.exe and it quite looks like
it loads the whole xml - I'm not sure what for, but I expect that it parses
the xml in the framework of DOM. Is it possible to use a SAX based xml
parser for the XSLT transformation in xalan, or something like this?

Also I have read that "it's not recommended to use XSLT on big XML files" -
haven't found a meaningful explanation though. What do you think about it?
Are there any other alternative ways for generic xml transformations which
sattisfy my needs (big xmls)?


Thanks
Anton

Re: slt transforming large XML files 1gig+

Posted by David Bertoni <db...@apache.org>.
Anton Khodakivskiy wrote:
> Thanks Dave that was helpful.
> 
> Are there any other XSLT libraries which parse the xml as the stream and 
> do not consume that much memory? I have read about SAXON-SA and they 
> claim that the library supports up to 20 gig xmls.. I will test it 
> shortly. Too bad it's commercial.. Also there is another commercial 
> implementation from Intel which is supposed to handle large xmls...

You can try the free version of Saxon, if you don't want to use a 
commercial version.  I'm not sure how Saxon's source tree implementation 
works, but it has historically been an in-memory one.

I don't know much about the Intel library, but perhaps they have a trial 
version you can use for testing purposes.
> 
> I tried Joost STX library yesterday and it works pretty good, btw..

Well, if STX works for your requirements, then that's the way to go.

Dave

Re: slt transforming large XML files 1gig+

Posted by Anton Khodakivskiy <ak...@gmail.com>.
Thanks Dave that was helpful.
Are there any other XSLT libraries which parse the xml as the stream and do
not consume that much memory? I have read about SAXON-SA and they claim that
the library supports up to 20 gig xmls.. I will test it shortly. Too bad
it's commercial.. Also there is another commercial implementation from Intel
which is supposed to handle large xmls...

I tried Joost STX library yesterday and it works pretty good, btw..

On Dec 18, 2007 9:09 AM, David Bertoni <db...@apache.org> wrote:

> Anton Khodakivskiy wrote:
> >
> >
> > ---------- Forwarded message ----------
> > From: *Anton Khodakivskiy* <akhodakivskiy@gmail.com
> > <ma...@gmail.com>>
> > Date: Dec 18, 2007 8:39 AM
> > Subject: slt transforming large XML files 1gig+
> > To: alan-c-users@xml.apache.org <ma...@xml.apache.org>
> >
> >
> > Hello
> >
> > I'm looking for a generic way to transform large XML files - possibly 1
> > gig and more. As you understand my biggest concerns are memory usage and
> > performance.
> > I just h ave tried the command line tool Xalan.exe and it quite looks
> > like it loads the whole xml - I'm not sure what for, but I expect that
> > it parses the xml in the framework of DOM. Is it possible to use a SAX
> > based xml parser for the XSLT transformation in xalan, or something like
> > this?
> Xalan-C doesn't use the DOM per-se, although it does use a tree
> representation of the input XML.  The differences are primarily related to
> reducing memory usage by implementing a read-only tree, which is all
> that's
> necessary for XSLT processing.
>
> Because the XPath language provides random access to the source tree, most
> XSLT processors use an in-memory representation, rather than trying to do
> streaming processing.  If you can reduce your transformation to a
> streaming
> subset of XPath, you might try STX:
>
> http://www.xml.com/pub/a/2003/02/26/stx.html
>
> >
> > Also I have read that "it's not recommended to use XSLT on big XML
> > files" - haven't found a meaningful explanation though. What do you
> > think about it? Are there any other alternative ways for generic xml
> > transformations which sattisfy my needs (big xmls)?
>
> I think you'll find that Xalan-C's memory footprint for a 1GB XML document
> will be much less than 1GB of memory, although it can vary widely
> depending
> on the document.  In addition, for documents that have a lot of repeated
> text nodes, you can enable pooling of text nodes to further reduce the
> memory footprint of the source tree.
>
> Whether something's "recommended" or not depends on your requirements.  A
> blanket statement like that doesn't reflect every possible set of
> requirements in the real world.
>
> Dave
>

Re: slt transforming large XML files 1gig+

Posted by David Bertoni <db...@apache.org>.
Anton Khodakivskiy wrote:
> 
> 
> ---------- Forwarded message ----------
> From: *Anton Khodakivskiy* <akhodakivskiy@gmail.com 
> <ma...@gmail.com>>
> Date: Dec 18, 2007 8:39 AM
> Subject: slt transforming large XML files 1gig+
> To: alan-c-users@xml.apache.org <ma...@xml.apache.org>
> 
> 
> Hello
> 
> I'm looking for a generic way to transform large XML files - possibly 1 
> gig and more. As you understand my biggest concerns are memory usage and 
> performance.
> I just h ave tried the command line tool Xalan.exe and it quite looks 
> like it loads the whole xml - I'm not sure what for, but I expect that 
> it parses the xml in the framework of DOM. Is it possible to use a SAX 
> based xml parser for the XSLT transformation in xalan, or something like 
> this? 
Xalan-C doesn't use the DOM per-se, although it does use a tree 
representation of the input XML.  The differences are primarily related to 
reducing memory usage by implementing a read-only tree, which is all that's 
necessary for XSLT processing.

Because the XPath language provides random access to the source tree, most 
XSLT processors use an in-memory representation, rather than trying to do 
streaming processing.  If you can reduce your transformation to a streaming 
subset of XPath, you might try STX:

http://www.xml.com/pub/a/2003/02/26/stx.html

> 
> Also I have read that "it's not recommended to use XSLT on big XML 
> files" - haven't found a meaningful explanation though. What do you 
> think about it? Are there any other alternative ways for generic xml 
> transformations which sattisfy my needs (big xmls)? 

I think you'll find that Xalan-C's memory footprint for a 1GB XML document 
will be much less than 1GB of memory, although it can vary widely depending 
on the document.  In addition, for documents that have a lot of repeated 
text nodes, you can enable pooling of text nodes to further reduce the 
memory footprint of the source tree.

Whether something's "recommended" or not depends on your requirements.  A 
blanket statement like that doesn't reflect every possible set of 
requirements in the real world.

Dave