You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@cocoon.apache.org by Norman Barker <no...@comsine.co.uk> on 2005/02/03 10:42:22 UTC

large XML file sizes

Hi,

I am handling GML documents with Cocoon which are very verbose (about
11mb in size).  

I am receiving the compressed data from an http post and then using a
custom generator to decompress this data and pass it on down the
pipeline.  Is there a way to improve this bottle neck? 

At the generator is has to hold the entire 11mb since it decompressing,
and then in places with XSL it has to hold the XML file in memory to do
a sort.

What does anyone else do with very large XML documents and Cocoon?  The
fact that it was SAX I thought would make it ok, but it is slow, and
occasionally falls over.

Thanks, 

Norman 


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org
For additional commands, e-mail: users-help@cocoon.apache.org


Re: large XML file sizes

Posted by Upayavira <uv...@upaya.co.uk>.
Norman Barker wrote:
> Hi,
> 
> I am handling GML documents with Cocoon which are very verbose (about
> 11mb in size).  
> 
> I am receiving the compressed data from an http post and then using a
> custom generator to decompress this data and pass it on down the
> pipeline.  Is there a way to improve this bottle neck? 
> 
> At the generator is has to hold the entire 11mb since it decompressing,
> and then in places with XSL it has to hold the XML file in memory to do
> a sort.
> 
> What does anyone else do with very large XML documents and Cocoon?  The
> fact that it was SAX I thought would make it ok, but it is slow, and
> occasionally falls over.

Why does the generator need to hold the entire file? If the content is 
gzipped, you can just wrap a GZipInputStream around the input stream you 
get from the servlet container. And hand that GZipInputStream to the parser.

If you want to dynamically process this, you're gonna have fun, on any 
system. However, what you want to do is avoid holding stuff in memory, 
and use streaming technologies wherever possible. One to look at is the 
STX block. I don't know if it can give you everything you want, but it 
might help as a replacement for XSLT that is much more 'streaming' 
focussed, and therefore can handle files of arbitrary size. It won't 
handle the sorting though, I wouldn't have thought.

Hope that helps.

Regards, Upayavira



---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org
For additional commands, e-mail: users-help@cocoon.apache.org