You are viewing a plain text version of this content. The canonical link for it is here.
Posted to j-users@xalan.apache.org by Rajesh Raheja <rr...@yahoo.com> on 2003/12/08 22:13:49 UTC

Memory Consumption for Large XML Transformation (50MB to 1GB) with SAX

We are trying to transform very large XML documents (30MB to 1GB in size) and
were planning on using an XSLT engine for it.  Our tests showed that passing in
the document in DOM typically crashed with out of memory errors.

however, even with passing in SAX events, the memory consumption was around
FIVE times the document size (e.g. the 50MB document input consumed 250MB of
the jvm).

would appreciate any inputs on any way to improve the memory consumption and
more generally - Is XSLT the way to go for such large documents? what are the
alternatives (btw, we tried asking customer to reduce or break up the document
- not feasible!)?

Thanks
Rajesh


__________________________________
Do you Yahoo!?
New Yahoo! Photos - easier uploading and sharing.
http://photos.yahoo.com/

Re: Memory Consumption for Large XML Transformation (50MB to 1GB) with SAX

Posted by da...@us.ibm.com.



Hi Nick,

No, it's not really been exposed, but it would be fairly easy to do so.  Do
you want to file a Bugzilla request for an enhancement to do that?  That's
the best way to track such things, and the way to make sure it gets into
the next release.

In the meantime, you can experiment with this, but you may need to rebuild
the Xalan binaries, depending on how you are using the APIs.  For the exact
location in the code, you can look at the constructor for
XalanSourceTreeDocument, which has a parameter called fPoolAllText
controlling this behavior.

By the way, XalanSourceTreeDocument always pools whitespace-only text
nodes, as that is a win 99.9% of the time.

Please post back if you have any more questions.

Dave



                                                                                                                                                  
                      Nick Bastin                                                                                                                 
                      <nbastin@opnet.c         To:      xalan-c-users@xml.apache.org                                                              
                      om>                      cc:      (bcc: David N Bertoni/Cambridge/IBM)                                                      
                                               Subject: Re: Memory Consumption for Large XML Transformation (50MB to 1GB) with SAX                
                      01/14/2004 02:14                                                                                                            
                      PM                                                                                                                          
                      Please respond                                                                                                              
                      to xalan-c-users                                                                                                            
                                                                                                                                                  




On Dec 8, 2003, at 7:18 PM, david_n_bertoni@us.ibm.com wrote:

> Xalan-C's model is very compact and you should see lower memory usage
> than
> Xalan-J on 32-bit architectures.  On 64-bit architectures, memory
> usage is
> probably comparable.  Of course, all of this depends on the kinds of
> documents you're transforming.  Content-heavy documents will use more
> memory than markup-heavy ones.  Even such details as the number of
> repeated
> element and attribute names will affect how much memory the document
> requires.  There is also an undocumented mode in Xalan-C which pools
> all
> text nodes, which can be very useful for documents that have lots of
> repeated values.

Gee, is there any documentation on this mode anywhere?  ;-) I did a
quick search of the mailing list archives, but I may not be searching
for the right strings.

--
Nick




Re: Memory Consumption for Large XML Transformation (50MB to 1GB) with SAX

Posted by Nick Bastin <nb...@opnet.com>.
On Dec 8, 2003, at 7:18 PM, david_n_bertoni@us.ibm.com wrote:

> Xalan-C's model is very compact and you should see lower memory usage 
> than
> Xalan-J on 32-bit architectures.  On 64-bit architectures, memory 
> usage is
> probably comparable.  Of course, all of this depends on the kinds of
> documents you're transforming.  Content-heavy documents will use more
> memory than markup-heavy ones.  Even such details as the number of 
> repeated
> element and attribute names will affect how much memory the document
> requires.  There is also an undocumented mode in Xalan-C which pools 
> all
> text nodes, which can be very useful for documents that have lots of
> repeated values.

Gee, is there any documentation on this mode anywhere?  ;-) I did a 
quick search of the mailing list archives, but I may not be searching 
for the right strings.

--
Nick


Re: Memory Consumption for Large XML Transformation (50MB to 1GB) with SAX

Posted by da...@us.ibm.com.



Hi Rajesh,

Xalan-C's model is very compact and you should see lower memory usage than
Xalan-J on 32-bit architectures.  On 64-bit architectures, memory usage is
probably comparable.  Of course, all of this depends on the kinds of
documents you're transforming.  Content-heavy documents will use more
memory than markup-heavy ones.  Even such details as the number of repeated
element and attribute names will affect how much memory the document
requires.  There is also an undocumented mode in Xalan-C which pools all
text nodes, which can be very useful for documents that have lots of
repeated values.

On the other hand, you may want to investigate if your transformations can
be streamed, which would make them a good candidate for using a simple SAX
filter, rather than XSLT.

If you decide to use XSLT, you should try both Xalan-C and Xalan-J to see
which one works best with some typical stylesheets and documents.  Just
make sure you have a representative sample for your testing.

Dave



                                                                                                                                 
                      Rajesh Raheja                                                                                              
                      <rraheja1@yahoo.         To:      xalan-c-users@xml.apache.org, xalan-j-users@xml.apache.org               
                      com>                     cc:      (bcc: David N Bertoni/Cambridge/IBM)                                     
                                               Subject: Memory Consumption for Large XML Transformation (50MB to 1GB) with SAX   
                      12/08/2003 01:13                                                                                           
                      PM                                                                                                         
                      Please respond                                                                                             
                      to rraheja                                                                                                 
                                                                                                                                 



We are trying to transform very large XML documents (30MB to 1GB in size)
and
were planning on using an XSLT engine for it.  Our tests showed that
passing in
the document in DOM typically crashed with out of memory errors.

however, even with passing in SAX events, the memory consumption was around
FIVE times the document size (e.g. the 50MB document input consumed 250MB
of
the jvm).

would appreciate any inputs on any way to improve the memory consumption
and
more generally - Is XSLT the way to go for such large documents? what are
the
alternatives (btw, we tried asking customer to reduce or break up the
document
- not feasible!)?

Thanks
Rajesh


__________________________________
Do you Yahoo!?
New Yahoo! Photos - easier uploading and sharing.
http://photos.yahoo.com/