You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@poi.apache.org by Wessel van Norel <de...@gmail.com> on 2011/01/24 20:05:42 UTC

OutOfMemoryError while writing a large xlsx file

Hello all,

Today I tried to change my HSSFWorkbook based code to XSSFWorkbook,
because the sheet I'm generating needs to contain more then 65k rows
(about 99k now). The last HSSFWorkbook I created was about 20M in
size. Knowing xmlx is XML based I expected writing the XSSFWorkbook
would take longer and need more memory, but I didn't expect it to need
> 2Gb of memory to be generated (my heap space is currently > 4Gb but
quite some is used by other parts of the application that is
generating the xlsx file). It needs even more memory and didn't
complete the writing process so I can't give you any detailed
information about the end result (unfortunately the code that
generates the XML depends on lots of EJB objects, so I can't do it
without the JBoss application server).

I read about the BigGridDemo:
http://www.docjar.org/html/api/org/apache/poi/xssf/usermodel/examples/BigGridDemo.java.html

And I found the following post:
http://apache-poi.1045710.n5.nabble.com/Yet-another-quot-Out-of-memory-quot-td3316218.html

So with the last post in mind I hope I can give Nick (and all others
on this list) a bit more detail, so hopefully this problem can be
resolved before I have to use the BigGridDemo, which would take a
considerable rewrite of my Excel exporting code.

I've created a 4.5Gb heapdump and let the eclipse memory analyzer do
it's trick. I've attached a few screenshots of the results, hoping
that these contain enough information for you. I'm not at liberty to
share the heapdump.

The biggest object is: org.apache.poi.xssf.usermodel.XSSFSheet. This
sheet is 2Gb in size. It contains almost 2 million
org.apache.xmlbeans.impl.store.Xobj$ElementXobj objects, totalling
1.7Gb in heap space retained.

The XSSFCells total 180M of memory, still a lot, but not anywhere near the 2Gb.


Unfortunately I've no experience, yet, with writing XML files using
XMLBeans. So I'm not sure if it's doing it the SAX or DOM way when
writing XML files. But looking at the memory usage, my first guess
would be that it's doing it the DOM way.


If you have anymore questions which can be answered by looking in
different ways at the heapdump, don't hesitate to ask.

I hope one of you is able to help me out here, or at least point me
where I need to poke in the POI code to change the way this XML file
is being written.

Thanks in advance.

Regards,
Wessel van Norel


Re: OutOfMemoryError while writing a large xlsx file

Posted by Nick Burch <ni...@alfresco.com>.
On Wed, 2 Feb 2011, Joseph D. Wagner wrote:
> I've noticed it doesn't override equals or hashCode.  It's my 
> understanding that the garbage collector uses hashCode to determine when 
> an object is eligible for collection.

Not sure most of the garbage collectors do, though the lack of methods 
could potentially slow down certain list/map/lookup operations

I'm sure there was a good talk at ApacheCon in Atlanta a few months back 
that covered the garbage collector in good detail, but annoyingly I've 
forgotten who gave it so I can point you at it... This blog post looks 
fairly good though:
http://techfeast-hiranya.blogspot.com/2010/11/taming-java-garbage-collector.html

Nick

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


Re: OutOfMemoryError while writing a large xlsx file

Posted by "Joseph D. Wagner" <th...@josephdwagner.info>.
On 01/24/2011 11:05 AM, Wessel van Norel wrote:
> The biggest object is: org.apache.poi.xssf.usermodel.XSSFSheet. This
> sheet is 2Gb in size. It contains almost 2 million
> org.apache.xmlbeans.impl.store.Xobj$ElementXobj objects, totalling
> 1.7Gb in heap space retained.
ElementXobj is an inner class of the abstract class Xobj, defined in:
xmlbeans/src/store/org/apache/xmlbeans/impl/store/Xobj.java

I've noticed it doesn't override equals or hashCode.  It's my 
understanding that the garbage collector uses hashCode to determine when 
an object is eligible for collection.  Without an override, the garbage 
collector may be missing some opportunities.

Can someone confirm if my understanding of the garbage collector 
correct?  I'm new at this, and I don't want to take what I think is an 
issue to the XmlBeans group only to look like an idiot.

Thanks.

Joseph D. Wagner

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


Re: OutOfMemoryError while writing a large xlsx file

Posted by Nick Burch <ni...@alfresco.com>.
On Mon, 24 Jan 2011, Wessel van Norel wrote:
> The biggest object is: org.apache.poi.xssf.usermodel.XSSFSheet. This
> sheet is 2Gb in size. It contains almost 2 million
> org.apache.xmlbeans.impl.store.Xobj$ElementXobj objects, totalling
> 1.7Gb in heap space retained.

Are you using the full ooxml-schemas jar, or the ooxml-lite one? If memory 
serves, they should have the same memory footprint (though that's worth 
double checking!), but the ooxml-schemas one should have more helpful 
classnames in it.

Any chance you could double check with the footprint of the two ooxml 
schemas jar (full and lite) on the memory footprint, and also see if the 
class names are more specific about the memory use on the full schema 
case?

(If it isn't more specific, can you drill down to try to find out what xml 
element(s) are mostly being used by the Xobjs)


> Unfortunately I've no experience, yet, with writing XML files using 
> XMLBeans. So I'm not sure if it's doing it the SAX or DOM way when 
> writing XML files. But looking at the memory usage, my first guess would 
> be that it's doing it the DOM way.

as memory serves, what happens is that the XSSF usermodel has an xssf 
specific object for most of the key parts (row, cell, style etc), and a 
backing xmlbeans object. Most reads and writes go through the XSSF class 
and read/write from the underlying xmlbeans object. At write time, 
xmlbeans serialises from the DOM view of its classes.

(For HSSF, there's a similar situation with HSSF usermodel classes and the 
underlying HSSF records)


Quick check - is there much memory difference between when you've built 
the sheet, and when you're writing it out? i.e. is it the process of 
building the huge sheet that takes all the memory, or does memory use peak 
when writing it out to the stream?

Nick

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org