You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@poi.apache.org by Randall Davis <da...@csail.mit.edu> on 2014/05/14 18:19:10 UTC

HSSF memory use

Just joined the list, so apologies if this is a well known issue.

I have been using HSSF for a number of year in a Java application and love its 
functionality. The app reads raw data and produces a modest-sized xls file for 
each raw data file; the xls file has three sheets, two sheets w/200 rows of 5 cols, 
one with 2000 rows of 3 cols.

Recently I've run into memory issues because each spreadsheet produced 
seems to permanently consume ~3MB of memory, and now that I've got more 
than a thousand data files to process (and the same number of xls files to 
produce), I'm running out of memory. In principle I can do 500 files at a time I 
suppose, but there are a variety of reasons why it's much easier to do them all in 
one pass.

I've checked carefully and it's unlikely I have another cause of the memory 
consumption (files closed, etc.)

I've looked around and seen hints that there are some issues around HSSF 
resources (cell styles, fonts?), and I've briefly looked at BigGridDemo.java.
In my code I used HSSF objects like any Java object, ie creating new ones with 
abandon as needed, unaware that this might be a problem.

Is this a known issue, and should I imitate the BigGridDemo style of creating a 
template so that objects like cells, etc., are created only once, but receive new for 
each new xls file?

All advice/suggestions welcome.

thanks




---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org


Re: HSSF memory use

Posted by Adrian Lynch <ad...@concreteplatform.com>.
Are you able to use XSSF? If so, you can use the streaming version, SXSSF.

A


On 14 May 2014 17:19, Randall Davis <da...@csail.mit.edu> wrote:

> Just joined the list, so apologies if this is a well known issue.
>
> I have been using HSSF for a number of year in a Java application and love
> its
> functionality. The app reads raw data and produces a modest-sized xls file
> for
> each raw data file; the xls file has three sheets, two sheets w/200 rows
> of 5 cols,
> one with 2000 rows of 3 cols.
>
> Recently I've run into memory issues because each spreadsheet produced
> seems to permanently consume ~3MB of memory, and now that I've got more
> than a thousand data files to process (and the same number of xls files to
> produce), I'm running out of memory. In principle I can do 500 files at a
> time I
> suppose, but there are a variety of reasons why it's much easier to do
> them all in
> one pass.
>
> I've checked carefully and it's unlikely I have another cause of the memory
> consumption (files closed, etc.)
>
> I've looked around and seen hints that there are some issues around HSSF
> resources (cell styles, fonts?), and I've briefly looked at
> BigGridDemo.java.
> In my code I used HSSF objects like any Java object, ie creating new ones
> with
> abandon as needed, unaware that this might be a problem.
>
> Is this a known issue, and should I imitate the BigGridDemo style of
> creating a
> template so that objects like cells, etc., are created only once, but
> receive new for
> each new xls file?
>
> All advice/suggestions welcome.
>
> thanks
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
> For additional commands, e-mail: user-help@poi.apache.org
>
>

Re: HSSF memory use

Posted by Nick Burch <ap...@gagravarr.org>.
On Wed, 14 May 2014, Randall Davis wrote:
> Recently I've run into memory issues because each spreadsheet produced 
> seems to permanently consume ~3MB of memory, and now that I've got more 
> than a thousand data files to process (and the same number of xls files 
> to produce), I'm running out of memory. In principle I can do 500 files 
> at a time I suppose, but there are a variety of reasons why it's much 
> easier to do them all in one pass.

As long as you're closing the output streams and input streams, and not 
holding references to anything in maps / lists / etc that you don't null / 
clear, memory should be released after a few GC runs.

I can only suggest you check you're not accidently keeping references to 
objects in maps / other functions / caches / etc.

Otherwise you'll need to dust off a profiler or similar, and use that to 
track down where the memory is going and what kinds of things it's going 
on

Nick

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org