You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@poi.apache.org by "Andrew C. Oliver" <ac...@apache.org> on 2003/06/25 16:22:49 UTC
Re: generating spreadsheets and memory utilization

This is probably better on the developers list.  (though if I recall I think
we voted to combine the too but never got around to it)

On 6/25/03 9:26 AM, "Humphrey, Jack" <JH...@coremetrics.com> wrote:

> Thanks for the tip. My prototype seems to use about 15% less memory and runs
> 77% faster with the performance branch code. FYI, the code is using CSV data
> to generate a spreadsheet with 17 columns and almost 50000 rows.
> 

Everything is flattened into parallel arrays instead of objects.  Yuckier to
debug..  

> I'm curious what optimizations have been made so far and if there are any
> ideas for more. I saw the note about possibly using 1.4 memory-mapped IO to
> reduce memory overhead -- I'm very interested in that possibility.
>

Well we probably won't do that just yet.  Some folks are still struggling by
us dropping 1.2 support (though they might have spoken up when we posed it
to the list).

YOU can change some of the (presently compile time) defaults for
arraylist/hashmap sizes.  Meaning if you look into ValueRecordAggregate,
there are some constants used for the initial size of the array list.  This
can make a BIG difference.  Smaller values are best for smaller sheets.
Larger values are much better for larger sheets.  If you have a decent mix,
this is best set at the average or so.  The way this works is that an array
list when new'd gets an initial size for the underlying array. so:

List mine = new ArrayList(10);

Is roughly equivalent to:

Object mine = new Object();
Object myarray = new Object[10];

So big deal, an object instantiation and an array... So what.  Well when you
reach the bounds of the array list, that 11th entry the equivlent of this
code runs:

Object[] oldarray = myarray;
Myarray = new Object[myarray.length *2];
System.arraycopy(oldarray,0,myarray,0,oldarray.length);

Thus think how many times this would execute on a LARGE spreadsheet if 10
were used (the default is larger).  Think of all the memory sitting there to
be garbage collected.

So right now you can grep for lines like:

 private final static int DEFAULT_ROWS=10000;
 private final static int DEFAULT_COLS=256;
 
Or

public final static int INITIAL_CAPACITY = 20;

And decide what's best based on its use (generally its commented).
Obviously most of these barely make a difference, the one in
ValueRecordsAggregate is the mother of them all.  This advice is not
necessarily true for the main branch.

In the near future we'll put this in the constructor and if you choose,
you'll be able to offer "advice" to the constructor as to what kind of
spreadsheet your generating.  It also seems to me that we should almost be
able to generate some kind of predictive (based on history) algorithm for
this, but I question whether it would be worth the synchronized call versus
a manual guess.

-andy

> Jack
> 
> -----Original Message-----
> From: Andrew C. Oliver
> To: POI Users List
> Sent: 6/24/2003 3:27 PM
> Subject: Re: generating spreadsheets and memory utilization
> 
> I'll be merging performance into the head very soon...   When is very
> soon?
> As soon as a I have a contiguous block of time to keep it all in my head
> at
> once. :-)
> 
> "cvs co -r performance-branch jakarta-poi"
> 
> On 6/24/03 3:17 PM, "Danny Mui" <da...@muibros.com> wrote:
> 
>> Cocoon uses the same API at the end of the day.
>> 
>> We have a performance branch that reduces memory by a good amount
>> (40%?).  Pull it down from CVS and see if the results are better?
>> 
>> Humphrey, Jack wrote:
>> 
>>> Hi,
>>> 
>>> I am writing some code to generate speadsheets and hope to be able to
> use
>>> POI. In my prototyping, I ran into the known issue of requiring large
>>> amounts of memory to generate a large spreadsheet. To generate a 15MB
> xls,
>>> it is taking up to 290MB of memory. I am writing a server application
> and
>>> that level of memory utilization is not acceptable.
>>> 
>>> Do I have any alternatives? If I use the Cocoon serializer and write
>>> Gnumeric XML instead, does it still require as much memory? Any tips?
>>> 
>>> Thanks.
>>> 
>>> Jack Humphrey
>>> 
>>> 
>>> 
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: poi-user-unsubscribe@jakarta.apache.org
>>> For additional commands, e-mail: poi-user-help@jakarta.apache.org
>>> 
>>>  
>>> 
>> 
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: poi-user-unsubscribe@jakarta.apache.org
>> For additional commands, e-mail: poi-user-help@jakarta.apache.org
>> 

-- 
Andrew C. Oliver
http://www.superlinksoftware.com/poi.jsp
Custom enhancements and Commercial Implementation for Jakarta POI

http://jakarta.apache.org/poi
For Java and Excel, Got POI?