You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@poi.apache.org by Brett Knights <kn...@ledgerdemain.com> on 2005/06/14 09:14:49 UTC

Time to add/update cells

Hello,

I have recently tried to use POI to add a few thousand rows to a 
spreadsheet.
It doesn't make much difference if I start with an almost blank 
spreadsheet or one with dummy values in the all the cells that will be 
populated on a run of known size.

I have 42 columns.
Operations move fairly quickly for the first 600 to 650 rows and then 
slow down considerably.

e.g.
On a test run on a 1GHz Windows machine:
Time to update the first 600 rows takes about 8 seconds.
Time to update the following rows to 1850 takes about 4 minutes. 
At around row 700 the code is updating around 13 rows a second.
By row 1800 it's down to 5 rows a second.

On a live run with better hardware than my test setup populating 6500 
rows takes close to 25 minutes. This is on a 2GHz Debian machine.

On neither machine does memory use approach the max allocated.

jdk 1.4.2
poi-2.5.1-final-20040804

Any help, tips, pointers etc would be most appreciated. It would make my 
life easier if I don't have to redo this as a csv.

TIA

Brett Knights


---------------------------------------------------------------------
To unsubscribe, e-mail: poi-user-unsubscribe@jakarta.apache.org
Mailing List:     http://jakarta.apache.org/site/mail2.html#poi
The Apache Jakarta Poi Project:  http://jakarta.apache.org/poi/


Re: Time to add/update cells

Posted by an...@superlinksoftware.com.
1000 per second isn't bad provided there are a few cells in there (which 
you have).  that 42000 cells per second, which
is pretty good actually considering.  Thats 0.000023809524 seconds per 
cell.... I'm happy with that :-)

The "stupid thing" that you're doing is that your pauses could be longer 
in between.  meaning if my heap is like 2gb and my sheet is only like 
10k then it will be a
LOOOOONG time before it hits the 70% mark.  The trouble is that as the 
sheet gets bigger...the number of full collections become more 
frequent...thus say instead of those ints you put a BIG FAT string that 
took a lot more bytes each...  You could kill perofrmance by keeping it 
all in memory and hitting the 70% mark more frequently (default is 70 = 
major or full collection depending)...  So big heap == better 
performance as counter-intuitive as that seems.  Isn't that cool?

-Andy

Avik Sengupta wrote:

>On a 1.8GHz linux (2.6) with jdk1.4.2, I can insert rows at approx 1000
>per second. Except for brief pauses due to garbage collection, the rate
>is resonably consistent. I've tried inserting 5000 rows with 42 cols
>each. 
>
>Attached is a graph showing this performance, the behaviour should be
>pretty obvious (and its quite what you would expect). The code follows,
>tell me I didnt do anything stupid!
>
>public void testPerf() {
>    	
>        HSSFWorkbook wb = new HSSFWorkbook();
>        HSSFSheet s = wb.createSheet();
>        HSSFRow r ;
>        HSSFCell c;
>        long t = System.currentTimeMillis();
>        long newT =0;
>        for (int i=0;i < 5000;i++) {
>        	if (i % 100 == 0 && i>0) {
>        		newT=System.currentTimeMillis();
>        		//System.out.println("Rows " + (i-100) +" to "+i+" inserted in
>" +(newT-t) + " ms - at " + (100000/(newT-t) + " rows per sec"));
>        		System.out.println(100000/(newT-t));
>        		t=newT;
>        	}
>        	r = s.createRow(i);
>        	for (short j=0;j<42;j++) {
>        		c = r.createCell(j);
>        		c.setCellValue(i+j);
>        	}
>        	
>        }
>
>
>
>On Tue, 2005-06-14 at 07:44 -0400, acoliver@apache.org wrote:
>  
>
>>Thats some odd time.  I achieve much better on my box.  First off set 
>>your MINIMUM heap size so that it doesn't grow and shrink.  Second off, 
>>make sure the heap is at least 50-60% larger than the high water mark of 
>>usage (due to generational garbage collection.  Last off, use the 2.6 
>>kernel.  Lastly, I'd like to see that code, it may not be POI at all.
>>
>>-Andy
>>
>>Brett Knights wrote:
>>    
>>
>>>Hello,
>>>
>>>I have recently tried to use POI to add a few thousand rows to a 
>>>spreadsheet.
>>>It doesn't make much difference if I start with an almost blank 
>>>spreadsheet or one with dummy values in the all the cells that will be 
>>>populated on a run of known size.
>>>
>>>I have 42 columns.
>>>Operations move fairly quickly for the first 600 to 650 rows and then 
>>>slow down considerably.
>>>
>>>e.g.
>>>On a test run on a 1GHz Windows machine:
>>>Time to update the first 600 rows takes about 8 seconds.
>>>Time to update the following rows to 1850 takes about 4 minutes. At 
>>>around row 700 the code is updating around 13 rows a second.
>>>By row 1800 it's down to 5 rows a second.
>>>
>>>On a live run with better hardware than my test setup populating 6500 
>>>rows takes close to 25 minutes. This is on a 2GHz Debian machine.
>>>
>>>On neither machine does memory use approach the max allocated.
>>>
>>>jdk 1.4.2
>>>poi-2.5.1-final-20040804
>>>
>>>Any help, tips, pointers etc would be most appreciated. It would make my 
>>>life easier if I don't have to redo this as a csv.
>>>
>>>TIA
>>>
>>>Brett Knights
>>>
>>>
>>>---------------------------------------------------------------------
>>>To unsubscribe, e-mail: poi-user-unsubscribe@jakarta.apache.org
>>>Mailing List:     http://jakarta.apache.org/site/mail2.html#poi
>>>The Apache Jakarta Poi Project:  http://jakarta.apache.org/poi/
>>>.
>>>
>>>      
>>>
>>    
>>


---------------------------------------------------------------------
To unsubscribe, e-mail: poi-user-unsubscribe@jakarta.apache.org
Mailing List:     http://jakarta.apache.org/site/mail2.html#poi
The Apache Jakarta Poi Project:  http://jakarta.apache.org/poi/


Re: Time to add/update cells

Posted by Brett Knights <kn...@ledgerdemain.com>.
Sorry for opening a can of worms here.

I was intending to post my code so I thought I'd refactor it a bit to make the intent clearer. I don't really know what I did but a run now takes about about the same time as outputting csv. On my 
machine it went from over 4 minutes to just over 2 minutes.

Anyway I noticed the same slowdown after 600 rows when outputting csv so POI isn't implicated at all in my problem.
Very curious though as I use the same framework (FWIW processing XML with Saxon) all the time and have never noticed a problem like this before.

Thanks for the examples. The incredulity helped too as it forced me to do more testing to try to prove POI was part of the problem. :-)

Brett Knights

>>>> I have recently tried to use POI to add a few thousand rows to a 
>>>> spreadsheet.
>>>> It doesn't make much difference if I start with an almost blank 
>>>> spreadsheet or one with dummy values in the all the cells that will 
>>>> be populated on a run of known size.
>>>>
>>>> I have 42 columns.
>>>> Operations move fairly quickly for the first 600 to 650 rows and 
>>>> then slow down considerably.
>>>>
>>>> e.g.
>>>> On a test run on a 1GHz Windows machine:
>>>> Time to update the first 600 rows takes about 8 seconds.
>>>> Time to update the following rows to 1850 takes about 4 minutes. At 
>>>> around row 700 the code is updating around 13 rows a second.
>>>> By row 1800 it's down to 5 rows a second.
>>>>
>>>> On a live run with better hardware than my test setup populating 
>>>> 6500 rows takes close to 25 minutes. This is on a 2GHz Debian machine.
>>>>
>>>> On neither machine does memory use approach the max allocated.
>>>>
>>>> jdk 1.4.2
>>>> poi-2.5.1-final-20040804
>>>>
>>>> Any help, tips, pointers etc would be most appreciated. It would 
>>>> make my
>>>> life easier if I don't have to redo this as a csv.
>>>>
>>>> TIA
>>>>
>>>> Brett Knights

---------------------------------------------------------------------
To unsubscribe, e-mail: poi-user-unsubscribe@jakarta.apache.org
Mailing List:     http://jakarta.apache.org/site/mail2.html#poi
The Apache Jakarta Poi Project:  http://jakarta.apache.org/poi/


Re: Time to add/update cells

Posted by ac...@apache.org.
You STILL use POI 1.1?????  ON A 64 bit VM???? 

The 64 bit being substantially slower on the SAME heap size is unshocking.

The fact that the HP VM is so slow is a bit shocking!

The linear progression of time despite heap adjustment on the Windows 
version is iteresting.  It probably means that you
need more memory for the same code on the HP.  Meaning I would expect a 
small decrease in time every time you increase the heap (as
counter intuitive as that seems) with the amount of cells you wrote.  
Just because the generational garbage collection has more space to work with
(despite having more to scan, it should do GC LESS often than it would 
have to otherwise)

-Andy



Mikael Sitruk wrote:

>Hi
>
>I've also recently conducted a performance test on windows and hp platform
>on 32&64 bits. 
>The tests were originally done to check the overhead of a wrapper to POI
>which allows sending data in buffer and break excel files according to
>criteria (max length field/cell values ...) 
>I've used poi 1.10.
>Basically the 1000 rows test is less than 1 sec, but the tests are up to
>750,000 rows. 
>The results are in the attached excel
>Mikael.S
>
>-----Original Message-----
>From: Avik Sengupta [mailto:avik.sengupta@itellix.com] 
>Sent: Tuesday, June 14, 2005 16:55
>To: acoliver@apache.org
>Cc: poi-user@jakarta.apache.org
>Subject: Re: Time to add/update cells
>
>On a 1.8GHz linux (2.6) with jdk1.4.2, I can insert rows at approx 1000
>per second. Except for brief pauses due to garbage collection, the rate
>is resonably consistent. I've tried inserting 5000 rows with 42 cols
>each. 
>
>Attached is a graph showing this performance, the behaviour should be
>pretty obvious (and its quite what you would expect). The code follows,
>tell me I didnt do anything stupid!
>
>public void testPerf() {
>    	
>        HSSFWorkbook wb = new HSSFWorkbook();
>        HSSFSheet s = wb.createSheet();
>        HSSFRow r ;
>        HSSFCell c;
>        long t = System.currentTimeMillis();
>        long newT =0;
>        for (int i=0;i < 5000;i++) {
>        	if (i % 100 == 0 && i>0) {
>        		newT=System.currentTimeMillis();
>        		//System.out.println("Rows " + (i-100) +" to "+i+"
>inserted in
>" +(newT-t) + " ms - at " + (100000/(newT-t) + " rows per sec"));
>        		System.out.println(100000/(newT-t));
>        		t=newT;
>        	}
>        	r = s.createRow(i);
>        	for (short j=0;j<42;j++) {
>        		c = r.createCell(j);
>        		c.setCellValue(i+j);
>        	}
>        	
>        }
>
>
>
>On Tue, 2005-06-14 at 07:44 -0400, acoliver@apache.org wrote:
>  
>
>>Thats some odd time.  I achieve much better on my box.  First off set 
>>your MINIMUM heap size so that it doesn't grow and shrink.  Second off, 
>>make sure the heap is at least 50-60% larger than the high water mark of 
>>usage (due to generational garbage collection.  Last off, use the 2.6 
>>kernel.  Lastly, I'd like to see that code, it may not be POI at all.
>>
>>-Andy
>>
>>Brett Knights wrote:
>>    
>>
>>>Hello,
>>>
>>>I have recently tried to use POI to add a few thousand rows to a 
>>>spreadsheet.
>>>It doesn't make much difference if I start with an almost blank 
>>>spreadsheet or one with dummy values in the all the cells that will be 
>>>populated on a run of known size.
>>>
>>>I have 42 columns.
>>>Operations move fairly quickly for the first 600 to 650 rows and then 
>>>slow down considerably.
>>>
>>>e.g.
>>>On a test run on a 1GHz Windows machine:
>>>Time to update the first 600 rows takes about 8 seconds.
>>>Time to update the following rows to 1850 takes about 4 minutes. At 
>>>around row 700 the code is updating around 13 rows a second.
>>>By row 1800 it's down to 5 rows a second.
>>>
>>>On a live run with better hardware than my test setup populating 6500 
>>>rows takes close to 25 minutes. This is on a 2GHz Debian machine.
>>>
>>>On neither machine does memory use approach the max allocated.
>>>
>>>jdk 1.4.2
>>>poi-2.5.1-final-20040804
>>>
>>>Any help, tips, pointers etc would be most appreciated. It would make my
>>>      
>>>
>
>  
>
>>>life easier if I don't have to redo this as a csv.
>>>
>>>TIA
>>>
>>>Brett Knights
>>>
>>>
>>>---------------------------------------------------------------------
>>>To unsubscribe, e-mail: poi-user-unsubscribe@jakarta.apache.org
>>>Mailing List:     http://jakarta.apache.org/site/mail2.html#poi
>>>The Apache Jakarta Poi Project:  http://jakarta.apache.org/poi/
>>>.
>>>
>>>      
>>>
>>    
>>
>>------------------------------------------------------------------------
>>
>>---------------------------------------------------------------------
>>To unsubscribe, e-mail: poi-user-unsubscribe@jakarta.apache.org
>>Mailing List:     http://jakarta.apache.org/site/mail2.html#poi
>>The Apache Jakarta Poi Project:  http://jakarta.apache.org/poi/
>>


-- 
Andrew C. Oliver
SuperLink Software, Inc.

Java to Excel using POI
http://www.superlinksoftware.com/services/poi
Commercial support including features added/implemented, bugs fixed.


---------------------------------------------------------------------
To unsubscribe, e-mail: poi-user-unsubscribe@jakarta.apache.org
Mailing List:     http://jakarta.apache.org/site/mail2.html#poi
The Apache Jakarta Poi Project:  http://jakarta.apache.org/poi/


RE: Time to add/update cells

Posted by Mikael Sitruk <mi...@bezeqint.net>.
Hi

I've also recently conducted a performance test on windows and hp platform
on 32&64 bits. 
The tests were originally done to check the overhead of a wrapper to POI
which allows sending data in buffer and break excel files according to
criteria (max length field/cell values ...) 
I've used poi 1.10.
Basically the 1000 rows test is less than 1 sec, but the tests are up to
750,000 rows. 
The results are in the attached excel
Mikael.S

-----Original Message-----
From: Avik Sengupta [mailto:avik.sengupta@itellix.com] 
Sent: Tuesday, June 14, 2005 16:55
To: acoliver@apache.org
Cc: poi-user@jakarta.apache.org
Subject: Re: Time to add/update cells

On a 1.8GHz linux (2.6) with jdk1.4.2, I can insert rows at approx 1000
per second. Except for brief pauses due to garbage collection, the rate
is resonably consistent. I've tried inserting 5000 rows with 42 cols
each. 

Attached is a graph showing this performance, the behaviour should be
pretty obvious (and its quite what you would expect). The code follows,
tell me I didnt do anything stupid!

public void testPerf() {
    	
        HSSFWorkbook wb = new HSSFWorkbook();
        HSSFSheet s = wb.createSheet();
        HSSFRow r ;
        HSSFCell c;
        long t = System.currentTimeMillis();
        long newT =0;
        for (int i=0;i < 5000;i++) {
        	if (i % 100 == 0 && i>0) {
        		newT=System.currentTimeMillis();
        		//System.out.println("Rows " + (i-100) +" to "+i+"
inserted in
" +(newT-t) + " ms - at " + (100000/(newT-t) + " rows per sec"));
        		System.out.println(100000/(newT-t));
        		t=newT;
        	}
        	r = s.createRow(i);
        	for (short j=0;j<42;j++) {
        		c = r.createCell(j);
        		c.setCellValue(i+j);
        	}
        	
        }



On Tue, 2005-06-14 at 07:44 -0400, acoliver@apache.org wrote:
> Thats some odd time.  I achieve much better on my box.  First off set 
> your MINIMUM heap size so that it doesn't grow and shrink.  Second off, 
> make sure the heap is at least 50-60% larger than the high water mark of 
> usage (due to generational garbage collection.  Last off, use the 2.6 
> kernel.  Lastly, I'd like to see that code, it may not be POI at all.
> 
> -Andy
> 
> Brett Knights wrote:
> > Hello,
> > 
> > I have recently tried to use POI to add a few thousand rows to a 
> > spreadsheet.
> > It doesn't make much difference if I start with an almost blank 
> > spreadsheet or one with dummy values in the all the cells that will be 
> > populated on a run of known size.
> > 
> > I have 42 columns.
> > Operations move fairly quickly for the first 600 to 650 rows and then 
> > slow down considerably.
> > 
> > e.g.
> > On a test run on a 1GHz Windows machine:
> > Time to update the first 600 rows takes about 8 seconds.
> > Time to update the following rows to 1850 takes about 4 minutes. At 
> > around row 700 the code is updating around 13 rows a second.
> > By row 1800 it's down to 5 rows a second.
> > 
> > On a live run with better hardware than my test setup populating 6500 
> > rows takes close to 25 minutes. This is on a 2GHz Debian machine.
> > 
> > On neither machine does memory use approach the max allocated.
> > 
> > jdk 1.4.2
> > poi-2.5.1-final-20040804
> > 
> > Any help, tips, pointers etc would be most appreciated. It would make my

> > life easier if I don't have to redo this as a csv.
> > 
> > TIA
> > 
> > Brett Knights
> > 
> > 
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: poi-user-unsubscribe@jakarta.apache.org
> > Mailing List:     http://jakarta.apache.org/site/mail2.html#poi
> > The Apache Jakarta Poi Project:  http://jakarta.apache.org/poi/
> > .
> > 
> 
> 
-- 


Re: Time to add/update cells

Posted by Avik Sengupta <av...@itellix.com>.
On a 1.8GHz linux (2.6) with jdk1.4.2, I can insert rows at approx 1000
per second. Except for brief pauses due to garbage collection, the rate
is resonably consistent. I've tried inserting 5000 rows with 42 cols
each. 

Attached is a graph showing this performance, the behaviour should be
pretty obvious (and its quite what you would expect). The code follows,
tell me I didnt do anything stupid!

public void testPerf() {
    	
        HSSFWorkbook wb = new HSSFWorkbook();
        HSSFSheet s = wb.createSheet();
        HSSFRow r ;
        HSSFCell c;
        long t = System.currentTimeMillis();
        long newT =0;
        for (int i=0;i < 5000;i++) {
        	if (i % 100 == 0 && i>0) {
        		newT=System.currentTimeMillis();
        		//System.out.println("Rows " + (i-100) +" to "+i+" inserted in
" +(newT-t) + " ms - at " + (100000/(newT-t) + " rows per sec"));
        		System.out.println(100000/(newT-t));
        		t=newT;
        	}
        	r = s.createRow(i);
        	for (short j=0;j<42;j++) {
        		c = r.createCell(j);
        		c.setCellValue(i+j);
        	}
        	
        }



On Tue, 2005-06-14 at 07:44 -0400, acoliver@apache.org wrote:
> Thats some odd time.  I achieve much better on my box.  First off set 
> your MINIMUM heap size so that it doesn't grow and shrink.  Second off, 
> make sure the heap is at least 50-60% larger than the high water mark of 
> usage (due to generational garbage collection.  Last off, use the 2.6 
> kernel.  Lastly, I'd like to see that code, it may not be POI at all.
> 
> -Andy
> 
> Brett Knights wrote:
> > Hello,
> > 
> > I have recently tried to use POI to add a few thousand rows to a 
> > spreadsheet.
> > It doesn't make much difference if I start with an almost blank 
> > spreadsheet or one with dummy values in the all the cells that will be 
> > populated on a run of known size.
> > 
> > I have 42 columns.
> > Operations move fairly quickly for the first 600 to 650 rows and then 
> > slow down considerably.
> > 
> > e.g.
> > On a test run on a 1GHz Windows machine:
> > Time to update the first 600 rows takes about 8 seconds.
> > Time to update the following rows to 1850 takes about 4 minutes. At 
> > around row 700 the code is updating around 13 rows a second.
> > By row 1800 it's down to 5 rows a second.
> > 
> > On a live run with better hardware than my test setup populating 6500 
> > rows takes close to 25 minutes. This is on a 2GHz Debian machine.
> > 
> > On neither machine does memory use approach the max allocated.
> > 
> > jdk 1.4.2
> > poi-2.5.1-final-20040804
> > 
> > Any help, tips, pointers etc would be most appreciated. It would make my 
> > life easier if I don't have to redo this as a csv.
> > 
> > TIA
> > 
> > Brett Knights
> > 
> > 
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: poi-user-unsubscribe@jakarta.apache.org
> > Mailing List:     http://jakarta.apache.org/site/mail2.html#poi
> > The Apache Jakarta Poi Project:  http://jakarta.apache.org/poi/
> > .
> > 
> 
> 
-- 


Re: Time to add/update cells

Posted by ac...@apache.org.
Thats some odd time.  I achieve much better on my box.  First off set 
your MINIMUM heap size so that it doesn't grow and shrink.  Second off, 
make sure the heap is at least 50-60% larger than the high water mark of 
usage (due to generational garbage collection.  Last off, use the 2.6 
kernel.  Lastly, I'd like to see that code, it may not be POI at all.

-Andy

Brett Knights wrote:
> Hello,
> 
> I have recently tried to use POI to add a few thousand rows to a 
> spreadsheet.
> It doesn't make much difference if I start with an almost blank 
> spreadsheet or one with dummy values in the all the cells that will be 
> populated on a run of known size.
> 
> I have 42 columns.
> Operations move fairly quickly for the first 600 to 650 rows and then 
> slow down considerably.
> 
> e.g.
> On a test run on a 1GHz Windows machine:
> Time to update the first 600 rows takes about 8 seconds.
> Time to update the following rows to 1850 takes about 4 minutes. At 
> around row 700 the code is updating around 13 rows a second.
> By row 1800 it's down to 5 rows a second.
> 
> On a live run with better hardware than my test setup populating 6500 
> rows takes close to 25 minutes. This is on a 2GHz Debian machine.
> 
> On neither machine does memory use approach the max allocated.
> 
> jdk 1.4.2
> poi-2.5.1-final-20040804
> 
> Any help, tips, pointers etc would be most appreciated. It would make my 
> life easier if I don't have to redo this as a csv.
> 
> TIA
> 
> Brett Knights
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: poi-user-unsubscribe@jakarta.apache.org
> Mailing List:     http://jakarta.apache.org/site/mail2.html#poi
> The Apache Jakarta Poi Project:  http://jakarta.apache.org/poi/
> .
> 


-- 
Andrew C. Oliver
SuperLink Software, Inc.

Java to Excel using POI
http://www.superlinksoftware.com/services/poi
Commercial support including features added/implemented, bugs fixed.

---------------------------------------------------------------------
To unsubscribe, e-mail: poi-user-unsubscribe@jakarta.apache.org
Mailing List:     http://jakarta.apache.org/site/mail2.html#poi
The Apache Jakarta Poi Project:  http://jakarta.apache.org/poi/