You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@poi.apache.org by Mikael Sitruk <mi...@bezeqint.net> on 2003/02/09 22:26:23 UTC

Performance Benchmark

Hi to all, 

I've made a benchmark on two branches: 'performance-branch' and
'performance'. I put the result here in this email, but perhaps an excel
file should be more appropriate

The benchmark was performed on two types of excel files noted A & B

A: MS Excel file of 10,483 KB (10483 (KB)) (1 sheet of 65520 on 10
columns, each cell contains a single character)
B: MS Excel file of 20,956 Mg (2 sheets of 65520 on 10 columns, each
cell contains a single character)

The benchmark was performed since I need to create very large workbooks
with an acceptable amount of memory. Following are the results.

Test #1 - branch: performance
-----------------------------
Memory size  : 128MB
Excel        : A & B
Run status   : failed
run time (ms): N/A
new file size: N/A
Comment      : N/A

Test #2 - branch: performance
-----------------------------
Memory size  : 256MB
Excel        : A
Run status   : success
run time (ms): 55680
new file size: 10339 (KB)
Comment      : The new file size is slightly less than the original one,
but it is totally compatible with Excel, e.g. I opened it in excel and
the data is ok. See(*)
 
   
Test #3 - branch: performance
-----------------------------
Memory size  : 256MB
Excel        : B
Run status   : failure
run time (ms): N/A
new file size: N/A
Comment      : N/A

Test #4 - branch: performance
-----------------------------
Memory size  : 300MB
Excel        : B
Run status   : failure
run time (ms): N/A
new file size: N/A
Comment      : N/A

Test #5 - branch: performance
-----------------------
Memory size  : 400MB
Excel        : B
Run status   : Success
run time (ms): 114765
new file size: 20666 KB
Comment      : N/A
------------------------------------------------------------------------
--------------------

Test #1 - branch: performance-branch
------------------------------------
Memory size  : 128MB
Excel        : A & B
Run status   : failed
run time (ms): N/A
new file size: N/A
Comment      : N/A

Test #2 - branch: performance-branch
------------------------------------
Memory size  : 256MB
Excel        : A
Run status   : success
run time (ms): 25687 (after third run - first run took 40668, the second
33000)
new file size: 10331 (KB)
Comment      : The new file size is slightly less than the original one,
but it is totally compatible with Excel, e.g. I opened it in excel and
the data is ok.
Another quite interesting thing is that the run took less and less time,
but this I presume is due to the CPU architecture.
   
Test #3 - branch: performance-branch
-----------------------------
Memory size  : 256MB
Excel        : B
Run status   : failure
run time (ms): N/A
new file size: N/A
Comment      : N/A

Test #4 - branch: performance-branch
-----------------------------
Memory size  : 300MB
Excel        : B
Run status   : failure
run time (ms): N/A
new file size: N/A
Comment      : N/A

Test #5 - branch: performance-branch
-----------------------
Memory size  : 400MB
Excel        : B
Run status   : failure
run time (ms): N/A
new file size: N/A
Comment      : N/A

Test #6 - branch: performance-branch
-----------------------
Memory size  : 450MB
Excel        : B
Run status   : success
run time (ms): 236991
new file size: 20,658 KB
Comment      : Here again the file size is slightly smaller.

The most interesting thing is that in this branch (performance-branch)
by paradox took more memory than the branch 'performance', e.g. the test
with 400MB failed in this branch and succeed in the 'performance'
branch.
 
==========================================================

(*) difference between original file and created file

Using Biff I have the following diffs.
Orig file (just after the COUNTRY record):
  ============================================
  Offset 0x57e (1406)
  recordid = 0x1c1, size =8
  [UNKNOWN RECORD:1c1]
      .id        = 1c1
  [/UNKNWON RECORD]

  -----UNKNOWN----------------------------------
  00000000 C1 01 00 00 54 8D 01 00                         ....T...

  -----UNKNOWN----------------------------------
  ============================================
  Offset 0x58a (1418)
  recordid = 0xfc, size =48
  [SST]
      .numstrings     = 9ff60
      .uniquestrings  = a
      .string_0      = A
      .string_1      = a
      .string_2      = b
      .string_3      = B
      .string_4      = c
      .string_5      = d
      .string_6      = e
      .string_7      = f
      .string_8      = g
      .string_9      = h
  [/SST]


The new file has instead ============================================
Offset 0x57e (1406)
recordid = 0xfc, size =48
[SST]
    .numstrings     = 9ff60
    .uniquestrings  = a
    .string_0      = A
    .string_1      = a
    .string_2      = b
    .string_3      = B
    .string_4      = c
    .string_5      = d
    .string_6      = e
    .string_7      = f
    .string_8      = g
    .string_9      = h
[/SST]

============================================

So there is a difference, and of course after this record the offset are
not the same.

Mikael.S

Re: Performance Benchmark

Posted by "Andrew C. Oliver" <ac...@apache.org>.
sure.  why not.

Glen Stampoultzis wrote:

> Would you like me to remove this tag?
>
> -- Glen
>
> At 05:03 PM 9/02/2003 -0500, you wrote:
>
>> Note.  I accidentally tagged the main branch "performance" before 
>> creating the performance-branch.  So "performance" is actually just a 
>> tag before I applied my patches and "performance-branch" is where 
>> they actually were.  The tag "performance" is roughly irrelevant at 
>> this point.  (I just goofed).
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: poi-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: poi-dev-help@jakarta.apache.org
>
>




Re: Performance Benchmark

Posted by Glen Stampoultzis <gs...@iprimus.com.au>.
Would you like me to remove this tag?

-- Glen

At 05:03 PM 9/02/2003 -0500, you wrote:
>Note.  I accidentally tagged the main branch "performance" before creating 
>the performance-branch.  So "performance" is actually just a tag before I 
>applied my patches and "performance-branch" is where they actually 
>were.  The tag "performance" is roughly irrelevant at this point.  (I just 
>goofed).


Re: Performance Benchmark

Posted by "Andrew C. Oliver" <ac...@apache.org>.
Note.  I accidentally tagged the main branch "performance" before 
creating the performance-branch.  So "performance" is actually just a 
tag before I applied my patches and "performance-branch" is where they 
actually were.  The tag "performance" is roughly irrelevant at this 
point.  (I just goofed).

On to the details.  The results are roughly what i'd expect.  Note that 
I did not have a sheet with all single characters in mind, more thinking 
about your average scenario.  In this case the peformance branch would 
cause an expansion.

Here is why.  In order to keep the data structure simple the labelsst 
index stored in each string cell is expanded to a double.  This lets us 
store it in the same array with numeric cells.  So while for the 10 meg 
sheet the memory savings of reducing the object count pays off still.  
On the 20 meg sheet the memory expansion of the labelsst indicies 
actually manages to surpass the savings in objects.  Furthermore, I 
imagine tweaking the INITIAL_CAPACITY constants for such a large sheet 
would probably bring this back into line (I plan to make these 
configurable).  As near the end the array copies required are probably 
quite large.

So what is left to do?  Well we need to test/debug with complex sheets, 
make all the unit tests work and make the INTIAL_CAPACITY constants able 
to be configured.  Meaning I should be able to say "Okay I'm about to 
open a 10k spreadsheet, so don't optimize for opening a big sheet" or 
"Okay I'm about to open a 4mb spreadsheet, so optimize for about that" 
or "I'm one of those crazy people who generates 20mb spreadsheets, 
please optimize for that".  This way the arrays and such are cooked to 
20mb etc and not made that big for small sheets.

-Andy

Mikael Sitruk wrote:

>Hi to all, 
>
>I've made a benchmark on two branches: 'performance-branch' and
>'performance'. I put the result here in this email, but perhaps an excel
>file should be more appropriate
>
>The benchmark was performed on two types of excel files noted A & B
>
>A: MS Excel file of 10,483 KB (10483 (KB)) (1 sheet of 65520 on 10
>columns, each cell contains a single character)
>B: MS Excel file of 20,956 Mg (2 sheets of 65520 on 10 columns, each
>cell contains a single character)
>
>The benchmark was performed since I need to create very large workbooks
>with an acceptable amount of memory. Following are the results.
>
>Test #1 - branch: performance
>-----------------------------
>Memory size  : 128MB
>Excel        : A & B
>Run status   : failed
>run time (ms): N/A
>new file size: N/A
>Comment      : N/A
>
>Test #2 - branch: performance
>-----------------------------
>Memory size  : 256MB
>Excel        : A
>Run status   : success
>run time (ms): 55680
>new file size: 10339 (KB)
>Comment      : The new file size is slightly less than the original one,
>but it is totally compatible with Excel, e.g. I opened it in excel and
>the data is ok. See(*)
> 
>   
>Test #3 - branch: performance
>-----------------------------
>Memory size  : 256MB
>Excel        : B
>Run status   : failure
>run time (ms): N/A
>new file size: N/A
>Comment      : N/A
>
>Test #4 - branch: performance
>-----------------------------
>Memory size  : 300MB
>Excel        : B
>Run status   : failure
>run time (ms): N/A
>new file size: N/A
>Comment      : N/A
>
>Test #5 - branch: performance
>-----------------------
>Memory size  : 400MB
>Excel        : B
>Run status   : Success
>run time (ms): 114765
>new file size: 20666 KB
>Comment      : N/A
>------------------------------------------------------------------------
>--------------------
>
>Test #1 - branch: performance-branch
>------------------------------------
>Memory size  : 128MB
>Excel        : A & B
>Run status   : failed
>run time (ms): N/A
>new file size: N/A
>Comment      : N/A
>
>Test #2 - branch: performance-branch
>------------------------------------
>Memory size  : 256MB
>Excel        : A
>Run status   : success
>run time (ms): 25687 (after third run - first run took 40668, the second
>33000)
>new file size: 10331 (KB)
>Comment      : The new file size is slightly less than the original one,
>but it is totally compatible with Excel, e.g. I opened it in excel and
>the data is ok.
>Another quite interesting thing is that the run took less and less time,
>but this I presume is due to the CPU architecture.
>   
>Test #3 - branch: performance-branch
>-----------------------------
>Memory size  : 256MB
>Excel        : B
>Run status   : failure
>run time (ms): N/A
>new file size: N/A
>Comment      : N/A
>
>Test #4 - branch: performance-branch
>-----------------------------
>Memory size  : 300MB
>Excel        : B
>Run status   : failure
>run time (ms): N/A
>new file size: N/A
>Comment      : N/A
>
>Test #5 - branch: performance-branch
>-----------------------
>Memory size  : 400MB
>Excel        : B
>Run status   : failure
>run time (ms): N/A
>new file size: N/A
>Comment      : N/A
>
>Test #6 - branch: performance-branch
>-----------------------
>Memory size  : 450MB
>Excel        : B
>Run status   : success
>run time (ms): 236991
>new file size: 20,658 KB
>Comment      : Here again the file size is slightly smaller.
>
>The most interesting thing is that in this branch (performance-branch)
>by paradox took more memory than the branch 'performance', e.g. the test
>with 400MB failed in this branch and succeed in the 'performance'
>branch.
> 
>==========================================================
>
>(*) difference between original file and created file
>
>Using Biff I have the following diffs.
>Orig file (just after the COUNTRY record):
>  ============================================
>  Offset 0x57e (1406)
>  recordid = 0x1c1, size =8
>  [UNKNOWN RECORD:1c1]
>      .id        = 1c1
>  [/UNKNWON RECORD]
>
>  -----UNKNOWN----------------------------------
>  00000000 C1 01 00 00 54 8D 01 00                         ....T...
>
>  -----UNKNOWN----------------------------------
>  ============================================
>  Offset 0x58a (1418)
>  recordid = 0xfc, size =48
>  [SST]
>      .numstrings     = 9ff60
>      .uniquestrings  = a
>      .string_0      = A
>      .string_1      = a
>      .string_2      = b
>      .string_3      = B
>      .string_4      = c
>      .string_5      = d
>      .string_6      = e
>      .string_7      = f
>      .string_8      = g
>      .string_9      = h
>  [/SST]
>
>
>The new file has instead ============================================
>Offset 0x57e (1406)
>recordid = 0xfc, size =48
>[SST]
>    .numstrings     = 9ff60
>    .uniquestrings  = a
>    .string_0      = A
>    .string_1      = a
>    .string_2      = b
>    .string_3      = B
>    .string_4      = c
>    .string_5      = d
>    .string_6      = e
>    .string_7      = f
>    .string_8      = g
>    .string_9      = h
>[/SST]
>
>============================================
>
>So there is a difference, and of course after this record the offset are
>not the same.
>
>Mikael.S
>
>  
>