You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@poi.apache.org by 葛上昌司 <ku...@altpaper.net> on 2011/10/25 10:52:10 UTC

SSTRecord.serialize() performance improvement patch for huge hssf output

Hi. POI developers.

I'm now working on the routines around HSSF output to improve
the performance and memory efficiency at a huge 'xls' output in a server.

Today I'll post the 1-st step patches
which provide both 2~4x performance improvement and some conveniences
in the serialization of SST.
Although the patches are based on the 3.7 stable release,
it seems also valid in the 3.8 development release.
Please use the 'patch' program to patch the files, such as 'patch -p 0
< HOGE.patch'.
I tried to use eclipse to patch them, but I found that any trial will fail.

The feature of the patches are divided
into the two pieces of this mail attachment named
'first-sstser-verify37.patch' and 'second-io-performance-hack37.patch'
to simplify the problems.
The poi developers can easily confirm the result and performance of my patch
by applying the two patches step by step.
Please see the descriptions of 1 and 2 sections below when you want to
know the details of patches.

1, 'first-sstser-verify37.patch'
This patch provides new two features.
The first one is a byte level test method named
'testSSTRecord_DigestCheck()' to investigate a hack correctness.
Another one is an enumeration class named 'SerializeFunction' to
dispatch various serialization methods of SST record easily.

Both of the features are introduced into the
org.apache.poi.hssf.record.TestSSTRecord.java.
And some trivial accessibility changes of other methods are also
contained in this patch.

The enum 'SerializeFunction' in default consists of two instances to dispatch
the memory output method 'Memory' and the raw-file based output method
'StreamFile'.

The test method of 'testSSTRecord_DigestCheck()' proves that
there are no differences in the output bytes between the 'Memory'
method and 'StreamFile' method
and their results are identical to the original results by using a
message digest match.

If you want to check by yourself, please run the whole set of unit
tests (ant test-all?) in the poi project
in which the patch automatically integrates the
'testSSTRecord_DigestCheck()' method.

2, 'second-io-performance-hack37.patch'
Another patch contains some output performance hacks
around the packages of org.apache.poi.hssf.record.cont and
org.apache.poi.util.

The essential feature of patch is to extend the LittleEndianOutput (
and the implementation classes )
for itself (themselves) to write out the String in the both formats of
ASCII and UTF16LE.
This extension internalizes the frequent polymorphism calls of
UnknownLengthRecordOutput#writeShort() or writeByte()
in the ContinuableRecord#writeCharacterData().
The call internalization enables the jvm to avoid the polymorphism
cost along the technique of code inlining per class .

Furthermore, the template adapters of this extension are provided by
LittleEndianOutputAdapter, LittleEndianOutputByteStreamAdatper and
LittleEndianOutputFilterAdapter
to ease to build up the implementation class of LittleEndianOutput.
By using the class tree, I implemented
LittleEndianOutputBufferedRandomAccessFile for the performance check
needs,
which uses the random access file coupled with buffers as the output
destination and
also supports the DelayableLittleEndianOutput interface.

The features of patch can be enabled and disabled by flipping the two
boolean flags of
ContinuableRecordOutput.useFasterWrite and
UnknownLengthRecordOutput.useFasterWrite.
The performances at changing these flags are investigated below.

You can verify the correctness of this patch
by running the test 'testSSTRecord_DigestCheck()' contained in the
previous 'first-sstser-verify37.patch'
for each example serialization method.
The example methods are Memory, DirectRandomAccessFile, StreamFile and
LZFCompressFile in 'SerializationFunction'.
Please be careful to use the LZFCompressFile serialization method because
it requires the compress-lzf-0.8.4.jar or upper which can be fetched
from the maven repository.
See http://www.jarvana.com/jarvana/archive-details/com/ning/compress-lzf/0.8.4/compress-lzf-0.8.4.jar
for the more details of jar.

The performance of this patch is investigated in the table below by
invoking the newly created method
TestSSTRecord.testSSTRecordPerformance()
with the small code change from 'int N = 1<<10' to 'int N= 1<<20'
under the jdk(1.6.0_26) of option '-Xmx1224m -server'.
The elapsed time of 2^20 SSTReocrds serialization in seconds are
measured 60 times.
Then, the statistics is calculated by excluding 40 extreme
measurements for each serialization, avoiding ill measurements.
The value and plus minus sign represent the mean and the standard
deviation of serialization time.

java: oracle jdk 1.6.0_26
option: -Xmx1224m -server
cpu: Intel core i5-2400
OS: windows 7

Optimization enum_SerializeFunction Mean Time Standard Deviation
--- --- --- -- ---
U@T/C@T Memory 0.248 +- 0.002 secs
U@T/C@T DirectRandomAccessFile 0.94 +- 0.024 secs
U@T/C@T StreamFile 0.936 +- 0.072 secs
U@T/C@T LZFCompressFile 0.362 +- 0.002 secs
--- --- --- -- ---
U@F/C@T Memory 0.213 +- 0.002 secs
U@F/C@T DirectRandomAccessFile 0.881 +- 0.046 secs
U@F/C@T StreamFile 0.827 +- 0.039 secs
U@F/C@T LZFCompressFile 0.438 +- 0.004 secs
--- --- --- -- ---
U@T/C@F Memory 0.744 +- 0.001 secs
U@T/C@F DirectRandomAccessFile 0.939 +- 0.029 secs
U@T/C@F StreamFile 0.901 +- 0.031 secs
U@T/C@F LZFCompressFile 0.658 +- 0.002 secs
--- --- --- -- ---
U@F/C@F Memory 1.011 +- 0.005 secs
U@F/C@F DirectRandomAccessFile 1.29 +- 0.003 secs
U@F/C@F StreamFile 0.837 +- 0.039 secs
U@F/C@F LZFCompressFile 0.902 +- 0.003 secs
--- --- --- -- ---
MANUAL_HACK Memory 0.237 +- 0.002 secs
MANUAL_HACK DirectRandomAccessFile 1.174 +- 0.094 secs
MANUAL_HACK StreamFile 0.806 +- 0.042 secs
MANUAL_HACK LZFCompressFile 0.384 +- 0.002 secs

Please see the 'ser_perf.png' image attached in this mail, which
contains a chart of the mean time for each serialize method in the
optimizations.
The 'U@[TF]/C@[TF]' of 'Optimization' column indicates whether the
flags of UnknownLengthRecordOutput.useFasterWrite and
ContinuableRecordOutput.useFasterWrite
are TRUE or FALSE.
In this case, the 'U@F/C@F' is identical to the original method.
The values of 'MANUAL' rows mean the ones in which the current poi
class trees around the SSTSerializer are fully refactored and
optimized as well as possible.
#Note.. The 'MANUAL' code is not included in this patch because the
class tree is too much changed.

>From the table and chart,
I concluded that the method
'ContinuableRecordOutput.useFasterWrite=true' and
'UnknownLengthRecordOutput.useFasterWrite=true'
is 2~4 times faster than the original method in the cpu dependent
cases such as the Memory and LZFCompressedFile.
Furthermore, from the result of full manual hack, the performance is
reasonable and optimal for the small source code changes of patch.

On the other hand, the worse performances in the disk dependent cases,
such as DirectRandomAccessFile and StreamFile, are required to be
fixed.
Nevertheless the patch cannot solve the cases.
In my experience, the disk resource are more limited in a server
processing than the cpu resource
although it is required to improve the disk performance in the disk
dependent cases.
These facts indicate that the compressed writer such as
LZFCompressedFile is necessary to improve the throughput of huge
'.xls's in a server.

--
Shoji Kuzukami
Information Infrastructure Development Co., Ltd.
7-3-1 Hongo Bunkyo-ku
Tokyo, Japan 113-0033

kuz+poi@iidev.co.jp
http://www.altpaper.net/

Re: SSTRecord.serialize() performance improvement patch for huge hssf output

Posted by Nick Burch <ni...@alfresco.com>.

Hi

Thanks for this, it looks very interesting. Any chance you could open two 
new issues in bugzilla for this? One for each of the two areas of 
improvement. If you attach the patches to the respective issues, then they 
won't get lost!

Thanks
Nick

On Tue, 25 Oct 2011, $B3k>e>;;J(J wrote:

> Hi. POI developers.
>
> I'm now working on the routines around HSSF output to improve
> the performance and memory efficiency at a huge 'xls' output  in a server.
>
> Today I'll post the 1-st step patches
> which provide both 2~4x performance improvement and some conveniences
> in the serialization of SST.
> Although the patches are based on the 3.7 stable release,
> it seems also valid in the 3.8 development release.
> Please use the 'patch' program to patch the files, such as 'patch -p 0
> < HOGE.patch'.
> I tried to use eclipse to patch them, but I found that any trial will fail.
>
> The feature of the patches are divided
> into the two pieces of this mail attachment named
> 'first-sstser-verify37.patch' and 'second-io-performance-hack37.patch'
> to simplify the problems.
> The poi developers can easily confirm the result and performance of my patch
> by applying the two patches step by step.
> Please see the descriptions of 1 and 2 sections below when you want to
> know the details of patches.
>
> 1, 'first-sstser-verify37.patch'
> This patch provides new two features.
> The first one is a byte level test method named
> 'testSSTRecord_DigestCheck()' to investigate a hack correctness.
> Another one is an enumeration class named 'SerializeFunction' to
> dispatch various serialization methods of SST record easily.
>
> Both of the features are introduced into the
> org.apache.poi.hssf.record.TestSSTRecord.java.
> And some trivial accessibility changes of other methods are also
> contained in this patch.
>
> The enum 'SerializeFunction' in default consists of two instances to dispatch
> the memory output method 'Memory' and the raw-file based output method
> 'StreamFile'.
>
> The test method of 'testSSTRecord_DigestCheck()' proves that
> there are no differences in the output bytes between the 'Memory'
> method and 'StreamFile' method
> and  their results are identical to the original results by using a
> message digest match.
>
> If you want to check by yourself, please run the whole set of unit
> tests (ant test-all?) in the poi project
> in which the patch automatically integrates the
> 'testSSTRecord_DigestCheck()' method.
>
>
> 2, 'second-io-performance-hack37.patch'
> Another patch contains some output performance hacks
> around the packages of org.apache.poi.hssf.record.cont and
> org.apache.poi.util.
>
> The essential feature of patch is to extend the LittleEndianOutput (
> and the implementation classes )
> for itself (themselves) to write out the String in the both formats of
> ASCII and UTF16LE.
> This extension internalizes the frequent polymorphism calls of
> UnknownLengthRecordOutput#writeShort() or writeByte()
> in the  ContinuableRecord#writeCharacterData().
> The call internalization enables the jvm to avoid the polymorphism
> cost along the technique of code inlining per class .
>
> Furthermore, the template adapters of this extension are provided by
> LittleEndianOutputAdapter, LittleEndianOutputByteStreamAdatper and
> LittleEndianOutputFilterAdapter
> to ease to build up the implementation class of LittleEndianOutput.
> By using the class tree, I implemented
> LittleEndianOutputBufferedRandomAccessFile for the performance check
> needs,
> which uses the random access file coupled with buffers as the output
> destination and
> also supports the DelayableLittleEndianOutput interface.
>
> The features of patch can be enabled and disabled by flipping the two
> boolean flags of
> ContinuableRecordOutput.useFasterWrite and
> UnknownLengthRecordOutput.useFasterWrite.
> The performances at changing these flags are investigated below.
>
> You can verify the correctness of this patch
> by running the test 'testSSTRecord_DigestCheck()' contained in the
> previous 'first-sstser-verify37.patch'
> for each  example serialization method.
> The example methods are Memory, DirectRandomAccessFile, StreamFile and
> LZFCompressFile in 'SerializationFunction'.
> Please be careful to use the LZFCompressFile serialization method because
> it requires the compress-lzf-0.8.4.jar or upper which can be fetched
> from the maven repository.
> See http://www.jarvana.com/jarvana/archive-details/com/ning/compress-lzf/0.8.4/compress-lzf-0.8.4.jar
> for the more details of jar.
>
> The performance of this patch is investigated in the table below by
> invoking the newly created method
> TestSSTRecord.testSSTRecordPerformance()
> with the small code change from 'int N = 1<<10' to 'int N= 1<<20'
> under the jdk(1.6.0_26) of option '-Xmx1224m -server'.
> The elapsed time of 2^20 SSTReocrds serialization in seconds are
> measured 60 times.
> Then, the statistics is calculated by excluding 40 extreme
> measurements for each serialization, avoiding ill measurements.
> The value and plus minus sign represent the mean and the standard
> deviation of serialization time.
>
>
> java: oracle jdk 1.6.0_26
> option: -Xmx1224m -server
> cpu: Intel core i5-2400
> OS: windows 7
>
> Optimization	enum_SerializeFunction	Mean Time		Standard Deviation
> ---	---	---	--	---
> U@T/C@T	Memory	0.248	+-	0.002	secs
> U@T/C@T	DirectRandomAccessFile	0.94	+-	0.024	secs
> U@T/C@T	StreamFile	0.936	+-	0.072	secs
> U@T/C@T	LZFCompressFile	0.362	+-	0.002	secs
> ---	---	---	--	---
> U@F/C@T	Memory	0.213	+-	0.002	secs
> U@F/C@T	DirectRandomAccessFile	0.881	+-	0.046	secs
> U@F/C@T	StreamFile	0.827	+-	0.039	secs
> U@F/C@T	LZFCompressFile	0.438	+-	0.004	secs
> ---	---	---	--	---
> U@T/C@F	Memory	0.744	+-	0.001	secs
> U@T/C@F	DirectRandomAccessFile	0.939	+-	0.029	secs
> U@T/C@F	StreamFile	0.901	+-	0.031	secs
> U@T/C@F	LZFCompressFile	0.658	+-	0.002	secs
> ---	---	---	--	---
> U@F/C@F	Memory	1.011	+-	0.005	secs
> U@F/C@F	DirectRandomAccessFile	1.29	+-	0.003	secs
> U@F/C@F	StreamFile	0.837	+-	0.039	secs
> U@F/C@F	LZFCompressFile	0.902	+-	0.003	secs
> ---	---	---	--	---
> MANUAL_HACK	 Memory	0.237	+-	0.002	secs
> MANUAL_HACK	 DirectRandomAccessFile	1.174	+-	0.094	secs
> MANUAL_HACK        StreamFile	0.806	+-	0.042	secs
> MANUAL_HACK	 LZFCompressFile	0.384	+-	0.002	secs
>
> Please see the 'ser_perf.png' image attached in this mail, which
> contains a chart of the mean time for each serialize method in the
> optimizations.
> The 'U@[TF]/C@[TF]' of 'Optimization' column indicates whether the
> flags of UnknownLengthRecordOutput.useFasterWrite and
> ContinuableRecordOutput.useFasterWrite
> are TRUE or FALSE.
> In this case, the 'U@F/C@F' is identical to the original method.
> The values of 'MANUAL' rows mean the ones in which  the current poi
> class trees  around the SSTSerializer are fully refactored and
> optimized as well as possible.
> #Note.. The 'MANUAL' code is not included in this patch because the
> class tree is too much changed.
>
> From the table and chart,
> I concluded that  the method
> 'ContinuableRecordOutput.useFasterWrite=true' and
> 'UnknownLengthRecordOutput.useFasterWrite=true'
> is 2~4 times faster than the original method in the cpu dependent
> cases such as the Memory and LZFCompressedFile.
> Furthermore, from the result of full manual hack, the performance is
> reasonable and optimal for the small source code changes of patch.
>
> On the other hand, the worse performances in the disk dependent cases,
> such as DirectRandomAccessFile and StreamFile,  are required to be
> fixed.
> Nevertheless  the patch cannot solve the cases.
> In my experience, the disk resource are more limited in a server
> processing than the cpu resource
> although it is required to improve the disk performance in the disk
> dependent cases.
> These facts indicate that the compressed writer such as
> LZFCompressedFile is necessary to improve the throughput of huge
> '.xls's in a server.
>
>