You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Michael Doswald (JIRA)" <ji...@apache.org> on 2016/07/21 14:37:20 UTC
[jira] [Updated] (PDFBOX-3433) Optimize image conversion in
LosslessFactory
[ https://issues.apache.org/jira/browse/PDFBOX-3433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Michael Doswald updated PDFBOX-3433:
------------------------------------
Attachment: pdfbox-performance-PDFBOX-3433.zip
PDFBOX-3433_Optimize_image_conversion_in_LosslessFactory_rev1.patch
The proposed patch (rev1) contains changes for LosslessFactory and COSStream.
* LosslessFactory: Read pixels line by line instead of pixel by pixel
* LosslessFactory: Pre-size buffer for grayscale images
* LosslessFactory: For RGB images create byte-buffer directly, without ByteArrayOutputStream. This prevents unnecessary copying of the resulting data array
* LosslessFactory: Pre-size the buffer for the FLATE_DECODE output
* COSStream: Overwrite the write(byte[],int,int) method for the FilterOutputStreams created by the class. Otherwise the default implementation loops over the byte array and calls write(int) for each byte
The attached JMH benchmark contains two methods to benchmark the speed of RGB and B/W images. The performance numbers on my systems are as follows:
Desktop RGB:
OLD: PdfBoxBenchmark.convertImage avgt 129.281 ± 1.926 ms/op
NEW: PdfBoxBenchmark.convertImage avgt 106.143 ± 1.425 ms/op
Desktop B/W:
OLD: PdfBoxBenchmark.convertImageBW avgt 37.467 ± 0.516 ms/op
NEW: PdfBoxBenchmark.convertImageBW avgt 29.554 ± 1.176 ms/op
Embedded RGB:
OLD: PdfBoxBenchmark.convertImage avgt 1600.929 ± 12.577 ms/op
NEW: PdfBoxBenchmark.convertImage avgt 1126.266 ± 42.487 ms/op
Embedded B/W:
OLD: PdfBoxBenchmark.convertImageBW avgt 1011.356 ± 29.348 ms/op
NEW: PdfBoxBenchmark.convertImageBW avgt 975.063 ± 35.642 ms/op
Because the patch pre-sizes the buffers and prevents unneccessary copying the allocation rate was also reduced (measurements from desktop):
OLD:
PdfBoxBenchmark.convertImage:·gc.alloc.rate avgt 352.563 ± 6.565 MB/sec
PdfBoxBenchmark.convertImage:·gc.alloc.rate.norm avgt 48880952.800 ± 243056.403 B/op
PdfBoxBenchmark.convertImageBW:·gc.alloc.rate avgt 518.062 ± 9.545 MB/sec
PdfBoxBenchmark.convertImageBW:·gc.alloc.rate.norm avgt 20213248.643 ± 215.935 B/op
NEW:
PdfBoxBenchmark.convertImage:·gc.alloc.rate avgt 153.795 ± 2.445 MB/sec
PdfBoxBenchmark.convertImage:·gc.alloc.rate.norm avgt 17121575.040 ± 108565.130 B/op
PdfBoxBenchmark.convertImageBW:·gc.alloc.rate avgt 40.888 ± 0.594 MB/sec
PdfBoxBenchmark.convertImageBW:·gc.alloc.rate.norm avgt 1268892.004 ± 76947.484 B/op
I'm curious about your opinions.
> Optimize image conversion in LosslessFactory
> --------------------------------------------
>
> Key: PDFBOX-3433
> URL: https://issues.apache.org/jira/browse/PDFBOX-3433
> Project: PDFBox
> Issue Type: Improvement
> Components: PDModel
> Affects Versions: 2.0.2
> Environment: Ubuntu 14.04.4 LTS
> Reporter: Michael Doswald
> Priority: Trivial
> Labels: optimization, performance
> Attachments: PDFBOX-3433_Optimize_image_conversion_in_LosslessFactory_rev1.patch, pdfbox-performance-PDFBOX-3433.zip
>
>
> Conversion of BufferedImage objects into PDImageXObject objects could be optimized by
> * Pre-sizing the buffers
> * Reading whole lines of pixels instead of pixel-by-pixel
> * Prevent unnecessary copying of byte arrays
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org