You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Michael Doswald (JIRA)" <ji...@apache.org> on 2016/07/21 14:37:20 UTC

[jira] [Updated] (PDFBOX-3433) Optimize image conversion in LosslessFactory

     [ https://issues.apache.org/jira/browse/PDFBOX-3433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael Doswald updated PDFBOX-3433:
------------------------------------
    Attachment: pdfbox-performance-PDFBOX-3433.zip
                PDFBOX-3433_Optimize_image_conversion_in_LosslessFactory_rev1.patch

The proposed patch (rev1) contains changes for LosslessFactory and COSStream. 

* LosslessFactory: Read pixels line by line instead of pixel by pixel
* LosslessFactory: Pre-size buffer for grayscale images
* LosslessFactory: For RGB images create byte-buffer directly, without ByteArrayOutputStream. This prevents unnecessary copying of the resulting data array
* LosslessFactory: Pre-size the buffer for the FLATE_DECODE output
* COSStream: Overwrite the write(byte[],int,int) method for the FilterOutputStreams created by the class. Otherwise the default implementation loops over the byte array and calls write(int) for each byte

The attached JMH benchmark contains two methods to benchmark the speed of RGB and B/W images. The performance numbers on my systems are as follows:

Desktop RGB:
OLD: PdfBoxBenchmark.convertImage    avgt   129.281 ± 1.926  ms/op
NEW: PdfBoxBenchmark.convertImage    avgt   106.143 ± 1.425  ms/op

Desktop B/W:
OLD: PdfBoxBenchmark.convertImageBW  avgt   37.467 ± 0.516  ms/op
NEW: PdfBoxBenchmark.convertImageBW  avgt   29.554 ± 1.176  ms/op

Embedded RGB:
OLD: PdfBoxBenchmark.convertImage    avgt   1600.929 ± 12.577  ms/op
NEW: PdfBoxBenchmark.convertImage    avgt   1126.266 ± 42.487  ms/op

Embedded B/W:
OLD: PdfBoxBenchmark.convertImageBW  avgt  1011.356 ± 29.348  ms/op
NEW: PdfBoxBenchmark.convertImageBW  avgt  975.063 ± 35.642  ms/op

Because the patch pre-sizes the buffers and prevents unneccessary copying the allocation rate was also reduced (measurements from desktop):

OLD:
PdfBoxBenchmark.convertImage:·gc.alloc.rate    avgt   352.563 ±        6.565  MB/sec
PdfBoxBenchmark.convertImage:·gc.alloc.rate.norm   avgt  48880952.800 ±   243056.403    B/op
PdfBoxBenchmark.convertImageBW:·gc.alloc.rate      avgt   518.062 ±        9.545  MB/sec
PdfBoxBenchmark.convertImageBW:·gc.alloc.rate.norm   avgt  20213248.643 ±      215.935    B/op

NEW: 
PdfBoxBenchmark.convertImage:·gc.alloc.rate     avgt   153.795 ±       2.445  MB/sec
PdfBoxBenchmark.convertImage:·gc.alloc.rate.norm   avgt  17121575.040 ±  108565.130    B/op
PdfBoxBenchmark.convertImageBW:·gc.alloc.rate      avgt      40.888 ±       0.594  MB/sec
PdfBoxBenchmark.convertImageBW:·gc.alloc.rate.norm   avgt   1268892.004 ±   76947.484    B/op

I'm curious about your opinions.

> Optimize image conversion in LosslessFactory
> --------------------------------------------
>
>                 Key: PDFBOX-3433
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-3433
>             Project: PDFBox
>          Issue Type: Improvement
>          Components: PDModel
>    Affects Versions: 2.0.2
>         Environment: Ubuntu 14.04.4 LTS
>            Reporter: Michael Doswald
>            Priority: Trivial
>              Labels: optimization, performance
>         Attachments: PDFBOX-3433_Optimize_image_conversion_in_LosslessFactory_rev1.patch, pdfbox-performance-PDFBOX-3433.zip
>
>
> Conversion of BufferedImage objects into PDImageXObject objects could be optimized by
> * Pre-sizing the buffers
> * Reading whole lines of pixels instead of pixel-by-pixel
> * Prevent unnecessary copying of byte arrays



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org