You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Daniel Gredler (JIRA)" <ji...@apache.org> on 2018/11/21 05:42:00 UTC
[jira] [Commented] (PDFBOX-4300) Reduce im memory buffers when creating grayscale images

    [ https://issues.apache.org/jira/browse/PDFBOX-4300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16694272#comment-16694272 ] 

Daniel Gredler commented on PDFBOX-4300:
----------------------------------------

I was going to create a new issue, but it looks like this may fit here...

I was looking at the {{LosslessFactory}} class today, thinking about using it mainly with grayscale and bitonal images. Performance was worse than expected, regardless of the compression level chosen ({{org.apache.pdfbox.filter.deflatelevel}}). Based on some local profiling and using the default compression level, {{createFromGrayImage}} spends about 30% of its time applying the flate filter, and the rest (70%) shuttling pixel data around ({{getRGB}}, etc). It seems to me that this method should be able to assume that the image's raster's data buffer is a {{DataBufferByte}}, and just use the data buffer directly:
{code:java}
    private static PDImageXObject createFromGrayImage(BufferedImage image, PDDocument document)
            throws IOException
    {
        byte[] pixels = ((DataBufferByte) image.getRaster().getDataBuffer()).getData();
        int bpc = image.getColorModel().getPixelSize();
        return prepareImageXObject(document, pixels,
                image.getWidth(), image.getHeight(), bpc, PDDeviceGray.INSTANCE);
    }
{code}
As expected, performance improved *drastically* with this change – to roughly on par with PNG file creation using {{ImageIO.write}}. The output looks good to the naked eye, but {{LosslessFactoryTest}} fails the grayscale assertion on line 95, where things seem to be off by a very small amount:
{code:java}
Tests run: 13, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 3.242 s <<< FAILURE! - in org.apache.pdfbox.pdmodel.graphics.image.LosslessFactoryTest
testCreateLosslessFromImageRGB(org.apache.pdfbox.pdmodel.graphics.image.LosslessFactoryTest)  Time elapsed: 0.6 s  <<< FAILURE!
junit.framework.AssertionFailedError: (3,0) expected: <FFFFFFFF> but was: <FFFEFEFE>;  expected:<-1> but was:<-65794>
	at org.apache.pdfbox.pdmodel.graphics.image.LosslessFactoryTest.testCreateLosslessFromImageRGB(LosslessFactoryTest.java:95)
{code}
Does this seem like a valid approach, both to improve performance and reduce memory usage? If so, any idea why some of the pixels are slightly different after the change?

> Reduce im memory buffers when creating grayscale images
> -------------------------------------------------------
>
>                 Key: PDFBOX-4300
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-4300
>             Project: PDFBox
>          Issue Type: Improvement
>          Components: PDModel
>    Affects Versions: 2.0.11
>            Reporter: Jesse Long
>            Priority: Minor
>              Labels: optimization
>         Attachments: PDFBOX-4300-1.patch
>
>
> LosslessFactory uses ByteArrayOutputStreams when creating PDF image data. First, it creates a BAOS in which to store the data, then a BAOS in which to store the flate encoded data. Finally the flate encoded data is written to the PDImageXObject's stream.
> We could instead create an empty PDStream, give it a filter, and write the image data directly into the stream. We then instantiate a PDImageXObject giving it the already created stream.
> This would dramatically reduce RAM requirement if a scratchfile is in play.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org