You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Daniel Gredler (JIRA)" <ji...@apache.org> on 2018/11/21 05:42:00 UTC
[jira] [Commented] (PDFBOX-4300) Reduce im memory buffers when
creating grayscale images
[ https://issues.apache.org/jira/browse/PDFBOX-4300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16694272#comment-16694272 ]
Daniel Gredler commented on PDFBOX-4300:
----------------------------------------
I was going to create a new issue, but it looks like this may fit here...
I was looking at the {{LosslessFactory}} class today, thinking about using it mainly with grayscale and bitonal images. Performance was worse than expected, regardless of the compression level chosen ({{org.apache.pdfbox.filter.deflatelevel}}). Based on some local profiling and using the default compression level, {{createFromGrayImage}} spends about 30% of its time applying the flate filter, and the rest (70%) shuttling pixel data around ({{getRGB}}, etc). It seems to me that this method should be able to assume that the image's raster's data buffer is a {{DataBufferByte}}, and just use the data buffer directly:
{code:java}
private static PDImageXObject createFromGrayImage(BufferedImage image, PDDocument document)
throws IOException
{
byte[] pixels = ((DataBufferByte) image.getRaster().getDataBuffer()).getData();
int bpc = image.getColorModel().getPixelSize();
return prepareImageXObject(document, pixels,
image.getWidth(), image.getHeight(), bpc, PDDeviceGray.INSTANCE);
}
{code}
As expected, performance improved *drastically* with this change – to roughly on par with PNG file creation using {{ImageIO.write}}. The output looks good to the naked eye, but {{LosslessFactoryTest}} fails the grayscale assertion on line 95, where things seem to be off by a very small amount:
{code:java}
Tests run: 13, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 3.242 s <<< FAILURE! - in org.apache.pdfbox.pdmodel.graphics.image.LosslessFactoryTest
testCreateLosslessFromImageRGB(org.apache.pdfbox.pdmodel.graphics.image.LosslessFactoryTest) Time elapsed: 0.6 s <<< FAILURE!
junit.framework.AssertionFailedError: (3,0) expected: <FFFFFFFF> but was: <FFFEFEFE>; expected:<-1> but was:<-65794>
at org.apache.pdfbox.pdmodel.graphics.image.LosslessFactoryTest.testCreateLosslessFromImageRGB(LosslessFactoryTest.java:95)
{code}
Does this seem like a valid approach, both to improve performance and reduce memory usage? If so, any idea why some of the pixels are slightly different after the change?
> Reduce im memory buffers when creating grayscale images
> -------------------------------------------------------
>
> Key: PDFBOX-4300
> URL: https://issues.apache.org/jira/browse/PDFBOX-4300
> Project: PDFBox
> Issue Type: Improvement
> Components: PDModel
> Affects Versions: 2.0.11
> Reporter: Jesse Long
> Priority: Minor
> Labels: optimization
> Attachments: PDFBOX-4300-1.patch
>
>
> LosslessFactory uses ByteArrayOutputStreams when creating PDF image data. First, it creates a BAOS in which to store the data, then a BAOS in which to store the flate encoded data. Finally the flate encoded data is written to the PDImageXObject's stream.
> We could instead create an empty PDStream, give it a filter, and write the image data directly into the stream. We then instantiate a PDImageXObject giving it the already created stream.
> This would dramatically reduce RAM requirement if a scratchfile is in play.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org