You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Ben Manes (Jira)" <ji...@apache.org> on 2019/12/31 03:08:00 UTC

[jira] [Commented] (PDFBOX-4726) PDFRenderer uses excessive memory

    [ https://issues.apache.org/jira/browse/PDFBOX-4726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17005908#comment-17005908 ] 

Ben Manes commented on PDFBOX-4726:
-----------------------------------

I narrowed down to some PDFs that cause the excessive memory. It seems to take 256mb per image for the raw data. This accounts for 2.4gb, with another 2.4gb not reachable by GC roots. I presume that means it got promoted to old gen due to being a humongous object and G1 didn't know to reclaim it (perhaps there is a good G1 knob to assist here). Either way, since this is temporary, I think avoiding having these created in memory at all would be very beneficial. I'd much rather use disk for this instead.
{code:java}
Quartz: entity.765c4033-ca7f-4d69-b910-cd9027507e88_at_1577736533434 [DAEMON] State: RUNNABLE tid: 43
java.util.zip.Inflater.inflateBytesBytes(long, byte[], int, int, byte[], int, int) Inflater.java
java.util.zip.Inflater.inflate(byte[], int, int) Inflater.java:385
java.util.zip.Inflater.inflate(byte[]) Inflater.java:471
org.apache.pdfbox.filter.FlateFilter.decompress(InputStream, OutputStream) FlateFilter.java:83
org.apache.pdfbox.filter.FlateFilter.decode(InputStream, OutputStream, COSDictionary, int) FlateFilter.java:50
org.apache.pdfbox.filter.Filter.decode(InputStream, OutputStream, COSDictionary, int, DecodeOptions) Filter.java:87
org.apache.pdfbox.cos.COSInputStream.create(List, COSDictionary, InputStream, ScratchFile, DecodeOptions) COSInputStream.java:84
org.apache.pdfbox.cos.COSStream.createInputStream(DecodeOptions) COSStream.java:175
org.apache.pdfbox.pdmodel.common.PDStream.createInputStream(DecodeOptions) PDStream.java:241
org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.createInputStream(DecodeOptions) PDImageXObject.java:735
org.apache.pdfbox.pdmodel.graphics.image.SampledImageReader.from8bit(PDImage, WritableRaster, Rectangle, int, int, int) SampledImageReader.java:373
org.apache.pdfbox.pdmodel.graphics.image.SampledImageReader.getRGBImage(PDImage, Rectangle, int, COSArray) SampledImageReader.java:226
org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(Rectangle, int) PDImageXObject.java:444
org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage() PDImageXObject.java:425
org.apache.pdfbox.rendering.PageDrawer.drawImage(PDImage) PageDrawer.java:1116
org.apache.pdfbox.contentstream.operator.graphics.DrawObject.process(Operator, List) DrawObject.java:63
org.apache.pdfbox.contentstream.PDFStreamEngine.processOperator(Operator, List) PDFStreamEngine.java:872
org.apache.pdfbox.contentstream.PDFStreamEngine.processStreamOperators(PDContentStream) PDFStreamEngine.java:506
org.apache.pdfbox.contentstream.PDFStreamEngine.processStream(PDContentStream) PDFStreamEngine.java:480
org.apache.pdfbox.contentstream.PDFStreamEngine.processPage(PDPage) PDFStreamEngine.java:153
org.apache.pdfbox.rendering.PageDrawer.drawPage(Graphics, PDRectangle) PageDrawer.java:268
org.apache.pdfbox.rendering.PDFRenderer.renderImage(int, float, ImageType, RenderDestination) PDFRenderer.java:321
org.apache.pdfbox.rendering.PDFRenderer.renderImage(int, float, ImageType) PDFRenderer.java:243
org.apache.pdfbox.rendering.PDFRenderer.renderImageWithDPI(int, float) PDFRenderer.java:215 {code}

> PDFRenderer uses excessive memory
> ---------------------------------
>
>                 Key: PDFBOX-4726
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-4726
>             Project: PDFBox
>          Issue Type: Improvement
>            Reporter: Ben Manes
>            Priority: Major
>         Attachments: heap.png, instance.png, stacktrace.png
>
>
> {{PDFRenderer.renderImage}} uses BufferedImage with only in-memory data. This is uncompressed and can use excessive memory. This occurs despite setting \{{MemoryUsageSetting}} being configured on the document for disk space, which should be honored.
> This [stackoverflow answer|https://stackoverflow.com/a/53205617/19450] suggests using a {{WritableRaster}} backed by a temporary file. This change cannot be done in user code and requires updating the {{PDFRenderer}}.
> I am currently trying to track down a PDF that caused out-of-memory issues. From the heap dump only a few {{BufferedImages}} where in memory, but they took 6gb in their uncompressed data.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org