You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Ben Manes (Jira)" <ji...@apache.org> on 2020/01/28 04:58:00 UTC
[jira] [Issue Comment Deleted] (PDFBOX-4726) PDFRenderer uses excessive memory

     [ https://issues.apache.org/jira/browse/PDFBOX-4726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ben Manes updated PDFBOX-4726:
------------------------------
    Comment: was deleted

(was: This occurred again, see the 1-26-20 attachments. The threads were in {{PDFRenderer}} and the heap is consumed by {{BufferedImage}} and {{int[]}} arrays. The application code doing this work is below, which tries as much as possible to be memory-friendly. Still, it fails due to PdfBox.

About 50% of the heap is eligible for GC but due to large objects, G1 is not collecting before it fails. This is on JDK11 and I will probably try switching to 13 + Shenandoah. However, ideally the image would not be rendered fully in-memory and the maximum would have been bound by the target dimensions.

Do you have any advise here?

{code}
private static final Dimensions TARGET_DIMENSIONS = Dimensions.create(1650, 1650);
private static final String FORMAT = "jpg";

/** Renders the page to an image and returns the file path. */
private Path renderPage(Context context, Pdf pdf, PdfMetadata metadata,
    PDDocument document, int pageNumber) throws IOException {
  BufferedImage image = null;
  try {
    String name = String.format("page_%d.%s", (pageNumber + 1), FORMAT);
    Path path = context.storage().tempDirectory(pdf.getUniqueId()).resolve(name);

    PDRectangle cropBox = document.getPage(pageNumber).getCropBox();
    float scaleY = TARGET_DIMENSIONS.getHeight() / cropBox.getHeight();
    float scaleX = TARGET_DIMENSIONS.getWidth() / cropBox.getWidth();
    float scaleBy = Math.max(scaleX, scaleY);

    image = new PDFRenderer(document).renderImage(pageNumber, scaleBy);
    ImageIO.write(image, FORMAT, path.toFile());
    return path;
  } finally {
    if (image != null) {
      image.getGraphics().dispose();
      image.flush();
    }
  }
}
{code})

> PDFRenderer uses excessive memory
> ---------------------------------
>
>                 Key: PDFBOX-4726
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-4726
>             Project: PDFBox
>          Issue Type: Improvement
>            Reporter: Ben Manes
>            Priority: Major
>         Attachments: heap 1-26-20.png, heap.png, instance.png, reachability 1-26-20.png, stacktrace.png
>
>
> {{PDFRenderer.renderImage}} uses BufferedImage with only in-memory data. This is uncompressed and can use excessive memory. This occurs despite setting \{{MemoryUsageSetting}} being configured on the document for disk space, which should be honored.
> This [stackoverflow answer|https://stackoverflow.com/a/53205617/19450] suggests using a {{WritableRaster}} backed by a temporary file. This change cannot be done in user code and requires updating the {{PDFRenderer}}.
> I am currently trying to track down a PDF that caused out-of-memory issues. From the heap dump only a few {{BufferedImages}} where in memory, but they took 6gb in their uncompressed data.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org