You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Tilman Hausherr (Jira)" <ji...@apache.org> on 2019/09/11 19:03:00 UTC
[jira] [Commented] (PDFBOX-4649) High CPU load an memory usage, when converting PDF to Image

    [ https://issues.apache.org/jira/browse/PDFBOX-4649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16927909#comment-16927909 ] 

Tilman Hausherr commented on PDFBOX-4649:
-----------------------------------------

With PDFDebugger I can display these files at 72dpi in 5 seconds with -Xmx4g. If I set the CPU to "ridiculous speed" more, it goes down to 2 seconds. Additional time will be needed to save the files.

It will be slower with higher dpi. It went up to about 3 seconds at 400% which is about 288dpi.

You can increase speed slightly by changing
{code}
PDDocument.load(Files.newInputStream(filePath, StandardOpenOption.READ))
{code}
to
{code}
PDDocument.load(new File(filePath))
{code}


> High CPU load an memory usage, when converting PDF to Image
> -----------------------------------------------------------
>
>                 Key: PDFBOX-4649
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-4649
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Rendering
>    Affects Versions: 2.0.16
>            Reporter: Willie Chieukam
>            Priority: Critical
>         Attachments: 331577-5_b_19ez1.pdf, 332699-5_c_19ez7.pdf, 335520-5_c_19ezb.pdf, 335521-5_c_19ezd.pdf
>
>
> Hello!
> we are running a business web application, that is using pdfbox to convert
>  pdf-files to images using using pdfRenderer.renderImageWithDPI(parameters).
> When we try to convert the attached pdf, the CPU load of tomcat, running in a docker container on openshift, is raising and it seems, that the process hangs. The tomcat process is no more responsive and we get an memory overflow. Also the server load is very high meanwhile.
> We are using
> + org.apache.pdfbox:pdfbox v 2.0.16
>  + org.apache.pdfbox:pdfbox-tools v 2.0.16
>  + org.apache.pdfbox:jbig2-imageio:3.0.2
> Our Code looks like this:
> {code:java}
>     public void saveImageFromPDF(Path filePath, Path imagePath, Integer IMAGE_DPI, Float IMAGE_QUALITY) {
>         try (PDDocument pddocument = PDDocument.load(Files.newInputStream(filePath, StandardOpenOption.READ))) {
>             PDFRenderer pdfRenderer = new PDFRenderer(pddocument);
>             for (Integer i = 0; i < pddocument.getNumberOfPages(); i++) {
>                 try (OutputStream outputStream = documentServiceUtility
>                         .getFileOutputStream(imagePath.resolve(Integer.toString(i) + "." + IMAGE_FILE_EXTENSION))) {
>                     BufferedImage bufferedImage = pdfRenderer.renderImageWithDPI(i, IMAGE_DPI, ImageType.BINARY);
>                     ImageIOUtil.writeImage(bufferedImage, IMAGE_FILE_EXTENSION, outputStream, IMAGE_DPI, IMAGE_QUALITY);
>                     LOG.debug("Image of document {} successfully saved.",
>                             imagePath.resolve(Integer.toString(i) + "." + IMAGE_FILE_EXTENSION));
>                 } catch (Throwable ex) {
>                     throw new NiehoffPDDocumentHanderException(filePath, ex);
>                 }
>             }
>         } catch (Exception e) {
>             throw new NiehoffPDDocumentHanderException(filePath, e);
>         }
>     }
> {code}
> Line throwing the exception
> *{color:#FF0000}BufferedImage bufferedImage = pdfRenderer.renderImageWithDPI(i, IMAGE_DPI, ImageType.BINARY);{color}*
>   
>  Do you have an idea, how to prevent this?
> Thank you very much and best regards,
>  Willie



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org