You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Tilman Hausherr (Jira)" <ji...@apache.org> on 2020/08/28 14:59:00 UTC

[jira] [Comment Edited] (PDFBOX-4943) PDF Merge of large document, memory usage?

    [ https://issues.apache.org/jira/browse/PDFBOX-4943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17186592#comment-17186592 ] 

Tilman Hausherr edited comment on PDFBOX-4943 at 8/28/20, 2:58 PM:
-------------------------------------------------------------------

Your document has a lot of structural stuff (the "structure tree"), that can't be put into the temp file. The temp file holds only streams.

The question is, do you need that? This is for accessibility. If your PDF files are created only for printing, then you can remove that part which would save space (the next time the PDF loads).


was (Author: tilman):
Your document has a lot of structural stuff (the "structure tree"), that can't be put into the temp file. The temp file holds only streams.

> PDF Merge of large document, memory usage?
> ------------------------------------------
>
>                 Key: PDFBOX-4943
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-4943
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Utilities
>    Affects Versions: 2.0.21
>         Environment: Windows, java x64
>            Reporter: Richard Stafford
>            Priority: Major
>
> We are trying to use the PDFMergerUtility to merge a number of PDF files, but are having an issue with exhausting java heap with a specific PDF file.  This file is about 50mb, but merging with it requires a java heap of 8gb.  We've tried using setupTempFileOnly() but that doesn't seem to help. 
> Looking further, just doing a PDDocument.load() operation for this file uses 3.5gb of heap, regardless of MemoryUsageSetting() values.
> For instance, with the following main()
> public static void main(String arg[]) public static void main(String arg[])
> { try
>     { // load the document
>         PDDocument sourceDoc = PDDocument.load( new File("c:\\tmp\\testfile.PDF"),
>                           MemoryUsageSetting.setupTempFileOnly().setTempDir(new File("c:\\tmp")) );
>         sourceDoc.close(); LogMessage.log( "Completed");
>     }
>   catch (Exception e)
>   {   LogMessage.log( "Exception in document load: "+e.toString());
>   }
> }
> Setting a breakpoint at the sourceDoc.close(), the heap has increased to 3.5gb.
>  
> Our test case can be downloaded from:
> [https://s3.amazonaws.com/webdl.equorum.com/misc/testfile.pdf]
>  
> Thanks,
> Rich Stafford
> Chief Scientist
> eQuorum Corporation



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org