You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Tilman Hausherr (JIRA)" <ji...@apache.org> on 2019/01/26 14:52:00 UTC
[jira] [Commented] (PDFBOX-4442) Loading files larger than available memory

    [ https://issues.apache.org/jira/browse/PDFBOX-4442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16753098#comment-16753098 ] 

Tilman Hausherr commented on PDFBOX-4442:
-----------------------------------------

You are using the correct parameter {{MemoryUsageSetting.setupTempFileOnly()}} but I guess the PDF structures (i.e. the many references) are still using a of of RAM, due to their complexity. The solution would be parse on demand, but there is no implementation in the repository. The size 8GB doesn't mean much, the structures may be compressed so much that it takes a multiple of it.

If you can share the PDF, send a link to the file, I have a PC with a lot of RAM.

> Loading files larger than available memory
> ------------------------------------------
>
>                 Key: PDFBOX-4442
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-4442
>             Project: PDFBox
>          Issue Type: Bug
>    Affects Versions: 2.0.13
>            Reporter: Krzysztof Podsiadło
>            Priority: Major
>
> I am trying to load a huge (8GB) PDF. As a result I am getting OutOfMemoryException. Is it even possible to load a file larger than available memory?
> Sample program:
> {code:java}
> public static void main(String[] args) {
>     File file = new File("pdf_8Gb.pdf");
>     try(InputStream inputStream = new FileInputStream(file)) {
>         try (final PDDocument document = PDDocument.load(inputStream, MemoryUsageSetting.setupTempFileOnly())) { //line 13
>             System.out.println("Success");
>         } catch (final InvalidPasswordException e) {
>             e.printStackTrace();
>         }
>     } catch (IOException e) {
>         e.printStackTrace();
>     }
> }{code}
>  
> Exception stacktrace:
> {code:java}
> Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
>     at java.base/java.util.LinkedHashMap.newNode(LinkedHashMap.java:256)
>     at java.base/java.util.HashMap.putVal(HashMap.java:626)
>     at java.base/java.util.HashMap.put(HashMap.java:607)
>     at org.apache.pdfbox.cos.COSDictionary.setItem(COSDictionary.java:217)
>     at org.apache.pdfbox.pdfparser.BaseParser.parseCOSDictionaryNameValuePair(BaseParser.java:304)
>     at org.apache.pdfbox.pdfparser.BaseParser.parseCOSDictionary(BaseParser.java:212)
>     at org.apache.pdfbox.pdfparser.BaseParser.parseDirObject(BaseParser.java:864)
>     at org.apache.pdfbox.pdfparser.COSParser.parseFileObject(COSParser.java:904)
>     at org.apache.pdfbox.pdfparser.COSParser.parseObjectDynamically(COSParser.java:873)
>     at org.apache.pdfbox.pdfparser.COSParser.parseObjectDynamically(COSParser.java:793)
>     at org.apache.pdfbox.pdfparser.COSParser.parseDictObjects(COSParser.java:753)
>     at org.apache.pdfbox.pdfparser.PDFParser.initialParse(PDFParser.java:187)
>     at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:226)
>     at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1200)
>     at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1116)
>     at pdfbox.test.PdfLoader.main(PdfLoader.java:13){code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org