You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Tilman Hausherr (JIRA)" <ji...@apache.org> on 2019/04/06 08:41:00 UTC

[jira] [Updated] (PDFBOX-4507) OutOfMemoryError - tika1.19.1.jar

     [ https://issues.apache.org/jira/browse/PDFBOX-4507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tilman Hausherr updated PDFBOX-4507:
------------------------------------
    Affects Version/s: 2.0.14

> OutOfMemoryError - tika1.19.1.jar
> ---------------------------------
>
>                 Key: PDFBOX-4507
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-4507
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Parsing
>    Affects Versions: 2.0.12, 2.0.14
>            Reporter: Ashish Tiwari
>            Priority: Major
>         Attachments: testCmplData.pdf
>
>
> I am trying to parse a pdf file and i am getting OOM.
> Please find below stacktrace, i was facing similar issue with docx as well, but that is working now, with changes suggested in attached ticket.
> https://issues.apache.org/jira/browse/TIKA-2847
> PS : this issue happens only if i have -Xmx512m configured, if i change it to 1g it starts working fine.
> {code:java}
> Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded
> at java.nio.HeapCharBuffer.<init>(HeapCharBuffer.java:57)
> at java.nio.CharBuffer.allocate(CharBuffer.java:335)
> at java.nio.charset.CharsetDecoder.decode(CharsetDecoder.java:795)
> at org.apache.pdfbox.pdfparser.BaseParser.isValidUTF8(BaseParser.java:782)
> at org.apache.pdfbox.pdfparser.BaseParser.parseCOSName(BaseParser.java:762)
> at org.apache.pdfbox.pdfparser.BaseParser.parseCOSDictionaryNameValuePair(BaseParser.java:278)
> at org.apache.pdfbox.pdfparser.BaseParser.parseCOSDictionary(BaseParser.java:212)
> at org.apache.pdfbox.pdfparser.BaseParser.parseDirObject(BaseParser.java:862)
> at org.apache.pdfbox.pdfparser.PDFObjectStreamParser.parse(PDFObjectStreamParser.java:84)
> at org.apache.pdfbox.pdfparser.COSParser.parseObjectStream(COSParser.java:994)
> at org.apache.pdfbox.pdfparser.COSParser.parseObjectDynamically(COSParser.java:880)
> at org.apache.pdfbox.pdfparser.COSParser.parseObjectDynamically(COSParser.java:794)
> at org.apache.pdfbox.pdfparser.COSParser.parseDictObjects(COSParser.java:754)
> at org.apache.pdfbox.pdfparser.PDFParser.initialParse(PDFParser.java:185)
> at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:220)
> at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1160)
> at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1133)
> at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:154)
> at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
> at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
> at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org