You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Andreas Lehmkühler (JIRA)" <ji...@apache.org> on 2018/03/21 06:21:00 UTC

[jira] [Comment Edited] (PDFBOX-4162) OutOfMemoryError in PDExtendedGraphicsState#getLineDashPattern

    [ https://issues.apache.org/jira/browse/PDFBOX-4162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16407482#comment-16407482 ] 

Andreas Lehmkühler edited comment on PDFBOX-4162 at 3/21/18 6:20 AM:
---------------------------------------------------------------------

I've fixed the suspicious code part, but I'm not convinced that this will fix the issue. 
How about a sample pdf?


was (Author: lehmi):
I've fixed the suspicious code part, but I'm not convinced that this will fix the issue. 

> OutOfMemoryError in PDExtendedGraphicsState#getLineDashPattern
> --------------------------------------------------------------
>
>                 Key: PDFBOX-4162
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-4162
>             Project: PDFBox
>          Issue Type: Bug
>    Affects Versions: 2.0.8
>            Reporter: Andreas Hubold
>            Assignee: Andreas Lehmkühler
>            Priority: Critical
>
> I'm getting an OutOfMemoryError from PDFBox when parsing a certain PDF using the Apache Tika App v 1.17 - which uses PDFBox 2.0.8 internally. This is reproducible even with 8GB heap. 
>  
> The OutOfMemoryError happens in org.apache.pdfbox.pdmodel.graphics.state.PDExtendedGraphicsState#getLineDashPattern, which contains this piece of suspicious code: 
> {code:java}
> COSArray dp = (COSArray) dict.getDictionaryObject( COSName.D );
> if( dp != null )
> {
>     COSArray array = new COSArray();
>     dp.addAll(dp);
> {code}
> The last line is wrong. It appends all elements from 'dp' to 'dp' again, effectively duplicating the elements in the list. Maybe the intention was to add it to the created array instead.
>  
> Stacktrace: 
> {noformat}
> [Full GC (Allocation Failure)  4225609K->4224664K(5989888K), 32,9544686 secs]
> Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
>     at java.util.Arrays.copyOf(Arrays.java:3210)
>     at java.util.Arrays.copyOf(Arrays.java:3181)
>     at java.util.ArrayList.grow(ArrayList.java:261)
>     at java.util.ArrayList.ensureExplicitCapacity(ArrayList.java:235)
>     at java.util.ArrayList.ensureCapacityInternal(ArrayList.java:227)
>     at java.util.ArrayList.addAll(ArrayList.java:579)
>     at org.apache.pdfbox.cos.COSArray.addAll(COSArray.java:124)
>     at org.apache.pdfbox.pdmodel.graphics.state.PDExtendedGraphicsState.getLineDashPattern(PDExtendedGraphicsState.java:280)
>     at org.apache.pdfbox.pdmodel.graphics.state.PDExtendedGraphicsState.copyIntoGraphicsState(PDExtendedGraphicsState.java:89)
>     at org.apache.pdfbox.contentstream.operator.state.SetGraphicsStateParameters.process(SetGraphicsStateParameters.java:61)
>     at org.apache.pdfbox.contentstream.PDFStreamEngine.processOperator(PDFStreamEngine.java:838)
>     at org.apache.pdfbox.contentstream.PDFStreamEngine.processStreamOperators(PDFStreamEngine.java:495)
>     at org.apache.pdfbox.contentstream.PDFStreamEngine.processStream(PDFStreamEngine.java:469)
>     at org.apache.pdfbox.contentstream.PDFStreamEngine.processPage(PDFStreamEngine.java:150)
>     at org.apache.pdfbox.text.LegacyPDFStreamEngine.processPage(LegacyPDFStreamEngine.java:139)
>     at org.apache.pdfbox.text.PDFTextStripper.processPage(PDFTextStripper.java:391)
>     at org.apache.tika.parser.pdf.PDF2XHTML.processPage(PDF2XHTML.java:147)
>     at org.apache.pdfbox.text.PDFTextStripper.processPages(PDFTextStripper.java:319)
>     at org.apache.pdfbox.text.PDFTextStripper.writeText(PDFTextStripper.java:266)
>     at org.apache.tika.parser.pdf.PDF2XHTML.process(PDF2XHTML.java:117)
>     at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:168)
>     at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
>     at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
>     at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143)
>     at org.apache.tika.cli.TikaCLI$OutputType.process(TikaCLI.java:205)
>     at org.apache.tika.cli.TikaCLI.process(TikaCLI.java:486)
>     at org.apache.tika.cli.TikaCLI.main(TikaCLI.java:145){noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org