You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "ASF subversion and git services (JIRA)" <ji...@apache.org> on 2017/10/09 18:15:00 UTC

[jira] [Commented] (PDFBOX-3955) new -- very slow processing on truncated PDF

    [ https://issues.apache.org/jira/browse/PDFBOX-3955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16197427#comment-16197427 ] 

ASF subversion and git services commented on PDFBOX-3955:
---------------------------------------------------------

Commit 1811589 from [~lehmi] in branch 'pdfbox/branches/2.0'
[ https://svn.apache.org/r1811589 ]

PDFBOX-3955: don't parse object stream multiple times

> new -- very slow processing on truncated PDF
> --------------------------------------------
>
>                 Key: PDFBOX-3955
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-3955
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Parsing
>            Reporter: Tim Allison
>            Assignee: Andreas Lehmkühler
>
> In the latest regression run with PDFBox's 2.x branch, we're now getting very slow processing on a truncated PDF with PDFBox app's {{ExtractText}}:
> http://162.242.228.174/docs/truncated_pdfs/commoncrawl2_likely_broken/7K/7KK53NK5PVKOUGDSQ4FK6542BNPC4SWB
> Turns out this is not an infinite loop.  After 4.5 minutes, {{ExtractText}} eventually ended with: 
> {noformat}
> Exception in thread "main" java.io.IOException: Missing root object specification in trailer.
>         at org.apache.pdfbox.pdfparser.COSParser.parseTrailerValuesDynamically(COSParser.java:2508)
>         at org.apache.pdfbox.pdfparser.PDFParser.initialParse(PDFParser.java:193)
>         at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:240)
>         at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1012)
>         at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:950)
>         at org.apache.pdfbox.tools.ExtractText.startExtraction(ExtractText.java:192)
>         at org.apache.pdfbox.tools.ExtractText.main(ExtractText.java:82)
>         at org.apache.pdfbox.tools.PDFBox.main(PDFBox.java:60)
> {noformat}
> .



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org