You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Andreas Lehmkühler (JIRA)" <ji...@apache.org> on 2015/07/25 15:33:05 UTC

[jira] [Comment Edited] (PDFBOX-2845) Error parsing PDF

    [ https://issues.apache.org/jira/browse/PDFBOX-2845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14641579#comment-14641579 ] 

Andreas Lehmkühler edited comment on PDFBOX-2845 at 7/25/15 1:32 PM:
---------------------------------------------------------------------

PDFBox is to strict. The spec says
{quote}
The following objects shall not be stored in an object stream:
- .....
- An object representing the value of the Length entry in an object stream dictionary
{quote}
In the given case the length is an indirect object (554 0) but not the length of an object stream. It's the length of some simple stream (515 0) and the length PDFBox is looking for is in one of the object streams (592 0) which isn't yet parsed. I've removed the check and ran into another problem. There is an infinity loop check which throws an exception.


was (Author: lehmi):
PDFBox is to strict. The spec says
{quote}
The following objects shall not be stored in an object stream:
- .....
- An object representing the value of the Length entry in an object stream dictionary
{quote}
In the given case the length is an indirect object but not the length of an object stream. It's the length of some simple stream and the length PDFBox is look for is in one of the object streams which isn't yet parsed. I've removed the check and ran into another problem. There is an infinity loop check which throws an exception.

> Error parsing PDF
> -----------------
>
>                 Key: PDFBOX-2845
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-2845
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Parsing
>    Affects Versions: 2.0.0
>            Reporter: Christopher Clark
>             Fix For: 2.0.0
>
>
> I get the following error when parsing this pdf:  http://jmlr.csail.mit.edu/proceedings/papers/v28/ranganath13.pdf
> java.io.IOException: Object must be defined and must not be compressed object: 554:0
> Stack trace:
> Exception in thread "main" java.io.IOException: Object must be defined and must not be compressed object: 554:0
>         at org.apache.pdfbox.pdfparser.COSParser.parseObjectDynamically(COSParser.java:682)
>         at org.apache.pdfbox.pdfparser.COSParser.parseObjectDynamically(COSParser.java:646)
>         at org.apache.pdfbox.pdfparser.COSParser.getLength(COSParser.java:847)
>         at org.apache.pdfbox.pdfparser.COSParser.parseCOSStream(COSParser.java:906)
>         at org.apache.pdfbox.pdfparser.COSParser.parseFileObject(COSParser.java:732)
>         at org.apache.pdfbox.pdfparser.COSParser.parseObjectDynamically(COSParser.java:693)
>         at org.apache.pdfbox.pdfparser.COSParser.parseObjectDynamically(COSParser.java:646)
>         at org.apache.pdfbox.pdfparser.COSParser.parseDictObjects(COSParser.java:607)
>         at org.apache.pdfbox.pdfparser.PDFParser.initialParse(PDFParser.java:198)
>         at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:225)
>         at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:848)
>         at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:793)
>         at org.apache.pdfbox.tools.ExtractText.startExtraction(ExtractText.java:192)
>         at org.apache.pdfbox.tools.ExtractText.main(ExtractText.java:81)
>         at org.apache.pdfbox.tools.PDFBox.main(PDFBox.java:55)
> Note this problem does not occur in 1.8.9



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org