You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@pdfbox.apache.org by "Andreas Lehmkühler (JIRA)" <ji...@apache.org> on 2016/04/12 12:30:25 UTC

[jira] [Resolved] (PDFBOX-3292) Error reading stream, expected='endstream' actual='' in non-truncated files

     [ https://issues.apache.org/jira/browse/PDFBOX-3292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andreas Lehmkühler resolved PDFBOX-3292.
----------------------------------------
    Resolution: Fixed

COSParser#checkXRefStreamOffset simply checked if a dictionary can be found at the given offset. I rare case, such as the given pdfs, there is a dictionary but not the one we are looking for. Saying that, I've implemented an additional check to see if the dictionary at the given offset is the right one or not and now everything works fine.

[~tallison@mitre.org] Thanks for the report!

> Error reading stream, expected='endstream' actual='' in non-truncated files
> ---------------------------------------------------------------------------
>
>                 Key: PDFBOX-3292
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-3292
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Parsing
>    Affects Versions: 2.0.0
>            Reporter: Tim Allison
>            Assignee: Andreas Lehmkühler
>            Priority: Minor
>             Fix For: 2.0.1, 2.1.0
>
>
> When PDF files are truncated, one of the most common exceptions in PDFBox 2.0.0 is:
> {noformat}
> java.io.IOException: Error reading stream, expected='endstream' actual='' at offset 165888
> 	at org.apache.pdfbox.pdfparser.COSParser.parseCOSStream(COSParser.java:999)
> 	at org.apache.pdfbox.pdfparser.COSParser.parseXrefObjStream(COSParser.java:326)
> 	at org.apache.pdfbox.pdfparser.COSParser.parseXref(COSParser.java:287)
> 	at org.apache.pdfbox.pdfparser.PDFParser.initialParse(PDFParser.java:192)
> 	at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:249)
> 	at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:847)
> {noformat}
> There are two files in govdocs1 that are NOT truncated and trigger this exception in 2.0.0, but were parsed by PDFBox 1.8.11 with the classic parser.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org