You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Timo Boehme (JIRA)" <ji...@apache.org> on 2013/03/15 09:42:13 UTC
[jira] [Commented] (PDFBOX-1541) expected='endstream' actual=''
failure to parse
[ https://issues.apache.org/jira/browse/PDFBOX-1541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13603229#comment-13603229 ]
Timo Boehme commented on PDFBOX-1541:
-------------------------------------
This seems to be a known shortcoming of the standard sequential PDF parser. Please try using the non-sequential parser. Using the API
change from PDDocument.load(...) to PDDocument.loadNonSeq( ... ). In case you are using the command line tools most of them have a -nonSeq parameter to toggle the parser.
Please report if this helped.
> expected='endstream' actual='' failure to parse
> -----------------------------------------------
>
> Key: PDFBOX-1541
> URL: https://issues.apache.org/jira/browse/PDFBOX-1541
> Project: PDFBox
> Issue Type: Bug
> Components: Text extraction
> Affects Versions: 1.7.1
> Environment: Ubuntu 12.04, JDK 1.7
> Reporter: Jinder Aujla
> Attachments: exporeal09_flyer_email3.pdf
>
>
> Following exception thrown when parsing attached PDF
> Caused by: java.io.IOException: expected='endstream' actual='' org.apache.pdfbox.io.PushBackInputStream@2a789924
> at org.apache.pdfbox.pdfparser.BaseParser.parseCOSStream(BaseParser.java:597)
> at org.apache.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:575)
> at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:187)
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira