You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@pdfbox.apache.org by "Timo Boehme (JIRA)" <ji...@apache.org> on 2013/03/15 09:42:13 UTC

[jira] [Commented] (PDFBOX-1541) expected='endstream' actual='' failure to parse

    [ https://issues.apache.org/jira/browse/PDFBOX-1541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13603229#comment-13603229 ] 

Timo Boehme commented on PDFBOX-1541:
-------------------------------------

This seems to be a known shortcoming of the standard sequential PDF parser. Please try using the non-sequential parser. Using the API
change from PDDocument.load(...) to PDDocument.loadNonSeq( ... ). In case you are using the command line tools most of them have a -nonSeq parameter to toggle the parser.
Please report if this helped.

                
> expected='endstream' actual='' failure to parse
> -----------------------------------------------
>
>                 Key: PDFBOX-1541
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-1541
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Text extraction
>    Affects Versions: 1.7.1
>         Environment: Ubuntu 12.04, JDK 1.7
>            Reporter: Jinder Aujla
>         Attachments: exporeal09_flyer_email3.pdf
>
>
> Following exception thrown when parsing attached PDF
> Caused by: java.io.IOException: expected='endstream' actual='' org.apache.pdfbox.io.PushBackInputStream@2a789924
> 	at org.apache.pdfbox.pdfparser.BaseParser.parseCOSStream(BaseParser.java:597)
> 	at org.apache.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:575)
> 	at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:187)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira