You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Andreas Lehmkühler (JIRA)" <ji...@apache.org> on 2011/09/21 19:46:10 UTC

[jira] [Commented] (PDFBOX-1122) Parsing Error, Skipping Object

    [ https://issues.apache.org/jira/browse/PDFBOX-1122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13109696#comment-13109696 ] 

Andreas Lehmkühler commented on PDFBOX-1122:
--------------------------------------------

I can't reproduce the described behaviour using PDFReader and ExtractText. I've tried different versions (1.6, 1.5, 1.4 and 1.3.1). Are you sure about the PDFBox version? Which version of tika are you using?

> Parsing Error, Skipping Object
> ------------------------------
>
>                 Key: PDFBOX-1122
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-1122
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Parsing
>    Affects Versions: 1.6.0
>         Environment: Working with Windows 7 in eclipse.
>            Reporter: Raihan Jamal
>              Labels: pdfbox
>             Fix For: 1.7.0
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> Parsing Error, Skipping Object
> java.io.IOException: expected='endstream' actual='' org.apache.pdfbox.io.PushBackInputStream@38011d45
> 	at org.apache.pdfbox.pdfparser.BaseParser.parseCOSStream(BaseParser.java:439)
> 	at org.apache.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:552)
> 	at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:184)
> 	at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1088)
> 	at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1053)
> 	at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:74)
> 	at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:197)
> 	at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:197)
> 	at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:135)
> 	at org.apache.tika.Tika.parseToString(Tika.java:357)
> 	at edu.uci.ics.crawler4j.crawler.BinaryParser.parse(BinaryParser.java:37)
> 	at edu.uci.ics.crawler4j.crawler.WebCrawler.handleBinary(WebCrawler.java:223)
> 	at edu.uci.ics.crawler4j.crawler.WebCrawler.processPage(WebCrawler.java:462)
> 	at edu.uci.ics.crawler4j.crawler.WebCrawler.run(WebCrawler.java:129)
> 	at java.lang.Thread.run(Thread.java:662)
>         Did not found XRef object at specified startxref position 0
> This is the sample URL where I am facing this problem:-
> http://www.qualcomm.com/documents/files/rev-b-enhanced-mobile-broadband-for-all.pdf
> Any suggestions why is it happening...!! Or its a bug??

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira