You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Raihan Jamal (JIRA)" <ji...@apache.org> on 2011/09/21 19:14:10 UTC

[jira] [Updated] (PDFBOX-1122) Parsing Error, Skipping Object

     [ https://issues.apache.org/jira/browse/PDFBOX-1122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Raihan Jamal updated PDFBOX-1122:
---------------------------------

    Description: 
Parsing Error, Skipping Object
java.io.IOException: expected='endstream' actual='' org.apache.pdfbox.io.PushBackInputStream@38011d45
	at org.apache.pdfbox.pdfparser.BaseParser.parseCOSStream(BaseParser.java:439)
	at org.apache.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:552)
	at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:184)
	at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1088)
	at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1053)
	at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:74)
	at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:197)
	at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:197)
	at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:135)
	at org.apache.tika.Tika.parseToString(Tika.java:357)
	at edu.uci.ics.crawler4j.crawler.BinaryParser.parse(BinaryParser.java:37)
	at edu.uci.ics.crawler4j.crawler.WebCrawler.handleBinary(WebCrawler.java:223)
	at edu.uci.ics.crawler4j.crawler.WebCrawler.processPage(WebCrawler.java:462)
	at edu.uci.ics.crawler4j.crawler.WebCrawler.run(WebCrawler.java:129)
	at java.lang.Thread.run(Thread.java:662)
        Did not found XRef object at specified startxref position 0

This is the sample URL where I am facing this problem:-
http://www.qualcomm.com/documents/files/rev-b-enhanced-mobile-broadband-for-all.pdf

Any suggestions why is it happening...!! Or its a bug??

  was:
java.io.IOException: expected='endstream' actual='' org.apache.pdfbox.io.PushBackInputStream@38011d45
	at org.apache.pdfbox.pdfparser.BaseParser.parseCOSStream(BaseParser.java:439)
	at org.apache.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:552)
	at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:184)
	at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1088)
	at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1053)
	at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:74)
	at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:197)
	at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:197)
	at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:135)
	at org.apache.tika.Tika.parseToString(Tika.java:357)
	at edu.uci.ics.crawler4j.crawler.BinaryParser.parse(BinaryParser.java:37)
	at edu.uci.ics.crawler4j.crawler.WebCrawler.handleBinary(WebCrawler.java:223)
	at edu.uci.ics.crawler4j.crawler.WebCrawler.processPage(WebCrawler.java:462)
	at edu.uci.ics.crawler4j.crawler.WebCrawler.run(WebCrawler.java:129)
	at java.lang.Thread.run(Thread.java:662)
        Did not found XRef object at specified startxref position 0

This is the sample URL where I am facing this problem:-
http://www.qualcomm.com/documents/files/rev-b-enhanced-mobile-broadband-for-all.pdf

Any suggestions why is it happening...!! Or its a bug??


> Parsing Error, Skipping Object
> ------------------------------
>
>                 Key: PDFBOX-1122
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-1122
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Parsing
>    Affects Versions: 1.6.0
>         Environment: Working with Windows 7 in eclipse.
>            Reporter: Raihan Jamal
>              Labels: pdfbox
>             Fix For: 1.7.0
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> Parsing Error, Skipping Object
> java.io.IOException: expected='endstream' actual='' org.apache.pdfbox.io.PushBackInputStream@38011d45
> 	at org.apache.pdfbox.pdfparser.BaseParser.parseCOSStream(BaseParser.java:439)
> 	at org.apache.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:552)
> 	at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:184)
> 	at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1088)
> 	at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1053)
> 	at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:74)
> 	at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:197)
> 	at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:197)
> 	at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:135)
> 	at org.apache.tika.Tika.parseToString(Tika.java:357)
> 	at edu.uci.ics.crawler4j.crawler.BinaryParser.parse(BinaryParser.java:37)
> 	at edu.uci.ics.crawler4j.crawler.WebCrawler.handleBinary(WebCrawler.java:223)
> 	at edu.uci.ics.crawler4j.crawler.WebCrawler.processPage(WebCrawler.java:462)
> 	at edu.uci.ics.crawler4j.crawler.WebCrawler.run(WebCrawler.java:129)
> 	at java.lang.Thread.run(Thread.java:662)
>         Did not found XRef object at specified startxref position 0
> This is the sample URL where I am facing this problem:-
> http://www.qualcomm.com/documents/files/rev-b-enhanced-mobile-broadband-for-all.pdf
> Any suggestions why is it happening...!! Or its a bug??

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira