You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by Uday Venkatadasari <uv...@avalonconsult.com> on 2015/06/12 15:37:26 UTC

[jira] Commented: (PDFBOX-917) Read non-conforming PDFs (attached) without throwing java.io.IOException: expected='endobj' org.apache.pdfbox.io.PushBackInputStream

Hi,

I am using tika 1.3 for parsing the pdf but I am getting error for one of
my pdf file. below is the error.
pdfbox 1.3.1
java.io.IOException: expected='obj' actual='655'
org.apache.pdfbox.io.PushBackInputStream@fe7591
at org.apache.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:511)
at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:172)
at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:859)
at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:826)
at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:53)

Please help me to solve this issue.

Thanks
Uday Venkatadasari
Senior Consultant | Avalon Consulting, LLC
<http://www.avalonconsult.com/>P: 703 635 3302 | M: 631 332 1595
LinkedIn <http://www.linkedin.com/company/avalon-consulting-llc> | Google+
<http://www.google.com/+AvalonConsultingLLC> | Twitter
<https://twitter.com/avalonconsult>
-------------------------------------------------------------------------------------------------------------
This message (including any attachments) contains confidential information
intended for a specific individual and purpose, and is protected by law. If
you are not the intended recipient, you should delete this message. Any
disclosure, copying, or distribution of this message, or the taking of any
action based on it, is strictly prohibited.

Re: [jira] Commented: (PDFBOX-917) Read non-conforming PDFs (attached) without throwing java.io.IOException: expected='endobj' org.apache.pdfbox.io.PushBackInputStream

Posted by Tilman Hausherr <TH...@t-online.de>.
Am 12.06.2015 um 15:37 schrieb Uday Venkatadasari:
> Hi,
>
> I am using tika 1.3 for parsing the pdf but I am getting error for one of
> my pdf file. below is the error.
> pdfbox 1.3.1

1.3.1 is from 2010, we're now at 1.8.9. TIKA ist now at 1.8. So please 
try with these versions.

If it doesn't work, try also to configure TIKA to use the non sequential 
parser of PDFBox.

If it still doesn't work, please open an issue in JIRA and attach your 
PDF file.

Tilman

PS: you posted to the dev list. This is for PDFBox developers. Next 
time, please post to the user list.

> java.io.IOException: expected='obj' actual='655'
> org.apache.pdfbox.io.PushBackInputStream@fe7591
> at org.apache.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:511)
> at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:172)
> at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:859)
> at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:826)
> at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:53)
>
> Please help me to solve this issue.
>
> Thanks
> Uday Venkatadasari
> Senior Consultant | Avalon Consulting, LLC
> <http://www.avalonconsult.com/>P: 703 635 3302 | M: 631 332 1595
> LinkedIn <http://www.linkedin.com/company/avalon-consulting-llc> | Google+
> <http://www.google.com/+AvalonConsultingLLC> | Twitter
> <https://twitter.com/avalonconsult>
> -------------------------------------------------------------------------------------------------------------
> This message (including any attachments) contains confidential information
> intended for a specific individual and purpose, and is protected by law. If
> you are not the intended recipient, you should delete this message. Any
> disclosure, copying, or distribution of this message, or the taking of any
> action based on it, is strictly prohibited.
>


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org