You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@pdfbox.apache.org by Chris Bowditch <bo...@hotmail.com> on 2009/08/13 10:06:12 UTC
PDF Parse Failure only on an IBM JDK?
Hi All,
I am facing a very strange problem with PDFBox 0.8.0 (revision 779577)
On a Sun JDK the PDF parses without error, but on an IBM JDK I get the
following error:
Exception in thread "main" java.io.IOException: Error: Expected an
integer type, actual='ãÃÃ'
at
org.apache.pdfbox.pdfparser.BaseParser.readInt(BaseParser.java:1220)
at
org.apache.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:490)
at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:172)
at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:736)
at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:704)
at org.apache.pdfbox.PDFReader.parseDocument(PDFReader.java:322)
at org.apache.pdfbox.PDFReader.openPDFFile(PDFReader.java:285)
at org.apache.pdfbox.PDFReader.main(PDFReader.java:270)
The error can be reproduced using the PDFReader class.
Unfortunately I cannot attach the PDF as-is since its confidential but
if anyone knows a tool I can use to obfuscate the PDF please let me know!
My question is how can I debug this error?
Thanks,
Chris
Re: PDF Parse Failure only on an IBM JDK?
Posted by Chris Bowditch <bo...@hotmail.com>.
Chris Bowditch wrote:
> Hi All,
>
> I am facing a very strange problem with PDFBox 0.8.0 (revision 779577)
> On a Sun JDK the PDF parses without error, but on an IBM JDK I get the
> following error:
>
> Exception in thread "main" java.io.IOException: Error: Expected an
> integer type, actual='ãÃÃ'
UPDATE on this issue:
There was actually an error being retried that occured before the below
error:
java.io.IOException: Error: Expected an integer type, actual='ãÏÓ'
at
org.apache.pdfbox.pdfparser.BaseParser.readInt(BaseParser.java:1220)
at
org.apache.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:482)
at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:172)
at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:736)
at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:704)
at org.apache.pdfbox.PDFReader.parseDocument(PDFReader.java:323)
at org.apache.pdfbox.PDFReader.openPDFFile(PDFReader.java:286)
at org.apache.pdfbox.PDFReader.main(PDFReader.java:271)
The characters that fail to parse occur at the start of the PDF:
%PDF-1.4
%âãÏÓ
6 0 obj
<</Filter /FlateDecode
/Length 489
>>
stream
I have debugged the PDFParser class and the problem lies in the
skipToNextObject method which is called . On the IBM JDK when the bytes
are converted to a String some of the bytes are skipped (specifically
those with a negative value), but when the bytes are subsequently
unread, the unreading goes back too far. I'm working on a patch now and
will raise a Jira Entry for this.
> at
> org.apache.pdfbox.pdfparser.BaseParser.readInt(BaseParser.java:1220)
> at
> org.apache.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:490)
> at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:172)
> at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:736)
> at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:704)
> at org.apache.pdfbox.PDFReader.parseDocument(PDFReader.java:322)
> at org.apache.pdfbox.PDFReader.openPDFFile(PDFReader.java:285)
> at org.apache.pdfbox.PDFReader.main(PDFReader.java:270)
>
> The error can be reproduced using the PDFReader class.
>
> Unfortunately I cannot attach the PDF as-is since its confidential but
> if anyone knows a tool I can use to obfuscate the PDF please let me know!
>
> My question is how can I debug this error?
>
> Thanks,
>
> Chris
Thanks,
Chris