You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@pdfbox.apache.org by Chris Bowditch <bo...@hotmail.com> on 2009/08/13 10:06:12 UTC

PDF Parse Failure only on an IBM JDK?

Hi All,

I am facing a very strange problem with PDFBox 0.8.0 (revision 779577) 
On a Sun JDK the PDF parses without error, but on an IBM JDK I get the 
following error:

Exception in thread "main" java.io.IOException: Error: Expected an 
integer type, actual='Ã£ÃÃ'
         at 
org.apache.pdfbox.pdfparser.BaseParser.readInt(BaseParser.java:1220)
         at 
org.apache.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:490)
         at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:172)
         at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:736)
         at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:704)
         at org.apache.pdfbox.PDFReader.parseDocument(PDFReader.java:322)
         at org.apache.pdfbox.PDFReader.openPDFFile(PDFReader.java:285)
         at org.apache.pdfbox.PDFReader.main(PDFReader.java:270)

The error can be reproduced using the PDFReader class.

Unfortunately I cannot attach the PDF as-is since its confidential but 
if anyone knows a tool I can use to obfuscate the PDF please let me know!

My question is how can I debug this error?

Thanks,

Chris

Re: PDF Parse Failure only on an IBM JDK?

Posted by Chris Bowditch <bo...@hotmail.com>.

Chris Bowditch wrote:

> Hi All,
> 
> I am facing a very strange problem with PDFBox 0.8.0 (revision 779577) 
> On a Sun JDK the PDF parses without error, but on an IBM JDK I get the 
> following error:
> 
> Exception in thread "main" java.io.IOException: Error: Expected an 
> integer type, actual='Ã£ÃÃ'

UPDATE on this issue:

There was actually an error being retried that occured before the below 
error:

java.io.IOException: Error: Expected an integer type, actual='ãÏÓ'
         at 
org.apache.pdfbox.pdfparser.BaseParser.readInt(BaseParser.java:1220)
         at 
org.apache.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:482)
         at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:172)
         at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:736)
         at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:704)
         at org.apache.pdfbox.PDFReader.parseDocument(PDFReader.java:323)
         at org.apache.pdfbox.PDFReader.openPDFFile(PDFReader.java:286)
         at org.apache.pdfbox.PDFReader.main(PDFReader.java:271)

The characters that fail to parse occur at the start of the PDF:

%PDF-1.4
%âãÏÓ
6 0 obj
<</Filter /FlateDecode
/Length 489
 >>
stream

I have debugged the PDFParser class and the problem lies in the 
skipToNextObject method which is called . On the IBM JDK when the bytes 
are converted to a String some of the bytes are skipped (specifically 
those with a negative value), but when the bytes are subsequently 
unread, the unreading goes back too far. I'm working on a patch now and 
will raise a Jira Entry for this.


>         at 
> org.apache.pdfbox.pdfparser.BaseParser.readInt(BaseParser.java:1220)
>         at 
> org.apache.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:490)
>         at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:172)
>         at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:736)
>         at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:704)
>         at org.apache.pdfbox.PDFReader.parseDocument(PDFReader.java:322)
>         at org.apache.pdfbox.PDFReader.openPDFFile(PDFReader.java:285)
>         at org.apache.pdfbox.PDFReader.main(PDFReader.java:270)
> 
> The error can be reproduced using the PDFReader class.
> 
> Unfortunately I cannot attach the PDF as-is since its confidential but 
> if anyone knows a tool I can use to obfuscate the PDF please let me know!
> 
> My question is how can I debug this error?
> 
> Thanks,
> 
> Chris

Thanks,

Chris