You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@pdfbox.apache.org by Mouthgalya Ganapathy <mo...@fitchratings.com> on 2015/06/08 22:28:04 UTC

Pdf parser error for spanish/portuguese files

Hi,
I am using pdf box 1.8.9 for extracting pdf contents(actually using apache tika which in turn is using pdf box). I am encountering the below exceptions while trying to parse Portuguese or Spanish pdf files. They are different exceptions but seem to be related to handling Spanish or Portuguese characters. Has anybody encountered these exceptions before?? Any suggestions to fix it??

I can attached the pdf files if that would be helpful.


Exception list:--


1.)    java.lang.RuntimeException: java.io.IOException: Expected='null' actual='n' at offset 4306
at org.apache.pdfbox.pdfparser.PDFStreamParser$1.tryNext(PDFStreamParser.java:198)
at org.apache.pdfbox.pdfparser.PDFStreamParser$1.hasNext(PDFStreamParser.java:205)


2.)     java.lang.RuntimeException: java.io.IOException: Unknown dir object c=')' cInt=41 peek=')' peekInt=41 8544
                at org.apache.pdfbox.pdfparser.PDFStreamParser$1.tryNext(PDFStreamParser.java:198)


3.)    java.lang.RuntimeException: java.io.IOException: Error expected floating point number actual='--22.'

at org.apache.pdfbox.pdfparser.PDFStreamParser$1.tryNext(PDFStreamParser.java:198)

at org.apache.pdfbox.pdfparser.PDFStreamParser$1.hasNext(PDFStreamParser.java:205)


4.)    java.lang.RuntimeException: java.io.IOException: Error expected floating point number actual='173.0.2'

at org.apache.pdfbox.pdfparser.PDFStreamParser$1.tryNext(PDFStreamParser.java:198)

at org.apache.pdfbox.pdfparser.PDFStreamParser$1.hasNext(PDFStreamParser.java:205)


5.)    java.lang.RuntimeException: java.io.IOException: Value is not an integer: -1-15

at org.apache.pdfbox.pdfparser.PDFStreamParser$1.tryNext(PDFStreamParser.java:198)

at org.apache.pdfbox.pdfparser.PDFStreamParser$1.hasNext(PDFStreamParser.java:205)


Thanks,
Mouthgalya Ganapathy
Product Development Team

______________________________________________________________________
Confidentiality Notice:  The information contained in this e-mail and any attachment(s) is confidential and for the use of the addressee(s) only.  If you are not the intended recipient of this e-mail, do not duplicate or redistribute it by any means.  Please delete this e-mail and any attachment(s) and notify us immediately.  Unauthorized use, reliance, disclosure or copying of the contents of this e-mail and any attachment(s), or any similar action, is strictly prohibited.  Fitch Ratings reserves the right, to the extent permitted by applicable law, to retain, monitor and intercept e-mail messages both to and from its systems.

This e-mail has been scanned by the MessageLabs Email Security System.  For more information, please visit http://www.messagelabs.com/email.
______________________________________________________________________

Re: Pdf parser error for spanish/portuguese files

Posted by Tilman Hausherr <TH...@t-online.de>.
Hi,

Please upload the PDF at a public place.

Tilman

Am 08.06.2015 um 22:28 schrieb Mouthgalya Ganapathy:
> Hi,
> I am using pdf box 1.8.9 for extracting pdf contents(actually using apache tika which in turn is using pdf box). I am encountering the below exceptions while trying to parse Portuguese or Spanish pdf files. They are different exceptions but seem to be related to handling Spanish or Portuguese characters. Has anybody encountered these exceptions before?? Any suggestions to fix it??
>
> I can attached the pdf files if that would be helpful.
>
>
> Exception list:--
>
>
> 1.)    java.lang.RuntimeException: java.io.IOException: Expected='null' actual='n' at offset 4306
> at org.apache.pdfbox.pdfparser.PDFStreamParser$1.tryNext(PDFStreamParser.java:198)
> at org.apache.pdfbox.pdfparser.PDFStreamParser$1.hasNext(PDFStreamParser.java:205)
>
>
> 2.)     java.lang.RuntimeException: java.io.IOException: Unknown dir object c=')' cInt=41 peek=')' peekInt=41 8544
>                  at org.apache.pdfbox.pdfparser.PDFStreamParser$1.tryNext(PDFStreamParser.java:198)
>
>
> 3.)    java.lang.RuntimeException: java.io.IOException: Error expected floating point number actual='--22.'
>
> at org.apache.pdfbox.pdfparser.PDFStreamParser$1.tryNext(PDFStreamParser.java:198)
>
> at org.apache.pdfbox.pdfparser.PDFStreamParser$1.hasNext(PDFStreamParser.java:205)
>
>
> 4.)    java.lang.RuntimeException: java.io.IOException: Error expected floating point number actual='173.0.2'
>
> at org.apache.pdfbox.pdfparser.PDFStreamParser$1.tryNext(PDFStreamParser.java:198)
>
> at org.apache.pdfbox.pdfparser.PDFStreamParser$1.hasNext(PDFStreamParser.java:205)
>
>
> 5.)    java.lang.RuntimeException: java.io.IOException: Value is not an integer: -1-15
>
> at org.apache.pdfbox.pdfparser.PDFStreamParser$1.tryNext(PDFStreamParser.java:198)
>
> at org.apache.pdfbox.pdfparser.PDFStreamParser$1.hasNext(PDFStreamParser.java:205)
>
>
> Thanks,
> Mouthgalya Ganapathy
> Product Development Team
>
> ______________________________________________________________________
> Confidentiality Notice:  The information contained in this e-mail and any attachment(s) is confidential and for the use of the addressee(s) only.  If you are not the intended recipient of this e-mail, do not duplicate or redistribute it by any means.  Please delete this e-mail and any attachment(s) and notify us immediately.  Unauthorized use, reliance, disclosure or copying of the contents of this e-mail and any attachment(s), or any similar action, is strictly prohibited.  Fitch Ratings reserves the right, to the extent permitted by applicable law, to retain, monitor and intercept e-mail messages both to and from its systems.
>
> This e-mail has been scanned by the MessageLabs Email Security System.  For more information, please visit http://www.messagelabs.com/email.
> ______________________________________________________________________


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org