You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@pdfbox.apache.org by Max Gravitt <mg...@me.com> on 2010/11/03 01:32:58 UTC

IOException with PDFParser

Hi,
I recently started to attempt to parse faxes that are PDF'd and sent via email.  I continually get the below exception with these types of files.  Does anyone have thoughts on the root cause and if there is any workaround?
thanks,
MG

IOException
expected='endobj' firstReadAttempt='' secondReadAttempt='' org.pdfbox.io.PushBackInputStream@d2f5f1
org.pdfbox.pdfparser.PDFParser; parseObject; 502
org.pdfbox.pdfparser.PDFParser; parse; 176
org.pdfbox.pdmodel.PDDocument; load; 707
com.josiejune.documentdispatch.models.Document$DocumentParser; getPDFContents; 245

Re: IOException with PDFParser

Posted by Max Gravitt <mg...@me.com>.

Great, thanks.  Question for the community - Has the latest version of PDFBox been altered to work Google App Engine?  If so, how can I adopt it?

thanks
MG

On Nov 2, 2010, at 9:30 PM, Andreas Lehmkühler wrote:

> Hi,
> 
> 
> Am 03.11.10 01:52, schrieb Max Gravitt:
>> Hi,
>> 
>> I should have clarified the question.  I am using this version because I am running the library on Google App Engine and this is the version that is compatible.  If I can't make this older version compatible with the new PDFs, is there a way to retrofit the most recent version to Google App engine?
> PDFBox was improved a lot since it came to apache. So there will be some differences compared to older versions, but there are also a lot of technical aspects which are still the same.
> As I'm not a GAE expert I can't answer your question in detail, but I know from other users that pdfbox has to be altered to work with in the GAE.
> 
> BR
> Andreas Lehmkühler
> 
>> On Nov 2, 2010, at 8:37 PM, Andreas Lehmkühler wrote:
>> 
>>> Hi,
>>> 
>>> Am 03.11.10 01:32, schrieb Max Gravitt:
>>>> Hi,
>>>> I recently started to attempt to parse faxes that are PDF'd and sent via email.  I continually get the below exception with these types of files.  Does anyone have thoughts on the root cause and if there is any workaround?
>>>> thanks,
>>>> MG
>>>> 
>>>> IOException
>>>> expected='endobj' firstReadAttempt='' secondReadAttempt='' org.pdfbox.io.PushBackInputStream@d2f5f1
>>>> org.pdfbox.pdfparser.PDFParser; parseObject; 502
>>>> org.pdfbox.pdfparser.PDFParser; parse; 176
>>>> org.pdfbox.pdmodel.PDDocument; load; 707
>>>> com.josiejune.documentdispatch.models.Document$DocumentParser; getPDFContents; 245
>>> According to the stack trace you're using a quite old (non-apache) version of pdfbox. I suggest to update to a more recent version from [1]
>>> 
>>> BR
>>> Andreas Lehmkühler
>>> 
>>> 
>>> [1] hhtp://pdfbox.apache.org/download.html
>> 
>

Re: IOException with PDFParser

Posted by Andreas Lehmkühler <an...@lehmi.de>.

Hi,


Am 03.11.10 01:52, schrieb Max Gravitt:
> Hi,
>
> I should have clarified the question.  I am using this version because I am running the library on Google App Engine and this is the version that is compatible.  If I can't make this older version compatible with the new PDFs, is there a way to retrofit the most recent version to Google App engine?
PDFBox was improved a lot since it came to apache. So there will be some 
differences compared to older versions, but there are also a lot of 
technical aspects which are still the same.
As I'm not a GAE expert I can't answer your question in detail, but I 
know from other users that pdfbox has to be altered to work with in the GAE.

BR
Andreas Lehmkühler

> On Nov 2, 2010, at 8:37 PM, Andreas Lehmkühler wrote:
>
>> Hi,
>>
>> Am 03.11.10 01:32, schrieb Max Gravitt:
>>> Hi,
>>> I recently started to attempt to parse faxes that are PDF'd and sent via email.  I continually get the below exception with these types of files.  Does anyone have thoughts on the root cause and if there is any workaround?
>>> thanks,
>>> MG
>>>
>>> IOException
>>> expected='endobj' firstReadAttempt='' secondReadAttempt='' org.pdfbox.io.PushBackInputStream@d2f5f1
>>> org.pdfbox.pdfparser.PDFParser; parseObject; 502
>>> org.pdfbox.pdfparser.PDFParser; parse; 176
>>> org.pdfbox.pdmodel.PDDocument; load; 707
>>> com.josiejune.documentdispatch.models.Document$DocumentParser; getPDFContents; 245
>> According to the stack trace you're using a quite old (non-apache) version of pdfbox. I suggest to update to a more recent version from [1]
>>
>> BR
>> Andreas Lehmkühler
>>
>>
>> [1] hhtp://pdfbox.apache.org/download.html
>

Re: IOException with PDFParser

Posted by Max Gravitt <mg...@me.com>.

Hi,

I should have clarified the question.  I am using this version because I am running the library on Google App Engine and this is the version that is compatible.  If I can't make this older version compatible with the new PDFs, is there a way to retrofit the most recent version to Google App engine?

thanks!
MG

On Nov 2, 2010, at 8:37 PM, Andreas Lehmkühler wrote:

> Hi,
> 
> Am 03.11.10 01:32, schrieb Max Gravitt:
>> Hi,
>> I recently started to attempt to parse faxes that are PDF'd and sent via email.  I continually get the below exception with these types of files.  Does anyone have thoughts on the root cause and if there is any workaround?
>> thanks,
>> MG
>> 
>> IOException
>> expected='endobj' firstReadAttempt='' secondReadAttempt='' org.pdfbox.io.PushBackInputStream@d2f5f1
>> org.pdfbox.pdfparser.PDFParser; parseObject; 502
>> org.pdfbox.pdfparser.PDFParser; parse; 176
>> org.pdfbox.pdmodel.PDDocument; load; 707
>> com.josiejune.documentdispatch.models.Document$DocumentParser; getPDFContents; 245
> According to the stack trace you're using a quite old (non-apache) version of pdfbox. I suggest to update to a more recent version from [1]
> 
> BR
> Andreas Lehmkühler
> 
> 
> [1] hhtp://pdfbox.apache.org/download.html

Re: IOException with PDFParser

Posted by Andreas Lehmkühler <an...@lehmi.de>.

Hi,

Am 03.11.10 01:32, schrieb Max Gravitt:
> Hi,
> I recently started to attempt to parse faxes that are PDF'd and sent via email.  I continually get the below exception with these types of files.  Does anyone have thoughts on the root cause and if there is any workaround?
> thanks,
> MG
>
> IOException
> expected='endobj' firstReadAttempt='' secondReadAttempt='' org.pdfbox.io.PushBackInputStream@d2f5f1
> org.pdfbox.pdfparser.PDFParser; parseObject; 502
> org.pdfbox.pdfparser.PDFParser; parse; 176
> org.pdfbox.pdmodel.PDDocument; load; 707
> com.josiejune.documentdispatch.models.Document$DocumentParser; getPDFContents; 245
According to the stack trace you're using a quite old (non-apache) 
version of pdfbox. I suggest to update to a more recent version from [1]

BR
Andreas Lehmkühler


[1] hhtp://pdfbox.apache.org/download.html