You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@pdfbox.apache.org by Max Gravitt <mg...@me.com> on 2010/11/03 01:32:58 UTC
IOException with PDFParser
Hi,
I recently started to attempt to parse faxes that are PDF'd and sent via email. I continually get the below exception with these types of files. Does anyone have thoughts on the root cause and if there is any workaround?
thanks,
MG
IOException
expected='endobj' firstReadAttempt='' secondReadAttempt='' org.pdfbox.io.PushBackInputStream@d2f5f1
org.pdfbox.pdfparser.PDFParser; parseObject; 502
org.pdfbox.pdfparser.PDFParser; parse; 176
org.pdfbox.pdmodel.PDDocument; load; 707
com.josiejune.documentdispatch.models.Document$DocumentParser; getPDFContents; 245
Re: IOException with PDFParser
Posted by Max Gravitt <mg...@me.com>.
Great, thanks. Question for the community - Has the latest version of PDFBox been altered to work Google App Engine? If so, how can I adopt it?
thanks
MG
On Nov 2, 2010, at 9:30 PM, Andreas Lehmkühler wrote:
> Hi,
>
>
> Am 03.11.10 01:52, schrieb Max Gravitt:
>> Hi,
>>
>> I should have clarified the question. I am using this version because I am running the library on Google App Engine and this is the version that is compatible. If I can't make this older version compatible with the new PDFs, is there a way to retrofit the most recent version to Google App engine?
> PDFBox was improved a lot since it came to apache. So there will be some differences compared to older versions, but there are also a lot of technical aspects which are still the same.
> As I'm not a GAE expert I can't answer your question in detail, but I know from other users that pdfbox has to be altered to work with in the GAE.
>
> BR
> Andreas Lehmkühler
>
>> On Nov 2, 2010, at 8:37 PM, Andreas Lehmkühler wrote:
>>
>>> Hi,
>>>
>>> Am 03.11.10 01:32, schrieb Max Gravitt:
>>>> Hi,
>>>> I recently started to attempt to parse faxes that are PDF'd and sent via email. I continually get the below exception with these types of files. Does anyone have thoughts on the root cause and if there is any workaround?
>>>> thanks,
>>>> MG
>>>>
>>>> IOException
>>>> expected='endobj' firstReadAttempt='' secondReadAttempt='' org.pdfbox.io.PushBackInputStream@d2f5f1
>>>> org.pdfbox.pdfparser.PDFParser; parseObject; 502
>>>> org.pdfbox.pdfparser.PDFParser; parse; 176
>>>> org.pdfbox.pdmodel.PDDocument; load; 707
>>>> com.josiejune.documentdispatch.models.Document$DocumentParser; getPDFContents; 245
>>> According to the stack trace you're using a quite old (non-apache) version of pdfbox. I suggest to update to a more recent version from [1]
>>>
>>> BR
>>> Andreas Lehmkühler
>>>
>>>
>>> [1] hhtp://pdfbox.apache.org/download.html
>>
>
Re: IOException with PDFParser
Posted by Andreas Lehmkühler <an...@lehmi.de>.
Hi,
Am 03.11.10 01:52, schrieb Max Gravitt:
> Hi,
>
> I should have clarified the question. I am using this version because I am running the library on Google App Engine and this is the version that is compatible. If I can't make this older version compatible with the new PDFs, is there a way to retrofit the most recent version to Google App engine?
PDFBox was improved a lot since it came to apache. So there will be some
differences compared to older versions, but there are also a lot of
technical aspects which are still the same.
As I'm not a GAE expert I can't answer your question in detail, but I
know from other users that pdfbox has to be altered to work with in the GAE.
BR
Andreas Lehmkühler
> On Nov 2, 2010, at 8:37 PM, Andreas Lehmkühler wrote:
>
>> Hi,
>>
>> Am 03.11.10 01:32, schrieb Max Gravitt:
>>> Hi,
>>> I recently started to attempt to parse faxes that are PDF'd and sent via email. I continually get the below exception with these types of files. Does anyone have thoughts on the root cause and if there is any workaround?
>>> thanks,
>>> MG
>>>
>>> IOException
>>> expected='endobj' firstReadAttempt='' secondReadAttempt='' org.pdfbox.io.PushBackInputStream@d2f5f1
>>> org.pdfbox.pdfparser.PDFParser; parseObject; 502
>>> org.pdfbox.pdfparser.PDFParser; parse; 176
>>> org.pdfbox.pdmodel.PDDocument; load; 707
>>> com.josiejune.documentdispatch.models.Document$DocumentParser; getPDFContents; 245
>> According to the stack trace you're using a quite old (non-apache) version of pdfbox. I suggest to update to a more recent version from [1]
>>
>> BR
>> Andreas Lehmkühler
>>
>>
>> [1] hhtp://pdfbox.apache.org/download.html
>
Re: IOException with PDFParser
Posted by Max Gravitt <mg...@me.com>.
Hi,
I should have clarified the question. I am using this version because I am running the library on Google App Engine and this is the version that is compatible. If I can't make this older version compatible with the new PDFs, is there a way to retrofit the most recent version to Google App engine?
thanks!
MG
On Nov 2, 2010, at 8:37 PM, Andreas Lehmkühler wrote:
> Hi,
>
> Am 03.11.10 01:32, schrieb Max Gravitt:
>> Hi,
>> I recently started to attempt to parse faxes that are PDF'd and sent via email. I continually get the below exception with these types of files. Does anyone have thoughts on the root cause and if there is any workaround?
>> thanks,
>> MG
>>
>> IOException
>> expected='endobj' firstReadAttempt='' secondReadAttempt='' org.pdfbox.io.PushBackInputStream@d2f5f1
>> org.pdfbox.pdfparser.PDFParser; parseObject; 502
>> org.pdfbox.pdfparser.PDFParser; parse; 176
>> org.pdfbox.pdmodel.PDDocument; load; 707
>> com.josiejune.documentdispatch.models.Document$DocumentParser; getPDFContents; 245
> According to the stack trace you're using a quite old (non-apache) version of pdfbox. I suggest to update to a more recent version from [1]
>
> BR
> Andreas Lehmkühler
>
>
> [1] hhtp://pdfbox.apache.org/download.html
Re: IOException with PDFParser
Posted by Andreas Lehmkühler <an...@lehmi.de>.
Hi,
Am 03.11.10 01:32, schrieb Max Gravitt:
> Hi,
> I recently started to attempt to parse faxes that are PDF'd and sent via email. I continually get the below exception with these types of files. Does anyone have thoughts on the root cause and if there is any workaround?
> thanks,
> MG
>
> IOException
> expected='endobj' firstReadAttempt='' secondReadAttempt='' org.pdfbox.io.PushBackInputStream@d2f5f1
> org.pdfbox.pdfparser.PDFParser; parseObject; 502
> org.pdfbox.pdfparser.PDFParser; parse; 176
> org.pdfbox.pdmodel.PDDocument; load; 707
> com.josiejune.documentdispatch.models.Document$DocumentParser; getPDFContents; 245
According to the stack trace you're using a quite old (non-apache)
version of pdfbox. I suggest to update to a more recent version from [1]
BR
Andreas Lehmkühler
[1] hhtp://pdfbox.apache.org/download.html