You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@pdfbox.apache.org by Ga...@sungard.com on 2015/03/06 21:05:48 UTC

PDFParser Error Caused by: org.apache.pdfbox.exceptions.WrappedIOException

Hello,
I am getting PDFParser Error Caused by: org.apache.pdfbox.exceptions.WrappedIOException
Complete stack trace is on the following link.
( http://apaste.info/DRD )

I am trying to import 4GB Long PDF using Tika into Solr. I was able to import up to 500MB. 


Please suggest if there is any workaround.

Thanks
G

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org

Re: PDFParser Error Caused by: org.apache.pdfbox.exceptions.WrappedIOException

Posted by John Hewson <jo...@jahewson.com>.

> On 6 Mar 2015, at 12:05, Ganesh.Yadav@sungard.com wrote:
> 
> Hello,
> I am getting PDFParser Error Caused by: org.apache.pdfbox.exceptions.WrappedIOException
> Complete stack trace is on the following link.
> ( http://apaste.info/DRD )
> 
> I am trying to import 4GB Long PDF using Tika into Solr. I was able to import up to 500MB. 

Just checking - you gave java at least 4GB of heap, right?

— John

> Please suggest if there is any workaround.
> 
> Thanks
> G
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: users-help@pdfbox.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org

Re: PDFParser Error Caused by: org.apache.pdfbox.exceptions.WrappedIOException

Posted by Tilman Hausherr <TH...@t-online.de>.

Sorry, wrong links, use these:
https://repository.apache.org/content/groups/snapshots/org/apache/pdfbox/pdfbox-app/1.8.9-SNAPSHOT/
https://repository.apache.org/content/groups/snapshots/org/apache/pdfbox/pdfbox-app/2.0.0-SNAPSHOT/

Tilman

Am 07.03.2015 um 14:21 schrieb Tilman Hausherr:
> The best would be to test whether that file can be handled by newer 
> versions of PDFBox (1.8.9 and 2.0)
>
> https://repository.apache.org/content/groups/snapshots/org/apache/pdfbox/pdfbox/1.8.9-SNAPSHOT/ 
>
> https://repository.apache.org/content/groups/snapshots/org/apache/pdfbox/pdfbox/2.0.0-SNAPSHOT/ 
>
>
> download the jar files, for each one try
>
>     - run java -jar <jarfile> ExtractText <yourfile>
>     - see what happens
>     - tell it
>
> Your paste indicates a problem in RandomAccessBuffer.java.
>
> Tilman
>
> Am 06.03.2015 um 21:05 schrieb Ganesh.Yadav@sungard.com:
>> Hello,
>> I am getting PDFParser Error Caused by: 
>> org.apache.pdfbox.exceptions.WrappedIOException
>> Complete stack trace is on the following link.
>> ( http://apaste.info/DRD )
>>
>> I am trying to import 4GB Long PDF using Tika into Solr. I was able 
>> to import up to 500MB.
>>
>>
>> Please suggest if there is any workaround.
>>
>> Thanks
>> G
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
>> For additional commands, e-mail: users-help@pdfbox.apache.org
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: users-help@pdfbox.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org

Re: PDFParser Error Caused by: org.apache.pdfbox.exceptions.WrappedIOException

Posted by Tilman Hausherr <TH...@t-online.de>.

The best would be to test whether that file can be handled by newer 
versions of PDFBox (1.8.9 and 2.0)

https://repository.apache.org/content/groups/snapshots/org/apache/pdfbox/pdfbox/1.8.9-SNAPSHOT/
https://repository.apache.org/content/groups/snapshots/org/apache/pdfbox/pdfbox/2.0.0-SNAPSHOT/

download the jar files, for each one try

     - run java -jar <jarfile> ExtractText <yourfile>
     - see what happens
     - tell it

Your paste indicates a problem in RandomAccessBuffer.java.

Tilman

Am 06.03.2015 um 21:05 schrieb Ganesh.Yadav@sungard.com:
> Hello,
> I am getting PDFParser Error Caused by: org.apache.pdfbox.exceptions.WrappedIOException
> Complete stack trace is on the following link.
> ( http://apaste.info/DRD )
>
> I am trying to import 4GB Long PDF using Tika into Solr. I was able to import up to 500MB.
>
>
> Please suggest if there is any workaround.
>
> Thanks
> G
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: users-help@pdfbox.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org