You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by Chris Mattmann <ma...@apache.org> on 2018/02/22 17:31:37 UTC

Re: Issue with apache Tika

Try UTF-8 encoding the URLs or the parameters themselves. If you are using Tika-Python, then use the Python
encode library…

 

Cheers,

Chris

 

 

 

From: radhia bezzine <be...@gmail.com>
Date: Thursday, February 22, 2018 at 6:03 AM
To: "Mattmann, Chris A (1761)" <ch...@jpl.nasa.gov>
Subject: Issue with apache Tika

 

Hello Dear ! 

 

I hope your are doing well.

 

I am writing to you because i have an issue running apache Tika on Python.

I'm trying to parse content & metadata from many urls (existing in the internet)

however Tika returns some times an error like " invalid argument "

i troubleshooted  the problem and i realized that some url include forbidden characters that is why apache tika mention " invalid argument "

I really don't know how to deal with this problem, i tried other tools but i think tika is matching with my need.

 

Thank you very much for you time.

 

Best regards! 

 

Radhia