You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by Chris Mattmann <ma...@apache.org> on 2017/05/03 13:58:09 UTC

Re: Apache Tika

Hi Gorka,

 

See: http://wiki.apache.org/tika/TikaOCR/

 

Is that what you’re looking for? If so, then you can simply enable OCR for Tika REST server, and then
point your TIka Python at that. Does that help?

 

Cheers,

Chris

 

 

 

 

From: gorka gallo <go...@gmail.com>
Date: Wednesday, May 3, 2017 at 2:19 AM
To: "Mattmann, Chris A (3010)" <ch...@jpl.nasa.gov>
Subject: Apache Tika

 

Hi Chris, 

 

I am Gorka Gallo, a research technician from Bilbao, Spain.

 

Is there any method to extract embedded images in PDF files with Apache Tika using Python?

 

Thanks,

 

Best regards,

Gorka.