You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Kranti™ K K Parisa <kr...@gmail.com> on 2010/02/04 13:22:21 UTC

Best OCR API for solr

Hi,

Can anyone list the best OCR APIs available to use in combination with SOLR.

The idea is to take a scanned file (format could be pdf,word,image..etc) as
input and give OCRd file which could be used to get the contents for the
SOLR indexing.

Best Regards,
Kranti K K Parisa

Re: Best OCR API for solr

Posted by Kranti™ K K Parisa <kr...@gmail.com>.
yes tika indexes all formats.

but i am specifically looking for OCR (thru java) atleast for PDF or JPEG
images

any clues?

Best Regards,
Kranti K K Parisa



On Thu, Feb 4, 2010 at 8:29 PM, mike anderson <sa...@gmail.com>wrote:

> There might be an OCR plugin for Apache Tika (which does exactly this out
> of
> the box except for OCR capability, i believe).
>
> http://lucene.apache.org/tika/
>
> -mike
>
>
> 2010/2/4 Kranti™ K K Parisa <kr...@gmail.com>
>
> > Hi,
> >
> > Can anyone list the best OCR APIs available to use in combination with
> > SOLR.
> >
> > The idea is to take a scanned file (format could be pdf,word,image..etc)
> as
> > input and give OCRd file which could be used to get the contents for the
> > SOLR indexing.
> >
> > Best Regards,
> > Kranti K K Parisa
> >
>

Re: Best OCR API for solr

Posted by mike anderson <sa...@gmail.com>.
There might be an OCR plugin for Apache Tika (which does exactly this out of
the box except for OCR capability, i believe).

http://lucene.apache.org/tika/

-mike


2010/2/4 Kranti™ K K Parisa <kr...@gmail.com>

> Hi,
>
> Can anyone list the best OCR APIs available to use in combination with
> SOLR.
>
> The idea is to take a scanned file (format could be pdf,word,image..etc) as
> input and give OCRd file which could be used to get the contents for the
> SOLR indexing.
>
> Best Regards,
> Kranti K K Parisa
>