You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Yasufumi Mizoguchi <ya...@gmail.com> on 2018/07/24 02:47:45 UTC

How to use tika-OCR in data import handler?

Hi,

I am trying to use tika-OCR(Tesseract) in data import handler
and found that processing English documents was quite good.

But I am struggling to process the other languages such as
Japanese, Chinese, etc...

So, I want to know how to switch Tesseract-OCR's processing
language via data import handler config or tikaConfig param.

Any points would be appreciated.

Thanks,
Yasufumi