You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@nutch.apache.org by Peyman Faratin <pe...@twosigmaiq.com> on 2018/08/15 14:33:04 UTC

Tesseract/Tika certain pages

Hi

I am a noobie to nutch. I am using version 1.15. What I would like to do is have tika ocr images, but only if the url matches some keywords. I am not sure how to go about configuring nutch to do either of these tasks. 

Any help would be much appreciated. 

Peyman