You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@nutch.apache.org by Alp Timurhan Çevik <at...@turkguven.com> on 2015/07/20 00:09:28 UTC

2.3.1 and version control

Hello,

I would like to use Tesseract OCR within nutch, in order to parse scanned
pdf files (assuming this is the correct (and only?) way of doing that).
Skimming through the previous emails, I noticed the support is possible by
using 2.3.1, which works alongside with tika 1.7+, which is needed for ocr.

I looked though the repositories, subversion and github, but were not able
to find any tag/branch for 2.3.1. There is one for 2.4, which is in
development and has 100 smth open issues.

My question is, is there anywhere I can reach 2.3.1, if not, is it safe to
use 2.4 trunk ? any planned release dates ? any other suggestions ?

Best regards,
Alp