You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Ali Nazemian <al...@gmail.com> on 2014/06/05 21:25:26 UTC

re-crawling with nutch 1.8

Hi,
I recently got familiar with nutch and I want to use nutch for whole web
crawling. The problem is I did not find any useful tutorial on how to
re-crawl using nutch. I know that there is some configuration parameter
that should change for purpose of recrawling, I am aware of them. The thing
that I dont know is how can I run a crawler for crawl as first step and
recrawl as the next steps? As far as I found out the default crawl script
that is provided with nutch could not be used for my purpose. Could
somebody tell me how can I do that? What are the prerequisites? Do I need
web application server such as tomcat for this purpose?
FYI I am using nutch 1.8.

Regards.

-- 
A.Nazemian