You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by Gaurang Patel <ga...@gmail.com> on 2009/10/06 00:18:21 UTC

generate, fetch- nutch commands

All,

I am a masters student and want to crawl the whole web for my masters
project.

While trying to generate, fetch, crawl the whole web using Nutch (I am
following steps from http://lucene.apache.org/nutch/tutorial8.html), I got
confused among various nutch terms and usage:
1) What is the purpose and difference between *crawl_fetch *and* crawldb* ?
If nutch stores all the info regarding urls in * crawldb*, then what is the
need for *crawl_fetch*?
2) Moreover, what does fetch and generate do? Can anyone describe in detail?
Is there any documentation for nutch commands like generate, fetch, etc?


Thanks & Regards,
Gaurang Patel