You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@nutch.apache.org by Fabrice Estiévenart <fe...@cetic.be> on 2006/03/23 17:03:14 UTC

Crawl a list of domains without going out ?

Hello,

I'm discovering Nutch for a few days and it works very nice. The 
tutorial accurately explains how to crawl an Intranet or the www but my 
task sits between both.

 From a list of URLs stored in a flat text file, i'd like using Nutch to 
crawl/index completely these websites without crawling out. After a 
first crawl, i'd like to be able to easily add new URLs in the text file 
and update the index.

Could anyone help a newbie like me ?

Thanks for this great project !!!

Fabrice