You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Jeff Love <do...@gmail.com> on 2005/10/24 17:02:54 UTC

Beyond the Tutorial

I have been using Nutch for over a year now to run a number of search engine
sites. For most of them I just do the basic intranet crawl by injecting a
list of urls that we want included in the index. Now I want to go beyond the
basic crawl. Specifically what I want to do is be able to do the initial
crawl and then add or remove sites from the index. I also want to be able to
setup a cron job that will index all new sites that have been added and
recrawl sites that have expired. I've tried finding ways to do this but
haven't had much luck. Does anyone have a tutorial or instructions that they
use to manage the index after the initial crawl? Thanks for any help given.

--
Jeff Love