You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by Fuad Efendi <fu...@efendi.ca> on 2005/08/29 04:09:35 UTC

Re-Crawl?

Hello,

I just executed "crawl -depth 5". What should I do if I want to refresh
databases daily? Easiest way is to delete, and recreate. May I issue
sequence of commands like 

5 times: FetchListTool, Fetcher, UpdateDatabaseTool,
UpdateSegmentsFromDb, IndexSegment, DeleteDuplicates
1 time: IndexMerger


Should I maintain old subfolders of SEGMENTS, or easiest way is simply
delete everything except DB?

Just thinking how to automate without any downtime. deleting,
recreating, deleting index, creating index, restarting - it's not good.

Thanks,
Fuad Efendi