You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Alex Basa <al...@yahoo.com> on 2008/10/23 15:17:04 UTC

Crawl and Merge questions

Does anyone know what crawl output directories are required on a successful crawl?  Are crawldb, indexes, index, linkdb and segments all required to have a successful merge?

I'm crawling on 5 servers and writing to the SAN.  Everything goes fast and fine (up to several million documents).  My problem is when I merge the indexes using the mergecrawls.sh, it takes a very long time.  Is there any performance tuning that you can do to speed up the mergecrawls?

Thanks in advance,

Alex