You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Alex Basa <al...@yahoo.com> on 2008/10/23 15:17:04 UTC
Crawl and Merge questions
Does anyone know what crawl output directories are required on a successful crawl? Are crawldb, indexes, index, linkdb and segments all required to have a successful merge?
I'm crawling on 5 servers and writing to the SAN. Everything goes fast and fine (up to several million documents). My problem is when I merge the indexes using the mergecrawls.sh, it takes a very long time. Is there any performance tuning that you can do to speed up the mergecrawls?
Thanks in advance,
Alex