You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by kaveh minooie <ka...@plutoz.com> on 2012/02/18 02:41:21 UTC
concurrency and solrindex
so imagine a scenario in which there is a crawldb and we run generate on
it to get three different segments directories. then we start concurrent
fetch job jobs and parse job on those three segments.
then one of them finishes sooner and we then run updatedb on it.
now here is the question is it safe to run invertlink and solrindex only
on the segment directory that has finished. I am not sure if I am
asking this correctly but my problem is basically with crawl db. both of
solrindex and invertlinks get crawldb as one of their inputs. Is it safe
to use crawldb for these purposes while there are still "checked out"
urls (the two other fetch job that are still running) out there?
thanks,
--
Kaveh Minooie
www.plutoz.com