You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by kaveh minooie <ka...@plutoz.com> on 2012/02/18 02:41:21 UTC

concurrency and solrindex


so imagine a scenario in which there is a crawldb and we run generate on 
it to get three different segments directories. then we start concurrent 
fetch job jobs and parse job on those three segments.

then one of them finishes sooner and we then run updatedb on it.

now here is the question is it safe to run invertlink and solrindex only 
on the segment directory that has finished. I  am not sure if I am 
asking this correctly but my problem is basically with crawl db. both of 
solrindex and invertlinks get crawldb as one of their inputs. Is it safe 
to use crawldb for these purposes while there are still "checked out" 
urls (the two other fetch job that are still running) out there?

thanks,
-- 
Kaveh Minooie

www.plutoz.com