You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by Stefan Groschupf <sg...@media-style.com> on 2005/11/11 18:32:04 UTC

mapSearcher was Re: Index update and Google Dance

Hi Doug,
> In the future I would like to implement a more automated  
> distributed search system than Nutch currently has.  One way to do  
> this might be to use MapReduce.  Each map task's input could be an  
> index and some segment data.  The map method would serve queries,  
> i.e., run a Nutch DistributedSearch.Server.  It would first copy  
> the index out of NDFS to the local disk, for better performance.

I have 2 questions regarding this mechanism.
First, what you plan to make the running search servers known by the  
master (search client) I can imaging a similar mechanism as the  
tasktracker and jobtracker use, a kind of heart beat message.
Second wouldn't be there also a possibility to solve nutch-92  
(DistributedSearch incorrectly scores results) by first running a map  
reduce task over the indexes that counting terms and than hold this  
somehow in the memory of master (search server client). But I'm not  
sure if that is may to much data.

Stefan