You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@nutch.apache.org by Chester <11...@qq.com> on 2009/01/06 09:56:01 UTC

About Nutch distributed search implement

Hi everyone!

When I read the source code of Nutch,there is one thing that made me
confused.That is about distributed search.

In my opinion,the way that implement achieves distributed search in Nutch is
to divide the big index into many small indexes and then search them
parallelly via several computers.But in the code,I saw the following:



  public IndexSearcher(Path[] indexDirs, Configuration conf) throws
IOException {
    IndexReader[] readers = new IndexReader[indexDirs.length];
    this.conf = conf;
    this.fs = FileSystem.get(conf);
    for (int i = 0; i < indexDirs.length; i++) {
      readers[i] = IndexReader.open(getDirectory(indexDirs[i]));
    }
    init(new MultiReader(readers), conf);
  }


As you know, in DFS environment, the index has probably been saved in other
machines. Thus,  if you want to read it, you have to via the network. Here
comes the problem: the speed of  network is much slower than that of  local
disk, so I think it will take the seach too long. I guess the developer have
considerd this issue, and I’m eager to know how it works?
-- 
View this message in context: http://www.nabble.com/About-Nutch-distributed-search-implement-tp21306743p21306743.html
Sent from the Nutch - Dev mailing list archive at Nabble.com.