You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by Stefan Groschupf <sg...@media-style.com> on 2005/12/16 12:13:33 UTC

Re: [Nutch-dev] distributed seach

Hi Ledio,
the actually nutch is 0.7 or you can also use the 0.8 branch code.
Also you are using old mailing lists and I suggest you use the apache  
nutch user mailing list.
http://lucene.apache.org/nutch/mailing_lists.html
To answer your question, nutch does forward the query to all search  
server and collect the and rerank the results of the search servers.
So give each of your servers a physically split of your index.
This will improve your performance. Also check that the index parts  
are not stored on the same hdd and your search servers have as much  
RAM as possiböe.
HTH
Stefan




Am 16.12.2005 um 03:00 schrieb Ledio Ago:

> I was able to setup nutch searchers in distributed fashion buy  
> creating the search-server.txt files
> at the root of the data where Tomcat was running.  I had a total of  
> 1.9 MM URLs slit in half for
> each searcher.
> I was very surprised to see that the performance numbers I got for  
> this set up was not as good as
> I was expecting.  Before I ran this setup, I run the test in a  
> single searcher with 1.9 MM URLs.
> The results for the distributed setup were the same or even.
>
> One thing that I suspect is that Tomcat is querying each nutch  
> search server synchronously
> instead of asynchronously, by querying each server one at the time,  
> because that would explain a lot.
>
> Can somebody tell me if this is true??
>
> I'm running Nutch 0.5 with very beefy machines.
>
> Thanks,
>
> Ledio


RE: [Nutch-dev] distributed seach

Posted by Ledio Ago <la...@looksmart.net>.
Thank you Stefan for the reply.

I did have seperate physical indexes in seperate machines with about 900K URLs in
each of them.  I run Tomcat in one of those boxes, and tested the load.  I got
the same numbers as I got when I didn't use the distributed search.
So I was suspecting that Tomcat wasn't doing Asynchrounou calls to the nutch
servers, therefore the performace issue.

I'll try versions 0.7 and 0.8 and will see what happens.  Another thing I'll try
is to put Tomcat in a different machine.

Thanks,
Ledio


-----Original Message-----
From: Stefan Groschupf [mailto:sg@media-style.com]
Sent: Fri 16-Dec-05 3:13 AM
To: dev@nutch.org
Cc: nutch-developers@lists.sourceforge.net
Subject: Re: [Nutch-dev] distributed seach
 
Hi Ledio,
the actually nutch is 0.7 or you can also use the 0.8 branch code.
Also you are using old mailing lists and I suggest you use the apache  
nutch user mailing list.
http://lucene.apache.org/nutch/mailing_lists.html
To answer your question, nutch does forward the query to all search  
server and collect the and rerank the results of the search servers.
So give each of your servers a physically split of your index.
This will improve your performance. Also check that the index parts  
are not stored on the same hdd and your search servers have as much  
RAM as possiböe.
HTH
Stefan




Am 16.12.2005 um 03:00 schrieb Ledio Ago:

> I was able to setup nutch searchers in distributed fashion buy  
> creating the search-server.txt files
> at the root of the data where Tomcat was running.  I had a total of  
> 1.9 MM URLs slit in half for
> each searcher.
> I was very surprised to see that the performance numbers I got for  
> this set up was not as good as
> I was expecting.  Before I ran this setup, I run the test in a  
> single searcher with 1.9 MM URLs.
> The results for the distributed setup were the same or even.
>
> One thing that I suspect is that Tomcat is querying each nutch  
> search server synchronously
> instead of asynchronously, by querying each server one at the time,  
> because that would explain a lot.
>
> Can somebody tell me if this is true??
>
> I'm running Nutch 0.5 with very beefy machines.
>
> Thanks,
>
> Ledio