You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by og...@yahoo.com on 2005/10/15 17:32:48 UTC

Re: [Nutch-general] RE: Nutch Search Speed Concern

Hello,

--- Paul Harrison <pa...@personifi.com> wrote:

> I too would love to hear some answers on this one.  We have a 100
> million
> page implementation on 5 machines, 4 GB of ram, and 2 SATA drives of
> 250 GB
> each.  Part of what I have noticed is that Lucene does some sort of
> strange
> caching in that if you do subsequent searches on a search the return
> results
> are quite quick.  I too have noticed that different terms have

That's probably your OS/FS caching.  Lucene doesn't cache anything.

> different
> search responses and that the problem gets worse with the number of
> terms in
> the query.

Yes, that makes sense.  More complex queries will have to dig through
the index more than simple ones, consequently taking more time to
return hits.

> I have also noticed that distributed search has problems.
>  The
> main search machine waits on other machines to serve up their results
> before
> it will respond.  So it appears that your search is only as fast as
> your
> slowest responding machine or whenever the timeout hits (whichever
> comes first).

I'm no expert, but this sounds reasonable to me - what if your closest
matches happen to be in the index on the slowest search server?

Otis

> If anyone has any suggestions on tuning the distributed
> search or
> general suggestions on speeding up retrieval times with a large set,
> I am
> all ears.
> 
> Thanks,
> 
> Paul  
> 
> -----Original Message-----
> From: TL [mailto:drunkiegq@yahoo.com] 
> Sent: Thursday, October 13, 2005 12:15 PM
> To: nutch-user@lucene.apache.org
> Subject: Nutch Search Speed Concern
> 
> Search Speed
> 
> What are the most important factors in nutch/lucene's
> search speed?
> 
> I've been testing nutch's search speed on a search
> pool with about 100M records (separated evenly into 30
> segments), and discovered that certain search terms
> have a signicantly higher search time then others.
> Some searches take 30 ms while others takes upwards of
> 3000ms. 
> 
> At first, there seemed to be a direct relationship
> between the total number of results from a given query
> and the timeit took to complete. But after further
> testing, that relationship did not hold true for all
> cases. There seems to be other factors that directly
> affect the speed of a search.
> 
> Has anyone else encountered this issue? Or have some
> insight to the impact of certain factors on search
> speed? 
> 
> Thanks.
> 
> - T
> 
> 
> 		
> __________________________________ 
> Yahoo! Music Unlimited 
> Access over 1 million songs. Try it free.
> http://music.yahoo.com/unlimited/
> 
> 
> 
> -------------------------------------------------------
> This SF.Net email is sponsored by:
> Power Architecture Resource Center: Free content, downloads,
> discussions,
> and more. http://solutions.newsforge.com/ibmarch.tmpl
> _______________________________________________
> Nutch-general mailing list
> Nutch-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nutch-general
>