You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by og...@yahoo.com on 2005/10/15 17:32:48 UTC
Re: [Nutch-general] RE: Nutch Search Speed Concern
Hello,
--- Paul Harrison <pa...@personifi.com> wrote:
> I too would love to hear some answers on this one. We have a 100
> million
> page implementation on 5 machines, 4 GB of ram, and 2 SATA drives of
> 250 GB
> each. Part of what I have noticed is that Lucene does some sort of
> strange
> caching in that if you do subsequent searches on a search the return
> results
> are quite quick. I too have noticed that different terms have
That's probably your OS/FS caching. Lucene doesn't cache anything.
> different
> search responses and that the problem gets worse with the number of
> terms in
> the query.
Yes, that makes sense. More complex queries will have to dig through
the index more than simple ones, consequently taking more time to
return hits.
> I have also noticed that distributed search has problems.
> The
> main search machine waits on other machines to serve up their results
> before
> it will respond. So it appears that your search is only as fast as
> your
> slowest responding machine or whenever the timeout hits (whichever
> comes first).
I'm no expert, but this sounds reasonable to me - what if your closest
matches happen to be in the index on the slowest search server?
Otis
> If anyone has any suggestions on tuning the distributed
> search or
> general suggestions on speeding up retrieval times with a large set,
> I am
> all ears.
>
> Thanks,
>
> Paul
>
> -----Original Message-----
> From: TL [mailto:drunkiegq@yahoo.com]
> Sent: Thursday, October 13, 2005 12:15 PM
> To: nutch-user@lucene.apache.org
> Subject: Nutch Search Speed Concern
>
> Search Speed
>
> What are the most important factors in nutch/lucene's
> search speed?
>
> I've been testing nutch's search speed on a search
> pool with about 100M records (separated evenly into 30
> segments), and discovered that certain search terms
> have a signicantly higher search time then others.
> Some searches take 30 ms while others takes upwards of
> 3000ms.
>
> At first, there seemed to be a direct relationship
> between the total number of results from a given query
> and the timeit took to complete. But after further
> testing, that relationship did not hold true for all
> cases. There seems to be other factors that directly
> affect the speed of a search.
>
> Has anyone else encountered this issue? Or have some
> insight to the impact of certain factors on search
> speed?
>
> Thanks.
>
> - T
>
>
>
> __________________________________
> Yahoo! Music Unlimited
> Access over 1 million songs. Try it free.
> http://music.yahoo.com/unlimited/
>
>
>
> -------------------------------------------------------
> This SF.Net email is sponsored by:
> Power Architecture Resource Center: Free content, downloads,
> discussions,
> and more. http://solutions.newsforge.com/ibmarch.tmpl
> _______________________________________________
> Nutch-general mailing list
> Nutch-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nutch-general
>