You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Daniel Herlitz <da...@pricerunner.com> on 2005/04/06 23:10:52 UTC

Search performance under high load

Hi everybody,

We have been using Lucene for about one year now with great success. 
Recently though the index has growed noticably and so has the number of 
searches. I was wondering if anyone would like to comment on these 
figures and say if it works for them?

Index size: ~2.5 GB, on disk
Number of fields: ~30
Number of indexed fields: ~10
Server: Linux, Intel(R) Xeon(TM) CPU 3.00GHz, 3GB, dedicated to Lucene 
searches.
Java: Sun 1.5, -Xmx1200m
Load: Approaching 2000 requests / hour.
Queries: The query strings are of highly differing complexity, from 
simple x:y to long queries involving conjunctions, disjunctions and 
wildecard queries.

90% of the queries run brilliantly. Problem is that 10% of the queries 
(simple or not) take a long time, on average more that 10 seconds, 
sometimes several minutes.

We have managed to track down these figures to the calls to 
IndexSearcher.search(Query). We have seen up to about 10 searches 
concurrently executing.

We have tried to run the server on different machines and with different 
version of Java. We have no OutOfMemorys.

I am curious about what to expect from Lucene when it comes to 
searching. There are lots of figures about the indexing speed (no 
question about that, it's incredibly fast!). But what about searching? 
And searching with the kind of load we have. Anyone in the same 
situation as we are? Comments? Suggestions?

Thanks
Daniel




---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Search performance under high load

Posted by David Spencer <da...@tropo.com>.
Daniel Herlitz wrote:

> Hi everybody,
> 
> We have been using Lucene for about one year now with great success. 
> Recently though the index has growed noticably and so has the number of 
> searches. I was wondering if anyone would like to comment on these 
> figures and say if it works for them?
> 
> Index size: ~2.5 GB, on disk
> Number of fields: ~30
> Number of indexed fields: ~10
> Server: Linux, Intel(R) Xeon(TM) CPU 3.00GHz, 3GB, dedicated to Lucene 
> searches.
> Java: Sun 1.5, -Xmx1200m

For perf tuning on 1.4+ VMs I always try these flags too:

-server
-XX:CompileThreshold=100
-Xverify:none

And also worth considering is giving a -Xms value equal to -Xmx.



> Load: Approaching 2000 requests / hour.
> Queries: The query strings are of highly differing complexity, from 
> simple x:y to long queries involving conjunctions, disjunctions and 
> wildecard queries.
> 
> 90% of the queries run brilliantly. Problem is that 10% of the queries 
> (simple or not) take a long time, on average more that 10 seconds, 
> sometimes several minutes.
> 
> We have managed to track down these figures to the calls to 
> IndexSearcher.search(Query). We have seen up to about 10 searches 
> concurrently executing.
> 
> We have tried to run the server on different machines and with different 
> version of Java. We have no OutOfMemorys.
> 
> I am curious about what to expect from Lucene when it comes to 
> searching. There are lots of figures about the indexing speed (no 
> question about that, it's incredibly fast!). But what about searching? 
> And searching with the kind of load we have. Anyone in the same 
> situation as we are? Comments? Suggestions?

Well in a benchmark I was doing recently fuzzy queries were the problem 
in the mix I had - but to be fair, a fuzzy search is really just a big 
query as it expands query to be all "similar" terms.

Also of interest is what's the problem w/ the long running queries - are 
they slowing down the response time for the other users w/ shorter 
queries?

I've never done this, but you could consider a thread pool to execute 
the queries, and once a query takes more than, say, a second, you lower 
its priority.

Also, I'd have a rule like no more than "n" slow queries can run at 
once, so you queue up slow queries if there are lots of them executing.



> 
> Thanks
> Daniel
> 
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Search performance under high load

Posted by Paul Elschot <pa...@xs4all.nl>.
Daniel,

On Thursday 07 April 2005 00:54, Chris Hostetter wrote:
> 
> : Queries: The query strings are of highly differing complexity, from
> : simple x:y to long queries involving conjunctions, disjunctions and
> : wildecard queries.
> :
> : 90% of the queries run brilliantly. Problem is that 10% of the queries
> : (simple or not) take a long time, on average more that 10 seconds,
> : sometimes several minutes.
> 
> without knowing the nature of the queries, these numbers are not outside
> the realm of possibility.  there have been examples on the list in the
> last few days of how BooleanQueries constructed with deep nesting have
> particularly bad performance.
> 
> I would suggest you timing logs to your Search code so that you get one
> log line per search executed telling you:
> 
> 1) the time of day the search was executed
> 2) the total time taken by the Searcher.search(Query) call
> 3) the Query.toString() of the search.
> 4) the Hits.length() of the result.
> 5) any tracking information to help you identify where the search came
>    from (ie: canned search from a category listing page, user entered
>    freeform text, your RSS feed genertor, etc...)
> 
> This will help you determine:
> 
>  a) is there a common element to the structure of queries that take more
>     then a certain amount of time?

In general, disjunctions (truncations, fuzzy queries) are slow, and
conjunctions (required terms, filters) are faster.

>  b) are the slow queries clustured by time of day? is anything else
>     happening on that box during that time?
>  c) are the "slow" queries all resulting in a high number of Hits?
>  d) are the slow searches all orriginating from a single source? (ie: are
>     the queries needed by categlory listing pages all really slow) can
>     they be re-implimented differently?
>  e) is there anything else the slow queries have in common?

I think your case is CPU bound, so you have a few options:
- use more CPU's,
- get in touch with the 'power' users, (via the logs as suggested by Chris)
  and find out it there are  simple measures you can take to help performance
  for them. For example, replacing a range that is repeatedly used by a cached
  filter can be quite effective.

Regards,
Paul Elschot


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Search performance under high load

Posted by Chris Hostetter <ho...@fucit.org>.
: Queries: The query strings are of highly differing complexity, from
: simple x:y to long queries involving conjunctions, disjunctions and
: wildecard queries.
:
: 90% of the queries run brilliantly. Problem is that 10% of the queries
: (simple or not) take a long time, on average more that 10 seconds,
: sometimes several minutes.

without knowing the nature of the queries, these numbers are not outside
the realm of possibility.  there have been examples on the list in the
last few days of how BooleanQueries constructed with deep nesting have
particularly bad performance.

I would suggest you timing logs to your Search code so that you get one
log line per search executed telling you:

1) the time of day the search was executed
2) the total time taken by the Searcher.search(Query) call
3) the Query.toString() of the search.
4) the Hits.length() of the result.
5) any tracking information to help you identify where the search came
   from (ie: canned search from a category listing page, user entered
   freeform text, your RSS feed genertor, etc...)

This will help you determine:

 a) is there a common element to the structure of queries that take more
    then a certain amount of time?
 b) are the slow queries clustured by time of day? is anything else
    happening on that box during that time?
 c) are the "slow" queries all resulting in a high number of Hits?
 d) are the slow searches all orriginating from a single source? (ie: are
    the queries needed by categlory listing pages all really slow) can
    they be re-implimented differently?
 e) is there anything else the slow queries have in common?



-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org