You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Herman Kiefus <he...@angieslist.com> on 2011/10/12 16:12:04 UTC

Extreme QTime

We service about 25K of each particular query type per hour per server.  QTime *averages* less than a second; however, there always a few (1-10) whose QTimes go way above (10 - 500 seconds) the average.  If I harvest these queries from the log an re-execute them they of course execute sub-second.  Why are some of these queries running long?

My first thought was perhaps these queries were occurring subsequent to replication commits, which happen every 10 minutes; however, there seem to be no clustering of these events around a 10 minute periodic cycle.  (Given that I have not established any appropriate warming queries, this seemed a logical conclusion).

My next though was to compare the times that these queries executed versus what I see in the log file (grep SEVERE...)  But I found nothing to correlate.

Do you folks have any ideas?

Re: Extreme QTime

Posted by Erick Erickson <er...@gmail.com>.

The first suspect is garbage collection. There
are various tools you can use to monitor GC and
see if those times correlate. There's a great
intro here:
http://www.lucidimagination.com/blog/2011/03/27/garbage-collection-bootcamp-1-0/

But a word of caution when re-executing the
queries. It's *possible* that you are getting
cached responses when you re-execute
the slow ones. Try with a slightly different
query (i.e. change one of the query terms
to something else).

The second thing that could be the culprit
is term caching, especially if you're
faceting or sorting. Try running your
problematical queries after a fresh
restart of Solr. If this is a problem, you
need to think about autowarming and
first/newSearcher options.

Best
Erick

On Wed, Oct 12, 2011 at 10:12 AM, Herman Kiefus <he...@angieslist.com> wrote:
> We service about 25K of each particular query type per hour per server.  QTime *averages* less than a second; however, there always a few (1-10) whose QTimes go way above (10 - 500 seconds) the average.  If I harvest these queries from the log an re-execute them they of course execute sub-second.  Why are some of these queries running long?
>
> My first thought was perhaps these queries were occurring subsequent to replication commits, which happen every 10 minutes; however, there seem to be no clustering of these events around a 10 minute periodic cycle.  (Given that I have not established any appropriate warming queries, this seemed a logical conclusion).
>
> My next though was to compare the times that these queries executed versus what I see in the log file (grep SEVERE...)  But I found nothing to correlate.
>
> Do you folks have any ideas?
>

Re: Extreme QTime

Posted by Otis Gospodnetic <ot...@yahoo.com>.

Hi Herman,

This is very likely due to JVM Garbage Collection, which can stop the whole app if not tuned correctly.  We just did this tuning it for a client whose JVM would lock up like that for minutes at the time.

If you don't have Solr & JVM monitoring, you should consider getting - see http://sematext.com/spm/solr-performance-monitoring/index.html for a service that can monitor your Solr, JVM, OS, etc.

If it is indeed GC that's causing this, you'll want to adjust sizes of different memory regions in the JVM (eden, survivor spaces, old gen), possibly pick a better garbage collector, etc.  There is no exact recipe because each app is different.

Otis
----

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/


----- Original Message -----
> From: Herman Kiefus <he...@angieslist.com>
> To: "solr-user@lucene.apache.org" <so...@lucene.apache.org>
> Cc: 
> Sent: Wednesday, October 12, 2011 10:12 AM
> Subject: Extreme QTime
> 
> We service about 25K of each particular query type per hour per server.  QTime 
> *averages* less than a second; however, there always a few (1-10) whose QTimes 
> go way above (10 - 500 seconds) the average.  If I harvest these queries from 
> the log an re-execute them they of course execute sub-second.  Why are some of 
> these queries running long?
> 
> My first thought was perhaps these queries were occurring subsequent to 
> replication commits, which happen every 10 minutes; however, there seem to be no 
> clustering of these events around a 10 minute periodic cycle.  (Given that I 
> have not established any appropriate warming queries, this seemed a logical 
> conclusion).
> 
> My next though was to compare the times that these queries executed versus what 
> I see in the log file (grep SEVERE...)  But I found nothing to correlate.
> 
> Do you folks have any ideas?
>