You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by KNitin <ni...@gmail.com> on 2014/02/20 22:58:02 UTC

Tweaking Solr Query Result Cache

Hello

  I have a 4 node cluster running Solr cloud 4.3.1. I have a few large
collections sharded 8 ways across all the 4 nodes (with 2 shards per node).
The size of the shard for the large collections is around 600-700Mb
containing around 250K+ documents.

Currently the size of the query cache is around 512. We have a few jobs
that run tail queries on these collections. The hit ratio of the cache
drops to 0 when running these queries and also at the same time CPU spikes.
The latencies are in the order of seconds in the above case. I verified GC
behavior is normal (not killing cpu)

The following are my questions


   1. Is it a good practice to vary the Query Result Cache size based on
   the size of the collection (large collections have large cache)?
   2. If most of your queries are tail queries, what is a good way to make
   your cache usage effective (higher hits)
   3. If lets say all your queries miss the cache, it is an OK behavior if
   your CPU spikes (to 90+%)
   4. Is there a recommended shard size (# of doc, size ) to use. A few of
   my collections are 100-200 Mb and the large ones are in teh order of 800-1Gb

Thanks a lot in advance
Nitin

Re: Tweaking Solr Query Result Cache

Posted by KNitin <ni...@gmail.com>.
Thanks, Erick. Turned off the query cache and sharded more aggressively
helped bring down the latencies


On Thu, Feb 20, 2014 at 5:07 PM, Erick Erickson <er...@gmail.com>wrote:

> What you _do_ want to do is add replicas so you distribute the CPU
> load across a bunch of machines.
>
> The QueryResultCache isn't very useful unless you have multiple queries
> that
> 1> reference the _exact_ same query, q, fq, sorting and all
> 2> don't page very far.
>
> This cache really only holds the document (internal Lucene) IDs for a
> "window"
> of hits. So say your window (configured in solrconfig.xml) is set to 50.
> For each
> of the query keys, 50 IDs are stored. Next time that exact query comes in,
> and
> _assuming_ start+rows < 50, you'll get the IDs from the cache and not much
> action occurs. The design intent here is to satisfy a few pages of results.
>
> If you mean by "tail queries" that there is very little repetition of
> queries, then
> why bother with a cache at all? If the hit ratio is going towards 0 it's
> not doing
> you enough good to matter.
>
>
> FWIW,
> Erick
>
>
> On Thu, Feb 20, 2014 at 1:58 PM, KNitin <ni...@gmail.com> wrote:
>
> > Hello
> >
> >   I have a 4 node cluster running Solr cloud 4.3.1. I have a few large
> > collections sharded 8 ways across all the 4 nodes (with 2 shards per
> node).
> > The size of the shard for the large collections is around 600-700Mb
> > containing around 250K+ documents.
> >
> > Currently the size of the query cache is around 512. We have a few jobs
> > that run tail queries on these collections. The hit ratio of the cache
> > drops to 0 when running these queries and also at the same time CPU
> spikes.
> > The latencies are in the order of seconds in the above case. I verified
> GC
> > behavior is normal (not killing cpu)
> >
> > The following are my questions
> >
> >
> >    1. Is it a good practice to vary the Query Result Cache size based on
> >    the size of the collection (large collections have large cache)?
> >    2. If most of your queries are tail queries, what is a good way to
> make
> >    your cache usage effective (higher hits)
> >    3. If lets say all your queries miss the cache, it is an OK behavior
> if
> >    your CPU spikes (to 90+%)
> >    4. Is there a recommended shard size (# of doc, size ) to use. A few
> of
> >    my collections are 100-200 Mb and the large ones are in teh order of
> > 800-1Gb
> >
> > Thanks a lot in advance
> > Nitin
> >
>

Re: Tweaking Solr Query Result Cache

Posted by Erick Erickson <er...@gmail.com>.
What you _do_ want to do is add replicas so you distribute the CPU
load across a bunch of machines.

The QueryResultCache isn't very useful unless you have multiple queries
that
1> reference the _exact_ same query, q, fq, sorting and all
2> don't page very far.

This cache really only holds the document (internal Lucene) IDs for a
"window"
of hits. So say your window (configured in solrconfig.xml) is set to 50.
For each
of the query keys, 50 IDs are stored. Next time that exact query comes in,
and
_assuming_ start+rows < 50, you'll get the IDs from the cache and not much
action occurs. The design intent here is to satisfy a few pages of results.

If you mean by "tail queries" that there is very little repetition of
queries, then
why bother with a cache at all? If the hit ratio is going towards 0 it's
not doing
you enough good to matter.


FWIW,
Erick


On Thu, Feb 20, 2014 at 1:58 PM, KNitin <ni...@gmail.com> wrote:

> Hello
>
>   I have a 4 node cluster running Solr cloud 4.3.1. I have a few large
> collections sharded 8 ways across all the 4 nodes (with 2 shards per node).
> The size of the shard for the large collections is around 600-700Mb
> containing around 250K+ documents.
>
> Currently the size of the query cache is around 512. We have a few jobs
> that run tail queries on these collections. The hit ratio of the cache
> drops to 0 when running these queries and also at the same time CPU spikes.
> The latencies are in the order of seconds in the above case. I verified GC
> behavior is normal (not killing cpu)
>
> The following are my questions
>
>
>    1. Is it a good practice to vary the Query Result Cache size based on
>    the size of the collection (large collections have large cache)?
>    2. If most of your queries are tail queries, what is a good way to make
>    your cache usage effective (higher hits)
>    3. If lets say all your queries miss the cache, it is an OK behavior if
>    your CPU spikes (to 90+%)
>    4. Is there a recommended shard size (# of doc, size ) to use. A few of
>    my collections are 100-200 Mb and the large ones are in teh order of
> 800-1Gb
>
> Thanks a lot in advance
> Nitin
>