You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Jeff Wartes <jw...@whitepages.com> on 2014/07/22 01:37:53 UTC

SolrCloud extended warmup support

I’d like to ensure an extended warmup is done on each SolrCloud node prior to that node serving traffic.
I can do certain things prior to starting Solr, such as pump the index dir through /dev/null to pre-warm the filesystem cache, and post-start I can use the ping handler with a health check file to prevent the node from entering the clients load balancer until I’m ready.
What I seem to be missing is control over when a node starts participating in queries sent to the other nodes.

I can, of course, add solrconfig.xml firstSearcher queries, which I assume (and fervently hope!) happens before a node registers itself in ZK clusterstate.json as ready for work, but that doesn’t scale so well if I want that initial warmup to run thousands of queries, or run them with some paralleism. I’m storing solrconfig.xml in ZK, so I’m sensitive to the size.

Any ideas, or corrections to my assumptions?

Thanks.

Re: SolrCloud extended warmup support

Posted by Jeff Wartes <jw...@whitepages.com>.

It¹s a command like this just prior to jetty startup:

find -L <solrhome dir> -type f -exec cat {} > /dev/null \;


On 7/24/14, 2:11 PM, "Toke Eskildsen" <te...@statsbiblioteket.dk> wrote:

>Jeff Wartes [jwartes@whitepages.com] wrote:
>> Well, I¹m not sure what to say. I¹ve been observing a noticeable latency
>> decrease over the first few thousand queries.
>
>How exactly do you get the index files fully cached? The cp-command will
>(at least for some systems) happily skip copying if the destination is
>/dev/null. One way is to ensure caching is to cat all the files to
>/dev/null.
>
>- Toke Eskildsen

RE: SolrCloud extended warmup support

Posted by Toke Eskildsen <te...@statsbiblioteket.dk>.

Jeff Wartes [jwartes@whitepages.com] wrote:
> Well, I’m not sure what to say. I’ve been observing a noticeable latency
> decrease over the first few thousand queries.

How exactly do you get the index files fully cached? The cp-command will (at least for some systems) happily skip copying if the destination is /dev/null. One way is to ensure caching is to cat all the files to /dev/null.

- Toke Eskildsen

Re: SolrCloud extended warmup support

Posted by Erick Erickson <er...@gmail.com>.

Hmmmm, well _I_ don't know what to say then....

This is puzzling. How much of a latency difference are you seeing?

It'd be interesting to see what happens if you experiment with
only going to a single shard (add &distrib=false to the query). Each
cache is local to the shard, so it's vaguely possible that you're
seeing queries hit different shards and in aggregate reduce your
total latency. But I'm really shooting in the dark here.

Best,
Erick


On Mon, Jul 21, 2014 at 5:57 PM, Erick Erickson <er...@gmail.com>
wrote:

> I've never seen it necessary to run "thousands of queries"
> to warm Solr. Usually less than a dozen will work fine. My
> challenge would be for you to measure performance differences
> on queries after running, say, 12 well-chosen queries as
> opposed to hundreds/thousands. I bet that if
> 1> you search across all the relevant fields, you'll fill up the
>      low-level caches for those fields.
> 2> you facet on all the fields you intend to facet on.
> 3> you sort on all the fields you intend to sort on.
> 4> you specify some filter queries. This is fuzzy since
>      really depends on you being able to predict what
>      those will be for firstSearcher. Things like "in the
>      last day/week/month" can be pre-configured, but
>      others you won't get. BTW, here's a blog about
>      why "in the last day" fq clauses can be tricky.
>    http://searchhub.org/2012/02/23/date-math-now-and-filter-queries/
>
> that you'll pretty much nail warmup and be fine. Note that
> you can do all the faceting on a single query. Specifying
> the primary, secondary & etc. sorts will fill those caches.
>
> Best,
> Erick
>
>
> On Mon, Jul 21, 2014 at 5:07 PM, Jeff Wartes <jw...@whitepages.com>
> wrote:
>
>>
>> On 7/21/14, 4:50 PM, "Shawn Heisey" <so...@elyograg.org> wrote:
>>
>> >On 7/21/2014 5:37 PM, Jeff Wartes wrote:
>> >> I¹d like to ensure an extended warmup is done on each SolrCloud node
>> >>prior to that node serving traffic.
>> >> I can do certain things prior to starting Solr, such as pump the index
>> >>dir through /dev/null to pre-warm the filesystem cache, and post-start I
>> >>can use the ping handler with a health check file to prevent the node
>> >>from entering the clients load balancer until I¹m ready.
>> >> What I seem to be missing is control over when a node starts
>> >>participating in queries sent to the other nodes.
>> >>
>> >> I can, of course, add solrconfig.xml firstSearcher queries, which I
>> >>assume (and fervently hope!) happens before a node registers itself in
>> >>ZK clusterstate.json as ready for work, but that doesn¹t scale so well
>> >>if I want that initial warmup to run thousands of queries, or run them
>> >>with some paralleism. I¹m storing solrconfig.xml in ZK, so I¹m sensitive
>> >>to the size.
>> >>
>> >> Any ideas, or corrections to my assumptions?
>> >
>> >I think that firstSearcher/newSearcher (and making sure useColdSearcher
>> >is set to false) is going to be the only way you can do this in a way
>> >that's compatible with SolrCloud.  If you were doing manual distributed
>> >search without SolrCloud, you'd have more options available.
>> >
>> >If useColdSearcher is set to false, that should keep *everything* from
>> >using the searcher until the warmup has finished.  I cannot be certain
>> >that this is the case, but I have some reasonable confidence that this
>> >is how it works.  If you find that it doesn't behave this way, I'd call
>> >it a bug.
>> >
>> >Thanks,
>> >Shawn
>>
>>
>> Thanks for the quick reply. Since distributed search latency is the max of
>> the shard sub-requests, I¹m trying my best to minimize any spikes in
>> cluster latency due to node restarts.
>> I double-checked useColdSearcher was false, but the doc says this means
>> requests ³block until the first searcher is done warming², which
>> translates pretty clearly to ³latency spike². The more I think about it,
>> the more worried I am that a node might indeed register itself in
>> live_nodes and get distributed requests before it¹s got a searcher to work
>> with. *Especially* if I have lots of serial firstSearcher queries.
>>
>> I¹ll look through the code myself tomorrow, but if anyone can help
>> confirm/deny the order of operations here, I¹d appreciate it.
>>
>>
>

Re: SolrCloud extended warmup support

Posted by Jeff Wartes <jw...@whitepages.com>.

Well, I’m not sure what to say. I’ve been observing a noticeable latency
decrease over the first few thousand queries. I’m not doing anything too
tricky either. Same exact query pattern, only one fq, always on the same
field, no faceting. The only potential suspects that occur to me could be
that it’s a large index (although properly sharded to fit in system
memory), and that it’s doing geo filtering & ordering.

Since I don’t have a good mechanism for many queries, I’ll probably just
do a few queries in firstSearcher for now and cross my fingers, but I’m
not optimistic.

For what it’s worth, I did verify that a replica doesn’t make itself
available to other nodes until after the firstSearcher queries are
completed.



On 7/21/14, 5:57 PM, "Erick Erickson" <er...@gmail.com> wrote:

>I've never seen it necessary to run "thousands of queries"
>to warm Solr. Usually less than a dozen will work fine. My
>challenge would be for you to measure performance differences
>on queries after running, say, 12 well-chosen queries as
>opposed to hundreds/thousands. I bet that if
>1> you search across all the relevant fields, you'll fill up the
>     low-level caches for those fields.
>2> you facet on all the fields you intend to facet on.
>3> you sort on all the fields you intend to sort on.
>4> you specify some filter queries. This is fuzzy since
>     really depends on you being able to predict what
>     those will be for firstSearcher. Things like "in the
>     last day/week/month" can be pre-configured, but
>     others you won't get. BTW, here's a blog about
>     why "in the last day" fq clauses can be tricky.
>   http://searchhub.org/2012/02/23/date-math-now-and-filter-queries/
>
>that you'll pretty much nail warmup and be fine. Note that
>you can do all the faceting on a single query. Specifying
>the primary, secondary & etc. sorts will fill those caches.
>
>Best,
>Erick
>
>
>On Mon, Jul 21, 2014 at 5:07 PM, Jeff Wartes <jw...@whitepages.com>
>wrote:
>
>>
>> On 7/21/14, 4:50 PM, "Shawn Heisey" <so...@elyograg.org> wrote:
>>
>> >On 7/21/2014 5:37 PM, Jeff Wartes wrote:
>> >> I¹d like to ensure an extended warmup is done on each SolrCloud node
>> >>prior to that node serving traffic.
>> >> I can do certain things prior to starting Solr, such as pump the
>>index
>> >>dir through /dev/null to pre-warm the filesystem cache, and
>>post-start I
>> >>can use the ping handler with a health check file to prevent the node
>> >>from entering the clients load balancer until I¹m ready.
>> >> What I seem to be missing is control over when a node starts
>> >>participating in queries sent to the other nodes.
>> >>
>> >> I can, of course, add solrconfig.xml firstSearcher queries, which I
>> >>assume (and fervently hope!) happens before a node registers itself in
>> >>ZK clusterstate.json as ready for work, but that doesn¹t scale so well
>> >>if I want that initial warmup to run thousands of queries, or run them
>> >>with some paralleism. I¹m storing solrconfig.xml in ZK, so I¹m
>>sensitive
>> >>to the size.
>> >>
>> >> Any ideas, or corrections to my assumptions?
>> >
>> >I think that firstSearcher/newSearcher (and making sure useColdSearcher
>> >is set to false) is going to be the only way you can do this in a way
>> >that's compatible with SolrCloud.  If you were doing manual distributed
>> >search without SolrCloud, you'd have more options available.
>> >
>> >If useColdSearcher is set to false, that should keep *everything* from
>> >using the searcher until the warmup has finished.  I cannot be certain
>> >that this is the case, but I have some reasonable confidence that this
>> >is how it works.  If you find that it doesn't behave this way, I'd call
>> >it a bug.
>> >
>> >Thanks,
>> >Shawn
>>
>>
>> Thanks for the quick reply. Since distributed search latency is the max
>>of
>> the shard sub-requests, I¹m trying my best to minimize any spikes in
>> cluster latency due to node restarts.
>> I double-checked useColdSearcher was false, but the doc says this means
>> requests ³block until the first searcher is done warming², which
>> translates pretty clearly to ³latency spike². The more I think about it,
>> the more worried I am that a node might indeed register itself in
>> live_nodes and get distributed requests before it¹s got a searcher to
>>work
>> with. *Especially* if I have lots of serial firstSearcher queries.
>>
>> I¹ll look through the code myself tomorrow, but if anyone can help
>> confirm/deny the order of operations here, I¹d appreciate it.
>>

Re: SolrCloud extended warmup support

Posted by Erick Erickson <er...@gmail.com>.

I've never seen it necessary to run "thousands of queries"
to warm Solr. Usually less than a dozen will work fine. My
challenge would be for you to measure performance differences
on queries after running, say, 12 well-chosen queries as
opposed to hundreds/thousands. I bet that if
1> you search across all the relevant fields, you'll fill up the
     low-level caches for those fields.
2> you facet on all the fields you intend to facet on.
3> you sort on all the fields you intend to sort on.
4> you specify some filter queries. This is fuzzy since
     really depends on you being able to predict what
     those will be for firstSearcher. Things like "in the
     last day/week/month" can be pre-configured, but
     others you won't get. BTW, here's a blog about
     why "in the last day" fq clauses can be tricky.
   http://searchhub.org/2012/02/23/date-math-now-and-filter-queries/

that you'll pretty much nail warmup and be fine. Note that
you can do all the faceting on a single query. Specifying
the primary, secondary & etc. sorts will fill those caches.

Best,
Erick


On Mon, Jul 21, 2014 at 5:07 PM, Jeff Wartes <jw...@whitepages.com> wrote:

>
> On 7/21/14, 4:50 PM, "Shawn Heisey" <so...@elyograg.org> wrote:
>
> >On 7/21/2014 5:37 PM, Jeff Wartes wrote:
> >> I¹d like to ensure an extended warmup is done on each SolrCloud node
> >>prior to that node serving traffic.
> >> I can do certain things prior to starting Solr, such as pump the index
> >>dir through /dev/null to pre-warm the filesystem cache, and post-start I
> >>can use the ping handler with a health check file to prevent the node
> >>from entering the clients load balancer until I¹m ready.
> >> What I seem to be missing is control over when a node starts
> >>participating in queries sent to the other nodes.
> >>
> >> I can, of course, add solrconfig.xml firstSearcher queries, which I
> >>assume (and fervently hope!) happens before a node registers itself in
> >>ZK clusterstate.json as ready for work, but that doesn¹t scale so well
> >>if I want that initial warmup to run thousands of queries, or run them
> >>with some paralleism. I¹m storing solrconfig.xml in ZK, so I¹m sensitive
> >>to the size.
> >>
> >> Any ideas, or corrections to my assumptions?
> >
> >I think that firstSearcher/newSearcher (and making sure useColdSearcher
> >is set to false) is going to be the only way you can do this in a way
> >that's compatible with SolrCloud.  If you were doing manual distributed
> >search without SolrCloud, you'd have more options available.
> >
> >If useColdSearcher is set to false, that should keep *everything* from
> >using the searcher until the warmup has finished.  I cannot be certain
> >that this is the case, but I have some reasonable confidence that this
> >is how it works.  If you find that it doesn't behave this way, I'd call
> >it a bug.
> >
> >Thanks,
> >Shawn
>
>
> Thanks for the quick reply. Since distributed search latency is the max of
> the shard sub-requests, I¹m trying my best to minimize any spikes in
> cluster latency due to node restarts.
> I double-checked useColdSearcher was false, but the doc says this means
> requests ³block until the first searcher is done warming², which
> translates pretty clearly to ³latency spike². The more I think about it,
> the more worried I am that a node might indeed register itself in
> live_nodes and get distributed requests before it¹s got a searcher to work
> with. *Especially* if I have lots of serial firstSearcher queries.
>
> I¹ll look through the code myself tomorrow, but if anyone can help
> confirm/deny the order of operations here, I¹d appreciate it.
>
>

Re: SolrCloud extended warmup support

Posted by Jeff Wartes <jw...@whitepages.com>.

On 7/21/14, 4:50 PM, "Shawn Heisey" <so...@elyograg.org> wrote:

>On 7/21/2014 5:37 PM, Jeff Wartes wrote:
>> I¹d like to ensure an extended warmup is done on each SolrCloud node
>>prior to that node serving traffic.
>> I can do certain things prior to starting Solr, such as pump the index
>>dir through /dev/null to pre-warm the filesystem cache, and post-start I
>>can use the ping handler with a health check file to prevent the node
>>from entering the clients load balancer until I¹m ready.
>> What I seem to be missing is control over when a node starts
>>participating in queries sent to the other nodes.
>> 
>> I can, of course, add solrconfig.xml firstSearcher queries, which I
>>assume (and fervently hope!) happens before a node registers itself in
>>ZK clusterstate.json as ready for work, but that doesn¹t scale so well
>>if I want that initial warmup to run thousands of queries, or run them
>>with some paralleism. I¹m storing solrconfig.xml in ZK, so I¹m sensitive
>>to the size.
>> 
>> Any ideas, or corrections to my assumptions?
>
>I think that firstSearcher/newSearcher (and making sure useColdSearcher
>is set to false) is going to be the only way you can do this in a way
>that's compatible with SolrCloud.  If you were doing manual distributed
>search without SolrCloud, you'd have more options available.
>
>If useColdSearcher is set to false, that should keep *everything* from
>using the searcher until the warmup has finished.  I cannot be certain
>that this is the case, but I have some reasonable confidence that this
>is how it works.  If you find that it doesn't behave this way, I'd call
>it a bug.
>
>Thanks,
>Shawn

Thanks for the quick reply. Since distributed search latency is the max of
the shard sub-requests, I¹m trying my best to minimize any spikes in
cluster latency due to node restarts.
I double-checked useColdSearcher was false, but the doc says this means
requests ³block until the first searcher is done warming², which
translates pretty clearly to ³latency spike². The more I think about it,
the more worried I am that a node might indeed register itself in
live_nodes and get distributed requests before it¹s got a searcher to work
with. *Especially* if I have lots of serial firstSearcher queries.

I¹ll look through the code myself tomorrow, but if anyone can help
confirm/deny the order of operations here, I¹d appreciate it.

Re: SolrCloud extended warmup support

Posted by Shawn Heisey <so...@elyograg.org>.

On 7/21/2014 5:37 PM, Jeff Wartes wrote:
> I’d like to ensure an extended warmup is done on each SolrCloud node prior to that node serving traffic.
> I can do certain things prior to starting Solr, such as pump the index dir through /dev/null to pre-warm the filesystem cache, and post-start I can use the ping handler with a health check file to prevent the node from entering the clients load balancer until I’m ready.
> What I seem to be missing is control over when a node starts participating in queries sent to the other nodes.
> 
> I can, of course, add solrconfig.xml firstSearcher queries, which I assume (and fervently hope!) happens before a node registers itself in ZK clusterstate.json as ready for work, but that doesn’t scale so well if I want that initial warmup to run thousands of queries, or run them with some paralleism. I’m storing solrconfig.xml in ZK, so I’m sensitive to the size.
> 
> Any ideas, or corrections to my assumptions?

I think that firstSearcher/newSearcher (and making sure useColdSearcher
is set to false) is going to be the only way you can do this in a way
that's compatible with SolrCloud.  If you were doing manual distributed
search without SolrCloud, you'd have more options available.

If useColdSearcher is set to false, that should keep *everything* from
using the searcher until the warmup has finished.  I cannot be certain
that this is the case, but I have some reasonable confidence that this
is how it works.  If you find that it doesn't behave this way, I'd call
it a bug.

Thanks,
Shawn