You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Rich Mayfield <ma...@gmail.com> on 2014/04/15 18:15:17 UTC

Race condition in Leader Election

I see something similar where, given ~1000 shards, both nodes spend a LOT of time sorting through the leader election process. Roughly 30 minutes.

I too am wondering - if I force all leaders onto one node, then shut down both, then start up the node with all of the leaders on it first, then start up the other node, then I think I would have a much faster startup sequence.

Does that sound reasonable? And if so, is there a way to trigger the leader election process without taking the time to unload and recreate the shards?

> Hi
> 
>   When restarting a node in solrcloud, i run into scenarios where both the
> replicas for a shard get into "recovering" state and never come up causing
> the error "No servers hosting this shard". To fix this, I either unload one
> core or restart one of the nodes again so that one of them becomes the
> leader.
> 
> Is there a way to "force" leader election for a shard for solrcloud? Is
> there a way to break ties automatically (without restarting nodes) to make
> a node as the leader for the shard?
> 
> 
> Thanks
> Nitin

Re: Race condition in Leader Election

Posted by Mark Miller <ma...@gmail.com>.
We have to fix that then.

-- 
Mark Miller
about.me/markrmiller

On April 15, 2014 at 12:20:03 PM, Rich Mayfield (mayfield.rich@gmail.com) wrote:

I see something similar where, given ~1000 shards, both nodes spend a LOT of time sorting through the leader election process. Roughly 30 minutes.  

I too am wondering - if I force all leaders onto one node, then shut down both, then start up the node with all of the leaders on it first, then start up the other node, then I think I would have a much faster startup sequence.  

Does that sound reasonable? And if so, is there a way to trigger the leader election process without taking the time to unload and recreate the shards?  

> Hi  
>  
> When restarting a node in solrcloud, i run into scenarios where both the  
> replicas for a shard get into "recovering" state and never come up causing  
> the error "No servers hosting this shard". To fix this, I either unload one  
> core or restart one of the nodes again so that one of them becomes the  
> leader.  
>  
> Is there a way to "force" leader election for a shard for solrcloud? Is  
> there a way to break ties automatically (without restarting nodes) to make  
> a node as the leader for the shard?  
>  
>  
> Thanks  
> Nitin