You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Tom Evans <te...@googlemail.com> on 2016/04/05 12:15:11 UTC

SolrCloud no leader for collection

Hi all, I have an 8 node SolrCloud 5.5 cluster with 11 collections,
most of them in a 1 shard x 8 replicas configuration. We have 5 ZK
nodes.

During the night, we attempted to reindex one of the larger
collections. We reindex by pushing json docs to the update handler
from a number of processes. It seemed this overwhelmed the servers,
and caused all of the collections to fail and end up in either a down
or a recovering state, often with no leader.

Restarting and rebooting the servers brought a lot of the collections
back online, but we are left with a few collections for which all the
nodes hosting those replicas are up, but the replica reports as either
"active" or "down", and with no leader.

Trying to force a leader election has no effect, it keeps choosing a
leader that is in "down" state. Removing all the nodes that are in
"down" state and forcing a leader election also has no effect.


Any ideas? The only viable option I see is to create a new collection,
index it and then remove the old collection and alias it in.

Cheers

Tom

Re: SolrCloud no leader for collection

Posted by Jeff Wartes <jw...@whitepages.com>.

I recall I had some luck fixing a leader-less shard (after a ZK quorum failure) by forcably removing the records for the down-state replicas from the leader election list, and then forcing an election. 
The ZK path looks like collections/<collection>/leader_elect/shardX/election. Usually you’ll find the down-state one that keeps getting elected is the first one. Delete that, then try the force-election collections api command again.

On 4/5/16, 3:15 AM, "Tom Evans" <te...@googlemail.com> wrote:

>Hi all, I have an 8 node SolrCloud 5.5 cluster with 11 collections,
>most of them in a 1 shard x 8 replicas configuration. We have 5 ZK
>nodes.
>
>During the night, we attempted to reindex one of the larger
>collections. We reindex by pushing json docs to the update handler
>from a number of processes. It seemed this overwhelmed the servers,
>and caused all of the collections to fail and end up in either a down
>or a recovering state, often with no leader.
>
>Restarting and rebooting the servers brought a lot of the collections
>back online, but we are left with a few collections for which all the
>nodes hosting those replicas are up, but the replica reports as either
>"active" or "down", and with no leader.
>
>Trying to force a leader election has no effect, it keeps choosing a
>leader that is in "down" state. Removing all the nodes that are in
>"down" state and forcing a leader election also has no effect.
>
>
>Any ideas? The only viable option I see is to create a new collection,
>index it and then remove the old collection and alias it in.
>
>Cheers
>
>Tom