You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Ishan Chattopadhyaya (JIRA)" <ji...@apache.org> on 2015/11/04 05:32:27 UTC

[jira] [Commented] (SOLR-7989) Down replica elected leader

    [ https://issues.apache.org/jira/browse/SOLR-7989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14988858#comment-14988858 ] 

Ishan Chattopadhyaya commented on SOLR-7989:
--------------------------------------------

Finally understood why this is happening. 

When a DOWN replica is elected a leader, the ElectionContext's runLeaderProcess() finally tries to publish the new leader as well as the new state (now ACTIVE) in the same message. For example:
{noformat}
{
  "operation":"leader",
  "shard":"shard1",
  "collection":"forceleader_test_collection",
  "base_url":"http://127.0.0.1:36501/sz_sie",
  "core":"forceleader_test_collection_shard1_replica2",
  "state":"active"}
{noformat}

However, the OverseerAction for LEADER operation doesn't actually update the "state" that was passed in. So, although the replica gets elected as the leader, its state stays DOWN in the cluster state.

I'll raise a patch for this soon.

> Down replica elected leader
> ---------------------------
>
>                 Key: SOLR-7989
>                 URL: https://issues.apache.org/jira/browse/SOLR-7989
>             Project: Solr
>          Issue Type: Bug
>            Reporter: Ishan Chattopadhyaya
>            Priority: Minor
>         Attachments: DownLeaderTest.java
>
>
> It is possible that a down replica gets elected as a leader, and that it stays down after the election.
> Here's how I hit upon this:
> * There are 3 replicas: leader, notleader0, notleader1
> * Introduced network partition to isolate notleader0, notleader1 from leader (leader puts these two in LIR via zk).
> * Kill leader, remove partition. Now leader is dead, and both of notleader0 and notleader1 are down. There is no leader.
> * Remove LIR znodes in zk.
> * Wait a while, and there happens a (flawed?) leader election.
> * Finally, the state is such that one of notleader0 or notleader1 (which were down before) become leader, but stays down.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org