You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Erick Erickson (JIRA)" <ji...@apache.org> on 2014/12/08 15:26:12 UTC

[jira] [Commented] (SOLR-6691) REBALANCELEADERS needs to change the leader election queue.

    [ https://issues.apache.org/jira/browse/SOLR-6691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14237904#comment-14237904 ] 

Erick Erickson commented on SOLR-6691:
--------------------------------------

[~noble.paul] Here's my promised note.

The code for figuring out who's "the guy in front" seems like it has a problem in my case. Pining here since every other time I've had problems here it's been a self-inflicted wound....

But this time I _swear_ I have some evidence....

Since the sorting is sensitive to session ID when two nodes have the same sequence ID, their order "depends". Note that since some of my tests are on a single Solr instance just rearranging shard leadership, I can have identical sessions, but the principle is the same for overseer. 

So let's say core_node2 joins at head. Depending on the session it may sort before or after the previous node with sequence 000001. This may not ever really be a problem with the Overseer election though, can a node rejoin at head without _also_ having a new session ID that's greater than any other ones in the election queue? Because if that's so, then the node rejoining will _always_ sort after the other node with the same sequence ID and this case will not occur. But for shard election on a single node hosting, say, 6 replicas it definitely happens.

Anyway, if core_node2 rejoins at head, it can look like either of these:

session1-core1-n_0000000
session2-core2-n_0000001
session3-core3-n_0000001
session4-core4-n_0000002

or

session1-core1-n_0000000
session12-core3-n_0000001
session3-core2-n_0000001
session4-core4-n_0000002

The problem here is that the LeaderElector code finds the index of the node _after_ the current sequence number then backs up two. So if core2 is looking for the "guy in front" in the first case, it'll watch itself. In the second case it'll watch core3 as it should.

I've got what I think is a solution, but I have to beat it to death for a while first. Looking for whether this is a sound analysis at this point.

> REBALANCELEADERS needs to change the leader election queue.
> -----------------------------------------------------------
>
>                 Key: SOLR-6691
>                 URL: https://issues.apache.org/jira/browse/SOLR-6691
>             Project: Solr
>          Issue Type: Bug
>            Reporter: Erick Erickson
>            Assignee: Erick Erickson
>
> The original code (SOLR-6517) assumed that changes in the clusterstate after issuing a command to the overseer to change the leader indicated that the leader was successfully changed. Fortunately, Noble clued me in that this isn't the case and that the potential leader needs to insert itself in the leader election queue before trigging the change leader command.
> Inserting themselves in the front of the queue should probably happen in BALANCESHARDUNIQUE when the preferredLeader property is assigned as well.
> [~noble.paul] Do evil things happen if a node joins at the head but it's _already_ in the queue? These ephemeral nodes in the queue are watching each other. So if node1 is the leader you have
> node1 <- node2 <- node3 <- node4
> where <- means "watches".
> Now, if node3 puts itself at the head of the list, you have
> {code}
> node1 <- node2
>       <- node3 <- node4
> {code}
> I _think_ when I was looking at this it all "just worked". 
> 1> node 1 goes down. Nodes 2 and 3 duke it out but there's code to insure that node3 becomes the leader and node2 inserts itself at then end so it's watching node 4.
> 2> node 2 goes down, nobody gets notified and it doesn't matter.
> 3> node 3 goes down, node 4 gets notified and starts watching node 2 by inserting itself at the end of the list.
> 4> node 4 goes down, nobody gets notified and it doesn't matter.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org