You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ratis.apache.org by "runzhiwang (Jira)" <ji...@apache.org> on 2020/12/25 03:17:00 UTC

[jira] [Commented] (RATIS-1265) Fix leader election with priority too slow

    [ https://issues.apache.org/jira/browse/RATIS-1265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17254716#comment-17254716 ] 

runzhiwang commented on RATIS-1265:
-----------------------------------

[~szetszwo] I think when the server with highest priority, i.e. s0, reject vote to other server,  such as s1, s0 should become candidate and askForVote immediately, because s0 has already reject vote to s1, s1 can not win the leader. So s0 can win the leader as soon as possible.  what do you think ?

> Fix leader election with priority too slow
> ------------------------------------------
>
>                 Key: RATIS-1265
>                 URL: https://issues.apache.org/jira/browse/RATIS-1265
>             Project: Ratis
>          Issue Type: Sub-task
>            Reporter: runzhiwang
>            Assignee: runzhiwang
>            Priority: Major
>         Attachments: leader_election_slow
>
>
> As the attached log shows, there are 3 servers: s0, s1, s2,  and s2 is the leader, then we change s0 with the highest priority, so s2 will yieldLeaderToHigherPriorityPeer(s0) when s0's log catch up. In yieldLeaderToHigherPriorityPeer, s2 will step down.
> But when s2 step down,  which server will request vote is almost random, if s0 can not request vote in a short time, the leader election will last a long time.
> As the attached log shows, election happen 8 times and last 14 seconds, but s0 only try start leader election at the 6th time, and can not get the leadership.
> {code:java}
> 2020-12-25 10:11:34,995     s1: start s1@group-241716F733F8-LeaderElection2          fail because s0 reject
> 2020-12-25 10:11:37,228      s2: start s2@group-241716F733F8-LeaderElection3        fail because s0 reject
> 2020-12-25 10:11:39,345     s1: start s1@group-241716F733F8-LeaderElection4         fail because s0 reject
> 2020-12-25 10:11:41,600      s1: start s1@group-241716F733F8-LeaderElection5         fail because s0 reject
> 2020-12-25 10:11:43,710      s2: start s2@group-241716F733F8-LeaderElection6        fail because s0 reject
> 2020-12-25 10:11:46,248     s0: start s0@group-241716F733F8-LeaderElection7         fail because s1 start election after 200ms, s1's request vote arrives s2 before s0, so s1 voted for itself and rejected s0 at 2020-12-25 10:11:47,267, and s2 voted for s1 at 2020-12-25 10:11:46,469 and rejected s0 at 2020-12-25 10:11:47,267
> 2020-12-25 10:11:46,461      s1: start s1@group-241716F733F8-LeaderElection8         fail because s0 reject
> 2020-12-25 10:11:48,597      s2: start s2@group-241716F733F8-LeaderElection9        fail because s0 reject
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)