You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@kudu.apache.org by "ASF subversion and git services (Jira)" <ji...@apache.org> on 2020/02/04 23:21:00 UTC

[jira] [Commented] (KUDU-2155) Disarm failure detector during an election

    [ https://issues.apache.org/jira/browse/KUDU-2155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17030204#comment-17030204 ] 

ASF subversion and git services commented on KUDU-2155:
-------------------------------------------------------

Commit b32283d2e5ce3d88e4f6afdeedf1c616721cce3a in kudu's branch refs/heads/master from Adar Dembo
[ https://gitbox.apache.org/repos/asf?p=kudu.git;h=b32283d ]

KUDU-2155: disable failure detector around elections

This is a more complete fix for KUDU-2149 which disables the failure
detector completely around a leader election.

There are several changes to make this happen:
1. The FD is changed to use a one-shot timer, which automatically disables
   upon firing.
2. Because all elections are guaranteed to reach DoElectionCallback, that's
   where we reenable the FD.
3. We provide a special case for pre-elections where FD reenabling is
   deferred until after the subsequent real election finishes.

I'm still not convinced this is the cleanest approach, but it seems to work.

Change-Id: Idcd311cee028c48e908f290d60c474e8a4557d97
Reviewed-on: http://gerrit.cloudera.org:8080/8134
Tested-by: Kudu Jenkins
Reviewed-by: Alexey Serbin <as...@cloudera.com>
Reviewed-by: Andrew Wong <aw...@cloudera.com>


> Disarm failure detector during an election
> ------------------------------------------
>
>                 Key: KUDU-2155
>                 URL: https://issues.apache.org/jira/browse/KUDU-2155
>             Project: Kudu
>          Issue Type: Bug
>          Components: consensus
>    Affects Versions: 1.6.0
>            Reporter: Adar Dembo
>            Assignee: Adar Dembo
>            Priority: Major
>
> KUDU-2149 uncovered an issue where a change in failure detector semantics could lead to election "stacking". It was clear that the failure detector should be disabled during an election, but the fix was tailored to minimize risk and thus be eligible for backporting to a 1.5.x point release.
> The better fix would be to completely the disable failure detector during an election. This JIRA tracks that improvement.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)