You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@kudu.apache.org by "David Alves (JIRA)" <ji...@apache.org> on 2016/09/30 21:58:20 UTC
[jira] [Comment Edited] (KUDU-1194) consensus: Allow abort of uncommittable config change ops

    [ https://issues.apache.org/jira/browse/KUDU-1194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15537191#comment-15537191 ] 

David Alves edited comment on KUDU-1194 at 9/30/16 9:58 PM:
------------------------------------------------------------

My guess is that a server could get into this state in the following sequence:

Replicas A,B,C are part of a config, A is leader
Config change to remove C
(config changes actually change the membership before commitment, iirc)
B Fails
Now C's acknowledgement does not count to commit the entry and B isn't available, so the cluster is stuck.

Was looking at LogCabin for this one. It seems that they always write to the log (no truncation) and keep an in memory double-linked-list of configurations that they can move forward/back.
Relevant pieces are here:
https://github.com/logcabin/logcabin/blob/master/Server/RaftConsensus.h#L687
https://github.com/logcabin/logcabin/blob/master/Server/RaftConsensus.cc#L1595

It seems like an "easy" work-around this is to make sure that we always have one extra server (which I think todd already suggested): 
Replicas A,B,C are part of a config, A is leader
Config change to add D
"Add D" committed.
Config change to remove C
B fails
For the A,B,D config (the final one) A,D votes are enough for commitment so progress is guaranteed.

Of course we would still be stuck if both B _and_ D failed, but that seems much more unlikely


was (Author: dralves):
My guess is that a server could get into this state in the following state:

Replicas A,B,C are part of a config, A is leader
Config change to remove C
(config changes actually change the membership before commitment, iirc)
B Fails
Now C's acknowledgement does not count to commit the entry and B isn't available, so the cluster is stuck.

Was looking at LogCabin for this one. It seems that they always write to the log (no truncation) and keep an in memory double-linked-list of configurations that they can move forward/back.
Relevant pieces are here:
https://github.com/logcabin/logcabin/blob/master/Server/RaftConsensus.h#L687
https://github.com/logcabin/logcabin/blob/master/Server/RaftConsensus.cc#L1595

It seems like an "easy" work-around this is to make sure that we always have one extra server (which I think todd already suggested): 
Replicas A,B,C are part of a config, A is leader
Config change to add D
"Add D" committed.
Config change to remove C
B fails
For the A,B,D config (the final one) A,D votes are enough for commitment so progress is guaranteed.

Of course we would still be stuck if both B _and_ D failed, but that seems much more unlikely

> consensus: Allow abort of uncommittable config change ops
> ---------------------------------------------------------
>
>                 Key: KUDU-1194
>                 URL: https://issues.apache.org/jira/browse/KUDU-1194
>             Project: Kudu
>          Issue Type: Improvement
>            Reporter: Mike Percy
>            Assignee: Mike Percy
>            Priority: Critical
>
> Wanted to capture a few thoughts about manually fixing broken configs or automatically rolling back bad config changes. This isn't a fully baked design, just wanted to jot down some initial thoughts.
> A general way to (attempt to) abort uncommitted ops is to truncate the Raft log on the leader (and replace the op with a NO_OP or something similar).
> Some thoughts on recovering from "bad" configs:
> * We may hit a situation where there is an in-progress config change operation that will be impossible to commit due to a majority of the nodes in the "target" config being permanently dead. If the leader is still alive, we can provide a timeout on these ops or a way to explicitly (via RPC) abort them by truncating the log.
> * If no leader is alive, and it's impossible to elect one, then we could write an "unsafe" tool only for emergency use that could do something evil like make the follower think that the tool is the new leader and append an unsafe change-config op to the follower's log.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)