You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@kudu.apache.org by "Todd Lipcon (JIRA)" <ji...@apache.org> on 2017/03/10 20:13:04 UTC

[jira] [Updated] (KUDU-1194) consensus: Allow abort of uncommittable config change ops

     [ https://issues.apache.org/jira/browse/KUDU-1194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Todd Lipcon updated KUDU-1194:
------------------------------
    Component/s: consensus

> consensus: Allow abort of uncommittable config change ops
> ---------------------------------------------------------
>
>                 Key: KUDU-1194
>                 URL: https://issues.apache.org/jira/browse/KUDU-1194
>             Project: Kudu
>          Issue Type: Improvement
>          Components: consensus
>            Reporter: Mike Percy
>            Assignee: Mike Percy
>            Priority: Critical
>
> Wanted to capture a few thoughts about manually fixing broken configs or automatically rolling back bad config changes. This isn't a fully baked design, just wanted to jot down some initial thoughts.
> A general way to (attempt to) abort uncommitted ops is to truncate the Raft log on the leader (and replace the op with a NO_OP or something similar).
> Some thoughts on recovering from "bad" configs:
> * We may hit a situation where there is an in-progress config change operation that will be impossible to commit due to a majority of the nodes in the "target" config being permanently dead. If the leader is still alive, we can provide a timeout on these ops or a way to explicitly (via RPC) abort them by truncating the log.
> * If no leader is alive, and it's impossible to elect one, then we could write an "unsafe" tool only for emergency use that could do something evil like make the follower think that the tool is the new leader and append an unsafe change-config op to the follower's log.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)