You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "Dinesh Bhat (Code Review)" <ge...@cloudera.org> on 2017/03/03 22:24:23 UTC

[kudu-CR] WIP KUDU-1330: Add a tool to unsafely recover from loss of majority replicas

Dinesh Bhat has posted comments on this change.

Change subject: WIP KUDU-1330: Add a tool to unsafely recover from loss of majority replicas
......................................................................


Patch Set 2:

(12 comments)

I am still addressing some refactoring/testing related to David's earlier feedbacks, so this update has addressed only Mike's review comments so far.

http://gerrit.cloudera.org:8080/#/c/6066/2//COMMIT_MSG
Commit Message:

PS2, Line 13: perticularly
> particularly
Done


Line 22: a) The API/tool acts as a fake leader mimicking the actual leader the node
> As mentioned in comments on raft_consensus.cc the tool should not impersona
I will probably post an update here after a quick testing, is the concern here that we may be picking an old leader if we rely on survivor's committed config's 'voted_for' field ? Even if we did, since we are bumping up the term, we know that this node is going to be elected as leader (assuming dead nodes not coming back).


Line 26: a) Assumption is that, the dead nodes are not coming back with a higher term,
> s/a/b/
Actually, by 'retained' I meant the the elected leader after election triggered as a result of config change in step a). I reworded this now.


Line 43: 2) Add a test case for when the node has a pending config change,
> +1 on this
I hadn't updated this list after my last commit, pls note that some of the test cases are already added, I updated the commit msg now to reflect current state.


http://gerrit.cloudera.org:8080/#/c/6066/2/src/kudu/consensus/raft_consensus.cc
File src/kudu/consensus/raft_consensus.cc:

PS2, Line 1559: client_cb
> No need to plumb in a callback if this is not an asynchronous method
Done


PS2, Line 1560: error_code
> Would be nice to use this
Done


Line 1600:   consensus_req.set_caller_uuid(leader_uuid);
> This should be the client UUID or a special value to indicate that this was
Do you have a concern on this approach ? Whether survivor is a single follower or a single leader, both of them would have had voted for the this leader. This approach works in both use cases.  I thought this is a simpler idea than using a new/fake uuid from tool. Note that whole change assumes that the dead nodes aren't coming back while this is in progress.


Line 1601:   consensus_req.set_caller_term(current_term);
> Use current term + 1
I didn't bump this to single leader use case, because the existing leader steps down and bails out from pre-election here without committing a new config: https://github.com/apache/kudu/blob/master/src/kudu/consensus/raft_consensus.cc#L401
However, I need to debug little more what other changes are needed if we were to bump up the term for single leader survivor. Do you think it's necessary to bump up the term for single leader survivor ?


Line 1603:   consensus_req.set_committed_index(last_op_id.index() + 1);
> Don't increment the committed index; leave it at whatever value it previous
Done


PS2, Line 1613: last_op_id.index() + 1
> This is repeated a few times in this function. Let's extract a variable to 
Done


PS2, Line 1615: time_manager_->GetSafeTime().value()
> Hmm. I would like David's input on this
I will poke David here, I don't exactly recall why I had to set here, perhaps due a CHECK failure down the line in UpdateReplica. It somewhere has a check for timestamp on a consensus request.


http://gerrit.cloudera.org:8080/#/c/6066/2/src/kudu/consensus/time_manager.cc
File src/kudu/consensus/time_manager.cc:

Line 121
> I don't think you need to remove this
In the case of single leader leader propgating the new config, he calls UpdateReplica on himself, hence I had to remove couple of DCHECKs as part of that. I also removed one on consensus_queue.cc


-- 
To view, visit http://gerrit.cloudera.org:8080/6066
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I908d8c981df74d56dbd034e72001d379fb314700
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dinesh Bhat <di...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Dinesh Bhat <di...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-HasComments: Yes