You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "Will Berkeley (Code Review)" <ge...@cloudera.org> on 2018/08/16 23:59:49 UTC

[kudu-CR] [WIP] KUDU-2245 Graceful leadership transfer

Will Berkeley has uploaded this change for review. ( http://gerrit.cloudera.org:8080/11251


Change subject: [WIP] KUDU-2245 Graceful leadership transfer
......................................................................

[WIP] KUDU-2245 Graceful leadership transfer

I will write more here.
Need to dist-test loop the patch.
Expect update soon.

Change-Id: Ic97343af9eb349556424c999799ed5e2941f0083
---
M src/kudu/consensus/consensus-test-util.h
M src/kudu/consensus/consensus.proto
M src/kudu/consensus/consensus_peers.cc
M src/kudu/consensus/consensus_peers.h
M src/kudu/consensus/consensus_queue.cc
M src/kudu/consensus/consensus_queue.h
M src/kudu/consensus/peer_manager.cc
M src/kudu/consensus/peer_manager.h
M src/kudu/consensus/raft_consensus.cc
M src/kudu/consensus/raft_consensus.h
M src/kudu/tools/kudu-admin-test.cc
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tools/tool_action_tablet.cc
M src/kudu/tools/tool_replica_util.cc
M src/kudu/tools/tool_replica_util.h
M src/kudu/tserver/tablet_service.cc
16 files changed, 531 insertions(+), 41 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/51/11251/1
-- 
To view, visit http://gerrit.cloudera.org:8080/11251
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Ic97343af9eb349556424c999799ed5e2941f0083
Gerrit-Change-Number: 11251
Gerrit-PatchSet: 1
Gerrit-Owner: Will Berkeley <wd...@gmail.com>

[kudu-CR] [WIP] KUDU-2245 Graceful leadership transfer

Posted by "Will Berkeley (Code Review)" <ge...@cloudera.org>.
Will Berkeley has posted comments on this change. ( http://gerrit.cloudera.org:8080/11251 )

Change subject: [WIP] KUDU-2245 Graceful leadership transfer
......................................................................


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/11251/4/src/kudu/tools/kudu-admin-test.cc
File src/kudu/tools/kudu-admin-test.cc:

PS4: 
> As discussed offline, it's worth adding some negative testcases as well:
I've added all the tests you suggested, plus one to exercise having multiple transfer periods and abrupt stepdowns happening.



-- 
To view, visit http://gerrit.cloudera.org:8080/11251
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic97343af9eb349556424c999799ed5e2941f0083
Gerrit-Change-Number: 11251
Gerrit-PatchSet: 4
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Fengling Wang <fw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Fri, 07 Sep 2018 21:18:10 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2245 Graceful leadership transfer

Posted by "Will Berkeley (Code Review)" <ge...@cloudera.org>.
Hello Fengling Wang, Tidy Bot, Mike Percy, Alexey Serbin, Kudu Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/11251

to look at the new patch set (#12).

Change subject: KUDU-2245 Graceful leadership transfer
......................................................................

KUDU-2245 Graceful leadership transfer

This patch implements graceful leadership transfer, as described in the
original Raft thesis. It has the following steps:

1. An admin client sends a request to the tablet leader for it to
   transfer leadership. The client can indicate a specific voter that it
   wants to become the leader, or it can allow the current leader to
   choose its successor.
2. The leader receives the request and beings a leader transfer period.
   During a leader transfer period, the leader does not accept writes or
   config change requests. This allows followers to catch up to the
   leader. A background timer expires the transfer period after one
   election timeout, since clients should be able to ride over
   interruptions in service lasting at least that long.
3. During the transfer period, the leader continues to update peers.
   When it receives a response from a peer, it checks if that peer is
   a voter and fully caught up to the leader's log. If it is, and if it
   is the designated successor if one was provided, the leader signals
   the peer to start an election, which it should win. If no eligible
   successor appears, the transfer period expires and the leader resumes
   normal operation.

This is an improvement over the current leader step down method, which
causes the leader to simply relinquish leadership and snooze its
election timer for an extra long period, so another voter will likely
become leader. Leadership transfer should usually be much faster and it
allows the client to select the new leader among current voters.
However, note that it does not provide strictly better guarantees- it is
still possible that leadership will not be transferred.

I ran TestRepeatLeaderStepDown and TestGracefulLeaderStepDown 1000 times
and 200 times each, in debug and TSAN modes, with 4 stress threads, and
saw no failures.

Change-Id: Ic97343af9eb349556424c999799ed5e2941f0083
---
M src/kudu/consensus/consensus-test-util.h
M src/kudu/consensus/consensus.proto
M src/kudu/consensus/consensus_peers.cc
M src/kudu/consensus/consensus_peers.h
M src/kudu/consensus/consensus_queue.cc
M src/kudu/consensus/consensus_queue.h
M src/kudu/consensus/peer_manager.cc
M src/kudu/consensus/peer_manager.h
M src/kudu/consensus/raft_consensus.cc
M src/kudu/consensus/raft_consensus.h
M src/kudu/integration-tests/raft_consensus-itest.cc
M src/kudu/tools/kudu-admin-test.cc
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tools/tool_action_tablet.cc
M src/kudu/tools/tool_replica_util.cc
M src/kudu/tools/tool_replica_util.h
M src/kudu/tserver/tablet_service.cc
17 files changed, 817 insertions(+), 57 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/51/11251/12
-- 
To view, visit http://gerrit.cloudera.org:8080/11251
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ic97343af9eb349556424c999799ed5e2941f0083
Gerrit-Change-Number: 11251
Gerrit-PatchSet: 12
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Fengling Wang <fw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>

[kudu-CR] KUDU-2245 Graceful leadership transfer

Posted by "Mike Percy (Code Review)" <ge...@cloudera.org>.
Mike Percy has posted comments on this change. ( http://gerrit.cloudera.org:8080/11251 )

Change subject: KUDU-2245 Graceful leadership transfer
......................................................................


Patch Set 18: Code-Review+2

(1 comment)

http://gerrit.cloudera.org:8080/#/c/11251/17/src/kudu/tools/kudu-admin-test.cc
File src/kudu/tools/kudu-admin-test.cc:

http://gerrit.cloudera.org:8080/#/c/11251/17/src/kudu/tools/kudu-admin-test.cc@1571
PS17, Line 1571:     stdout.clear();
> (facepalm)
Thanks, I didn't see that part.



-- 
To view, visit http://gerrit.cloudera.org:8080/11251
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic97343af9eb349556424c999799ed5e2941f0083
Gerrit-Change-Number: 11251
Gerrit-PatchSet: 18
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Fengling Wang <fw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Thu, 18 Oct 2018 00:33:51 +0000
Gerrit-HasComments: Yes

[kudu-CR] [WIP] KUDU-2245 Graceful leadership transfer

Posted by "Will Berkeley (Code Review)" <ge...@cloudera.org>.
Will Berkeley has posted comments on this change. ( http://gerrit.cloudera.org:8080/11251 )

Change subject: [WIP] KUDU-2245 Graceful leadership transfer
......................................................................


Patch Set 4:

(17 comments)

I think I hit most of the small stuff. I need to come back later and add some more tests.

http://gerrit.cloudera.org:8080/#/c/11251/2/src/kudu/consensus/consensus.proto
File src/kudu/consensus/consensus.proto:

http://gerrit.cloudera.org:8080/#/c/11251/2/src/kudu/consensus/consensus.proto@462
PS2, Line 462: Do not set in ABRUPT mode.
> nit: Does it mean it's not effective in ABRUPT mode (i.e. ignored)? 
No. It means the request is rejected as invalid.


http://gerrit.cloudera.org:8080/#/c/11251/2/src/kudu/consensus/consensus_peers.h
File src/kudu/consensus/consensus_peers.h:

http://gerrit.cloudera.org:8080/#/c/11251/2/src/kudu/consensus/consensus_peers.h@82
PS2, Line 82: does not count as a pending request
> Not sure I understand this.  Did you mean pending Raft config change?
This class normally can only handle one request outstanding at a time. I meant that the StartElection does not count as that one request. I updated the comment to make it clearer.


http://gerrit.cloudera.org:8080/#/c/11251/2/src/kudu/consensus/consensus_peers.h@223
PS2, Line 223:   virtual void StartTabletCopy(const StartTabletCopyRequestPB* request,
> warning: parameter 'request' is unused [misc-unused-parameters]
Done


http://gerrit.cloudera.org:8080/#/c/11251/2/src/kudu/consensus/consensus_peers.h@224
PS2, Line 224:                                     StartTabletCopyResponsePB* response,
> warning: parameter 'response' is unused [misc-unused-parameters]
Done


http://gerrit.cloudera.org:8080/#/c/11251/2/src/kudu/consensus/consensus_peers.cc
File src/kudu/consensus/consensus_peers.cc:

http://gerrit.cloudera.org:8080/#/c/11251/2/src/kudu/consensus/consensus_peers.cc@293
PS2, Line 293: &
> nit: is it even needed?  I would expect we don't need to capture anything f
Done


http://gerrit.cloudera.org:8080/#/c/11251/2/src/kudu/consensus/consensus_peers.cc@501
PS2, Line 501:                                  const rpc::ResponseCallback& callback) {
> warning: parameter 'callback' is unused [misc-unused-parameters]
Done


http://gerrit.cloudera.org:8080/#/c/11251/2/src/kudu/consensus/consensus_queue.h
File src/kudu/consensus/consensus_queue.h:

http://gerrit.cloudera.org:8080/#/c/11251/2/src/kudu/consensus/consensus_queue.h@556
PS2, Line 556:   mutable simple_spinlock queue_lock_; // TODO: rename
> warning: missing username/bug in TODO [google-readability-todo]
Done


http://gerrit.cloudera.org:8080/#/c/11251/4/src/kudu/consensus/consensus_queue.cc
File src/kudu/consensus/consensus_queue.cc:

http://gerrit.cloudera.org:8080/#/c/11251/4/src/kudu/consensus/consensus_queue.cc@1040
PS4, Line 1040: TrackedPeer* peer
> Maybe, a const reference instead of a pointer?
Done


http://gerrit.cloudera.org:8080/#/c/11251/4/src/kudu/consensus/consensus_queue.cc@1455
PS4, Line 1455: Unable to notify RaftConsensus of peer to promote.
> This messages seems to be outdated.
Done


http://gerrit.cloudera.org:8080/#/c/11251/4/src/kudu/consensus/peer_manager.h
File src/kudu/consensus/peer_manager.h:

http://gerrit.cloudera.org:8080/#/c/11251/4/src/kudu/consensus/peer_manager.h@66
PS4, Line 66: void
> Why not to return Status ?  That signature would be more robust, and caller
Because the current caller won't do anything useful with the status. It can always be changed in the future if that changes. The method itself logs its reason for failure if it fails.


http://gerrit.cloudera.org:8080/#/c/11251/4/src/kudu/consensus/raft_consensus.cc
File src/kudu/consensus/raft_consensus.cc:

http://gerrit.cloudera.org:8080/#/c/11251/4/src/kudu/consensus/raft_consensus.cc@608
PS4, Line 608: BeginLeaderTransferPeriodUnlocked(new_leader_uuid);
> Is it worth verifying that new_leader_uuid != peer_uuid() prior to kicking 
Yes. We also verify that on the client side and short-circuit in that case.


http://gerrit.cloudera.org:8080/#/c/11251/4/src/kudu/consensus/raft_consensus.cc@984
PS4, Line 984:     LOG_WITH_PREFIX_UNLOCKED(INFO) << "Not signalling peer " << peer_uuid
             :                                    << "to start an election: it's not a voter "
             :                                    << "in the active config.";
> Add 'return;' after logging ?
Done


http://gerrit.cloudera.org:8080/#/c/11251/4/src/kudu/consensus/raft_consensus.cc@990
PS4, Line 990: peer_manager_->StartElection(peer_uuid);
> What if StartElection() returns non-OK status?
Leadership transfer failed. StartElection will log the reason why. This is already after we responded to the RPC (transfer is async) so we can't send an error back.


http://gerrit.cloudera.org:8080/#/c/11251/2/src/kudu/tools/kudu-admin-test.cc
File src/kudu/tools/kudu-admin-test.cc:

http://gerrit.cloudera.org:8080/#/c/11251/2/src/kudu/tools/kudu-admin-test.cc@1205
PS2, Line 1205:                                   tablet_id_, /*index=*/1));
> warning: argument name 'index' in comment does not match parameter name 'mi
Done


http://gerrit.cloudera.org:8080/#/c/11251/2/src/kudu/tools/kudu-admin-test.cc@1252
PS2, Line 1252:                                   tablet_id_, /*index=*/1));
> warning: argument name 'index' in comment does not match parameter name 'mi
Done


http://gerrit.cloudera.org:8080/#/c/11251/4/src/kudu/tools/tool_replica_util.h
File src/kudu/tools/tool_replica_util.h:

http://gerrit.cloudera.org:8080/#/c/11251/4/src/kudu/tools/tool_replica_util.h@57
PS4, Line 57: that the replica with UUID 'leader_uuid' step down.
> nit: the replica with UUID 'leader_uuid' to step down.
Hrm, my phrasing sounds more grammatical to my ear, but I don't have a precise argument for why.


http://gerrit.cloudera.org:8080/#/c/11251/4/src/kudu/tserver/tablet_service.cc
File src/kudu/tserver/tablet_service.cc:

http://gerrit.cloudera.org:8080/#/c/11251/4/src/kudu/tserver/tablet_service.cc@1146
PS4, Line 1146: uuid
> nit: UUID
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/11251
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic97343af9eb349556424c999799ed5e2941f0083
Gerrit-Change-Number: 11251
Gerrit-PatchSet: 4
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Fengling Wang <fw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Fri, 31 Aug 2018 18:30:52 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2245 Graceful leadership transfer

Posted by "Will Berkeley (Code Review)" <ge...@cloudera.org>.
Hello Fengling Wang, Tidy Bot, Mike Percy, Alexey Serbin, Kudu Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/11251

to look at the new patch set (#8).

Change subject: KUDU-2245 Graceful leadership transfer
......................................................................

KUDU-2245 Graceful leadership transfer

This patch implements graceful leadership transfer, as described in the
original Raft thesis. It has the following steps:

1. An admin client sends a request to the tablet leader for it to
   transfer leadership. The client can indicate a specific voter that it
   wants to become the leader, or it can allow the current leader to
   choose its successor.
2. The leader receives the request and beings a leader transfer period.
   During a leader transfer period, the leader does not accept writes or
   config change requests. This allows followers to catch up to the
   leader. A background timer expires the transfer period after one
   election timeout, since clients should be able to ride over
   interruptions in service lasting at least that long.
3. During the transfer period, the leader continues to update peers.
   When it receives a response from a peer, it checks if that peer is
   a voter and fully caught up to the leader's log. If it is, and if it
   is the designated successor if one was provided, the leader signals
   the peer to start an election, which it should win. If no eligible
   successor appears, the transfer period expires and the leader resumes
   normal operation.

This is an improvement over the current leader step down method, which
causes the leader to simply relinquish leadership and snooze its
election timer for an extra long period, so another voter will likely
become leader. Leadership transfer should usually be much faster and it
allows the client to select the new leader among current voters.
However, note that it does not provide strictly better guarantees- it is
still possible that leadership will not be transferred.

I ran TestRepeatLeaderStepDown and TestGracefulLeaderStepDown 1000 times
and 200 times each, in debug and TSAN modes, with 4 stress threads, and
saw no failures.

Still WIP because I want to
* Run some dist-test loops of the rebalancer tests, which now use
  graceful leadership transfer.
* Add a test or two for bad cases where the leadership transfer period
  should expire.
* Quantify how much faster leadership transfer is than abrupt stepdown,
  at least in a lab environment.

Change-Id: Ic97343af9eb349556424c999799ed5e2941f0083
---
M src/kudu/consensus/consensus-test-util.h
M src/kudu/consensus/consensus.proto
M src/kudu/consensus/consensus_peers.cc
M src/kudu/consensus/consensus_peers.h
M src/kudu/consensus/consensus_queue.cc
M src/kudu/consensus/consensus_queue.h
M src/kudu/consensus/peer_manager.cc
M src/kudu/consensus/peer_manager.h
M src/kudu/consensus/raft_consensus.cc
M src/kudu/consensus/raft_consensus.h
M src/kudu/integration-tests/raft_consensus-itest.cc
M src/kudu/tools/kudu-admin-test.cc
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tools/tool_action_tablet.cc
M src/kudu/tools/tool_replica_util.cc
M src/kudu/tools/tool_replica_util.h
M src/kudu/tserver/tablet_service.cc
17 files changed, 815 insertions(+), 55 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/51/11251/8
-- 
To view, visit http://gerrit.cloudera.org:8080/11251
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ic97343af9eb349556424c999799ed5e2941f0083
Gerrit-Change-Number: 11251
Gerrit-PatchSet: 8
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Fengling Wang <fw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>

[kudu-CR] KUDU-2245 Graceful leadership transfer

Posted by "Will Berkeley (Code Review)" <ge...@cloudera.org>.
Will Berkeley has posted comments on this change. ( http://gerrit.cloudera.org:8080/11251 )

Change subject: KUDU-2245 Graceful leadership transfer
......................................................................


Patch Set 15:

(5 comments)

The "signature" of WARN_NOT_OK is WARN_NOT_OK(Status s, string prefix) so I switched around a little how the error messages are constructed but it's the same idea of pushing the error handling up one level.

http://gerrit.cloudera.org:8080/#/c/11251/15/src/kudu/consensus/peer_manager.cc
File src/kudu/consensus/peer_manager.cc:

http://gerrit.cloudera.org:8080/#/c/11251/15/src/kudu/consensus/peer_manager.cc@110
PS15, Line 110: void PeerManager::StartElection(const std::string& uuid) {
> Seems like this function should return a Status so we can decide how to han
Done


http://gerrit.cloudera.org:8080/#/c/11251/15/src/kudu/consensus/peer_manager.cc@119
PS15, Line 119:     return;
> how about:
Done


http://gerrit.cloudera.org:8080/#/c/11251/15/src/kudu/consensus/peer_manager.cc@121
PS15, Line 121:   Status s = peer->StartElection();
> how about;
Done


http://gerrit.cloudera.org:8080/#/c/11251/15/src/kudu/consensus/raft_consensus.cc
File src/kudu/consensus/raft_consensus.cc:

http://gerrit.cloudera.org:8080/#/c/11251/15/src/kudu/consensus/raft_consensus.cc@991
PS15, Line 991:   peer_manager_->StartElection(peer_uuid);
> can we do the warning here?
Done


http://gerrit.cloudera.org:8080/#/c/11251/15/src/kudu/consensus/raft_consensus.cc@991
PS15, Line 991:   peer_manager_->StartElection(peer_uuid);
> i mean something like WARN_NOT_OK here
(thumbsup)



-- 
To view, visit http://gerrit.cloudera.org:8080/11251
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic97343af9eb349556424c999799ed5e2941f0083
Gerrit-Change-Number: 11251
Gerrit-PatchSet: 15
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Fengling Wang <fw...@cloudera.com>
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Tue, 16 Oct 2018 18:41:53 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2245 Graceful leadership transfer

Posted by "Will Berkeley (Code Review)" <ge...@cloudera.org>.
Will Berkeley has posted comments on this change. ( http://gerrit.cloudera.org:8080/11251 )

Change subject: KUDU-2245 Graceful leadership transfer
......................................................................


Patch Set 16:

I doubled the timeout in TestSimultaneousLeaderTransferAndAbruptStepdown and ran the test 1000 times under both ASAN and TSAN using dist-test. No failures.


-- 
To view, visit http://gerrit.cloudera.org:8080/11251
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic97343af9eb349556424c999799ed5e2941f0083
Gerrit-Change-Number: 11251
Gerrit-PatchSet: 16
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Fengling Wang <fw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Tue, 16 Oct 2018 22:58:39 +0000
Gerrit-HasComments: No

[kudu-CR] KUDU-2245 Graceful leadership transfer

Posted by "Will Berkeley (Code Review)" <ge...@cloudera.org>.
Will Berkeley has posted comments on this change. ( http://gerrit.cloudera.org:8080/11251 )

Change subject: KUDU-2245 Graceful leadership transfer
......................................................................


Patch Set 17: Verified+1

Unrelated flakes.


-- 
To view, visit http://gerrit.cloudera.org:8080/11251
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic97343af9eb349556424c999799ed5e2941f0083
Gerrit-Change-Number: 11251
Gerrit-PatchSet: 17
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Fengling Wang <fw...@cloudera.com>
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Wed, 17 Oct 2018 00:37:46 +0000
Gerrit-HasComments: No

[kudu-CR] KUDU-2245 Graceful leadership transfer

Posted by "Will Berkeley (Code Review)" <ge...@cloudera.org>.
Will Berkeley has posted comments on this change. ( http://gerrit.cloudera.org:8080/11251 )

Change subject: KUDU-2245 Graceful leadership transfer
......................................................................


Patch Set 10:

(9 comments)

I tried doing a bit of a refactor on the CheckMoveComplete code but maybe I do not understand it as well as I thought. Maybe we should chat about it tomorrow?

http://gerrit.cloudera.org:8080/#/c/11251/10/src/kudu/tools/tool_replica_util.cc
File src/kudu/tools/tool_replica_util.cc:

http://gerrit.cloudera.org:8080/#/c/11251/10/src/kudu/tools/tool_replica_util.cc@a198
PS10, Line 198: 
0. 'from_ts_uuid' is X (in the failure logs, X = 51a2892fa5c1471fb173f2dda0d7d11a.


http://gerrit.cloudera.org:8080/#/c/11251/10/src/kudu/tools/tool_replica_util.cc@a206
PS10, Line 206: 
              : 
              : 
              : 
              : 
1. We find out the tablet leader and build a proxy to it.

In the failure case, 'from_ts_uuid' equals 'leader_uuid' equals X. 'proxy' is an RPC proxy to X.


http://gerrit.cloudera.org:8080/#/c/11251/10/src/kudu/tools/tool_replica_util.cc@a259
PS10, Line 259: 
              : 
              : 
              : 
              : 
              : 
              : 
3. We do a graceful stepdown of TS with UUID 'leader_uuid', which is X. Note that 'proxy' continues to point to X.


http://gerrit.cloudera.org:8080/#/c/11251/10/src/kudu/tools/tool_replica_util.cc@a282
PS10, Line 282: 
              : 
              : 
4. We re-find the current leader and store it in 'leader_uuid', shadowing 'leader_uuid'. If leadership has already changed and client detects this, 'leader_uuid' is not X anymore. It's now, say, Y. Note that 'proxy' is still a proxy to X.

In the failure logs, Y = d56040e17c444e06b1924246d25f1263.


http://gerrit.cloudera.org:8080/#/c/11251/10/src/kudu/tools/tool_replica_util.cc@a285
PS10, Line 285: 
5. This evaluates to true, since 'from_ts_uuid' is X, the original leader, but now 'leader_uuid' is Y.


http://gerrit.cloudera.org:8080/#/c/11251/10/src/kudu/tools/tool_replica_util.cc@a290
PS10, Line 290: 
6. Uh oh. 'proxy' is a proxy to X, but 'leader_uuid', which determines the destination UUID of the GetConsensus RPC, is Y. Thus the RPC fails with

    Invalid argument: GetConsensusState: Wrong destination UUID requested. Local UUID: X. Requested UUID: Y


http://gerrit.cloudera.org:8080/#/c/11251/10/src/kudu/tools/tool_replica_util.cc@296
PS10, Line 296:     // We should consult the leader for the most up-to-date consensus state.
              :     string new_leader_uuid;
              :     HostPort new_leader_hp;
              :     RETURN_NOT_OK(GetTabletLeader(client, tablet_id, &new_leader_uuid, &new_leader_hp));
> I think this is racy, regardless of the way how leadership changes -- grace
Right, we rely on retries by the caller to handle that sort of thing. 313 is downstream of this leader determination and can't replace it, as I understand the code. We have to find the new leader else we won't know if we've successfully moved leadership from 'from_ts_uuid' (in the case where that's necessary).


http://gerrit.cloudera.org:8080/#/c/11251/10/src/kudu/tools/tool_replica_util.cc@305
PS10, Line 305: leader_uuid
> If leader changed and new leader UUID is detected, why still to keep this o
This is a mistake. We want to check if the current leader's uuid after step down is the same as the 'from_ts_uuid'. Thanks for finding it.


http://gerrit.cloudera.org:8080/#/c/11251/10/src/kudu/tools/tool_replica_util.cc@310
PS10, Line 310: proxy
> I think this code is executed when no DoLeaderStepDown() was called above (
Maybe some naming cleanup  or a bit of refactoring would help? 'from_ts_uuid != leader_uuid' on L305 is saying that (in the case when the leadership started on 'from_ts_uuid') leadership has changed to some other peer.

I added some numbered comments to the original code to better explain what I think is going on, and I'll make an attempt to make it clearer what's going on here, at least as I understand it.



-- 
To view, visit http://gerrit.cloudera.org:8080/11251
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic97343af9eb349556424c999799ed5e2941f0083
Gerrit-Change-Number: 11251
Gerrit-PatchSet: 10
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Fengling Wang <fw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Mon, 24 Sep 2018 08:18:42 +0000
Gerrit-HasComments: Yes

[kudu-CR] [WIP] KUDU-2245 Graceful leadership transfer

Posted by "Will Berkeley (Code Review)" <ge...@cloudera.org>.
Hello Fengling Wang, Tidy Bot, Mike Percy, Alexey Serbin, Kudu Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/11251

to look at the new patch set (#3).

Change subject: [WIP] KUDU-2245 Graceful leadership transfer
......................................................................

[WIP] KUDU-2245 Graceful leadership transfer

This patch implements graceful leadership transfer, as described in the
original Raft thesis. It has the following steps:

1. An admin client sends a request to the tablet leader for it to
   transfer leadership. The client can indicate a specific voter that it
   wants to become the leader, or it can allow the current leader to
   choose its successor.
2. The leader receives the request and beings a leader transfer period.
   During a leader transfer period, the leader does not accept writes or
   config change requests. This allows followers to catch up to the
   leader. A background timer expires the transfer period after one
   election timeout, since clients should be able to ride over
   interruptions in service lasting at least that long.
3. During the transfer period, the leader continues to update peers.
   When it receives a response from a peer, it checks if that peer is
   a voter and fully caught up to the leader's log. If it is, and if it
   is the designated successor if one was provided, the leader signals
   the peer to start an election, which it should win. If no eligible
   successor appears, the transfer period expires and the leader resumes
   normal operation.

This is an improvement over the current leader step down method, which
causes the leader to simply relinquish leadership and snooze its
election timer for an extra long period, so another voter will likely
become leader. Leadership transfer should usually be much faster and it
allows the client to select the new leader among current voters.
However, note that it does not provide strictly better guarantees- it is
still possible that leadership will not be transferred.

I ran TestRepeatLeaderStepDown and TestGracefulLeaderStepDown 1000 times
and 200 times each, in debug and TSAN modes, with 4 stress threads, and
saw no failures.

Still WIP because I want to
* Run some dist-test loops of the rebalancer tests, which now use
  graceful leadership transfer.
* Add a test or two for bad cases where the leadership transfer period
  should expire.
* Quantify how much faster leadership transfer is than abrupt stepdown,
  at least in a lab environment.

Change-Id: Ic97343af9eb349556424c999799ed5e2941f0083
---
M src/kudu/consensus/consensus-test-util.h
M src/kudu/consensus/consensus.proto
M src/kudu/consensus/consensus_peers.cc
M src/kudu/consensus/consensus_peers.h
M src/kudu/consensus/consensus_queue.cc
M src/kudu/consensus/consensus_queue.h
M src/kudu/consensus/peer_manager.cc
M src/kudu/consensus/peer_manager.h
M src/kudu/consensus/raft_consensus.cc
M src/kudu/consensus/raft_consensus.h
M src/kudu/tools/kudu-admin-test.cc
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tools/tool_action_tablet.cc
M src/kudu/tools/tool_replica_util.cc
M src/kudu/tools/tool_replica_util.h
M src/kudu/tserver/tablet_service.cc
16 files changed, 530 insertions(+), 45 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/51/11251/3
-- 
To view, visit http://gerrit.cloudera.org:8080/11251
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ic97343af9eb349556424c999799ed5e2941f0083
Gerrit-Change-Number: 11251
Gerrit-PatchSet: 3
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Fengling Wang <fw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot

[kudu-CR] KUDU-2245 Graceful leadership transfer

Posted by "Mike Percy (Code Review)" <ge...@cloudera.org>.
Mike Percy has posted comments on this change. ( http://gerrit.cloudera.org:8080/11251 )

Change subject: KUDU-2245 Graceful leadership transfer
......................................................................


Patch Set 7:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/11251/7/src/kudu/consensus/consensus.proto
File src/kudu/consensus/consensus.proto:

http://gerrit.cloudera.org:8080/#/c/11251/7/src/kudu/consensus/consensus.proto@463
PS7, Line 463:   optional bytes new_leader_uuid = 4;
Why do we need this? Wouldn't it be better for the leader to always simply choose the first voter that catches up?



-- 
To view, visit http://gerrit.cloudera.org:8080/11251
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic97343af9eb349556424c999799ed5e2941f0083
Gerrit-Change-Number: 11251
Gerrit-PatchSet: 7
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Fengling Wang <fw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Thu, 20 Sep 2018 23:33:40 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2245 Graceful leadership transfer

Posted by "Will Berkeley (Code Review)" <ge...@cloudera.org>.
Will Berkeley has posted comments on this change. ( http://gerrit.cloudera.org:8080/11251 )

Change subject: KUDU-2245 Graceful leadership transfer
......................................................................


Patch Set 13:

(13 comments)

PeerMessageQueue::DetermineIfCaughtUp is called for every peer on update, and one condition under which it exits early during a leadership transfer period is when that period has a designated successor and the peer isn't it. But if there's no designated successor the function continues, so if there's no designated successor the first peer to be identified as caught up will be transferred leadership.

http://gerrit.cloudera.org:8080/#/c/11251/13//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/11251/13//COMMIT_MSG@16
PS13, Line 16: beings
> begins
Done


http://gerrit.cloudera.org:8080/#/c/11251/13//COMMIT_MSG@42
PS13, Line 42: Still WIP
> Still still WIP?
Negatory.


http://gerrit.cloudera.org:8080/#/c/11251/13/src/kudu/consensus/consensus_peers.cc
File src/kudu/consensus/consensus_peers.cc:

http://gerrit.cloudera.org:8080/#/c/11251/13/src/kudu/consensus/consensus_peers.cc@501
PS13, Line 501:                                  const rpc::ResponseCallback& /*callback*/) {
              :   controller->set_timeout(MonoDelta::FromMilliseconds(FLAGS_consensus_rpc_timeout_ms));
              :   consensus_proxy_->RunLeaderElection(*request, response, controller);
> we are mixing async and sync calls here but ignoring the output from the sy
Yes it's meant to be sync. The result of the call is handled by the caller, who owns the parameter.


http://gerrit.cloudera.org:8080/#/c/11251/13/src/kudu/consensus/consensus_queue.h
File src/kudu/consensus/consensus_queue.h:

http://gerrit.cloudera.org:8080/#/c/11251/13/src/kudu/consensus/consensus_queue.h@556
PS13, Line 556: TODO(unknown)
> TODO(todd)
Done


http://gerrit.cloudera.org:8080/#/c/11251/13/src/kudu/consensus/consensus_queue.cc
File src/kudu/consensus/consensus_queue.cc:

http://gerrit.cloudera.org:8080/#/c/11251/13/src/kudu/consensus/consensus_queue.cc@1071
PS13, Line 1071: DVLOG
> how about just VLOG
Done


http://gerrit.cloudera.org:8080/#/c/11251/13/src/kudu/consensus/consensus_queue.cc@1451
PS13, Line 1451:   WARN_NOT_OK(raft_pool_observers_token_->SubmitClosure(
> add: DCHECK(queue_lock_.is_locked());
Done


http://gerrit.cloudera.org:8080/#/c/11251/13/src/kudu/consensus/raft_consensus.cc
File src/kudu/consensus/raft_consensus.cc:

http://gerrit.cloudera.org:8080/#/c/11251/13/src/kudu/consensus/raft_consensus.cc@598
PS13, Line 598: new_leader_uuid.get()
> nit: it's more idiomatic to use *new_leader_uuid with optionals vs. get(), 
Done


http://gerrit.cloudera.org:8080/#/c/11251/13/src/kudu/consensus/raft_consensus.cc@601
PS13, Line 601:       return Status::OK();
> Is there a use case where we want to support this? Otherwise it seems like 
I don't think it's illegal- the tool moves leadership to a requested peer. If that peer is already leader, that's good, not an error. It also makes coding tests harder because the "oops I was already leader so I gave you an error but everything is actually fine" case would need to be handled.


http://gerrit.cloudera.org:8080/#/c/11251/13/src/kudu/consensus/raft_consensus.cc@608
PS13, Line 608:       resp->mutable_error()->set_code(TabletServerErrorPB::UNKNOWN_ERROR);
              :       StatusToPB(Status::InvalidArgument(msg), resp->mutable_error()->mutable_status());
              :       // We return OK so that the tablet service won't overwrite the error code.
              :       return Status::OK();
> why not just return Status::InvalidArgument in this case and let TabletServ
Done


http://gerrit.cloudera.org:8080/#/c/11251/13/src/kudu/consensus/raft_consensus.cc@614
PS13, Line 614:   BeginLeaderTransferPeriodUnlocked(new_leader_uuid);
> Is there something that prevents us from clobbering an existing leadership 
Nope, the new one replaces the old one. I guess it might be nice to only allow one to occur at once (like your code change below suggests).


http://gerrit.cloudera.org:8080/#/c/11251/13/src/kudu/consensus/raft_consensus.cc@622
PS13, Line 622: transfer_period_timer_->Snooze()
> What is the use case for this?
None once we only allow one LT at a time.


http://gerrit.cloudera.org:8080/#/c/11251/13/src/kudu/consensus/raft_consensus.cc@631
PS13, Line 631:   leader_transfer_in_progress_.Store(true, kMemOrderAcquire);
> how about:
Done


http://gerrit.cloudera.org:8080/#/c/11251/13/src/kudu/tools/kudu-admin-test.cc
File src/kudu/tools/kudu-admin-test.cc:

http://gerrit.cloudera.org:8080/#/c/11251/13/src/kudu/tools/kudu-admin-test.cc@1448
PS13, Line 1448: TEST_F(AdminCliTest, TestSimultaneousLeaderTransferAndAbruptStepdown) {
> This test could use an explanation -- I'm a little confused about the purpo
Explanation added.



-- 
To view, visit http://gerrit.cloudera.org:8080/11251
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic97343af9eb349556424c999799ed5e2941f0083
Gerrit-Change-Number: 11251
Gerrit-PatchSet: 13
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Fengling Wang <fw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Mon, 15 Oct 2018 17:20:53 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2245 Graceful leadership transfer

Posted by "Will Berkeley (Code Review)" <ge...@cloudera.org>.
Will Berkeley has removed Kudu Jenkins from this change.  ( http://gerrit.cloudera.org:8080/11251 )

Change subject: KUDU-2245 Graceful leadership transfer
......................................................................


Removed reviewer Kudu Jenkins with the following votes:

* Verified-1 by Kudu Jenkins (120)
-- 
To view, visit http://gerrit.cloudera.org:8080/11251
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: deleteReviewer
Gerrit-Change-Id: Ic97343af9eb349556424c999799ed5e2941f0083
Gerrit-Change-Number: 11251
Gerrit-PatchSet: 17
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Fengling Wang <fw...@cloudera.com>
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>

[kudu-CR] KUDU-2245 Graceful leadership transfer

Posted by "Mike Percy (Code Review)" <ge...@cloudera.org>.
Mike Percy has posted comments on this change. ( http://gerrit.cloudera.org:8080/11251 )

Change subject: KUDU-2245 Graceful leadership transfer
......................................................................


Patch Set 15:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/11251/15/src/kudu/consensus/raft_consensus.cc
File src/kudu/consensus/raft_consensus.cc:

http://gerrit.cloudera.org:8080/#/c/11251/15/src/kudu/consensus/raft_consensus.cc@991
PS15, Line 991:   peer_manager_->StartElection(peer_uuid);
> can we do the warning here?
i mean something like WARN_NOT_OK here



-- 
To view, visit http://gerrit.cloudera.org:8080/11251
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic97343af9eb349556424c999799ed5e2941f0083
Gerrit-Change-Number: 11251
Gerrit-PatchSet: 15
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Fengling Wang <fw...@cloudera.com>
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Tue, 16 Oct 2018 05:11:41 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2245 Graceful leadership transfer

Posted by "Will Berkeley (Code Review)" <ge...@cloudera.org>.
Will Berkeley has posted comments on this change. ( http://gerrit.cloudera.org:8080/11251 )

Change subject: KUDU-2245 Graceful leadership transfer
......................................................................


Patch Set 8:

I think this uncovered a bug in CheckCompleteMove that was hidden by abrupt leader transfer because it is much slower than graceful transfer. I added comments to the new patch set explaining.


-- 
To view, visit http://gerrit.cloudera.org:8080/11251
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic97343af9eb349556424c999799ed5e2941f0083
Gerrit-Change-Number: 11251
Gerrit-PatchSet: 8
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Fengling Wang <fw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Fri, 21 Sep 2018 20:05:50 +0000
Gerrit-HasComments: No

[kudu-CR] KUDU-2245 Graceful leadership transfer

Posted by "Will Berkeley (Code Review)" <ge...@cloudera.org>.
Hello Fengling Wang, Tidy Bot, Mike Percy, Alexey Serbin, Kudu Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/11251

to look at the new patch set (#9).

Change subject: KUDU-2245 Graceful leadership transfer
......................................................................

KUDU-2245 Graceful leadership transfer

This patch implements graceful leadership transfer, as described in the
original Raft thesis. It has the following steps:

1. An admin client sends a request to the tablet leader for it to
   transfer leadership. The client can indicate a specific voter that it
   wants to become the leader, or it can allow the current leader to
   choose its successor.
2. The leader receives the request and beings a leader transfer period.
   During a leader transfer period, the leader does not accept writes or
   config change requests. This allows followers to catch up to the
   leader. A background timer expires the transfer period after one
   election timeout, since clients should be able to ride over
   interruptions in service lasting at least that long.
3. During the transfer period, the leader continues to update peers.
   When it receives a response from a peer, it checks if that peer is
   a voter and fully caught up to the leader's log. If it is, and if it
   is the designated successor if one was provided, the leader signals
   the peer to start an election, which it should win. If no eligible
   successor appears, the transfer period expires and the leader resumes
   normal operation.

This is an improvement over the current leader step down method, which
causes the leader to simply relinquish leadership and snooze its
election timer for an extra long period, so another voter will likely
become leader. Leadership transfer should usually be much faster and it
allows the client to select the new leader among current voters.
However, note that it does not provide strictly better guarantees- it is
still possible that leadership will not be transferred.

I ran TestRepeatLeaderStepDown and TestGracefulLeaderStepDown 1000 times
and 200 times each, in debug and TSAN modes, with 4 stress threads, and
saw no failures.

Still WIP because I want to
* Run some dist-test loops of the rebalancer tests, which now use
  graceful leadership transfer.
* Add a test or two for bad cases where the leadership transfer period
  should expire.
* Quantify how much faster leadership transfer is than abrupt stepdown,
  at least in a lab environment.

Change-Id: Ic97343af9eb349556424c999799ed5e2941f0083
---
M src/kudu/consensus/consensus-test-util.h
M src/kudu/consensus/consensus.proto
M src/kudu/consensus/consensus_peers.cc
M src/kudu/consensus/consensus_peers.h
M src/kudu/consensus/consensus_queue.cc
M src/kudu/consensus/consensus_queue.h
M src/kudu/consensus/peer_manager.cc
M src/kudu/consensus/peer_manager.h
M src/kudu/consensus/raft_consensus.cc
M src/kudu/consensus/raft_consensus.h
M src/kudu/integration-tests/raft_consensus-itest.cc
M src/kudu/tools/kudu-admin-test.cc
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tools/tool_action_tablet.cc
M src/kudu/tools/tool_replica_util.cc
M src/kudu/tools/tool_replica_util.h
M src/kudu/tserver/tablet_service.cc
17 files changed, 826 insertions(+), 59 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/51/11251/9
-- 
To view, visit http://gerrit.cloudera.org:8080/11251
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ic97343af9eb349556424c999799ed5e2941f0083
Gerrit-Change-Number: 11251
Gerrit-PatchSet: 9
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Fengling Wang <fw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>

[kudu-CR] KUDU-2245 Graceful leadership transfer

Posted by "Will Berkeley (Code Review)" <ge...@cloudera.org>.
Hello Fengling Wang, Tidy Bot, Mike Percy, Alexey Serbin, Kudu Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/11251

to look at the new patch set (#17).

Change subject: KUDU-2245 Graceful leadership transfer
......................................................................

KUDU-2245 Graceful leadership transfer

This patch implements graceful leadership transfer, as described in the
original Raft thesis. It has the following steps:

1. An admin client sends a request to the tablet leader for it to
   transfer leadership. The client can indicate a specific voter that it
   wants to become the leader, or it can allow the current leader to
   choose its successor.
2. The leader receives the request and begins a leader transfer period.
   During a leader transfer period, the leader does not accept writes or
   config change requests. This allows followers to catch up to the
   leader. A background timer expires the transfer period after one
   election timeout, since clients should be able to ride over
   interruptions in service lasting at least that long. If another
   request to transfer leadership is received during a transfer period,
   it will be rejected.
3. During the transfer period, the leader continues to update peers.
   When it receives a response from a peer, it checks if that peer is
   a voter and fully caught up to the leader's log. If it is, and if it
   is the designated successor if one was provided, the leader signals
   the peer to start an election, which it should win. If no eligible
   successor appears, the transfer period expires and the leader resumes
   normal operation.

This is an improvement over the current leader step down method, which
causes the leader to simply relinquish leadership and snooze its
election timer for an extra long period, so another voter will likely
become leader. Leadership transfer should usually be much faster and it
allows the client to select the new leader among current voters.
However, note that it does not provide strictly better guarantees- it is
still possible that leadership will not be transferred.

I ran TestRepeatLeaderStepDown and TestGracefulLeaderStepDown 1000 times
and 200 times each, in debug and TSAN modes, with 4 stress threads, and
saw no failures.

Change-Id: Ic97343af9eb349556424c999799ed5e2941f0083
---
M src/kudu/consensus/consensus-test-util.h
M src/kudu/consensus/consensus.proto
M src/kudu/consensus/consensus_peers.cc
M src/kudu/consensus/consensus_peers.h
M src/kudu/consensus/consensus_queue.cc
M src/kudu/consensus/consensus_queue.h
M src/kudu/consensus/peer_manager.cc
M src/kudu/consensus/peer_manager.h
M src/kudu/consensus/raft_consensus.cc
M src/kudu/consensus/raft_consensus.h
M src/kudu/integration-tests/raft_consensus-itest.cc
M src/kudu/tools/kudu-admin-test.cc
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tools/tool_action_tablet.cc
M src/kudu/tools/tool_replica_util.cc
M src/kudu/tools/tool_replica_util.h
M src/kudu/tserver/tablet_service.cc
17 files changed, 829 insertions(+), 67 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/51/11251/17
-- 
To view, visit http://gerrit.cloudera.org:8080/11251
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ic97343af9eb349556424c999799ed5e2941f0083
Gerrit-Change-Number: 11251
Gerrit-PatchSet: 17
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Fengling Wang <fw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>

[kudu-CR] KUDU-2245 Graceful leadership transfer

Posted by "Will Berkeley (Code Review)" <ge...@cloudera.org>.
Will Berkeley has posted comments on this change. ( http://gerrit.cloudera.org:8080/11251 )

Change subject: KUDU-2245 Graceful leadership transfer
......................................................................


Patch Set 7:

(30 comments)

http://gerrit.cloudera.org:8080/#/c/11251/7/src/kudu/consensus/consensus-test-util.h
File src/kudu/consensus/consensus-test-util.h:

http://gerrit.cloudera.org:8080/#/c/11251/7/src/kudu/consensus/consensus-test-util.h@230
PS7, Line 230:   virtual void StartElection(const RunLeaderElectionRequestPB* request,
> warning: parameter 'request' is unused [misc-unused-parameters]
Done


http://gerrit.cloudera.org:8080/#/c/11251/7/src/kudu/consensus/consensus-test-util.h@231
PS7, Line 231:                              RunLeaderElectionResponsePB* response,
> warning: parameter 'response' is unused [misc-unused-parameters]
Done


http://gerrit.cloudera.org:8080/#/c/11251/7/src/kudu/consensus/consensus-test-util.h@232
PS7, Line 232:                              rpc::RpcController* controller,
> warning: parameter 'controller' is unused [misc-unused-parameters]
Done


http://gerrit.cloudera.org:8080/#/c/11251/7/src/kudu/consensus/consensus-test-util.h@233
PS7, Line 233:                              const rpc::ResponseCallback& callback) override {}
> warning: parameter 'callback' is unused [misc-unused-parameters]
Done


http://gerrit.cloudera.org:8080/#/c/11251/7/src/kudu/consensus/consensus-test-util.h@299
PS7, Line 299:   virtual void StartElection(const RunLeaderElectionRequestPB* request,
> warning: parameter 'request' is unused [misc-unused-parameters]
Done


http://gerrit.cloudera.org:8080/#/c/11251/7/src/kudu/consensus/consensus-test-util.h@299
PS7, Line 299: virtual 
> nit: drop 'virtual' since 'override' is used.
Done


http://gerrit.cloudera.org:8080/#/c/11251/7/src/kudu/consensus/consensus-test-util.h@300
PS7, Line 300:                              RunLeaderElectionResponsePB* response,
> warning: parameter 'response' is unused [misc-unused-parameters]
Done


http://gerrit.cloudera.org:8080/#/c/11251/7/src/kudu/consensus/consensus-test-util.h@301
PS7, Line 301:                              rpc::RpcController* controller,
> warning: parameter 'controller' is unused [misc-unused-parameters]
Done


http://gerrit.cloudera.org:8080/#/c/11251/7/src/kudu/consensus/consensus-test-util.h@302
PS7, Line 302:                              const rpc::ResponseCallback& callback) override {}
> warning: parameter 'callback' is unused [misc-unused-parameters]
Done


http://gerrit.cloudera.org:8080/#/c/11251/7/src/kudu/consensus/consensus-test-util.h@355
PS7, Line 355: virtual 
> nit: drop
Done


http://gerrit.cloudera.org:8080/#/c/11251/7/src/kudu/consensus/consensus-test-util.h@355
PS7, Line 355:   virtual void StartElection(const RunLeaderElectionRequestPB* requestoverride ,
> warning: parameter 'requestoverride' is unused [misc-unused-parameters]
Done


http://gerrit.cloudera.org:8080/#/c/11251/7/src/kudu/consensus/consensus-test-util.h@356
PS7, Line 356:                              RunLeaderElectionResponsePB* response,
> warning: parameter 'response' is unused [misc-unused-parameters]
Done


http://gerrit.cloudera.org:8080/#/c/11251/7/src/kudu/consensus/consensus-test-util.h@357
PS7, Line 357:                              rpc::RpcController* controller,
> warning: parameter 'controller' is unused [misc-unused-parameters]
Done


http://gerrit.cloudera.org:8080/#/c/11251/7/src/kudu/consensus/consensus-test-util.h@358
PS7, Line 358:                              const rpc::ResponseCallback& callback) override {}
> warning: parameter 'callback' is unused [misc-unused-parameters]
Done


http://gerrit.cloudera.org:8080/#/c/11251/7/src/kudu/consensus/consensus-test-util.h@362
PS7, Line 362:                                          rpc::RpcController* controller,
> warning: parameter 'controller' is unused [misc-unused-parameters]
Done


http://gerrit.cloudera.org:8080/#/c/11251/7/src/kudu/consensus/consensus-test-util.h@482
PS7, Line 482:   virtual void StartElection(const RunLeaderElectionRequestPB* request,
> warning: parameter 'request' is unused [misc-unused-parameters]
Done


http://gerrit.cloudera.org:8080/#/c/11251/7/src/kudu/consensus/consensus-test-util.h@482
PS7, Line 482: virtual 
> nit: drop
Done


http://gerrit.cloudera.org:8080/#/c/11251/7/src/kudu/consensus/consensus-test-util.h@483
PS7, Line 483:                              RunLeaderElectionResponsePB* response,
> warning: parameter 'response' is unused [misc-unused-parameters]
Done


http://gerrit.cloudera.org:8080/#/c/11251/7/src/kudu/consensus/consensus-test-util.h@484
PS7, Line 484:                              rpc::RpcController* controller,
> warning: parameter 'controller' is unused [misc-unused-parameters]
Done


http://gerrit.cloudera.org:8080/#/c/11251/7/src/kudu/consensus/consensus-test-util.h@485
PS7, Line 485:                              const rpc::ResponseCallback& callback) override {}
> warning: parameter 'callback' is unused [misc-unused-parameters]
Done


http://gerrit.cloudera.org:8080/#/c/11251/7/src/kudu/consensus/consensus-test-util.h@489
PS7, Line 489:                                          rpc::RpcController* controller,
> warning: parameter 'controller' is unused [misc-unused-parameters]
Done


http://gerrit.cloudera.org:8080/#/c/11251/7/src/kudu/consensus/consensus.proto
File src/kudu/consensus/consensus.proto:

http://gerrit.cloudera.org:8080/#/c/11251/7/src/kudu/consensus/consensus.proto@463
PS7, Line 463:   optional bytes new_leader_uuid = 4;
> Why do we need this? Wouldn't it be better for the leader to always simply 
It's useful for leadership rebalancing and decommissioning to be able to pick a server.


http://gerrit.cloudera.org:8080/#/c/11251/7/src/kudu/consensus/consensus_peers.h
File src/kudu/consensus/consensus_peers.h:

http://gerrit.cloudera.org:8080/#/c/11251/7/src/kudu/consensus/consensus_peers.h@226
PS7, Line 226:                                     rpc::RpcController* controller,
> warning: parameter 'controller' is unused [misc-unused-parameters]
Done


http://gerrit.cloudera.org:8080/#/c/11251/7/src/kudu/consensus/consensus_peers.h@227
PS7, Line 227:                                     const rpc::ResponseCallback& callback) {
> warning: parameter 'callback' is unused [misc-unused-parameters]
Done


http://gerrit.cloudera.org:8080/#/c/11251/7/src/kudu/consensus/consensus_queue.h
File src/kudu/consensus/consensus_queue.h:

http://gerrit.cloudera.org:8080/#/c/11251/7/src/kudu/consensus/consensus_queue.h@362
PS7, Line 362: boost:none
> boost::none
Done


http://gerrit.cloudera.org:8080/#/c/11251/7/src/kudu/consensus/consensus_queue.h@365
PS7, Line 365: WatchForSuccessor
> nit: maybe, rename to BeginWatchForSuccessor() to pair with the EndWatchFor
Done


http://gerrit.cloudera.org:8080/#/c/11251/7/src/kudu/consensus/consensus_queue.h@559
PS7, Line 559: designated_successor_
> designated_successor_uuid_ ?
Done


http://gerrit.cloudera.org:8080/#/c/11251/7/src/kudu/consensus/consensus_queue.h@593
PS7, Line 593:   virtual void NotifyPeerToStartElection(const std::string& peer_uuid) = 0;
> Add documentation.
Done


http://gerrit.cloudera.org:8080/#/c/11251/7/src/kudu/integration-tests/raft_consensus-itest.cc
File src/kudu/integration-tests/raft_consensus-itest.cc:

http://gerrit.cloudera.org:8080/#/c/11251/7/src/kudu/integration-tests/raft_consensus-itest.cc@3054
PS7, Line 3054: 
              : 
> nit: two extra lines
Done


http://gerrit.cloudera.org:8080/#/c/11251/7/src/kudu/tools/kudu-admin-test.cc
File src/kudu/tools/kudu-admin-test.cc:

http://gerrit.cloudera.org:8080/#/c/11251/7/src/kudu/tools/kudu-admin-test.cc@1514
PS7, Line 1514: , { }
> nit: drop
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/11251
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic97343af9eb349556424c999799ed5e2941f0083
Gerrit-Change-Number: 11251
Gerrit-PatchSet: 7
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Fengling Wang <fw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Fri, 21 Sep 2018 16:17:43 +0000
Gerrit-HasComments: Yes

[kudu-CR] [WIP] KUDU-2245 Graceful leadership transfer

Posted by "Will Berkeley (Code Review)" <ge...@cloudera.org>.
Hello Fengling Wang, Tidy Bot, Mike Percy, Alexey Serbin, Kudu Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/11251

to look at the new patch set (#4).

Change subject: [WIP] KUDU-2245 Graceful leadership transfer
......................................................................

[WIP] KUDU-2245 Graceful leadership transfer

This patch implements graceful leadership transfer, as described in the
original Raft thesis. It has the following steps:

1. An admin client sends a request to the tablet leader for it to
   transfer leadership. The client can indicate a specific voter that it
   wants to become the leader, or it can allow the current leader to
   choose its successor.
2. The leader receives the request and beings a leader transfer period.
   During a leader transfer period, the leader does not accept writes or
   config change requests. This allows followers to catch up to the
   leader. A background timer expires the transfer period after one
   election timeout, since clients should be able to ride over
   interruptions in service lasting at least that long.
3. During the transfer period, the leader continues to update peers.
   When it receives a response from a peer, it checks if that peer is
   a voter and fully caught up to the leader's log. If it is, and if it
   is the designated successor if one was provided, the leader signals
   the peer to start an election, which it should win. If no eligible
   successor appears, the transfer period expires and the leader resumes
   normal operation.

This is an improvement over the current leader step down method, which
causes the leader to simply relinquish leadership and snooze its
election timer for an extra long period, so another voter will likely
become leader. Leadership transfer should usually be much faster and it
allows the client to select the new leader among current voters.
However, note that it does not provide strictly better guarantees- it is
still possible that leadership will not be transferred.

I ran TestRepeatLeaderStepDown and TestGracefulLeaderStepDown 1000 times
and 200 times each, in debug and TSAN modes, with 4 stress threads, and
saw no failures.

Still WIP because I want to
* Run some dist-test loops of the rebalancer tests, which now use
  graceful leadership transfer.
* Add a test or two for bad cases where the leadership transfer period
  should expire.
* Quantify how much faster leadership transfer is than abrupt stepdown,
  at least in a lab environment.

Change-Id: Ic97343af9eb349556424c999799ed5e2941f0083
---
M src/kudu/consensus/consensus-test-util.h
M src/kudu/consensus/consensus.proto
M src/kudu/consensus/consensus_peers.cc
M src/kudu/consensus/consensus_peers.h
M src/kudu/consensus/consensus_queue.cc
M src/kudu/consensus/consensus_queue.h
M src/kudu/consensus/peer_manager.cc
M src/kudu/consensus/peer_manager.h
M src/kudu/consensus/raft_consensus.cc
M src/kudu/consensus/raft_consensus.h
M src/kudu/tools/kudu-admin-test.cc
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tools/tool_action_tablet.cc
M src/kudu/tools/tool_replica_util.cc
M src/kudu/tools/tool_replica_util.h
M src/kudu/tserver/tablet_service.cc
16 files changed, 531 insertions(+), 47 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/51/11251/4
-- 
To view, visit http://gerrit.cloudera.org:8080/11251
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ic97343af9eb349556424c999799ed5e2941f0083
Gerrit-Change-Number: 11251
Gerrit-PatchSet: 4
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Fengling Wang <fw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot

[kudu-CR] KUDU-2245 Graceful leadership transfer

Posted by "Mike Percy (Code Review)" <ge...@cloudera.org>.
Mike Percy has posted comments on this change. ( http://gerrit.cloudera.org:8080/11251 )

Change subject: KUDU-2245 Graceful leadership transfer
......................................................................


Patch Set 14:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/11251/13/src/kudu/consensus/consensus_peers.cc
File src/kudu/consensus/consensus_peers.cc:

http://gerrit.cloudera.org:8080/#/c/11251/13/src/kudu/consensus/consensus_peers.cc@501
PS13, Line 501:   controller->set_timeout(MonoDelta::FromMilliseconds(FLAGS_consensus_rpc_timeout_ms));
              :   consensus_proxy_->RunLeaderElection(*request, response, controller);
              : }
> Yes it's meant to be sync. The result of the call is handled by the caller,
How about:

  Status RpcPeerProxy::StartElection(const RunLeaderElectionRequestPB* request,
                                     RunLeaderElectionResponsePB* response,
                                     rpc::RpcController* controller) {
    controller->set_timeout(MonoDelta::FromMilliseconds(FLAGS_consensus_rpc_timeout_ms));
    return consensus_proxy_->RunLeaderElection(*request, response, controller);
  }

And while you're at it, mind renaming StartTabletCopy to StartTabletCopyAsync?



-- 
To view, visit http://gerrit.cloudera.org:8080/11251
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic97343af9eb349556424c999799ed5e2941f0083
Gerrit-Change-Number: 11251
Gerrit-PatchSet: 14
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Fengling Wang <fw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Mon, 15 Oct 2018 21:16:54 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2245 Graceful leadership transfer

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/11251 )

Change subject: KUDU-2245 Graceful leadership transfer
......................................................................


Patch Set 7:

(14 comments)

Few more nits.

http://gerrit.cloudera.org:8080/#/c/11251/7//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/11251/7//COMMIT_MSG@10
PS7, Line 10: original Raft thesis
I think a reference (like https://raft.github.io/raft.pdf) would not hurt.


http://gerrit.cloudera.org:8080/#/c/11251/7//COMMIT_MSG@16
PS7, Line 16: beings
begins

Maybe replace with 'starts' ?


http://gerrit.cloudera.org:8080/#/c/11251/7//COMMIT_MSG@33
PS7, Line 33: Leadership transfer should usually be much faster and it
            : allows the client to select the new leader among current voters.
Did you do any measurements in that regard?  If yes, I think it would be nice to mention the summary of your findings in this commit message.


http://gerrit.cloudera.org:8080/#/c/11251/7//COMMIT_MSG@42
PS7, Line 42: WIP
Is it still WIP patch?  The WIP tag in the first line is gone.  Should I assume those 3 items below are still TODO or already done?


http://gerrit.cloudera.org:8080/#/c/11251/7/src/kudu/consensus/consensus-test-util.h
File src/kudu/consensus/consensus-test-util.h:

http://gerrit.cloudera.org:8080/#/c/11251/7/src/kudu/consensus/consensus-test-util.h@299
PS7, Line 299: virtual 
nit: drop 'virtual' since 'override' is used.


http://gerrit.cloudera.org:8080/#/c/11251/7/src/kudu/consensus/consensus-test-util.h@355
PS7, Line 355: virtual 
nit: drop


http://gerrit.cloudera.org:8080/#/c/11251/7/src/kudu/consensus/consensus-test-util.h@482
PS7, Line 482: virtual 
nit: drop


http://gerrit.cloudera.org:8080/#/c/11251/7/src/kudu/consensus/consensus_queue.h
File src/kudu/consensus/consensus_queue.h:

http://gerrit.cloudera.org:8080/#/c/11251/7/src/kudu/consensus/consensus_queue.h@362
PS7, Line 362: boost:none
boost::none


http://gerrit.cloudera.org:8080/#/c/11251/7/src/kudu/consensus/consensus_queue.h@365
PS7, Line 365: WatchForSuccessor
nit: maybe, rename to BeginWatchForSuccessor() to pair with the EndWatchForSuccessor() below.


http://gerrit.cloudera.org:8080/#/c/11251/7/src/kudu/consensus/consensus_queue.h@559
PS7, Line 559: designated_successor_
designated_successor_uuid_ ?


http://gerrit.cloudera.org:8080/#/c/11251/7/src/kudu/consensus/consensus_queue.h@593
PS7, Line 593:   virtual void NotifyPeerToStartElection(const std::string& peer_uuid) = 0;
Add documentation.


http://gerrit.cloudera.org:8080/#/c/11251/7/src/kudu/integration-tests/raft_consensus-itest.cc
File src/kudu/integration-tests/raft_consensus-itest.cc:

http://gerrit.cloudera.org:8080/#/c/11251/7/src/kudu/integration-tests/raft_consensus-itest.cc@3001
PS7, Line 3001: TEST_F(RaftConsensusITest, TestLeaderTransferWhenFollowerFallsBehindLeaderGC) {
Nice test.  How long does it run?  If more than 5 seconds, consider adding

  if (!AllowSlowTests()) {
    LOG(WARNING) << "test is skipped; set KUDU_ALLOW_SLOW_TESTS=1 to run";
    return;
  }


http://gerrit.cloudera.org:8080/#/c/11251/7/src/kudu/integration-tests/raft_consensus-itest.cc@3054
PS7, Line 3054: 
              : 
nit: two extra lines


http://gerrit.cloudera.org:8080/#/c/11251/7/src/kudu/tools/kudu-admin-test.cc
File src/kudu/tools/kudu-admin-test.cc:

http://gerrit.cloudera.org:8080/#/c/11251/7/src/kudu/tools/kudu-admin-test.cc@1514
PS7, Line 1514: , { }
nit: drop



-- 
To view, visit http://gerrit.cloudera.org:8080/11251
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic97343af9eb349556424c999799ed5e2941f0083
Gerrit-Change-Number: 11251
Gerrit-PatchSet: 7
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Fengling Wang <fw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Wed, 12 Sep 2018 06:42:29 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2245 Graceful leadership transfer

Posted by "Mike Percy (Code Review)" <ge...@cloudera.org>.
Mike Percy has posted comments on this change. ( http://gerrit.cloudera.org:8080/11251 )

Change subject: KUDU-2245 Graceful leadership transfer
......................................................................


Patch Set 18:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/11251/17/src/kudu/tools/kudu-admin-test.cc
File src/kudu/tools/kudu-admin-test.cc:

http://gerrit.cloudera.org:8080/#/c/11251/17/src/kudu/tools/kudu-admin-test.cc@1571
PS17, Line 1571:     stdout.clear();
> This seems to be the only test we have for non-designated graceful leader s
didn't see a reply posted yet, not sure if you saw this comment



-- 
To view, visit http://gerrit.cloudera.org:8080/11251
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic97343af9eb349556424c999799ed5e2941f0083
Gerrit-Change-Number: 11251
Gerrit-PatchSet: 18
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Fengling Wang <fw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Thu, 18 Oct 2018 00:13:19 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2245 Graceful leadership transfer

Posted by "Will Berkeley (Code Review)" <ge...@cloudera.org>.
Will Berkeley has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/11251 )

Change subject: KUDU-2245 Graceful leadership transfer
......................................................................

KUDU-2245 Graceful leadership transfer

This patch implements graceful leadership transfer, as described in the
original Raft thesis. It has the following steps:

1. An admin client sends a request to the tablet leader for it to
   transfer leadership. The client can indicate a specific voter that it
   wants to become the leader, or it can allow the current leader to
   choose its successor.
2. The leader receives the request and begins a leader transfer period.
   During a leader transfer period, the leader does not accept writes or
   config change requests. This allows followers to catch up to the
   leader. A background timer expires the transfer period after one
   election timeout, since clients should be able to ride over
   interruptions in service lasting at least that long. If another
   request to transfer leadership is received during a transfer period,
   it will be rejected.
3. During the transfer period, the leader continues to update peers.
   When it receives a response from a peer, it checks if that peer is
   a voter and fully caught up to the leader's log. If it is, and if it
   is the designated successor if one was provided, the leader signals
   the peer to start an election, which it should win. If no eligible
   successor appears, the transfer period expires and the leader resumes
   normal operation.

This is an improvement over the current leader step down method, which
causes the leader to simply relinquish leadership and snooze its
election timer for an extra long period, so another voter will likely
become leader. Leadership transfer should usually be much faster and it
allows the client to select the new leader among current voters.
However, note that it does not provide strictly better guarantees- it is
still possible that leadership will not be transferred.

I ran TestRepeatLeaderStepDown and TestGracefulLeaderStepDown 1000 times
and 200 times each, in debug and TSAN modes, with 4 stress threads, and
saw no failures.

Change-Id: Ic97343af9eb349556424c999799ed5e2941f0083
Reviewed-on: http://gerrit.cloudera.org:8080/11251
Tested-by: Kudu Jenkins
Reviewed-by: Mike Percy <mp...@apache.org>
---
M src/kudu/consensus/consensus-test-util.h
M src/kudu/consensus/consensus.proto
M src/kudu/consensus/consensus_peers.cc
M src/kudu/consensus/consensus_peers.h
M src/kudu/consensus/consensus_queue.cc
M src/kudu/consensus/consensus_queue.h
M src/kudu/consensus/peer_manager.cc
M src/kudu/consensus/peer_manager.h
M src/kudu/consensus/raft_consensus.cc
M src/kudu/consensus/raft_consensus.h
M src/kudu/integration-tests/raft_consensus-itest.cc
M src/kudu/tools/kudu-admin-test.cc
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tools/tool_action_tablet.cc
M src/kudu/tools/tool_replica_util.cc
M src/kudu/tools/tool_replica_util.h
M src/kudu/tserver/tablet_service.cc
17 files changed, 839 insertions(+), 67 deletions(-)

Approvals:
  Kudu Jenkins: Verified
  Mike Percy: Looks good to me, approved

-- 
To view, visit http://gerrit.cloudera.org:8080/11251
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: Ic97343af9eb349556424c999799ed5e2941f0083
Gerrit-Change-Number: 11251
Gerrit-PatchSet: 19
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Fengling Wang <fw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>

[kudu-CR] KUDU-2245 Graceful leadership transfer

Posted by "Will Berkeley (Code Review)" <ge...@cloudera.org>.
Hello Fengling Wang, Tidy Bot, Mike Percy, Alexey Serbin, Kudu Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/11251

to look at the new patch set (#7).

Change subject: KUDU-2245 Graceful leadership transfer
......................................................................

KUDU-2245 Graceful leadership transfer

This patch implements graceful leadership transfer, as described in the
original Raft thesis. It has the following steps:

1. An admin client sends a request to the tablet leader for it to
   transfer leadership. The client can indicate a specific voter that it
   wants to become the leader, or it can allow the current leader to
   choose its successor.
2. The leader receives the request and beings a leader transfer period.
   During a leader transfer period, the leader does not accept writes or
   config change requests. This allows followers to catch up to the
   leader. A background timer expires the transfer period after one
   election timeout, since clients should be able to ride over
   interruptions in service lasting at least that long.
3. During the transfer period, the leader continues to update peers.
   When it receives a response from a peer, it checks if that peer is
   a voter and fully caught up to the leader's log. If it is, and if it
   is the designated successor if one was provided, the leader signals
   the peer to start an election, which it should win. If no eligible
   successor appears, the transfer period expires and the leader resumes
   normal operation.

This is an improvement over the current leader step down method, which
causes the leader to simply relinquish leadership and snooze its
election timer for an extra long period, so another voter will likely
become leader. Leadership transfer should usually be much faster and it
allows the client to select the new leader among current voters.
However, note that it does not provide strictly better guarantees- it is
still possible that leadership will not be transferred.

I ran TestRepeatLeaderStepDown and TestGracefulLeaderStepDown 1000 times
and 200 times each, in debug and TSAN modes, with 4 stress threads, and
saw no failures.

Still WIP because I want to
* Run some dist-test loops of the rebalancer tests, which now use
  graceful leadership transfer.
* Add a test or two for bad cases where the leadership transfer period
  should expire.
* Quantify how much faster leadership transfer is than abrupt stepdown,
  at least in a lab environment.

Change-Id: Ic97343af9eb349556424c999799ed5e2941f0083
---
M src/kudu/consensus/consensus-test-util.h
M src/kudu/consensus/consensus.proto
M src/kudu/consensus/consensus_peers.cc
M src/kudu/consensus/consensus_peers.h
M src/kudu/consensus/consensus_queue.cc
M src/kudu/consensus/consensus_queue.h
M src/kudu/consensus/peer_manager.cc
M src/kudu/consensus/peer_manager.h
M src/kudu/consensus/raft_consensus.cc
M src/kudu/consensus/raft_consensus.h
M src/kudu/integration-tests/raft_consensus-itest.cc
M src/kudu/tools/kudu-admin-test.cc
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tools/tool_action_tablet.cc
M src/kudu/tools/tool_replica_util.cc
M src/kudu/tools/tool_replica_util.h
M src/kudu/tserver/tablet_service.cc
17 files changed, 804 insertions(+), 51 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/51/11251/7
-- 
To view, visit http://gerrit.cloudera.org:8080/11251
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ic97343af9eb349556424c999799ed5e2941f0083
Gerrit-Change-Number: 11251
Gerrit-PatchSet: 7
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Fengling Wang <fw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>

[kudu-CR] KUDU-2245 Graceful leadership transfer

Posted by "Will Berkeley (Code Review)" <ge...@cloudera.org>.
Will Berkeley has posted comments on this change. ( http://gerrit.cloudera.org:8080/11251 )

Change subject: KUDU-2245 Graceful leadership transfer
......................................................................


Patch Set 17:

Posted a fix for a simple issue that caused the one failure: https://gerrit.cloudera.org/#/c/11705/.


-- 
To view, visit http://gerrit.cloudera.org:8080/11251
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic97343af9eb349556424c999799ed5e2941f0083
Gerrit-Change-Number: 11251
Gerrit-PatchSet: 17
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Fengling Wang <fw...@cloudera.com>
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Wed, 17 Oct 2018 01:10:56 +0000
Gerrit-HasComments: No

[kudu-CR] KUDU-2245 Graceful leadership transfer

Posted by "Will Berkeley (Code Review)" <ge...@cloudera.org>.
Will Berkeley has posted comments on this change. ( http://gerrit.cloudera.org:8080/11251 )

Change subject: KUDU-2245 Graceful leadership transfer
......................................................................


Patch Set 15: Verified+1

TSAN failures are some kind of isolate failure to download files error in unrealted test, and the release failure also looks unrelated as it's RPC layer.


-- 
To view, visit http://gerrit.cloudera.org:8080/11251
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic97343af9eb349556424c999799ed5e2941f0083
Gerrit-Change-Number: 11251
Gerrit-PatchSet: 15
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Fengling Wang <fw...@cloudera.com>
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Tue, 16 Oct 2018 01:51:28 +0000
Gerrit-HasComments: No

[kudu-CR] KUDU-2245 Graceful leadership transfer

Posted by "Will Berkeley (Code Review)" <ge...@cloudera.org>.
Hello Fengling Wang, Tidy Bot, Mike Percy, Alexey Serbin, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/11251

to look at the new patch set (#16).

Change subject: KUDU-2245 Graceful leadership transfer
......................................................................

KUDU-2245 Graceful leadership transfer

This patch implements graceful leadership transfer, as described in the
original Raft thesis. It has the following steps:

1. An admin client sends a request to the tablet leader for it to
   transfer leadership. The client can indicate a specific voter that it
   wants to become the leader, or it can allow the current leader to
   choose its successor.
2. The leader receives the request and begins a leader transfer period.
   During a leader transfer period, the leader does not accept writes or
   config change requests. This allows followers to catch up to the
   leader. A background timer expires the transfer period after one
   election timeout, since clients should be able to ride over
   interruptions in service lasting at least that long. If another
   request to transfer leadership is received during a transfer period,
   it will be rejected.
3. During the transfer period, the leader continues to update peers.
   When it receives a response from a peer, it checks if that peer is
   a voter and fully caught up to the leader's log. If it is, and if it
   is the designated successor if one was provided, the leader signals
   the peer to start an election, which it should win. If no eligible
   successor appears, the transfer period expires and the leader resumes
   normal operation.

This is an improvement over the current leader step down method, which
causes the leader to simply relinquish leadership and snooze its
election timer for an extra long period, so another voter will likely
become leader. Leadership transfer should usually be much faster and it
allows the client to select the new leader among current voters.
However, note that it does not provide strictly better guarantees- it is
still possible that leadership will not be transferred.

I ran TestRepeatLeaderStepDown and TestGracefulLeaderStepDown 1000 times
and 200 times each, in debug and TSAN modes, with 4 stress threads, and
saw no failures.

Change-Id: Ic97343af9eb349556424c999799ed5e2941f0083
---
M src/kudu/consensus/consensus-test-util.h
M src/kudu/consensus/consensus.proto
M src/kudu/consensus/consensus_peers.cc
M src/kudu/consensus/consensus_peers.h
M src/kudu/consensus/consensus_queue.cc
M src/kudu/consensus/consensus_queue.h
M src/kudu/consensus/peer_manager.cc
M src/kudu/consensus/peer_manager.h
M src/kudu/consensus/raft_consensus.cc
M src/kudu/consensus/raft_consensus.h
M src/kudu/integration-tests/raft_consensus-itest.cc
M src/kudu/tools/kudu-admin-test.cc
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tools/tool_action_tablet.cc
M src/kudu/tools/tool_replica_util.cc
M src/kudu/tools/tool_replica_util.h
M src/kudu/tserver/tablet_service.cc
17 files changed, 829 insertions(+), 67 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/51/11251/16
-- 
To view, visit http://gerrit.cloudera.org:8080/11251
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ic97343af9eb349556424c999799ed5e2941f0083
Gerrit-Change-Number: 11251
Gerrit-PatchSet: 16
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Fengling Wang <fw...@cloudera.com>
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>

[kudu-CR] KUDU-2245 Graceful leadership transfer

Posted by "Will Berkeley (Code Review)" <ge...@cloudera.org>.
Will Berkeley has posted comments on this change. ( http://gerrit.cloudera.org:8080/11251 )

Change subject: KUDU-2245 Graceful leadership transfer
......................................................................


Patch Set 10:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/11251/8/src/kudu/consensus/consensus-test-util.h
File src/kudu/consensus/consensus-test-util.h:

http://gerrit.cloudera.org:8080/#/c/11251/8/src/kudu/consensus/consensus-test-util.h@355
PS8, Line 355:   virtual void StartElection(const RunLeaderElectionRequestPB* /*request*/,
> warning: parameter 'requestoverride' is unused [misc-unused-parameters]
Done


http://gerrit.cloudera.org:8080/#/c/11251/8/src/kudu/consensus/consensus-test-util.h@356
PS8, Line 356:                              RunLeaderElectionResponsePB* /*response*/,
> warning: parameter 'response' is unused [misc-unused-parameters]
Done


http://gerrit.cloudera.org:8080/#/c/11251/8/src/kudu/consensus/consensus-test-util.h@357
PS8, Line 357:                              rpc::RpcController* /*controller*/,
> warning: parameter 'controller' is unused [misc-unused-parameters]
Done


http://gerrit.cloudera.org:8080/#/c/11251/8/src/kudu/consensus/consensus-test-util.h@358
PS8, Line 358:                              const rpc::ResponseCallback& /*callback*/) override {}
> warning: parameter 'callback' is unused [misc-unused-parameters]
Done


http://gerrit.cloudera.org:8080/#/c/11251/10/src/kudu/tools/tool_replica_util.cc
File src/kudu/tools/tool_replica_util.cc:

http://gerrit.cloudera.org:8080/#/c/11251/10/src/kudu/tools/tool_replica_util.cc@310
PS10, Line 310: proxy
'proxy' was constructed at L149, with old leader info. If leadership had actually changed, and the client had had time to realize it, then this RPC would be sent to the wrong server, leading to the test failure seen in the precommit.

Likely, the client always returned stale info when stepdown was abrupt, because there would have been no new leader for the failure period + the time of an election, but with graceful stepdown the period was reduced to just the time of an election.



-- 
To view, visit http://gerrit.cloudera.org:8080/11251
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic97343af9eb349556424c999799ed5e2941f0083
Gerrit-Change-Number: 11251
Gerrit-PatchSet: 10
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Fengling Wang <fw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Fri, 21 Sep 2018 20:22:12 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2245 Graceful leadership transfer

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/11251 )

Change subject: KUDU-2245 Graceful leadership transfer
......................................................................


Patch Set 10:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/11251/10/src/kudu/tools/tool_replica_util.cc
File src/kudu/tools/tool_replica_util.cc:

http://gerrit.cloudera.org:8080/#/c/11251/10/src/kudu/tools/tool_replica_util.cc@310
PS10, Line 310: proxy
> Maybe some naming cleanup  or a bit of refactoring would help? 'from_ts_uui
Yes, that was it -- it seems to be the reason behind my confusion.  Indeed, the leader would change as per L284 in the  original code.



-- 
To view, visit http://gerrit.cloudera.org:8080/11251
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic97343af9eb349556424c999799ed5e2941f0083
Gerrit-Change-Number: 11251
Gerrit-PatchSet: 10
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Fengling Wang <fw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Mon, 24 Sep 2018 17:59:58 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2245 Graceful leadership transfer

Posted by "Will Berkeley (Code Review)" <ge...@cloudera.org>.
Hello Fengling Wang, Tidy Bot, Mike Percy, Alexey Serbin, Kudu Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/11251

to look at the new patch set (#15).

Change subject: KUDU-2245 Graceful leadership transfer
......................................................................

KUDU-2245 Graceful leadership transfer

This patch implements graceful leadership transfer, as described in the
original Raft thesis. It has the following steps:

1. An admin client sends a request to the tablet leader for it to
   transfer leadership. The client can indicate a specific voter that it
   wants to become the leader, or it can allow the current leader to
   choose its successor.
2. The leader receives the request and begins a leader transfer period.
   During a leader transfer period, the leader does not accept writes or
   config change requests. This allows followers to catch up to the
   leader. A background timer expires the transfer period after one
   election timeout, since clients should be able to ride over
   interruptions in service lasting at least that long. If another
   request to transfer leadership is received during a transfer period,
   it will be rejected.
3. During the transfer period, the leader continues to update peers.
   When it receives a response from a peer, it checks if that peer is
   a voter and fully caught up to the leader's log. If it is, and if it
   is the designated successor if one was provided, the leader signals
   the peer to start an election, which it should win. If no eligible
   successor appears, the transfer period expires and the leader resumes
   normal operation.

This is an improvement over the current leader step down method, which
causes the leader to simply relinquish leadership and snooze its
election timer for an extra long period, so another voter will likely
become leader. Leadership transfer should usually be much faster and it
allows the client to select the new leader among current voters.
However, note that it does not provide strictly better guarantees- it is
still possible that leadership will not be transferred.

I ran TestRepeatLeaderStepDown and TestGracefulLeaderStepDown 1000 times
and 200 times each, in debug and TSAN modes, with 4 stress threads, and
saw no failures.

Change-Id: Ic97343af9eb349556424c999799ed5e2941f0083
---
M src/kudu/consensus/consensus-test-util.h
M src/kudu/consensus/consensus.proto
M src/kudu/consensus/consensus_peers.cc
M src/kudu/consensus/consensus_peers.h
M src/kudu/consensus/consensus_queue.cc
M src/kudu/consensus/consensus_queue.h
M src/kudu/consensus/peer_manager.cc
M src/kudu/consensus/peer_manager.h
M src/kudu/consensus/raft_consensus.cc
M src/kudu/consensus/raft_consensus.h
M src/kudu/integration-tests/raft_consensus-itest.cc
M src/kudu/tools/kudu-admin-test.cc
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tools/tool_action_tablet.cc
M src/kudu/tools/tool_replica_util.cc
M src/kudu/tools/tool_replica_util.h
M src/kudu/tserver/tablet_service.cc
17 files changed, 833 insertions(+), 67 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/51/11251/15
-- 
To view, visit http://gerrit.cloudera.org:8080/11251
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ic97343af9eb349556424c999799ed5e2941f0083
Gerrit-Change-Number: 11251
Gerrit-PatchSet: 15
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Fengling Wang <fw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>

[kudu-CR] KUDU-2245 Graceful leadership transfer

Posted by "Will Berkeley (Code Review)" <ge...@cloudera.org>.
Will Berkeley has posted comments on this change. ( http://gerrit.cloudera.org:8080/11251 )

Change subject: KUDU-2245 Graceful leadership transfer
......................................................................


Patch Set 7:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/11251/7/src/kudu/integration-tests/raft_consensus-itest.cc
File src/kudu/integration-tests/raft_consensus-itest.cc:

http://gerrit.cloudera.org:8080/#/c/11251/7/src/kudu/integration-tests/raft_consensus-itest.cc@3001
PS7, Line 3001: TEST_F(RaftConsensusITest, TestLeaderTransferWhenFollowerFallsBehindLeaderGC) {
> Nice test.  How long does it run?  If more than 5 seconds, consider adding
Seems it takes about 6 seconds locally in DEBUG mode. Done.



-- 
To view, visit http://gerrit.cloudera.org:8080/11251
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic97343af9eb349556424c999799ed5e2941f0083
Gerrit-Change-Number: 11251
Gerrit-PatchSet: 7
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Fengling Wang <fw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Fri, 21 Sep 2018 16:57:10 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2245 Graceful leadership transfer

Posted by "Will Berkeley (Code Review)" <ge...@cloudera.org>.
Hello Fengling Wang, Tidy Bot, Mike Percy, Alexey Serbin, Kudu Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/11251

to look at the new patch set (#14).

Change subject: KUDU-2245 Graceful leadership transfer
......................................................................

KUDU-2245 Graceful leadership transfer

This patch implements graceful leadership transfer, as described in the
original Raft thesis. It has the following steps:

1. An admin client sends a request to the tablet leader for it to
   transfer leadership. The client can indicate a specific voter that it
   wants to become the leader, or it can allow the current leader to
   choose its successor.
2. The leader receives the request and begins a leader transfer period.
   During a leader transfer period, the leader does not accept writes or
   config change requests. This allows followers to catch up to the
   leader. A background timer expires the transfer period after one
   election timeout, since clients should be able to ride over
   interruptions in service lasting at least that long. If another
   request to transfer leadership is received during a transfer period,
   it will be rejected.
3. During the transfer period, the leader continues to update peers.
   When it receives a response from a peer, it checks if that peer is
   a voter and fully caught up to the leader's log. If it is, and if it
   is the designated successor if one was provided, the leader signals
   the peer to start an election, which it should win. If no eligible
   successor appears, the transfer period expires and the leader resumes
   normal operation.

This is an improvement over the current leader step down method, which
causes the leader to simply relinquish leadership and snooze its
election timer for an extra long period, so another voter will likely
become leader. Leadership transfer should usually be much faster and it
allows the client to select the new leader among current voters.
However, note that it does not provide strictly better guarantees- it is
still possible that leadership will not be transferred.

I ran TestRepeatLeaderStepDown and TestGracefulLeaderStepDown 1000 times
and 200 times each, in debug and TSAN modes, with 4 stress threads, and
saw no failures.

Change-Id: Ic97343af9eb349556424c999799ed5e2941f0083
---
M src/kudu/consensus/consensus-test-util.h
M src/kudu/consensus/consensus.proto
M src/kudu/consensus/consensus_peers.cc
M src/kudu/consensus/consensus_peers.h
M src/kudu/consensus/consensus_queue.cc
M src/kudu/consensus/consensus_queue.h
M src/kudu/consensus/peer_manager.cc
M src/kudu/consensus/peer_manager.h
M src/kudu/consensus/raft_consensus.cc
M src/kudu/consensus/raft_consensus.h
M src/kudu/integration-tests/raft_consensus-itest.cc
M src/kudu/tools/kudu-admin-test.cc
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tools/tool_action_tablet.cc
M src/kudu/tools/tool_replica_util.cc
M src/kudu/tools/tool_replica_util.h
M src/kudu/tserver/tablet_service.cc
17 files changed, 813 insertions(+), 55 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/51/11251/14
-- 
To view, visit http://gerrit.cloudera.org:8080/11251
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ic97343af9eb349556424c999799ed5e2941f0083
Gerrit-Change-Number: 11251
Gerrit-PatchSet: 14
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Fengling Wang <fw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>

[kudu-CR] KUDU-2245 Graceful leadership transfer

Posted by "Will Berkeley (Code Review)" <ge...@cloudera.org>.
Hello Fengling Wang, Tidy Bot, Mike Percy, Alexey Serbin, Kudu Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/11251

to look at the new patch set (#13).

Change subject: KUDU-2245 Graceful leadership transfer
......................................................................

KUDU-2245 Graceful leadership transfer

This patch implements graceful leadership transfer, as described in the
original Raft thesis. It has the following steps:

1. An admin client sends a request to the tablet leader for it to
   transfer leadership. The client can indicate a specific voter that it
   wants to become the leader, or it can allow the current leader to
   choose its successor.
2. The leader receives the request and beings a leader transfer period.
   During a leader transfer period, the leader does not accept writes or
   config change requests. This allows followers to catch up to the
   leader. A background timer expires the transfer period after one
   election timeout, since clients should be able to ride over
   interruptions in service lasting at least that long.
3. During the transfer period, the leader continues to update peers.
   When it receives a response from a peer, it checks if that peer is
   a voter and fully caught up to the leader's log. If it is, and if it
   is the designated successor if one was provided, the leader signals
   the peer to start an election, which it should win. If no eligible
   successor appears, the transfer period expires and the leader resumes
   normal operation.

This is an improvement over the current leader step down method, which
causes the leader to simply relinquish leadership and snooze its
election timer for an extra long period, so another voter will likely
become leader. Leadership transfer should usually be much faster and it
allows the client to select the new leader among current voters.
However, note that it does not provide strictly better guarantees- it is
still possible that leadership will not be transferred.

I ran TestRepeatLeaderStepDown and TestGracefulLeaderStepDown 1000 times
and 200 times each, in debug and TSAN modes, with 4 stress threads, and
saw no failures.

Still WIP because I want to
* Run some dist-test loops of the rebalancer tests, which now use
  graceful leadership transfer.
* Add a test or two for bad cases where the leadership transfer period
  should expire.
* Quantify how much faster leadership transfer is than abrupt stepdown,
  at least in a lab environment.

Change-Id: Ic97343af9eb349556424c999799ed5e2941f0083
---
M src/kudu/consensus/consensus-test-util.h
M src/kudu/consensus/consensus.proto
M src/kudu/consensus/consensus_peers.cc
M src/kudu/consensus/consensus_peers.h
M src/kudu/consensus/consensus_queue.cc
M src/kudu/consensus/consensus_queue.h
M src/kudu/consensus/peer_manager.cc
M src/kudu/consensus/peer_manager.h
M src/kudu/consensus/raft_consensus.cc
M src/kudu/consensus/raft_consensus.h
M src/kudu/integration-tests/raft_consensus-itest.cc
M src/kudu/tools/kudu-admin-test.cc
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tools/tool_action_tablet.cc
M src/kudu/tools/tool_replica_util.cc
M src/kudu/tools/tool_replica_util.h
M src/kudu/tserver/tablet_service.cc
17 files changed, 816 insertions(+), 55 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/51/11251/13
-- 
To view, visit http://gerrit.cloudera.org:8080/11251
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ic97343af9eb349556424c999799ed5e2941f0083
Gerrit-Change-Number: 11251
Gerrit-PatchSet: 13
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Fengling Wang <fw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>

[kudu-CR] KUDU-2245 Graceful leadership transfer

Posted by "Will Berkeley (Code Review)" <ge...@cloudera.org>.
Will Berkeley has posted comments on this change. ( http://gerrit.cloudera.org:8080/11251 )

Change subject: KUDU-2245 Graceful leadership transfer
......................................................................


Patch Set 13: Verified+1

Precommit failure is KUDU-2576.


-- 
To view, visit http://gerrit.cloudera.org:8080/11251
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic97343af9eb349556424c999799ed5e2941f0083
Gerrit-Change-Number: 11251
Gerrit-PatchSet: 13
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Fengling Wang <fw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Mon, 01 Oct 2018 20:07:12 +0000
Gerrit-HasComments: No

[kudu-CR] KUDU-2245 Graceful leadership transfer

Posted by "Will Berkeley (Code Review)" <ge...@cloudera.org>.
Will Berkeley has posted comments on this change. ( http://gerrit.cloudera.org:8080/11251 )

Change subject: KUDU-2245 Graceful leadership transfer
......................................................................


Patch Set 7:

Not sure why it failed...it's hard to track that in the logs. I did submit a patch to make it easier to see why tool commands failed: https://gerrit.cloudera.org/#/c/11411/.


-- 
To view, visit http://gerrit.cloudera.org:8080/11251
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic97343af9eb349556424c999799ed5e2941f0083
Gerrit-Change-Number: 11251
Gerrit-PatchSet: 7
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Fengling Wang <fw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Mon, 10 Sep 2018 19:50:35 +0000
Gerrit-HasComments: No

[kudu-CR] KUDU-2245 Graceful leadership transfer

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/11251 )

Change subject: KUDU-2245 Graceful leadership transfer
......................................................................


Patch Set 11:

> (9 comments)
 > 
 > I tried doing a bit of a refactor on the CheckMoveComplete code but
 > maybe I do not understand it as well as I thought. Maybe we should
 > chat about it tomorrow?

After looking at the step-by-step explanation I think it makes sense, yep.

As we discussed offline, it makes sense to separate that refactoring into its own patch.  That will be easier to track and review, especially given the fact that it's a bug.


-- 
To view, visit http://gerrit.cloudera.org:8080/11251
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic97343af9eb349556424c999799ed5e2941f0083
Gerrit-Change-Number: 11251
Gerrit-PatchSet: 11
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Fengling Wang <fw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Mon, 24 Sep 2018 17:58:10 +0000
Gerrit-HasComments: No

[kudu-CR] KUDU-2245 Graceful leadership transfer

Posted by "Mike Percy (Code Review)" <ge...@cloudera.org>.
Mike Percy has posted comments on this change. ( http://gerrit.cloudera.org:8080/11251 )

Change subject: KUDU-2245 Graceful leadership transfer
......................................................................


Patch Set 13:

(13 comments)

I don't see a place where we lazily choose the new leader UUID if it's not specified in the graceful stepdown call. I think we should allow the leader to choose who gets to take over if it's not specified.

http://gerrit.cloudera.org:8080/#/c/11251/13//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/11251/13//COMMIT_MSG@16
PS13, Line 16: beings
begins


http://gerrit.cloudera.org:8080/#/c/11251/13//COMMIT_MSG@42
PS13, Line 42: Still WIP
Still still WIP?


http://gerrit.cloudera.org:8080/#/c/11251/13/src/kudu/consensus/consensus_peers.cc
File src/kudu/consensus/consensus_peers.cc:

http://gerrit.cloudera.org:8080/#/c/11251/13/src/kudu/consensus/consensus_peers.cc@501
PS13, Line 501:                                  const rpc::ResponseCallback& /*callback*/) {
              :   controller->set_timeout(MonoDelta::FromMilliseconds(FLAGS_consensus_rpc_timeout_ms));
              :   consensus_proxy_->RunLeaderElection(*request, response, controller);
we are mixing async and sync calls here but ignoring the output from the sync call -- seems wrong


http://gerrit.cloudera.org:8080/#/c/11251/13/src/kudu/consensus/consensus_queue.h
File src/kudu/consensus/consensus_queue.h:

http://gerrit.cloudera.org:8080/#/c/11251/13/src/kudu/consensus/consensus_queue.h@556
PS13, Line 556: TODO(unknown)
TODO(todd)


http://gerrit.cloudera.org:8080/#/c/11251/13/src/kudu/consensus/consensus_queue.cc
File src/kudu/consensus/consensus_queue.cc:

http://gerrit.cloudera.org:8080/#/c/11251/13/src/kudu/consensus/consensus_queue.cc@1071
PS13, Line 1071: DVLOG
how about just VLOG


http://gerrit.cloudera.org:8080/#/c/11251/13/src/kudu/consensus/consensus_queue.cc@1451
PS13, Line 1451:   WARN_NOT_OK(raft_pool_observers_token_->SubmitClosure(
add: DCHECK(queue_lock_.is_locked());


http://gerrit.cloudera.org:8080/#/c/11251/13/src/kudu/consensus/raft_consensus.cc
File src/kudu/consensus/raft_consensus.cc:

http://gerrit.cloudera.org:8080/#/c/11251/13/src/kudu/consensus/raft_consensus.cc@598
PS13, Line 598: new_leader_uuid.get()
nit: it's more idiomatic to use *new_leader_uuid with optionals vs. get(), here and below


http://gerrit.cloudera.org:8080/#/c/11251/13/src/kudu/consensus/raft_consensus.cc@601
PS13, Line 601:       return Status::OK();
Is there a use case where we want to support this? Otherwise it seems like an illegal request and should be rejected.


http://gerrit.cloudera.org:8080/#/c/11251/13/src/kudu/consensus/raft_consensus.cc@608
PS13, Line 608:       resp->mutable_error()->set_code(TabletServerErrorPB::UNKNOWN_ERROR);
              :       StatusToPB(Status::InvalidArgument(msg), resp->mutable_error()->mutable_status());
              :       // We return OK so that the tablet service won't overwrite the error code.
              :       return Status::OK();
why not just return Status::InvalidArgument in this case and let TabletService take care of the rest?


http://gerrit.cloudera.org:8080/#/c/11251/13/src/kudu/consensus/raft_consensus.cc@614
PS13, Line 614:   BeginLeaderTransferPeriodUnlocked(new_leader_uuid);
Is there something that prevents us from clobbering an existing leadership transfer period?


http://gerrit.cloudera.org:8080/#/c/11251/13/src/kudu/consensus/raft_consensus.cc@622
PS13, Line 622: transfer_period_timer_->Snooze()
What is the use case for this?


http://gerrit.cloudera.org:8080/#/c/11251/13/src/kudu/consensus/raft_consensus.cc@631
PS13, Line 631:   leader_transfer_in_progress_.Store(true, kMemOrderAcquire);
how about:

  if (!leader_transfer_in_progresss_.compare_exchange_strong(false, true)) {
    return Status::IllegalArgument(Substitute("leadership transfer for $0 already in progress", tablet_id_));
  }


http://gerrit.cloudera.org:8080/#/c/11251/13/src/kudu/tools/kudu-admin-test.cc
File src/kudu/tools/kudu-admin-test.cc:

http://gerrit.cloudera.org:8080/#/c/11251/13/src/kudu/tools/kudu-admin-test.cc@1448
PS13, Line 1448: TEST_F(AdminCliTest, TestSimultaneousLeaderTransferAndAbruptStepdown) {
This test could use an explanation -- I'm a little confused about the purpose of this test and the expected end result.



-- 
To view, visit http://gerrit.cloudera.org:8080/11251
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic97343af9eb349556424c999799ed5e2941f0083
Gerrit-Change-Number: 11251
Gerrit-PatchSet: 13
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Fengling Wang <fw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Fri, 12 Oct 2018 21:07:54 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2245 Graceful leadership transfer

Posted by "Will Berkeley (Code Review)" <ge...@cloudera.org>.
Hello Fengling Wang, Tidy Bot, Mike Percy, Alexey Serbin, Kudu Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/11251

to look at the new patch set (#11).

Change subject: KUDU-2245 Graceful leadership transfer
......................................................................

KUDU-2245 Graceful leadership transfer

This patch implements graceful leadership transfer, as described in the
original Raft thesis. It has the following steps:

1. An admin client sends a request to the tablet leader for it to
   transfer leadership. The client can indicate a specific voter that it
   wants to become the leader, or it can allow the current leader to
   choose its successor.
2. The leader receives the request and beings a leader transfer period.
   During a leader transfer period, the leader does not accept writes or
   config change requests. This allows followers to catch up to the
   leader. A background timer expires the transfer period after one
   election timeout, since clients should be able to ride over
   interruptions in service lasting at least that long.
3. During the transfer period, the leader continues to update peers.
   When it receives a response from a peer, it checks if that peer is
   a voter and fully caught up to the leader's log. If it is, and if it
   is the designated successor if one was provided, the leader signals
   the peer to start an election, which it should win. If no eligible
   successor appears, the transfer period expires and the leader resumes
   normal operation.

This is an improvement over the current leader step down method, which
causes the leader to simply relinquish leadership and snooze its
election timer for an extra long period, so another voter will likely
become leader. Leadership transfer should usually be much faster and it
allows the client to select the new leader among current voters.
However, note that it does not provide strictly better guarantees- it is
still possible that leadership will not be transferred.

I ran TestRepeatLeaderStepDown and TestGracefulLeaderStepDown 1000 times
and 200 times each, in debug and TSAN modes, with 4 stress threads, and
saw no failures.

Still WIP because I want to
* Run some dist-test loops of the rebalancer tests, which now use
  graceful leadership transfer.
* Add a test or two for bad cases where the leadership transfer period
  should expire.
* Quantify how much faster leadership transfer is than abrupt stepdown,
  at least in a lab environment.

Change-Id: Ic97343af9eb349556424c999799ed5e2941f0083
---
M src/kudu/consensus/consensus-test-util.h
M src/kudu/consensus/consensus.proto
M src/kudu/consensus/consensus_peers.cc
M src/kudu/consensus/consensus_peers.h
M src/kudu/consensus/consensus_queue.cc
M src/kudu/consensus/consensus_queue.h
M src/kudu/consensus/peer_manager.cc
M src/kudu/consensus/peer_manager.h
M src/kudu/consensus/raft_consensus.cc
M src/kudu/consensus/raft_consensus.h
M src/kudu/integration-tests/raft_consensus-itest.cc
M src/kudu/tools/kudu-admin-test.cc
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tools/tool_action_tablet.cc
M src/kudu/tools/tool_replica_util.cc
M src/kudu/tools/tool_replica_util.h
M src/kudu/tserver/tablet_service.cc
17 files changed, 859 insertions(+), 74 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/51/11251/11
-- 
To view, visit http://gerrit.cloudera.org:8080/11251
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ic97343af9eb349556424c999799ed5e2941f0083
Gerrit-Change-Number: 11251
Gerrit-PatchSet: 11
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Fengling Wang <fw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>

[kudu-CR] KUDU-2245 Graceful leadership transfer

Posted by "Will Berkeley (Code Review)" <ge...@cloudera.org>.
Will Berkeley has removed Kudu Jenkins from this change.  ( http://gerrit.cloudera.org:8080/11251 )

Change subject: KUDU-2245 Graceful leadership transfer
......................................................................


Removed reviewer Kudu Jenkins with the following votes:

* Verified-1 by Kudu Jenkins (120)
-- 
To view, visit http://gerrit.cloudera.org:8080/11251
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: deleteReviewer
Gerrit-Change-Id: Ic97343af9eb349556424c999799ed5e2941f0083
Gerrit-Change-Number: 11251
Gerrit-PatchSet: 15
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Fengling Wang <fw...@cloudera.com>
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>

[kudu-CR] [WIP] KUDU-2245 Graceful leadership transfer

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/11251 )

Change subject: [WIP] KUDU-2245 Graceful leadership transfer
......................................................................


Patch Set 4:

(14 comments)

Just skimming through.  I'm going to take another look tomorrow.

http://gerrit.cloudera.org:8080/#/c/11251/2/src/kudu/consensus/consensus.proto
File src/kudu/consensus/consensus.proto:

http://gerrit.cloudera.org:8080/#/c/11251/2/src/kudu/consensus/consensus.proto@462
PS2, Line 462: Do not set in ABRUPT mode.
nit: Does it mean it's not effective in ABRUPT mode (i.e. ignored)? 

Maybe, change to

'Ignored in ABRUPT mode, do not set.'


http://gerrit.cloudera.org:8080/#/c/11251/2/src/kudu/consensus/consensus_peers.h
File src/kudu/consensus/consensus_peers.h:

http://gerrit.cloudera.org:8080/#/c/11251/2/src/kudu/consensus/consensus_peers.h@82
PS2, Line 82: does not count as a pending request
Not sure I understand this.  Did you mean pending Raft config change?


http://gerrit.cloudera.org:8080/#/c/11251/2/src/kudu/consensus/consensus_peers.cc
File src/kudu/consensus/consensus_peers.cc:

http://gerrit.cloudera.org:8080/#/c/11251/2/src/kudu/consensus/consensus_peers.cc@293
PS2, Line 293: &
nit: is it even needed?  I would expect we don't need to capture anything for that.


http://gerrit.cloudera.org:8080/#/c/11251/4/src/kudu/consensus/consensus_queue.cc
File src/kudu/consensus/consensus_queue.cc:

http://gerrit.cloudera.org:8080/#/c/11251/4/src/kudu/consensus/consensus_queue.cc@1040
PS4, Line 1040: TrackedPeer* peer
Maybe, a const reference instead of a pointer?


http://gerrit.cloudera.org:8080/#/c/11251/4/src/kudu/consensus/consensus_queue.cc@1455
PS4, Line 1455: Unable to notify RaftConsensus of peer to promote.
This messages seems to be outdated.


http://gerrit.cloudera.org:8080/#/c/11251/4/src/kudu/consensus/peer_manager.h
File src/kudu/consensus/peer_manager.h:

http://gerrit.cloudera.org:8080/#/c/11251/4/src/kudu/consensus/peer_manager.h@66
PS4, Line 66: void
Why not to return Status ?  That signature would be more robust, and callers, if needed, might explicitly ignore the result status.


http://gerrit.cloudera.org:8080/#/c/11251/4/src/kudu/consensus/raft_consensus.cc
File src/kudu/consensus/raft_consensus.cc:

http://gerrit.cloudera.org:8080/#/c/11251/4/src/kudu/consensus/raft_consensus.cc@608
PS4, Line 608: BeginLeaderTransferPeriodUnlocked(new_leader_uuid);
Is it worth verifying that new_leader_uuid != peer_uuid() prior to kicking off the process?


http://gerrit.cloudera.org:8080/#/c/11251/4/src/kudu/consensus/raft_consensus.cc@984
PS4, Line 984:     LOG_WITH_PREFIX_UNLOCKED(INFO) << "Not signalling peer " << peer_uuid
             :                                    << "to start an election: it's not a voter "
             :                                    << "in the active config.";
Add 'return;' after logging ?


http://gerrit.cloudera.org:8080/#/c/11251/4/src/kudu/consensus/raft_consensus.cc@990
PS4, Line 990: peer_manager_->StartElection(peer_uuid);
What if StartElection() returns non-OK status?


http://gerrit.cloudera.org:8080/#/c/11251/4/src/kudu/tools/kudu-admin-test.cc
File src/kudu/tools/kudu-admin-test.cc:

PS4: 
As discussed offline, it's worth adding some negative testcases as well:

* attempt transferring leadership to a replica that has just been evicted from the config
* attempt transferring leadership to non-voter replica
* attempt transferring leadership to a replica which fell behind WAL segment GC threshold

Also, what it the behavior when asking a leader during transfer period to abruptly step down?


http://gerrit.cloudera.org:8080/#/c/11251/4/src/kudu/tools/kudu-admin-test.cc@1282
PS4, Line 1282: ASSERT_TRUE(s.ok()) << s.ToString()
Here and elsewhere in this patch: if something goes wrong, it helps a lot to have the stderr printed out as well.


http://gerrit.cloudera.org:8080/#/c/11251/4/src/kudu/tools/kudu-admin-test.cc@1288
PS4, Line 1288:     ASSERT_OK(GetLeaderReplicaWithRetries(tablet_id_, &master_observed_leader));
              :     ASSERT_TRUE(ContainsKey(possible_new_leaders, master_observed_leader->uuid()));
Shouldn't RunKuduTool be under this ASSERT_EVENTUALLY as well?  Don't you expect it to be flaky a bit otherwise?


http://gerrit.cloudera.org:8080/#/c/11251/4/src/kudu/tools/tool_replica_util.h
File src/kudu/tools/tool_replica_util.h:

http://gerrit.cloudera.org:8080/#/c/11251/4/src/kudu/tools/tool_replica_util.h@57
PS4, Line 57: that the replica with UUID 'leader_uuid' step down.
nit: the replica with UUID 'leader_uuid' to step down.


http://gerrit.cloudera.org:8080/#/c/11251/4/src/kudu/tserver/tablet_service.cc
File src/kudu/tserver/tablet_service.cc:

http://gerrit.cloudera.org:8080/#/c/11251/4/src/kudu/tserver/tablet_service.cc@1146
PS4, Line 1146: uuid
nit: UUID



-- 
To view, visit http://gerrit.cloudera.org:8080/11251
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic97343af9eb349556424c999799ed5e2941f0083
Gerrit-Change-Number: 11251
Gerrit-PatchSet: 4
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Fengling Wang <fw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Comment-Date: Wed, 29 Aug 2018 05:09:18 +0000
Gerrit-HasComments: Yes

[kudu-CR] [WIP] KUDU-2245 Graceful leadership transfer

Posted by "Will Berkeley (Code Review)" <ge...@cloudera.org>.
Hello Fengling Wang, Tidy Bot, Mike Percy, Alexey Serbin, Kudu Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/11251

to look at the new patch set (#5).

Change subject: [WIP] KUDU-2245 Graceful leadership transfer
......................................................................

[WIP] KUDU-2245 Graceful leadership transfer

This patch implements graceful leadership transfer, as described in the
original Raft thesis. It has the following steps:

1. An admin client sends a request to the tablet leader for it to
   transfer leadership. The client can indicate a specific voter that it
   wants to become the leader, or it can allow the current leader to
   choose its successor.
2. The leader receives the request and beings a leader transfer period.
   During a leader transfer period, the leader does not accept writes or
   config change requests. This allows followers to catch up to the
   leader. A background timer expires the transfer period after one
   election timeout, since clients should be able to ride over
   interruptions in service lasting at least that long.
3. During the transfer period, the leader continues to update peers.
   When it receives a response from a peer, it checks if that peer is
   a voter and fully caught up to the leader's log. If it is, and if it
   is the designated successor if one was provided, the leader signals
   the peer to start an election, which it should win. If no eligible
   successor appears, the transfer period expires and the leader resumes
   normal operation.

This is an improvement over the current leader step down method, which
causes the leader to simply relinquish leadership and snooze its
election timer for an extra long period, so another voter will likely
become leader. Leadership transfer should usually be much faster and it
allows the client to select the new leader among current voters.
However, note that it does not provide strictly better guarantees- it is
still possible that leadership will not be transferred.

I ran TestRepeatLeaderStepDown and TestGracefulLeaderStepDown 1000 times
and 200 times each, in debug and TSAN modes, with 4 stress threads, and
saw no failures.

Still WIP because I want to
* Run some dist-test loops of the rebalancer tests, which now use
  graceful leadership transfer.
* Add a test or two for bad cases where the leadership transfer period
  should expire.
* Quantify how much faster leadership transfer is than abrupt stepdown,
  at least in a lab environment.

Change-Id: Ic97343af9eb349556424c999799ed5e2941f0083
---
M src/kudu/consensus/consensus-test-util.h
M src/kudu/consensus/consensus.proto
M src/kudu/consensus/consensus_peers.cc
M src/kudu/consensus/consensus_peers.h
M src/kudu/consensus/consensus_queue.cc
M src/kudu/consensus/consensus_queue.h
M src/kudu/consensus/peer_manager.cc
M src/kudu/consensus/peer_manager.h
M src/kudu/consensus/raft_consensus.cc
M src/kudu/consensus/raft_consensus.h
M src/kudu/integration-tests/raft_consensus-itest.cc
M src/kudu/tools/kudu-admin-test.cc
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tools/tool_action_tablet.cc
M src/kudu/tools/tool_replica_util.cc
M src/kudu/tools/tool_replica_util.h
M src/kudu/tserver/tablet_service.cc
17 files changed, 804 insertions(+), 51 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/51/11251/5
-- 
To view, visit http://gerrit.cloudera.org:8080/11251
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ic97343af9eb349556424c999799ed5e2941f0083
Gerrit-Change-Number: 11251
Gerrit-PatchSet: 5
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Fengling Wang <fw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>

[kudu-CR] KUDU-2245 Graceful leadership transfer

Posted by "Mike Percy (Code Review)" <ge...@cloudera.org>.
Mike Percy has posted comments on this change. ( http://gerrit.cloudera.org:8080/11251 )

Change subject: KUDU-2245 Graceful leadership transfer
......................................................................


Patch Set 17:

(5 comments)

Finished going through the tests, this all looks good but I wonder if we are missing coverage in the graceful, non-designated case.

http://gerrit.cloudera.org:8080/#/c/11251/17/src/kudu/consensus/consensus_queue.h
File src/kudu/consensus/consensus_queue.h:

http://gerrit.cloudera.org:8080/#/c/11251/17/src/kudu/consensus/consensus_queue.h@468
PS17, Line 468:   // Determine if 'peer' is fully caught up to the leader. If it is, notify
              :   // observers.
              :   void DetermineIfCaughtUp(const TrackedPeer& peer, const ConsensusStatusPB& status);
Let's rename this method to something like TransferLeadershipIfNeeded() and have the doc comment explain its full responsibilities, such as:

  // If there is a graceful leadership change underway, notify queue observers
  // to initiate leadership transfer to the specified peer under the following
  // conditions:
  // * 'peer' has fully caught up to the leader
  // * 'peer' is the designated successor, or no successor was designated
  void TransferLeadershipIfNeeded(const TrackedPeer& peer, const ConsensusStatusPB& status);


http://gerrit.cloudera.org:8080/#/c/11251/17/src/kudu/integration-tests/raft_consensus-itest.cc
File src/kudu/integration-tests/raft_consensus-itest.cc:

http://gerrit.cloudera.org:8080/#/c/11251/17/src/kudu/integration-tests/raft_consensus-itest.cc@3024
PS17, Line 3024: TEST_F(RaftConsensusITest, TestLeaderTransferWhenFollowerFallsBehindLeaderGC) {
please add a test comment with something like: // Designating graceful leadership transfer to a follower that cannot catch up should eventually fail.


http://gerrit.cloudera.org:8080/#/c/11251/17/src/kudu/tools/kudu-admin-test.cc
File src/kudu/tools/kudu-admin-test.cc:

http://gerrit.cloudera.org:8080/#/c/11251/17/src/kudu/tools/kudu-admin-test.cc@1315
PS17, Line 1315: TEST_F(AdminCliTest, TestLeaderTransferToEvictedPeer) {
please add a test comment: // Leader should reject requests to transfer leadership to a non-member of the config.


http://gerrit.cloudera.org:8080/#/c/11251/17/src/kudu/tools/kudu-admin-test.cc@1377
PS17, Line 1377: TEST_F(AdminCliTest, TestLeaderTransferToNonVoter) {
please add a test comment: // Leader should reject requests to transfer leadership to a non-voter of the config.


http://gerrit.cloudera.org:8080/#/c/11251/17/src/kudu/tools/kudu-admin-test.cc@1571
PS17, Line 1571:       "leader_step_down",
This seems to be the only test we have for non-designated graceful leader step down. Can we add an assertion to check that the leader actually changed, in the graceful case, or add a test for non-designated graceful stepdown?



-- 
To view, visit http://gerrit.cloudera.org:8080/11251
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic97343af9eb349556424c999799ed5e2941f0083
Gerrit-Change-Number: 11251
Gerrit-PatchSet: 17
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Fengling Wang <fw...@cloudera.com>
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Wed, 17 Oct 2018 19:45:25 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2245 Graceful leadership transfer

Posted by "Will Berkeley (Code Review)" <ge...@cloudera.org>.
Hello Fengling Wang, Tidy Bot, Mike Percy, Alexey Serbin, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/11251

to look at the new patch set (#18).

Change subject: KUDU-2245 Graceful leadership transfer
......................................................................

KUDU-2245 Graceful leadership transfer

This patch implements graceful leadership transfer, as described in the
original Raft thesis. It has the following steps:

1. An admin client sends a request to the tablet leader for it to
   transfer leadership. The client can indicate a specific voter that it
   wants to become the leader, or it can allow the current leader to
   choose its successor.
2. The leader receives the request and begins a leader transfer period.
   During a leader transfer period, the leader does not accept writes or
   config change requests. This allows followers to catch up to the
   leader. A background timer expires the transfer period after one
   election timeout, since clients should be able to ride over
   interruptions in service lasting at least that long. If another
   request to transfer leadership is received during a transfer period,
   it will be rejected.
3. During the transfer period, the leader continues to update peers.
   When it receives a response from a peer, it checks if that peer is
   a voter and fully caught up to the leader's log. If it is, and if it
   is the designated successor if one was provided, the leader signals
   the peer to start an election, which it should win. If no eligible
   successor appears, the transfer period expires and the leader resumes
   normal operation.

This is an improvement over the current leader step down method, which
causes the leader to simply relinquish leadership and snooze its
election timer for an extra long period, so another voter will likely
become leader. Leadership transfer should usually be much faster and it
allows the client to select the new leader among current voters.
However, note that it does not provide strictly better guarantees- it is
still possible that leadership will not be transferred.

I ran TestRepeatLeaderStepDown and TestGracefulLeaderStepDown 1000 times
and 200 times each, in debug and TSAN modes, with 4 stress threads, and
saw no failures.

Change-Id: Ic97343af9eb349556424c999799ed5e2941f0083
---
M src/kudu/consensus/consensus-test-util.h
M src/kudu/consensus/consensus.proto
M src/kudu/consensus/consensus_peers.cc
M src/kudu/consensus/consensus_peers.h
M src/kudu/consensus/consensus_queue.cc
M src/kudu/consensus/consensus_queue.h
M src/kudu/consensus/peer_manager.cc
M src/kudu/consensus/peer_manager.h
M src/kudu/consensus/raft_consensus.cc
M src/kudu/consensus/raft_consensus.h
M src/kudu/integration-tests/raft_consensus-itest.cc
M src/kudu/tools/kudu-admin-test.cc
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tools/tool_action_tablet.cc
M src/kudu/tools/tool_replica_util.cc
M src/kudu/tools/tool_replica_util.h
M src/kudu/tserver/tablet_service.cc
17 files changed, 839 insertions(+), 67 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/51/11251/18
-- 
To view, visit http://gerrit.cloudera.org:8080/11251
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ic97343af9eb349556424c999799ed5e2941f0083
Gerrit-Change-Number: 11251
Gerrit-PatchSet: 18
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Fengling Wang <fw...@cloudera.com>
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>

[kudu-CR] KUDU-2245 Graceful leadership transfer

Posted by "Will Berkeley (Code Review)" <ge...@cloudera.org>.
Will Berkeley has posted comments on this change. ( http://gerrit.cloudera.org:8080/11251 )

Change subject: KUDU-2245 Graceful leadership transfer
......................................................................


Patch Set 17:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/11251/17/src/kudu/consensus/consensus_queue.h
File src/kudu/consensus/consensus_queue.h:

http://gerrit.cloudera.org:8080/#/c/11251/17/src/kudu/consensus/consensus_queue.h@468
PS17, Line 468:   // Determine if 'peer' is fully caught up to the leader. If it is, notify
              :   // observers.
              :   void DetermineIfCaughtUp(const TrackedPeer& peer, const ConsensusStatusPB& status);
> Let's rename this method to something like TransferLeadershipIfNeeded() and
Done


http://gerrit.cloudera.org:8080/#/c/11251/17/src/kudu/integration-tests/raft_consensus-itest.cc
File src/kudu/integration-tests/raft_consensus-itest.cc:

http://gerrit.cloudera.org:8080/#/c/11251/17/src/kudu/integration-tests/raft_consensus-itest.cc@3024
PS17, Line 3024: TEST_F(RaftConsensusITest, TestLeaderTransferWhenFollowerFallsBehindLeaderGC) {
> please add a test comment with something like: // Designating graceful lead
Done


http://gerrit.cloudera.org:8080/#/c/11251/17/src/kudu/tools/kudu-admin-test.cc
File src/kudu/tools/kudu-admin-test.cc:

http://gerrit.cloudera.org:8080/#/c/11251/17/src/kudu/tools/kudu-admin-test.cc@1571
PS17, Line 1571:       "leader_step_down",
> didn't see a reply posted yet, not sure if you saw this comment
(facepalm)


http://gerrit.cloudera.org:8080/#/c/11251/17/src/kudu/tools/kudu-admin-test.cc@1571
PS17, Line 1571:       "leader_step_down",
> This seems to be the only test we have for non-designated graceful leader s
Graceful stepdown does not guarantee leadership transfer any more than abrupt stepdown does- if no peer catches up during the transfer period, the leader will resume its normal operation.

See L1297 for a test of graceful transfer without designating a successor. In that case, failure detection is disabled and there are no other writes, so leader transfer should succeed almost surely within multi-second timeouts even with TSAN / stress thread slowdowns.



-- 
To view, visit http://gerrit.cloudera.org:8080/11251
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic97343af9eb349556424c999799ed5e2941f0083
Gerrit-Change-Number: 11251
Gerrit-PatchSet: 17
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Fengling Wang <fw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Thu, 18 Oct 2018 00:17:26 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2245 Graceful leadership transfer

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/11251 )

Change subject: KUDU-2245 Graceful leadership transfer
......................................................................


Patch Set 10:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/11251/10/src/kudu/tools/tool_replica_util.cc
File src/kudu/tools/tool_replica_util.cc:

http://gerrit.cloudera.org:8080/#/c/11251/10/src/kudu/tools/tool_replica_util.cc@296
PS10, Line 296:     // We should consult the leader for the most up-to-date consensus state.
              :     string new_leader_uuid;
              :     HostPort new_leader_hp;
              :     RETURN_NOT_OK(GetTabletLeader(client, tablet_id, &new_leader_uuid, &new_leader_hp));
I think this is racy, regardless of the way how leadership changes -- gracefully or abrupt.  Why not to rely on the logic at line 313 instead?


http://gerrit.cloudera.org:8080/#/c/11251/10/src/kudu/tools/tool_replica_util.cc@305
PS10, Line 305: leader_uuid
If leader changed and new leader UUID is detected, why still to keep this old leader uuid in the if() clause?


http://gerrit.cloudera.org:8080/#/c/11251/10/src/kudu/tools/tool_replica_util.cc@310
PS10, Line 310: proxy
> 'proxy' was constructed at L149, with old leader info. If leadership had ac
I think this code is executed when no DoLeaderStepDown() was called above (see from_ts_uuid != leader_uuid).  I'm not sure that analysis of pre-commit failure is correct.

Also, if the leadership has changed, wouldn't the verification below  return Status::Incomplete?



-- 
To view, visit http://gerrit.cloudera.org:8080/11251
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic97343af9eb349556424c999799ed5e2941f0083
Gerrit-Change-Number: 11251
Gerrit-PatchSet: 10
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Fengling Wang <fw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Mon, 24 Sep 2018 02:12:39 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2245 Graceful leadership transfer

Posted by "Will Berkeley (Code Review)" <ge...@cloudera.org>.
Hello Fengling Wang, Tidy Bot, Mike Percy, Alexey Serbin, Kudu Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/11251

to look at the new patch set (#10).

Change subject: KUDU-2245 Graceful leadership transfer
......................................................................

KUDU-2245 Graceful leadership transfer

This patch implements graceful leadership transfer, as described in the
original Raft thesis. It has the following steps:

1. An admin client sends a request to the tablet leader for it to
   transfer leadership. The client can indicate a specific voter that it
   wants to become the leader, or it can allow the current leader to
   choose its successor.
2. The leader receives the request and beings a leader transfer period.
   During a leader transfer period, the leader does not accept writes or
   config change requests. This allows followers to catch up to the
   leader. A background timer expires the transfer period after one
   election timeout, since clients should be able to ride over
   interruptions in service lasting at least that long.
3. During the transfer period, the leader continues to update peers.
   When it receives a response from a peer, it checks if that peer is
   a voter and fully caught up to the leader's log. If it is, and if it
   is the designated successor if one was provided, the leader signals
   the peer to start an election, which it should win. If no eligible
   successor appears, the transfer period expires and the leader resumes
   normal operation.

This is an improvement over the current leader step down method, which
causes the leader to simply relinquish leadership and snooze its
election timer for an extra long period, so another voter will likely
become leader. Leadership transfer should usually be much faster and it
allows the client to select the new leader among current voters.
However, note that it does not provide strictly better guarantees- it is
still possible that leadership will not be transferred.

I ran TestRepeatLeaderStepDown and TestGracefulLeaderStepDown 1000 times
and 200 times each, in debug and TSAN modes, with 4 stress threads, and
saw no failures.

Still WIP because I want to
* Run some dist-test loops of the rebalancer tests, which now use
  graceful leadership transfer.
* Add a test or two for bad cases where the leadership transfer period
  should expire.
* Quantify how much faster leadership transfer is than abrupt stepdown,
  at least in a lab environment.

Change-Id: Ic97343af9eb349556424c999799ed5e2941f0083
---
M src/kudu/consensus/consensus-test-util.h
M src/kudu/consensus/consensus.proto
M src/kudu/consensus/consensus_peers.cc
M src/kudu/consensus/consensus_peers.h
M src/kudu/consensus/consensus_queue.cc
M src/kudu/consensus/consensus_queue.h
M src/kudu/consensus/peer_manager.cc
M src/kudu/consensus/peer_manager.h
M src/kudu/consensus/raft_consensus.cc
M src/kudu/consensus/raft_consensus.h
M src/kudu/integration-tests/raft_consensus-itest.cc
M src/kudu/tools/kudu-admin-test.cc
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tools/tool_action_tablet.cc
M src/kudu/tools/tool_replica_util.cc
M src/kudu/tools/tool_replica_util.h
M src/kudu/tserver/tablet_service.cc
17 files changed, 826 insertions(+), 59 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/51/11251/10
-- 
To view, visit http://gerrit.cloudera.org:8080/11251
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ic97343af9eb349556424c999799ed5e2941f0083
Gerrit-Change-Number: 11251
Gerrit-PatchSet: 10
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Fengling Wang <fw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>

[kudu-CR] KUDU-2245 Graceful leadership transfer

Posted by "Will Berkeley (Code Review)" <ge...@cloudera.org>.
Will Berkeley has posted comments on this change. ( http://gerrit.cloudera.org:8080/11251 )

Change subject: KUDU-2245 Graceful leadership transfer
......................................................................


Patch Set 16:

Hm, I had tested with a lot of runs on TSAN and didn't see a problem with write timeouts in the abrupt stepdown + leader transfer patch but the precommit hit three write timeouts in 3 attempts. It's expected that it could take a while for a write to go through but it should go through eventually...so maybe the timeout needs to be higher.


-- 
To view, visit http://gerrit.cloudera.org:8080/11251
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic97343af9eb349556424c999799ed5e2941f0083
Gerrit-Change-Number: 11251
Gerrit-PatchSet: 16
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Fengling Wang <fw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Tue, 16 Oct 2018 19:48:15 +0000
Gerrit-HasComments: No

[kudu-CR] KUDU-2245 Graceful leadership transfer

Posted by "Will Berkeley (Code Review)" <ge...@cloudera.org>.
Hello Fengling Wang, Tidy Bot, Mike Percy, Alexey Serbin, Kudu Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/11251

to look at the new patch set (#6).

Change subject: KUDU-2245 Graceful leadership transfer
......................................................................

KUDU-2245 Graceful leadership transfer

This patch implements graceful leadership transfer, as described in the
original Raft thesis. It has the following steps:

1. An admin client sends a request to the tablet leader for it to
   transfer leadership. The client can indicate a specific voter that it
   wants to become the leader, or it can allow the current leader to
   choose its successor.
2. The leader receives the request and beings a leader transfer period.
   During a leader transfer period, the leader does not accept writes or
   config change requests. This allows followers to catch up to the
   leader. A background timer expires the transfer period after one
   election timeout, since clients should be able to ride over
   interruptions in service lasting at least that long.
3. During the transfer period, the leader continues to update peers.
   When it receives a response from a peer, it checks if that peer is
   a voter and fully caught up to the leader's log. If it is, and if it
   is the designated successor if one was provided, the leader signals
   the peer to start an election, which it should win. If no eligible
   successor appears, the transfer period expires and the leader resumes
   normal operation.

This is an improvement over the current leader step down method, which
causes the leader to simply relinquish leadership and snooze its
election timer for an extra long period, so another voter will likely
become leader. Leadership transfer should usually be much faster and it
allows the client to select the new leader among current voters.
However, note that it does not provide strictly better guarantees- it is
still possible that leadership will not be transferred.

I ran TestRepeatLeaderStepDown and TestGracefulLeaderStepDown 1000 times
and 200 times each, in debug and TSAN modes, with 4 stress threads, and
saw no failures.

Still WIP because I want to
* Run some dist-test loops of the rebalancer tests, which now use
  graceful leadership transfer.
* Add a test or two for bad cases where the leadership transfer period
  should expire.
* Quantify how much faster leadership transfer is than abrupt stepdown,
  at least in a lab environment.

Change-Id: Ic97343af9eb349556424c999799ed5e2941f0083
---
M src/kudu/consensus/consensus-test-util.h
M src/kudu/consensus/consensus.proto
M src/kudu/consensus/consensus_peers.cc
M src/kudu/consensus/consensus_peers.h
M src/kudu/consensus/consensus_queue.cc
M src/kudu/consensus/consensus_queue.h
M src/kudu/consensus/peer_manager.cc
M src/kudu/consensus/peer_manager.h
M src/kudu/consensus/raft_consensus.cc
M src/kudu/consensus/raft_consensus.h
M src/kudu/integration-tests/raft_consensus-itest.cc
M src/kudu/tools/kudu-admin-test.cc
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tools/tool_action_tablet.cc
M src/kudu/tools/tool_replica_util.cc
M src/kudu/tools/tool_replica_util.h
M src/kudu/tserver/tablet_service.cc
17 files changed, 804 insertions(+), 51 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/51/11251/6
-- 
To view, visit http://gerrit.cloudera.org:8080/11251
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ic97343af9eb349556424c999799ed5e2941f0083
Gerrit-Change-Number: 11251
Gerrit-PatchSet: 6
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Fengling Wang <fw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>

[kudu-CR] KUDU-2245 Graceful leadership transfer

Posted by "Will Berkeley (Code Review)" <ge...@cloudera.org>.
Will Berkeley has removed a vote on this change.

Change subject: KUDU-2245 Graceful leadership transfer
......................................................................


Removed Verified-1 by Kudu Jenkins (120)
-- 
To view, visit http://gerrit.cloudera.org:8080/11251
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: deleteVote
Gerrit-Change-Id: Ic97343af9eb349556424c999799ed5e2941f0083
Gerrit-Change-Number: 11251
Gerrit-PatchSet: 13
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Fengling Wang <fw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>

[kudu-CR] [WIP] KUDU-2245 Graceful leadership transfer

Posted by "Will Berkeley (Code Review)" <ge...@cloudera.org>.
Hello Fengling Wang, Tidy Bot, Mike Percy, Alexey Serbin, Kudu Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/11251

to look at the new patch set (#2).

Change subject: [WIP] KUDU-2245 Graceful leadership transfer
......................................................................

[WIP] KUDU-2245 Graceful leadership transfer

This patch implements graceful leadership transfer, as described in the
original Raft thesis. It has the following steps:

1. An admin client sends a request to the tablet leader for it to
   transfer leadership. The client can indicate a specific voter that it
   wants to become the leader, or it can allow the current leader to
   choose its successor.
2. The leader receives the request and beings a leader transfer period.
   During a leader transfer period, the leader does not accept writes or
   config change requests. This allows followers to catch up to the
   leader. A background timer expires the transfer period after one
   election timeout, since clients should be able to ride over
   interruptions in service lasting at least that long.
3. During the transfer period, the leader continues to update peers.
   When it receives a response from a peer, it checks if that peer is
   a voter and fully caught up to the leader's log. If it is, and if it
   is the designated successor if one was provided, the leader signals
   the peer to start an election, which it should win. If no eligible
   successor appears, the transfer period expires and the leader resumes
   normal operation.

This is an improvement over the current leader step down method, which
causes the leader to simply relinquish leadership and snooze its
election timer for an extra long period, so another voter will likely
become leader. Leadership transfer should usually be much faster and it
allows the client to select the new leader among current voters.
However, note that it does not provide strictly better guarantees- it is
still possible that leadership will not be transferred.

I ran TestRepeatLeaderStepDown and TestGracefulLeaderStepDown 1000 times
and 200 times each, in debug and TSAN modes, with 4 stress threads, and
saw no failures.

Still WIP because I want to
* Run some dist-test loops of the rebalancer tests, which now use
  graceful leadership transfer.
* Add a test or two for bad cases where the leadership transfer period
  should expire.
* Quantify how much faster leadership transfer is than abrupt stepdown,
  at least in a lab environment.

Change-Id: Ic97343af9eb349556424c999799ed5e2941f0083
---
M src/kudu/consensus/consensus-test-util.h
M src/kudu/consensus/consensus.proto
M src/kudu/consensus/consensus_peers.cc
M src/kudu/consensus/consensus_peers.h
M src/kudu/consensus/consensus_queue.cc
M src/kudu/consensus/consensus_queue.h
M src/kudu/consensus/peer_manager.cc
M src/kudu/consensus/peer_manager.h
M src/kudu/consensus/raft_consensus.cc
M src/kudu/consensus/raft_consensus.h
M src/kudu/tools/kudu-admin-test.cc
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tools/tool_action_tablet.cc
M src/kudu/tools/tool_replica_util.cc
M src/kudu/tools/tool_replica_util.h
M src/kudu/tserver/tablet_service.cc
16 files changed, 530 insertions(+), 45 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/51/11251/2
-- 
To view, visit http://gerrit.cloudera.org:8080/11251
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ic97343af9eb349556424c999799ed5e2941f0083
Gerrit-Change-Number: 11251
Gerrit-PatchSet: 2
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Fengling Wang <fw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot

[kudu-CR] KUDU-2245 Graceful leadership transfer

Posted by "Mike Percy (Code Review)" <ge...@cloudera.org>.
Mike Percy has posted comments on this change. ( http://gerrit.cloudera.org:8080/11251 )

Change subject: KUDU-2245 Graceful leadership transfer
......................................................................


Patch Set 15:

(4 comments)

Looks good just a plumbing nitpick now

http://gerrit.cloudera.org:8080/#/c/11251/15/src/kudu/consensus/peer_manager.cc
File src/kudu/consensus/peer_manager.cc:

http://gerrit.cloudera.org:8080/#/c/11251/15/src/kudu/consensus/peer_manager.cc@110
PS15, Line 110: void PeerManager::StartElection(const std::string& uuid) {
Seems like this function should return a Status so we can decide how to handle errors one level up.


http://gerrit.cloudera.org:8080/#/c/11251/15/src/kudu/consensus/peer_manager.cc@119
PS15, Line 119:     return;
how about:

  return Status::NotFound(Substitute("Unable to start an election on unknown peer $0", uuid));


http://gerrit.cloudera.org:8080/#/c/11251/15/src/kudu/consensus/peer_manager.cc@121
PS15, Line 121:   Status s = peer->StartElection();
how about;

  RETURN_NOT_OK_PREPEND(peer->StartElection(), Substitute("Failed to start election on peer $0", uuid));
  return Status::OK();


http://gerrit.cloudera.org:8080/#/c/11251/15/src/kudu/consensus/raft_consensus.cc
File src/kudu/consensus/raft_consensus.cc:

http://gerrit.cloudera.org:8080/#/c/11251/15/src/kudu/consensus/raft_consensus.cc@991
PS15, Line 991:   peer_manager_->StartElection(peer_uuid);
can we do the warning here?



-- 
To view, visit http://gerrit.cloudera.org:8080/11251
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic97343af9eb349556424c999799ed5e2941f0083
Gerrit-Change-Number: 11251
Gerrit-PatchSet: 15
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Fengling Wang <fw...@cloudera.com>
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Tue, 16 Oct 2018 05:11:05 +0000
Gerrit-HasComments: Yes