You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "Alexey Serbin (Code Review)" <ge...@cloudera.org> on 2019/01/31 08:04:07 UTC

[kudu-CR] [tools] KUDU-2557 fix flake in TwoConcurrentRebalancers test

Alexey Serbin has uploaded this change for review. ( http://gerrit.cloudera.org:8080/12324


Change subject: [tools] KUDU-2557 fix flake in TwoConcurrentRebalancers test
......................................................................

[tools] KUDU-2557 fix flake in TwoConcurrentRebalancers test

This patch addresses KUDU-2557, fixing an issue where a tablet
Raft configuration with the source replica being a leader
eventually gets an extra voter replica due to multiple rebalancer
tools running concurrently.

For example, once a tablet with target replication factor of 3
got the following configuration, the rebalancers tool could not
make any forward progress:

  A (VL) REPLACE=true
  B (V)
  C (V)
  D (V)

This patch introduces a new semantic for a move operation
during intra-location rebalancing, where empty (or absent)
UUID of the target tserver means 'remove the source replica'.
In addition, an extra logic has been introduced into the
CheckCompleteMove() utility function to handle that new move
semantic described above.  With that, when the rebalancer
observes a tablet's configuration with more voter replicas
than the target replication factor, it schedules an operation
to remove the source replica.  These new provisions allows for
progress even in case of tablet configurations pictures above.

Change-Id: If5e28f4b644b1549549f50a322c1d7ef376122d3
---
M src/kudu/tools/ksck.cc
M src/kudu/tools/ksck.h
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tools/rebalancer.cc
M src/kudu/tools/tool_replica_util.cc
5 files changed, 71 insertions(+), 28 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/24/12324/1
-- 
To view, visit http://gerrit.cloudera.org:8080/12324
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: If5e28f4b644b1549549f50a322c1d7ef376122d3
Gerrit-Change-Number: 12324
Gerrit-PatchSet: 1
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>

[kudu-CR] [tools] KUDU-2557 fix flake in TwoConcurrentRebalancers test

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/12324 )

Change subject: [tools] KUDU-2557 fix flake in TwoConcurrentRebalancers test
......................................................................


Patch Set 2:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/12324/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/12324/2//COMMIT_MSG@28
PS2, Line 28: With that, when the rebalancer observes a tablet's
            : configuration with more voter replicas than the tablet's target
            : replication factor, it schedules an operation to remove the source
            : replica, which is perfectly aligned with the nature of the greedy
            : rebalancing algorithm.
> To check my understanding: normally, in 3-4-3, the master would remove a re
Right -- that was a corner case when the leader replica marked with REPLACE=true.  So, for those cases the rebalancer has a dedicated logic to keep an eye on that and make the leader step down when the newly added non-voter at the target tserver becomes a voter.


http://gerrit.cloudera.org:8080/#/c/12324/2/src/kudu/tools/kudu-tool-test.cc
File src/kudu/tools/kudu-tool-test.cc:

http://gerrit.cloudera.org:8080/#/c/12324/2/src/kudu/tools/kudu-tool-test.cc@397
PS2, Line 397: <int64_t>
> What's the purpose of this?
Ah, for some reason the original version failed to compile on my laptop.  Not sure what triggered that (I have some other approach addressing this issue).  Let me check whether it's actually needed with the current code.



-- 
To view, visit http://gerrit.cloudera.org:8080/12324
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If5e28f4b644b1549549f50a322c1d7ef376122d3
Gerrit-Change-Number: 12324
Gerrit-PatchSet: 2
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Thu, 31 Jan 2019 18:45:18 +0000
Gerrit-HasComments: Yes

[kudu-CR] [tools] KUDU-2557 fix flake in TwoConcurrentRebalancers test

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/12324 )

Change subject: [tools] KUDU-2557 fix flake in TwoConcurrentRebalancers test
......................................................................


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/12324/2/src/kudu/tools/kudu-tool-test.cc
File src/kudu/tools/kudu-tool-test.cc:

http://gerrit.cloudera.org:8080/#/c/12324/2/src/kudu/tools/kudu-tool-test.cc@397
PS2, Line 397: <int64_t>
> Ah, for some reason the original version failed to compile on my laptop.  N
Yep, just confirmed that without that change the compilation fails.  Basically, the issue comes from using 'long' for 0 ('0L' (using 'long long', i.e. '0LL' would help as well).



-- 
To view, visit http://gerrit.cloudera.org:8080/12324
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If5e28f4b644b1549549f50a322c1d7ef376122d3
Gerrit-Change-Number: 12324
Gerrit-PatchSet: 2
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Thu, 31 Jan 2019 19:07:44 +0000
Gerrit-HasComments: Yes

[kudu-CR] [tools] KUDU-2557 fix flake in TwoConcurrentRebalancers test

Posted by "Will Berkeley (Code Review)" <ge...@cloudera.org>.
Will Berkeley has posted comments on this change. ( http://gerrit.cloudera.org:8080/12324 )

Change subject: [tools] KUDU-2557 fix flake in TwoConcurrentRebalancers test
......................................................................


Patch Set 2: Code-Review+2

(2 comments)

http://gerrit.cloudera.org:8080/#/c/12324/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/12324/2//COMMIT_MSG@28
PS2, Line 28: With that, when the rebalancer observes a tablet's
            : configuration with more voter replicas than the tablet's target
            : replication factor, it schedules an operation to remove the source
            : replica, which is perfectly aligned with the nature of the greedy
            : rebalancing algorithm.
To check my understanding: normally, in 3-4-3, the master would remove a replica marked replace. But because the leader is marked as replace, the tool has to do it?


http://gerrit.cloudera.org:8080/#/c/12324/2/src/kudu/tools/kudu-tool-test.cc
File src/kudu/tools/kudu-tool-test.cc:

http://gerrit.cloudera.org:8080/#/c/12324/2/src/kudu/tools/kudu-tool-test.cc@397
PS2, Line 397: <int64_t>
What's the purpose of this?



-- 
To view, visit http://gerrit.cloudera.org:8080/12324
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If5e28f4b644b1549549f50a322c1d7ef376122d3
Gerrit-Change-Number: 12324
Gerrit-PatchSet: 2
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Thu, 31 Jan 2019 18:38:18 +0000
Gerrit-HasComments: Yes

[kudu-CR] [tools] KUDU-2557 fix flake in TwoConcurrentRebalancers test

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Hello Will Berkeley, Kudu Jenkins, Andrew Wong, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/12324

to look at the new patch set (#2).

Change subject: [tools] KUDU-2557 fix flake in TwoConcurrentRebalancers test
......................................................................

[tools] KUDU-2557 fix flake in TwoConcurrentRebalancers test

This patch addresses KUDU-2557, fixing an no-progress issue
if a tablet's Raft configuration with the source replica being
a leader eventually gets an extra voter replica due to multiple
rebalancer tools running concurrently.

Prior to this patch, once a tablet with the target replication factor
of 3 had gotten the following configuration, neither of concurrent
rebalancer tools could make any forward progress:

  A (VL) REPLACE=true
  B (V)
  C (V)
  D (V)

This patch introduces a new semantic for a move operation
for the intra-location rebalancing, where an empty (or absent)
UUID of the target tserver means 'just remove the source replica'.
Also, an extra logic has been introduced into the CheckCompleteMove()
utility function to handle the new replica move semantic described
above.  With that, when the rebalancer observes a tablet's
configuration with more voter replicas than the tablet's target
replication factor, it schedules an operation to remove the source
replica, which is perfectly aligned with the nature of the greedy
rebalancing algorithm.

These new provisions help to resolve the no-progress issue
described above.

Change-Id: If5e28f4b644b1549549f50a322c1d7ef376122d3
---
M src/kudu/tools/ksck.cc
M src/kudu/tools/ksck.h
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tools/rebalancer.cc
M src/kudu/tools/tool_replica_util.cc
5 files changed, 71 insertions(+), 28 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/24/12324/2
-- 
To view, visit http://gerrit.cloudera.org:8080/12324
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If5e28f4b644b1549549f50a322c1d7ef376122d3
Gerrit-Change-Number: 12324
Gerrit-PatchSet: 2
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>

[kudu-CR] [tools] KUDU-2557 fix flake in TwoConcurrentRebalancers test

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/12324 )

Change subject: [tools] KUDU-2557 fix flake in TwoConcurrentRebalancers test
......................................................................

[tools] KUDU-2557 fix flake in TwoConcurrentRebalancers test

This patch addresses KUDU-2557, fixing an no-progress issue
if a tablet's Raft configuration with the source replica being
a leader eventually gets an extra voter replica due to multiple
rebalancer tools running concurrently.

Prior to this patch, once a tablet with the target replication factor
of 3 had gotten the following configuration, neither of concurrent
rebalancer tools could make any forward progress:

  A (VL) REPLACE=true
  B (V)
  C (V)
  D (V)

This patch introduces a new semantic for a move operation
for the intra-location rebalancing, where an empty (or absent)
UUID of the target tserver means 'just remove the source replica'.
Also, an extra logic has been introduced into the CheckCompleteMove()
utility function to handle the new replica move semantic described
above.  With that, when the rebalancer observes a tablet's
configuration with more voter replicas than the tablet's target
replication factor, it schedules an operation to remove the source
replica, which is perfectly aligned with the nature of the greedy
rebalancing algorithm.

These new provisions help to resolve the no-progress issue
described above.

Change-Id: If5e28f4b644b1549549f50a322c1d7ef376122d3
Reviewed-on: http://gerrit.cloudera.org:8080/12324
Tested-by: Kudu Jenkins
Reviewed-by: Will Berkeley <wd...@gmail.com>
---
M src/kudu/tools/ksck.cc
M src/kudu/tools/ksck.h
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tools/rebalancer.cc
M src/kudu/tools/tool_replica_util.cc
5 files changed, 71 insertions(+), 28 deletions(-)

Approvals:
  Kudu Jenkins: Verified
  Will Berkeley: Looks good to me, approved

-- 
To view, visit http://gerrit.cloudera.org:8080/12324
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: If5e28f4b644b1549549f50a322c1d7ef376122d3
Gerrit-Change-Number: 12324
Gerrit-PatchSet: 3
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>