You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "Alexey Serbin (Code Review)" <ge...@cloudera.org> on 2018/05/14 22:44:30 UTC

[kudu-CR] WIP [tools] first draft of rebalancing in kudu CLI

Alexey Serbin has uploaded this change for review. ( http://gerrit.cloudera.org:8080/10399


Change subject: WIP [tools] first draft of rebalancing in kudu CLI
......................................................................

WIP [tools] first draft of rebalancing in kudu CLI

WIP:
  * needs integration tests with a la ksck-remote
  * address several TODOs

Change-Id: I269ea1dcb0b528ad9f03308bac6b8769e2141238
---
M src/kudu/tools/CMakeLists.txt
M src/kudu/tools/rebalance-algo.cc
A src/kudu/tools/rebalance-test.cc
A src/kudu/tools/rebalance.cc
A src/kudu/tools/rebalance.h
M src/kudu/tools/tool_action_cluster.cc
M src/kudu/tools/tool_action_tablet.cc
7 files changed, 835 insertions(+), 58 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/99/10399/1
-- 
To view, visit http://gerrit.cloudera.org:8080/10399
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I269ea1dcb0b528ad9f03308bac6b8769e2141238
Gerrit-Change-Number: 10399
Gerrit-PatchSet: 1
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>

[kudu-CR] [tools] rebalancer in the kudu CLI tool

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Hello Will Berkeley, Mike Percy, Kudu Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/10399

to look at the new patch set (#22).

Change subject: [tools] rebalancer in the kudu CLI tool
......................................................................

[tools] rebalancer in the kudu CLI tool

Introduced rebalancing tool as a part of the Kudu CLI tool.  To run
the rebalancer against a Kudu cluster:

  sudo -u kudu kudu cluster rebalance <master_addresses>

This changelist also contains unit tests to cover converting KsckResults
into ClusterBalanceInfo and integration tests to verify
the functionality of the rebalancer as a part of the Kudu CLI tool.

The code was tested against a 12-node cluster running the recent
Kudu 1.7 (3-4-3 replica management scheme) with about 12TB of data total
and a cluster running Kudu 1.4 (3-2-3 replica management scheme).

This tool works against Kudu clusters of version 1.4 and above.
It does not work against Kudu version 1.3 and earlier.

Change-Id: I269ea1dcb0b528ad9f03308bac6b8769e2141238
---
M src/kudu/client/client.cc
M src/kudu/tools/CMakeLists.txt
M src/kudu/tools/kudu-admin-test.cc
A src/kudu/tools/rebalance-test.cc
A src/kudu/tools/rebalancer.cc
A src/kudu/tools/rebalancer.h
M src/kudu/tools/tool_action_cluster.cc
M src/kudu/tools/tool_action_common.cc
M src/kudu/tools/tool_action_common.h
M src/kudu/tools/tool_action_tablet.cc
A src/kudu/tools/tool_replica_util.cc
A src/kudu/tools/tool_replica_util.h
12 files changed, 2,562 insertions(+), 427 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/99/10399/22
-- 
To view, visit http://gerrit.cloudera.org:8080/10399
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I269ea1dcb0b528ad9f03308bac6b8769e2141238
Gerrit-Change-Number: 10399
Gerrit-PatchSet: 22
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>

[kudu-CR] [tools] rebalancer in the kudu CLI tool

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/10399 )

Change subject: [tools] rebalancer in the kudu CLI tool
......................................................................


Patch Set 23: Verified+1

Unrelated flake in OptionalSSL/TestRpc.TestClientConnectionMetrics/0


-- 
To view, visit http://gerrit.cloudera.org:8080/10399
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I269ea1dcb0b528ad9f03308bac6b8769e2141238
Gerrit-Change-Number: 10399
Gerrit-PatchSet: 23
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Thu, 07 Jun 2018 04:14:41 +0000
Gerrit-HasComments: No

[kudu-CR] WIP [tools] first draft of rebalancing in kudu CLI

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Hello Will Berkeley, Kudu Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/10399

to look at the new patch set (#10).

Change subject: WIP [tools] first draft of rebalancing in kudu CLI
......................................................................

WIP [tools] first draft of rebalancing in kudu CLI

WIP:
  * needs more of restructuring and clean-up

The code was tested against the flash cluster and 3-2-3 cluster
running CDH-5.12

Change-Id: I269ea1dcb0b528ad9f03308bac6b8769e2141238
---
M src/kudu/tools/CMakeLists.txt
M src/kudu/tools/kudu-admin-test.cc
A src/kudu/tools/rebalance-test.cc
A src/kudu/tools/rebalance.cc
A src/kudu/tools/rebalance.h
M src/kudu/tools/tool_action_cluster.cc
M src/kudu/tools/tool_action_common.cc
M src/kudu/tools/tool_action_common.h
M src/kudu/tools/tool_action_tablet.cc
A src/kudu/tools/tool_replica_util.cc
A src/kudu/tools/tool_replica_util.h
11 files changed, 2,113 insertions(+), 416 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/99/10399/10
-- 
To view, visit http://gerrit.cloudera.org:8080/10399
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I269ea1dcb0b528ad9f03308bac6b8769e2141238
Gerrit-Change-Number: 10399
Gerrit-PatchSet: 10
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>

[kudu-CR] [tools] rebalancer in the kudu CLI tool

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Hello Will Berkeley, Mike Percy, Kudu Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/10399

to look at the new patch set (#20).

Change subject: [tools] rebalancer in the kudu CLI tool
......................................................................

[tools] rebalancer in the kudu CLI tool

Introduced rebalancing tool as a part of the Kudu CLI tool.  To run
the rebalancer against a Kudu cluster:

  sudo -u kudu kudu cluster rebalance <master_addresses>

This changelist also contains unit tests to cover converting KsckResults
result into ClusterBalanceInfo and integration tests to verify
the functionality of the rebalancer as a part of the Kudu CLI tool.

The code was tested against the 12-node cluster running the recent
Kudu 1.7 (3-4-3 replica management scheme) with about 12TB of data total
and a cluster running Kudu 1.4 (3-2-3 replica management scheme).

Change-Id: I269ea1dcb0b528ad9f03308bac6b8769e2141238
---
M src/kudu/client/client.cc
M src/kudu/tools/CMakeLists.txt
M src/kudu/tools/kudu-admin-test.cc
A src/kudu/tools/rebalance-test.cc
A src/kudu/tools/rebalancer.cc
A src/kudu/tools/rebalancer.h
M src/kudu/tools/tool_action_cluster.cc
M src/kudu/tools/tool_action_common.cc
M src/kudu/tools/tool_action_common.h
M src/kudu/tools/tool_action_tablet.cc
A src/kudu/tools/tool_replica_util.cc
A src/kudu/tools/tool_replica_util.h
12 files changed, 2,554 insertions(+), 419 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/99/10399/20
-- 
To view, visit http://gerrit.cloudera.org:8080/10399
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I269ea1dcb0b528ad9f03308bac6b8769e2141238
Gerrit-Change-Number: 10399
Gerrit-PatchSet: 20
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>

[kudu-CR] [tools] rebalancer in the kudu CLI tool

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/10399 )

Change subject: [tools] rebalancer in the kudu CLI tool
......................................................................


Patch Set 20:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/10399/20/src/kudu/client/client.cc
File src/kudu/client/client.cc:

http://gerrit.cloudera.org:8080/#/c/10399/20/src/kudu/client/client.cc@548
PS20, Line 548: resp.tablet_locations_size() == 0
> I don't recall the details, but the definite thing I remember was the error
Ah, now I see -- you just need to pass non-existent tablet_id as a parameter, and then you'll get that error.  I think reporting back Status::NotFound() is more idiomatic than Status::IllegalState().



-- 
To view, visit http://gerrit.cloudera.org:8080/10399
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I269ea1dcb0b528ad9f03308bac6b8769e2141238
Gerrit-Change-Number: 10399
Gerrit-PatchSet: 20
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Wed, 06 Jun 2018 17:30:58 +0000
Gerrit-HasComments: Yes

[kudu-CR] WIP [tools] first draft of rebalancing in kudu CLI

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Hello Will Berkeley, Kudu Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/10399

to look at the new patch set (#6).

Change subject: WIP [tools] first draft of rebalancing in kudu CLI
......................................................................

WIP [tools] first draft of rebalancing in kudu CLI

WIP:
  * adapt to run faster in 3-2-3 re-replication scheme
  * test against older cluster using 3-2-3 re-replication scheme
  * needs integration tests with a la ksck-remote
  * needs a bit of restructuring and clean-up

The code was tested against the flash cluster.

Change-Id: I269ea1dcb0b528ad9f03308bac6b8769e2141238
---
M src/kudu/tools/CMakeLists.txt
A src/kudu/tools/rebalance-test.cc
A src/kudu/tools/rebalance.cc
A src/kudu/tools/rebalance.h
M src/kudu/tools/tool_action_cluster.cc
M src/kudu/tools/tool_action_tablet.cc
6 files changed, 1,337 insertions(+), 101 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/99/10399/6
-- 
To view, visit http://gerrit.cloudera.org:8080/10399
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I269ea1dcb0b528ad9f03308bac6b8769e2141238
Gerrit-Change-Number: 10399
Gerrit-PatchSet: 6
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>

[kudu-CR] [tools] rebalancer in the kudu CLI tool

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/10399 )

Change subject: [tools] rebalancer in the kudu CLI tool
......................................................................


Patch Set 19:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/10399/19/src/kudu/tools/rebalancer.cc
File src/kudu/tools/rebalancer.cc:

http://gerrit.cloudera.org:8080/#/c/10399/19/src/kudu/tools/rebalancer.cc@276
PS19, Line 276: &is_timed_out,
              :                                                             &reset_ops
typo: these should be swapped.  Probably, the signature of this method should be changed.



-- 
To view, visit http://gerrit.cloudera.org:8080/10399
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I269ea1dcb0b528ad9f03308bac6b8769e2141238
Gerrit-Change-Number: 10399
Gerrit-PatchSet: 19
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Tue, 05 Jun 2018 03:44:26 +0000
Gerrit-HasComments: Yes

[kudu-CR] [tools] rebalancer in the kudu CLI tool

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Hello Will Berkeley, Kudu Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/10399

to look at the new patch set (#11).

Change subject: [tools] rebalancer in the kudu CLI tool
......................................................................

[tools] rebalancer in the kudu CLI tool

Introduced rebalancing tool as a part of the Kudu CLI tool.  To run
the rebalancer against a Kudu cluster:

  sudo -u kudu kudu cluster rebalance <cluster_rpc_endpoints>

This changelist also contains unit tests to cover converting KsckResults
result into ClusterBalanceInfo and integration tests to verify
the functionality of the rebalancer as a part of the Kudu CLI tool.

The code was tested against the flash cluster and a 3-2-3 cluster
running CDH5.12.x

Change-Id: I269ea1dcb0b528ad9f03308bac6b8769e2141238
---
M src/kudu/tools/CMakeLists.txt
M src/kudu/tools/kudu-admin-test.cc
A src/kudu/tools/rebalance-test.cc
A src/kudu/tools/rebalance.cc
A src/kudu/tools/rebalance.h
M src/kudu/tools/tool_action_cluster.cc
M src/kudu/tools/tool_action_common.cc
M src/kudu/tools/tool_action_common.h
M src/kudu/tools/tool_action_tablet.cc
A src/kudu/tools/tool_replica_util.cc
A src/kudu/tools/tool_replica_util.h
11 files changed, 2,113 insertions(+), 416 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/99/10399/11
-- 
To view, visit http://gerrit.cloudera.org:8080/10399
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I269ea1dcb0b528ad9f03308bac6b8769e2141238
Gerrit-Change-Number: 10399
Gerrit-PatchSet: 11
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>

[kudu-CR] [tools] rebalancer in the kudu CLI tool

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Hello Will Berkeley, Mike Percy, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/10399

to look at the new patch set (#23).

Change subject: [tools] rebalancer in the kudu CLI tool
......................................................................

[tools] rebalancer in the kudu CLI tool

Introduced rebalancing tool as a part of the Kudu CLI tool.  To run
the rebalancer against a Kudu cluster:

  sudo -u kudu kudu cluster rebalance <master_addresses>

This changelist also contains unit tests to cover converting KsckResults
into ClusterBalanceInfo and integration tests to verify
the functionality of the rebalancer as a part of the Kudu CLI tool.

The code was tested against a 12-node cluster running the recent
Kudu 1.7 (3-4-3 replica management scheme) with about 12TB of data total
and a cluster running Kudu 1.4 (3-2-3 replica management scheme).

This tool works against Kudu clusters of version 1.4 and above.
It does not work against Kudu version 1.3 and earlier.

Change-Id: I269ea1dcb0b528ad9f03308bac6b8769e2141238
---
M src/kudu/client/client.cc
M src/kudu/tools/CMakeLists.txt
M src/kudu/tools/kudu-admin-test.cc
A src/kudu/tools/rebalance-test.cc
A src/kudu/tools/rebalancer.cc
A src/kudu/tools/rebalancer.h
M src/kudu/tools/tool_action_cluster.cc
M src/kudu/tools/tool_action_common.cc
M src/kudu/tools/tool_action_common.h
M src/kudu/tools/tool_action_tablet.cc
A src/kudu/tools/tool_replica_util.cc
A src/kudu/tools/tool_replica_util.h
12 files changed, 2,570 insertions(+), 427 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/99/10399/23
-- 
To view, visit http://gerrit.cloudera.org:8080/10399
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I269ea1dcb0b528ad9f03308bac6b8769e2141238
Gerrit-Change-Number: 10399
Gerrit-PatchSet: 23
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>

[kudu-CR] WIP [tools] first draft of rebalancing in kudu CLI

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Hello Will Berkeley, Kudu Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/10399

to look at the new patch set (#8).

Change subject: WIP [tools] first draft of rebalancing in kudu CLI
......................................................................

WIP [tools] first draft of rebalancing in kudu CLI

WIP:
  * needs integration tests with a la ksck-remote
  * needs more of restructuring and clean-up

The code was tested against the flash cluster and 3-2-3 cluster
running CDH-5.12

Change-Id: I269ea1dcb0b528ad9f03308bac6b8769e2141238
---
M src/kudu/tools/CMakeLists.txt
M src/kudu/tools/kudu-admin-test.cc
A src/kudu/tools/rebalance-test.cc
A src/kudu/tools/rebalance.cc
A src/kudu/tools/rebalance.h
M src/kudu/tools/tool_action_cluster.cc
M src/kudu/tools/tool_action_common.cc
M src/kudu/tools/tool_action_common.h
M src/kudu/tools/tool_action_tablet.cc
A src/kudu/tools/tool_replica_util.cc
A src/kudu/tools/tool_replica_util.h
11 files changed, 1,918 insertions(+), 410 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/99/10399/8
-- 
To view, visit http://gerrit.cloudera.org:8080/10399
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I269ea1dcb0b528ad9f03308bac6b8769e2141238
Gerrit-Change-Number: 10399
Gerrit-PatchSet: 8
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>

[kudu-CR] [tools] rebalancer in the kudu CLI tool

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Hello Will Berkeley, Mike Percy, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/10399

to look at the new patch set (#21).

Change subject: [tools] rebalancer in the kudu CLI tool
......................................................................

[tools] rebalancer in the kudu CLI tool

Introduced rebalancing tool as a part of the Kudu CLI tool.  To run
the rebalancer against a Kudu cluster:

  sudo -u kudu kudu cluster rebalance <master_addresses>

This changelist also contains unit tests to cover converting KsckResults
into ClusterBalanceInfo and integration tests to verify
the functionality of the rebalancer as a part of the Kudu CLI tool.

The code was tested against a 12-node cluster running the recent
Kudu 1.7 (3-4-3 replica management scheme) with about 12TB of data total
and a cluster running Kudu 1.4 (3-2-3 replica management scheme).

This tool works against Kudu clusters of version 1.4 and above.
It does not work against Kudu version 1.3 and earlier.

Change-Id: I269ea1dcb0b528ad9f03308bac6b8769e2141238
---
M src/kudu/client/client.cc
M src/kudu/tools/CMakeLists.txt
M src/kudu/tools/kudu-admin-test.cc
A src/kudu/tools/rebalance-test.cc
A src/kudu/tools/rebalancer.cc
A src/kudu/tools/rebalancer.h
M src/kudu/tools/tool_action_cluster.cc
M src/kudu/tools/tool_action_common.cc
M src/kudu/tools/tool_action_common.h
M src/kudu/tools/tool_action_tablet.cc
A src/kudu/tools/tool_replica_util.cc
A src/kudu/tools/tool_replica_util.h
12 files changed, 2,559 insertions(+), 419 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/99/10399/21
-- 
To view, visit http://gerrit.cloudera.org:8080/10399
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I269ea1dcb0b528ad9f03308bac6b8769e2141238
Gerrit-Change-Number: 10399
Gerrit-PatchSet: 21
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>

[kudu-CR] [tools] rebalancer in the kudu CLI tool

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Hello Will Berkeley, Mike Percy, Kudu Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/10399

to look at the new patch set (#13).

Change subject: [tools] rebalancer in the kudu CLI tool
......................................................................

[tools] rebalancer in the kudu CLI tool

Introduced rebalancing tool as a part of the Kudu CLI tool.  To run
the rebalancer against a Kudu cluster:

  sudo -u kudu kudu cluster rebalance <master_addresses>

This changelist also contains unit tests to cover converting KsckResults
result into ClusterBalanceInfo and integration tests to verify
the functionality of the rebalancer as a part of the Kudu CLI tool.

The code was tested against the 12-node cluster running the recent
Kudu 1.7 (3-4-3 replica management scheme) with about 12TB of data total
and a cluster running Kudu 1.4 (3-2-3 replica management scheme).

Change-Id: I269ea1dcb0b528ad9f03308bac6b8769e2141238
---
M src/kudu/client/client.cc
M src/kudu/tools/CMakeLists.txt
M src/kudu/tools/kudu-admin-test.cc
A src/kudu/tools/rebalance-test.cc
A src/kudu/tools/rebalance.cc
A src/kudu/tools/rebalance.h
M src/kudu/tools/tool_action_cluster.cc
M src/kudu/tools/tool_action_common.cc
M src/kudu/tools/tool_action_common.h
M src/kudu/tools/tool_action_tablet.cc
A src/kudu/tools/tool_replica_util.cc
A src/kudu/tools/tool_replica_util.h
12 files changed, 2,272 insertions(+), 420 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/99/10399/13
-- 
To view, visit http://gerrit.cloudera.org:8080/10399
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I269ea1dcb0b528ad9f03308bac6b8769e2141238
Gerrit-Change-Number: 10399
Gerrit-PatchSet: 13
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>

[kudu-CR] [tools] rebalancer in the kudu CLI tool

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Hello Will Berkeley, Mike Percy, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/10399

to look at the new patch set (#12).

Change subject: [tools] rebalancer in the kudu CLI tool
......................................................................

[tools] rebalancer in the kudu CLI tool

Introduced rebalancing tool as a part of the Kudu CLI tool.  To run
the rebalancer against a Kudu cluster:

  sudo -u kudu kudu cluster rebalance <cluster_rpc_endpoints>

This changelist also contains unit tests to cover converting KsckResults
result into ClusterBalanceInfo and integration tests to verify
the functionality of the rebalancer as a part of the Kudu CLI tool.

The code was tested against the flash cluster and a 3-2-3 cluster
running CDH5.12.x

Change-Id: I269ea1dcb0b528ad9f03308bac6b8769e2141238
---
M src/kudu/client/client.cc
M src/kudu/tools/CMakeLists.txt
M src/kudu/tools/kudu-admin-test.cc
A src/kudu/tools/rebalance-test.cc
A src/kudu/tools/rebalance.cc
A src/kudu/tools/rebalance.h
M src/kudu/tools/tool_action_cluster.cc
M src/kudu/tools/tool_action_common.cc
M src/kudu/tools/tool_action_common.h
M src/kudu/tools/tool_action_tablet.cc
A src/kudu/tools/tool_replica_util.cc
A src/kudu/tools/tool_replica_util.h
12 files changed, 2,158 insertions(+), 419 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/99/10399/12
-- 
To view, visit http://gerrit.cloudera.org:8080/10399
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I269ea1dcb0b528ad9f03308bac6b8769e2141238
Gerrit-Change-Number: 10399
Gerrit-PatchSet: 12
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>

[kudu-CR] WIP [tools] first draft of rebalancing in kudu CLI

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/10399 )

Change subject: WIP [tools] first draft of rebalancing in kudu CLI
......................................................................


Patch Set 4:

(12 comments)

> (21 comments)
 > 
 > Some random comments. I didn't look at rebalancer.cc since IIRC
 > you've already changed it a lot while testing compared to what's
 > posted here.

Thank you for the review.  Yes, I updated the implementation and planning to post the new version later tonight.  I'm playing with the newer version at flash cluster, fixing issues when I see any.

http://gerrit.cloudera.org:8080/#/c/10399/4/src/kudu/tools/rebalance-test.cc
File src/kudu/tools/rebalance-test.cc:

http://gerrit.cloudera.org:8080/#/c/10399/4/src/kudu/tools/rebalance-test.cc@140
PS4, Line 140: Convert
> nit: I think you mean "Test converting..."
Done


http://gerrit.cloudera.org:8080/#/c/10399/4/src/kudu/tools/rebalance-test.cc@165
PS4, Line 165: "tablet_a_0"
> These should all be the same for one tablet.
Done


http://gerrit.cloudera.org:8080/#/c/10399/4/src/kudu/tools/rebalance.h
File src/kudu/tools/rebalance.h:

http://gerrit.cloudera.org:8080/#/c/10399/4/src/kudu/tools/rebalance.h@59
PS4, Line 59: UUID
> UUIDs
Done


http://gerrit.cloudera.org:8080/#/c/10399/4/src/kudu/tools/rebalance.h@59
PS4, Line 59: is output into
> nit: This wording leaves it unclear to me whether it replaces the contents 
Done


http://gerrit.cloudera.org:8080/#/c/10399/4/src/kudu/tools/rebalance.h@60
PS4, Line 60:  found
> nit: are found
Done


http://gerrit.cloudera.org:8080/#/c/10399/4/src/kudu/tools/rebalance.h@60
PS4, Line 60: is
> nit: will be
Done


http://gerrit.cloudera.org:8080/#/c/10399/4/src/kudu/tools/rebalance.h@60
PS4, Line 60: the 'tablet_ids' is empty
> OK this makes it clear the results are populated into 'tablet_ids'.
Done


http://gerrit.cloudera.org:8080/#/c/10399/4/src/kudu/tools/rebalance.h@60
PS4, Line 60: the
> nit: remove
Done


http://gerrit.cloudera.org:8080/#/c/10399/4/src/kudu/tools/tool_action_cluster.cc
File src/kudu/tools/tool_action_cluster.cc:

http://gerrit.cloudera.org:8080/#/c/10399/4/src/kudu/tools/tool_action_cluster.cc@62
PS4, Line 62: perfectly
> nit: I don't like this extra word because I don't think it means anything- 
Done


http://gerrit.cloudera.org:8080/#/c/10399/4/src/kudu/tools/tool_action_cluster.cc@64
PS4, Line 64: max_moves_per_step
> Is this parameter also secretly the max number of moves that can be going o
This simple implementation runs step by step, and yes -- this parameter controls the maximum number of moves that can be done in one  step.  I also added 'max_moves_per_server' parameter.


http://gerrit.cloudera.org:8080/#/c/10399/4/src/kudu/tools/tool_action_cluster.cc@119
PS4, Line 119: LOG(INFO) << "rebalancing is complete: no moves left";
> nit: We should straight print this rather than log it.
Done


http://gerrit.cloudera.org:8080/#/c/10399/4/src/kudu/tools/tool_action_tablet.cc
File src/kudu/tools/tool_action_tablet.cc:

http://gerrit.cloudera.org:8080/#/c/10399/4/src/kudu/tools/tool_action_tablet.cc@413
PS4, Line 413:   
> Should be 4 spaces, not 6.
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/10399
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I269ea1dcb0b528ad9f03308bac6b8769e2141238
Gerrit-Change-Number: 10399
Gerrit-PatchSet: 4
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Mon, 21 May 2018 01:01:56 +0000
Gerrit-HasComments: Yes

[kudu-CR] WIP [tools] first draft of rebalancing in kudu CLI

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Hello Will Berkeley, Kudu Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/10399

to look at the new patch set (#5).

Change subject: WIP [tools] first draft of rebalancing in kudu CLI
......................................................................

WIP [tools] first draft of rebalancing in kudu CLI

WIP:
  * needs integration tests with a la ksck-remote
  * address several TODOs

The code was tested against the flash cluster.

Change-Id: I269ea1dcb0b528ad9f03308bac6b8769e2141238
---
M src/kudu/tools/CMakeLists.txt
M src/kudu/tools/rebalance-algo.cc
A src/kudu/tools/rebalance-test.cc
A src/kudu/tools/rebalance.cc
A src/kudu/tools/rebalance.h
M src/kudu/tools/tool_action_cluster.cc
M src/kudu/tools/tool_action_tablet.cc
7 files changed, 1,242 insertions(+), 95 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/99/10399/5
-- 
To view, visit http://gerrit.cloudera.org:8080/10399
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I269ea1dcb0b528ad9f03308bac6b8769e2141238
Gerrit-Change-Number: 10399
Gerrit-PatchSet: 5
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>

[kudu-CR] [tools] rebalancer in the kudu CLI tool

Posted by "Will Berkeley (Code Review)" <ge...@cloudera.org>.
Will Berkeley has posted comments on this change. ( http://gerrit.cloudera.org:8080/10399 )

Change subject: [tools] rebalancer in the kudu CLI tool
......................................................................


Patch Set 20: Code-Review+1

(13 comments)

A couple of nits and a couple of questions but overall I think it looks good. Great job making it easier to understand!

http://gerrit.cloudera.org:8080/#/c/10399/20//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/10399/20//COMMIT_MSG@15
PS20, Line 15: result
nit: extra word


http://gerrit.cloudera.org:8080/#/c/10399/20//COMMIT_MSG@18
PS20, Line 18: the
nit: a


http://gerrit.cloudera.org:8080/#/c/10399/20//COMMIT_MSG@21
PS20, Line 21: 
We should also cover the limitations (if any) this tool has working against older versions.


http://gerrit.cloudera.org:8080/#/c/10399/20/src/kudu/client/client.cc
File src/kudu/client/client.cc:

http://gerrit.cloudera.org:8080/#/c/10399/20/src/kudu/client/client.cc@548
PS20, Line 548: resp.tablet_locations_size() == 0
This is funny. How did this come up? Looking at the master code, we return a NotFound error in the RPC if the tablet is unknown. The only other ways I see for no replicas to be returned is for the tablet to have no replicas assigned, which would be odd, or for a VOTER_REPLICA filter to be applied with all replicas non-voters, which would also be odd.


http://gerrit.cloudera.org:8080/#/c/10399/20/src/kudu/tools/kudu-admin-test.cc
File src/kudu/tools/kudu-admin-test.cc:

http://gerrit.cloudera.org:8080/#/c/10399/20/src/kudu/tools/kudu-admin-test.cc@1353
PS20, Line 1353: ASSERT_FALSE(s.ok());
nit: ASSERT_TRUE on the sort of status it is. Here It looks like that is IllegalState.


http://gerrit.cloudera.org:8080/#/c/10399/20/src/kudu/tools/kudu-admin-test.cc@1383
PS20, Line 1383: constexpr auto kTserverUnresponsiveMs = 5000;
nit: Can we lower this and speed up the test? Does it make things flaky if it's lowered?


http://gerrit.cloudera.org:8080/#/c/10399/20/src/kudu/tools/kudu-admin-test.cc@1403
PS20, Line 1403: 3
nit: (kRepFactor + 1)


http://gerrit.cloudera.org:8080/#/c/10399/20/src/kudu/tools/kudu-admin-test.cc@1427
PS20, Line 1427: if (kRepFactor > 1)
Ah, makes sense to me that writes can't replicate during replica movement when RF = 1 under 3-2-3. However, shouldn't it work during 3-4-3 replica movement?

nit: Also, could you update the comments in this test to say that a workload isn't run during 3-2-3 replica movement with RF = 1 because the tablet is unavailable during the move?


http://gerrit.cloudera.org:8080/#/c/10399/20/src/kudu/tools/rebalancer.h
File src/kudu/tools/rebalancer.h:

http://gerrit.cloudera.org:8080/#/c/10399/20/src/kudu/tools/rebalancer.h@78
PS20, Line 78: size_t max_state_syncs_in_a_row
This parameter seems like a nice safety valve but also hard to use. How long would it take for 1000 max state syncs to happen? Do we need an idle timeout instead? So, the semantics would be "if you haven't been able to do a move for X seconds, give up as being unable to make progress".


http://gerrit.cloudera.org:8080/#/c/10399/20/src/kudu/tools/rebalancer.cc
File src/kudu/tools/rebalancer.cc:

http://gerrit.cloudera.org:8080/#/c/10399/20/src/kudu/tools/rebalancer.cc@243
PS20, Line 243: resync_state = false;
The rsync happens in GetNextMoves but that's not obvious from reading this function. Could you note somewhere that we resync in GetNextMoves?


http://gerrit.cloudera.org:8080/#/c/10399/20/src/kudu/tools/rebalancer.cc@539
PS20, Line 539:  
nit: are


http://gerrit.cloudera.org:8080/#/c/10399/20/src/kudu/tools/rebalancer.cc@545
PS20, Line 545: // UUIDs of tablets of the selected table at the source tserver which replicas
              :   // are non-leaders.
Howabout "Tablet ids of replicas on the source tserver that are non-leaders."


http://gerrit.cloudera.org:8080/#/c/10399/20/src/kudu/tools/rebalancer.cc@548
PS20, Line 548: // UUIDs of tablets of the selected table at the source tserver which replicas
              :   // are leaders.
"Tablet ids of replicas on the source tserver that are leaders."



-- 
To view, visit http://gerrit.cloudera.org:8080/10399
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I269ea1dcb0b528ad9f03308bac6b8769e2141238
Gerrit-Change-Number: 10399
Gerrit-PatchSet: 20
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Tue, 05 Jun 2018 18:45:57 +0000
Gerrit-HasComments: Yes

[kudu-CR] [tools] rebalancer in the kudu CLI tool

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has removed Kudu Jenkins from this change.  ( http://gerrit.cloudera.org:8080/10399 )

Change subject: [tools] rebalancer in the kudu CLI tool
......................................................................


Removed reviewer Kudu Jenkins with the following votes:

* Verified-1 by Kudu Jenkins (120)
-- 
To view, visit http://gerrit.cloudera.org:8080/10399
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: deleteReviewer
Gerrit-Change-Id: I269ea1dcb0b528ad9f03308bac6b8769e2141238
Gerrit-Change-Number: 10399
Gerrit-PatchSet: 11
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>

[kudu-CR] [tools] rebalancer in the kudu CLI tool

Posted by "Mike Percy (Code Review)" <ge...@cloudera.org>.
Mike Percy has posted comments on this change. ( http://gerrit.cloudera.org:8080/10399 )

Change subject: [tools] rebalancer in the kudu CLI tool
......................................................................


Patch Set 15:

(17 comments)

I am still working through this patch, but I thought I'd leave some partial feedback.

http://gerrit.cloudera.org:8080/#/c/10399/15/src/kudu/tools/rebalance.h
File src/kudu/tools/rebalance.h:

PS15: 
nit: consider naming this rebalancer.h since it has the Rebalancer class


http://gerrit.cloudera.org:8080/#/c/10399/15/src/kudu/tools/rebalance.h@129
PS15, Line 129: results for the next step
nit: using the term "step" here is confusing because it sounds like it's for the next run of RunStep()


http://gerrit.cloudera.org:8080/#/c/10399/15/src/kudu/tools/rebalance.h@152
PS15, Line 152:       std::unordered_map<std::string, std::set<size_t>>& op_indices,
mutable ref violates style guide


http://gerrit.cloudera.org:8080/#/c/10399/15/src/kudu/tools/rebalance.h@171
PS15, Line 171:   bool UpdateMovesInProgress(const MonoTime& deadline, bool* timed_out);
Needs doc, including the API contract


http://gerrit.cloudera.org:8080/#/c/10399/15/src/kudu/tools/rebalance.h@181
PS15, Line 181: Helper containers used to run rebalancing.
This comment is very vague. Please add info about key -> value for the containers and what they are tracking for all the helper containers including the op count stuff


http://gerrit.cloudera.org:8080/#/c/10399/15/src/kudu/tools/rebalance.cc
File src/kudu/tools/rebalance.cc:

http://gerrit.cloudera.org:8080/#/c/10399/15/src/kudu/tools/rebalance.cc@53
PS15, Line 53: false
true seems like a more desirable default in a 3-4-3 cluster.


http://gerrit.cloudera.org:8080/#/c/10399/15/src/kudu/tools/rebalance.cc@56
PS15, Line 56: rebalance_output_replica_distribution_details
can we just make this a --verbose or -v argument to the kudu cluster rebalance command?


http://gerrit.cloudera.org:8080/#/c/10399/15/src/kudu/tools/rebalance.cc@238
PS15, Line 238:     if (clear_scheduled_moves) {
A comment here would be nice to indicate when we want to clear scheduled moves


http://gerrit.cloudera.org:8080/#/c/10399/15/src/kudu/tools/rebalance.cc@239
PS15, Line 239: loger
typo


http://gerrit.cloudera.org:8080/#/c/10399/15/src/kudu/tools/rebalance.cc@250
PS15, Line 250: DCHECK
nit: DCHECK_EQ("", FindNonUniqueTablet(replica_moves)) would print the offending tablet if this ever triggered


http://gerrit.cloudera.org:8080/#/c/10399/15/src/kudu/tools/rebalance.cc@479
PS15, Line 479: RunStep
I think RunStep() might be better named GetNextMoveSet() or NextMoves() or NextBatch() or NextBatchOfMoves() ... because it's not actually executing the moves.


http://gerrit.cloudera.org:8080/#/c/10399/15/src/kudu/tools/rebalance.cc@479
PS15, Line 479: pending_moves
why pass pending_moves into this function instead of just directly referencing scheduled_moves_ from within it?


http://gerrit.cloudera.org:8080/#/c/10399/15/src/kudu/tools/rebalance.cc@559
PS15, Line 559: constraint: it's better not to move leader replica
Can you clarify whether this means we prefer to move other replicas before the leader, or whether this means we will never move a leader?


http://gerrit.cloudera.org:8080/#/c/10399/15/src/kudu/tools/rebalance.cc@660
PS15, Line 660:     src_op_indices_.emplace(elem.ts_uuid_from, set<size_t>()).first->
              :         second.emplace(i);
I'm having trouble understanding why you are adding i to the set here.


http://gerrit.cloudera.org:8080/#/c/10399/15/src/kudu/tools/tool_action_cluster.cc
File src/kudu/tools/tool_action_cluster.cc:

http://gerrit.cloudera.org:8080/#/c/10399/15/src/kudu/tools/tool_action_cluster.cc@212
PS15, Line 212: rebalance_
nit: consider removing the rebalance_ prefix from these command-line flags, since they are only exposed when you type "kudu cluster rebalance" anyway.


http://gerrit.cloudera.org:8080/#/c/10399/15/src/kudu/tools/tool_replica_util.h
File src/kudu/tools/tool_replica_util.h:

http://gerrit.cloudera.org:8080/#/c/10399/15/src/kudu/tools/tool_replica_util.h@56
PS15, Line 56: Make
nit: s/Make/Request/


http://gerrit.cloudera.org:8080/#/c/10399/15/src/kudu/tools/tool_replica_util.h@78
PS15, Line 78: // can be null.
nit: doc the contract for completion_status and Status return from this function.

Why do we have separate variables for is_complete and completion_status?



-- 
To view, visit http://gerrit.cloudera.org:8080/10399
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I269ea1dcb0b528ad9f03308bac6b8769e2141238
Gerrit-Change-Number: 10399
Gerrit-PatchSet: 15
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Fri, 01 Jun 2018 01:02:44 +0000
Gerrit-HasComments: Yes

[kudu-CR] [tools] rebalancer in the kudu CLI tool

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has removed Kudu Jenkins from this change.  ( http://gerrit.cloudera.org:8080/10399 )

Change subject: [tools] rebalancer in the kudu CLI tool
......................................................................


Removed reviewer Kudu Jenkins with the following votes:

* Verified-1 by Kudu Jenkins (120)
-- 
To view, visit http://gerrit.cloudera.org:8080/10399
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: deleteReviewer
Gerrit-Change-Id: I269ea1dcb0b528ad9f03308bac6b8769e2141238
Gerrit-Change-Number: 10399
Gerrit-PatchSet: 22
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>

[kudu-CR] WIP [tools] first draft of rebalancing in kudu CLI

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Hello Will Berkeley, Kudu Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/10399

to look at the new patch set (#9).

Change subject: WIP [tools] first draft of rebalancing in kudu CLI
......................................................................

WIP [tools] first draft of rebalancing in kudu CLI

WIP:
  * needs more of restructuring and clean-up

The code was tested against the flash cluster and 3-2-3 cluster
running CDH-5.12

Change-Id: I269ea1dcb0b528ad9f03308bac6b8769e2141238
---
M src/kudu/tools/CMakeLists.txt
M src/kudu/tools/kudu-admin-test.cc
A src/kudu/tools/rebalance-test.cc
A src/kudu/tools/rebalance.cc
A src/kudu/tools/rebalance.h
M src/kudu/tools/tool_action_cluster.cc
M src/kudu/tools/tool_action_common.cc
M src/kudu/tools/tool_action_common.h
M src/kudu/tools/tool_action_tablet.cc
A src/kudu/tools/tool_replica_util.cc
A src/kudu/tools/tool_replica_util.h
11 files changed, 2,095 insertions(+), 416 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/99/10399/9
-- 
To view, visit http://gerrit.cloudera.org:8080/10399
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I269ea1dcb0b528ad9f03308bac6b8769e2141238
Gerrit-Change-Number: 10399
Gerrit-PatchSet: 9
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>

[kudu-CR] [tools] rebalancer in the kudu CLI tool

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/10399 )

Change subject: [tools] rebalancer in the kudu CLI tool
......................................................................


Patch Set 11: Verified+1

Unrelated flake in MasterFailoverTest.TestMasterPermanentFailure


-- 
To view, visit http://gerrit.cloudera.org:8080/10399
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I269ea1dcb0b528ad9f03308bac6b8769e2141238
Gerrit-Change-Number: 10399
Gerrit-PatchSet: 11
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Tue, 29 May 2018 16:07:34 +0000
Gerrit-HasComments: No

[kudu-CR] [tools] rebalancer in the kudu CLI tool

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has removed Kudu Jenkins from this change.  ( http://gerrit.cloudera.org:8080/10399 )

Change subject: [tools] rebalancer in the kudu CLI tool
......................................................................


Removed reviewer Kudu Jenkins with the following votes:

* Verified-1 by Kudu Jenkins (120)
-- 
To view, visit http://gerrit.cloudera.org:8080/10399
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: deleteReviewer
Gerrit-Change-Id: I269ea1dcb0b528ad9f03308bac6b8769e2141238
Gerrit-Change-Number: 10399
Gerrit-PatchSet: 20
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>

[kudu-CR] [tools] rebalancer in the kudu CLI tool

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Hello Will Berkeley, Mike Percy, Kudu Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/10399

to look at the new patch set (#16).

Change subject: [tools] rebalancer in the kudu CLI tool
......................................................................

[tools] rebalancer in the kudu CLI tool

Introduced rebalancing tool as a part of the Kudu CLI tool.  To run
the rebalancer against a Kudu cluster:

  sudo -u kudu kudu cluster rebalance <master_addresses>

This changelist also contains unit tests to cover converting KsckResults
result into ClusterBalanceInfo and integration tests to verify
the functionality of the rebalancer as a part of the Kudu CLI tool.

The code was tested against the 12-node cluster running the recent
Kudu 1.7 (3-4-3 replica management scheme) with about 12TB of data total
and a cluster running Kudu 1.4 (3-2-3 replica management scheme).

Change-Id: I269ea1dcb0b528ad9f03308bac6b8769e2141238
---
M src/kudu/client/client.cc
M src/kudu/tools/CMakeLists.txt
M src/kudu/tools/kudu-admin-test.cc
A src/kudu/tools/rebalance-test.cc
A src/kudu/tools/rebalance.cc
A src/kudu/tools/rebalance.h
M src/kudu/tools/tool_action_cluster.cc
M src/kudu/tools/tool_action_common.cc
M src/kudu/tools/tool_action_common.h
M src/kudu/tools/tool_action_tablet.cc
A src/kudu/tools/tool_replica_util.cc
A src/kudu/tools/tool_replica_util.h
12 files changed, 2,297 insertions(+), 419 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/99/10399/16
-- 
To view, visit http://gerrit.cloudera.org:8080/10399
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I269ea1dcb0b528ad9f03308bac6b8769e2141238
Gerrit-Change-Number: 10399
Gerrit-PatchSet: 16
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>

[kudu-CR] [tools] rebalancer in the kudu CLI tool

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Hello Will Berkeley, Mike Percy, Kudu Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/10399

to look at the new patch set (#14).

Change subject: [tools] rebalancer in the kudu CLI tool
......................................................................

[tools] rebalancer in the kudu CLI tool

Introduced rebalancing tool as a part of the Kudu CLI tool.  To run
the rebalancer against a Kudu cluster:

  sudo -u kudu kudu cluster rebalance <master_addresses>

This changelist also contains unit tests to cover converting KsckResults
result into ClusterBalanceInfo and integration tests to verify
the functionality of the rebalancer as a part of the Kudu CLI tool.

The code was tested against the 12-node cluster running the recent
Kudu 1.7 (3-4-3 replica management scheme) with about 12TB of data total
and a cluster running Kudu 1.4 (3-2-3 replica management scheme).

Change-Id: I269ea1dcb0b528ad9f03308bac6b8769e2141238
---
M src/kudu/client/client.cc
M src/kudu/tools/CMakeLists.txt
M src/kudu/tools/kudu-admin-test.cc
A src/kudu/tools/rebalance-test.cc
A src/kudu/tools/rebalance.cc
A src/kudu/tools/rebalance.h
M src/kudu/tools/tool_action_cluster.cc
M src/kudu/tools/tool_action_common.cc
M src/kudu/tools/tool_action_common.h
M src/kudu/tools/tool_action_tablet.cc
A src/kudu/tools/tool_replica_util.cc
A src/kudu/tools/tool_replica_util.h
12 files changed, 2,271 insertions(+), 419 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/99/10399/14
-- 
To view, visit http://gerrit.cloudera.org:8080/10399
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I269ea1dcb0b528ad9f03308bac6b8769e2141238
Gerrit-Change-Number: 10399
Gerrit-PatchSet: 14
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>

[kudu-CR] WIP [tools] first draft of rebalancing in kudu CLI

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Hello Will Berkeley, Kudu Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/10399

to look at the new patch set (#7).

Change subject: WIP [tools] first draft of rebalancing in kudu CLI
......................................................................

WIP [tools] first draft of rebalancing in kudu CLI

WIP:
  * adapt to run faster in 3-2-3 re-replication scheme
  * test against older cluster using 3-2-3 re-replication scheme
  * needs integration tests with a la ksck-remote
  * needs a bit of restructuring and clean-up

The code was tested against the flash cluster.

Change-Id: I269ea1dcb0b528ad9f03308bac6b8769e2141238
---
M src/kudu/tools/CMakeLists.txt
A src/kudu/tools/rebalance-test.cc
A src/kudu/tools/rebalance.cc
A src/kudu/tools/rebalance.h
M src/kudu/tools/tool_action_cluster.cc
M src/kudu/tools/tool_action_tablet.cc
6 files changed, 1,371 insertions(+), 101 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/99/10399/7
-- 
To view, visit http://gerrit.cloudera.org:8080/10399
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I269ea1dcb0b528ad9f03308bac6b8769e2141238
Gerrit-Change-Number: 10399
Gerrit-PatchSet: 7
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>

[kudu-CR] WIP [tools] first draft of rebalancing in kudu CLI

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/10399 )

Change subject: WIP [tools] first draft of rebalancing in kudu CLI
......................................................................


Patch Set 5:

> Uploaded patch set 5.

Will, I haven't addressed some of your comments yet.  However, I updated the patch to include the rebalancing logic which works pretty good for the flash cluster.  That's just one alternative.  I have another, pure ksck-based rebalancing approach in the making.

I think I'll address the rest of your feedback tomorrow morning.

 > Uploaded patch set 5.


-- 
To view, visit http://gerrit.cloudera.org:8080/10399
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I269ea1dcb0b528ad9f03308bac6b8769e2141238
Gerrit-Change-Number: 10399
Gerrit-PatchSet: 5
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Mon, 21 May 2018 05:00:01 +0000
Gerrit-HasComments: No

[kudu-CR] [tools] rebalancer in the kudu CLI tool

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/10399 )

Change subject: [tools] rebalancer in the kudu CLI tool
......................................................................


Patch Set 23:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/10399/23/src/kudu/tools/rebalancer.cc
File src/kudu/tools/rebalancer.cc:

http://gerrit.cloudera.org:8080/#/c/10399/23/src/kudu/tools/rebalancer.cc@127
PS23, Line 127: ave
> nit: avg
Done


http://gerrit.cloudera.org:8080/#/c/10399/23/src/kudu/tools/rebalancer.cc@170
PS23, Line 170: ave
> nit: avg
Done


http://gerrit.cloudera.org:8080/#/c/10399/23/src/kudu/tools/tool_replica_util.h
File src/kudu/tools/tool_replica_util.h:

http://gerrit.cloudera.org:8080/#/c/10399/23/src/kudu/tools/tool_replica_util.h@84
PS23, Line 84: // The
> nit: funny line wrap.
Done


http://gerrit.cloudera.org:8080/#/c/10399/23/src/kudu/tools/tool_replica_util.cc
File src/kudu/tools/tool_replica_util.cc:

http://gerrit.cloudera.org:8080/#/c/10399/23/src/kudu/tools/tool_replica_util.cc@195
PS23, Line 195: IsMoveComplete
> There's one funny thing about this function, which is that in 3-4-3 mode it
Indeed.  I renamed this function into CheckCompleteMove() and added corresponding comment.


http://gerrit.cloudera.org:8080/#/c/10399/23/src/kudu/tools/tool_replica_util.cc@360
PS23, Line 360:  
> nit: extra space.
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/10399
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I269ea1dcb0b528ad9f03308bac6b8769e2141238
Gerrit-Change-Number: 10399
Gerrit-PatchSet: 23
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Thu, 07 Jun 2018 21:11:51 +0000
Gerrit-HasComments: Yes

[kudu-CR] [tools] rebalancer in the kudu CLI tool

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has removed Kudu Jenkins from this change.  ( http://gerrit.cloudera.org:8080/10399 )

Change subject: [tools] rebalancer in the kudu CLI tool
......................................................................


Removed reviewer Kudu Jenkins with the following votes:

* Verified-1 by Kudu Jenkins (120)
-- 
To view, visit http://gerrit.cloudera.org:8080/10399
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: deleteReviewer
Gerrit-Change-Id: I269ea1dcb0b528ad9f03308bac6b8769e2141238
Gerrit-Change-Number: 10399
Gerrit-PatchSet: 23
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>

[kudu-CR] WIP [tools] first draft of rebalancing in kudu CLI

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/10399 )

Change subject: WIP [tools] first draft of rebalancing in kudu CLI
......................................................................


Patch Set 7:

(11 comments)

http://gerrit.cloudera.org:8080/#/c/10399/7/src/kudu/tools/rebalance-test.cc
File src/kudu/tools/rebalance-test.cc:

http://gerrit.cloudera.org:8080/#/c/10399/7/src/kudu/tools/rebalance-test.cc@277
PS7, Line 277: ARRAYSIZE
> nit: From reading the comment at kudu/gutil/macros.h:116, we should prefer 
Done


http://gerrit.cloudera.org:8080/#/c/10399/7/src/kudu/tools/rebalance.h
File src/kudu/tools/rebalance.h:

http://gerrit.cloudera.org:8080/#/c/10399/7/src/kudu/tools/rebalance.h@38
PS7, Line 38: ReplicaMoveInfo
> Can you relate this struct to TableReplicaMove in rebalance-algo.{cc, h}? I
Done


http://gerrit.cloudera.org:8080/#/c/10399/7/src/kudu/tools/rebalance.h@44
PS7, Line 44: MovesInProgress
> Could you add a comment explaining what the string key is?
Done


http://gerrit.cloudera.org:8080/#/c/10399/7/src/kudu/tools/rebalance.h@54
PS7, Line 54: if
            :   // non-empty, that's the only tables which will be rebalanced;
> nit: I think this clause is redundant.
Done


http://gerrit.cloudera.org:8080/#/c/10399/7/src/kudu/tools/rebalance.h@56
PS7, Line 56: whole cluster
> nit: Can you be extra clear and say this means that all tables will be bala
Done


http://gerrit.cloudera.org:8080/#/c/10399/7/src/kudu/tools/rebalance.h@63
PS7, Line 63: rebalancing
> rebalancer
Done


http://gerrit.cloudera.org:8080/#/c/10399/7/src/kudu/tools/rebalance.h@67
PS7, Line 67: replica_move_infos
> Can you comment on what the parameters are for?
Done


http://gerrit.cloudera.org:8080/#/c/10399/7/src/kudu/tools/rebalance.h@70
PS7, Line 70: std::ostream*
> nit: While it goes against the GSG's rules, I think it's standard to use a 
Done


http://gerrit.cloudera.org:8080/#/c/10399/7/src/kudu/tools/rebalance.cc
File src/kudu/tools/rebalance.cc:

http://gerrit.cloudera.org:8080/#/c/10399/7/src/kudu/tools/rebalance.cc@48
PS7, Line 48: rebalance_move_single_replicas
> This is supposed to be a temporary flag, right?
Right.

I think it's worth to keep this around while there are Kudu clusters with older code where KUDU-2443 is not yet fixed.

This flag would not appear unless KUDU-2443 existed.


http://gerrit.cloudera.org:8080/#/c/10399/7/src/kudu/tools/tool_action_cluster.cc
File src/kudu/tools/tool_action_cluster.cc:

http://gerrit.cloudera.org:8080/#/c/10399/7/src/kudu/tools/tool_action_cluster.cc@145
PS7, Line 145: const vector<string> master_addresses = Split(master_addresses_str, ",");
> nit: Swap this line and the one before it.
Done


http://gerrit.cloudera.org:8080/#/c/10399/7/src/kudu/tools/tool_action_tablet.cc
File src/kudu/tools/tool_action_tablet.cc:

http://gerrit.cloudera.org:8080/#/c/10399/7/src/kudu/tools/tool_action_tablet.cc@98
PS7, Line 98: MoveReplicaImpl
> Since this is now shared by the rebalancer and the move_tablet tool, I thin
The tool_action_common.cc does not contain any tablet-related code.  I'm not sure we want to move it there.

I think the fact that it used both in tablet and cluster actions does not necessarily means we want it in tool_action_common.cc.  Maybe, we should separate it into a file which is common between action_tablet and action_cluster?



-- 
To view, visit http://gerrit.cloudera.org:8080/10399
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I269ea1dcb0b528ad9f03308bac6b8769e2141238
Gerrit-Change-Number: 10399
Gerrit-PatchSet: 7
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Sat, 26 May 2018 07:27:58 +0000
Gerrit-HasComments: Yes

[kudu-CR] [tools] rebalancer in the kudu CLI tool

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Hello Will Berkeley, Mike Percy, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/10399

to look at the new patch set (#18).

Change subject: [tools] rebalancer in the kudu CLI tool
......................................................................

[tools] rebalancer in the kudu CLI tool

Introduced rebalancing tool as a part of the Kudu CLI tool.  To run
the rebalancer against a Kudu cluster:

  sudo -u kudu kudu cluster rebalance <master_addresses>

This changelist also contains unit tests to cover converting KsckResults
result into ClusterBalanceInfo and integration tests to verify
the functionality of the rebalancer as a part of the Kudu CLI tool.

The code was tested against the 12-node cluster running the recent
Kudu 1.7 (3-4-3 replica management scheme) with about 12TB of data total
and a cluster running Kudu 1.4 (3-2-3 replica management scheme).

Change-Id: I269ea1dcb0b528ad9f03308bac6b8769e2141238
---
M src/kudu/client/client.cc
M src/kudu/tools/CMakeLists.txt
M src/kudu/tools/kudu-admin-test.cc
A src/kudu/tools/rebalance-test.cc
A src/kudu/tools/rebalancer.cc
A src/kudu/tools/rebalancer.h
M src/kudu/tools/tool_action_cluster.cc
M src/kudu/tools/tool_action_common.cc
M src/kudu/tools/tool_action_common.h
M src/kudu/tools/tool_action_tablet.cc
A src/kudu/tools/tool_replica_util.cc
A src/kudu/tools/tool_replica_util.h
12 files changed, 2,505 insertions(+), 419 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/99/10399/18
-- 
To view, visit http://gerrit.cloudera.org:8080/10399
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I269ea1dcb0b528ad9f03308bac6b8769e2141238
Gerrit-Change-Number: 10399
Gerrit-PatchSet: 18
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>

[kudu-CR] [tools] rebalancer in the kudu CLI tool

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Hello Will Berkeley, Mike Percy, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/10399

to look at the new patch set (#24).

Change subject: [tools] rebalancer in the kudu CLI tool
......................................................................

[tools] rebalancer in the kudu CLI tool

Introduced rebalancing tool as a part of the Kudu CLI tool.  To run
the rebalancer against a Kudu cluster:

  sudo -u kudu kudu cluster rebalance <master_addresses>

This changelist also contains unit tests to cover converting KsckResults
into ClusterBalanceInfo and integration tests to verify
the functionality of the rebalancer as a part of the Kudu CLI tool.

The code was tested against a 12-node cluster running the recent
Kudu 1.7 (3-4-3 replica management scheme) with about 12TB of data total
and a cluster running Kudu 1.4 (3-2-3 replica management scheme).

This tool works against Kudu clusters of version 1.4 and above.
It does not work against Kudu version 1.3 and earlier.

Change-Id: I269ea1dcb0b528ad9f03308bac6b8769e2141238
---
M src/kudu/client/client.cc
M src/kudu/tools/CMakeLists.txt
M src/kudu/tools/kudu-admin-test.cc
A src/kudu/tools/rebalance-test.cc
M src/kudu/tools/rebalance_algo-test.cc
M src/kudu/tools/rebalance_algo.cc
M src/kudu/tools/rebalance_algo.h
A src/kudu/tools/rebalancer.cc
A src/kudu/tools/rebalancer.h
M src/kudu/tools/tool_action_cluster.cc
M src/kudu/tools/tool_action_common.cc
M src/kudu/tools/tool_action_common.h
M src/kudu/tools/tool_action_tablet.cc
A src/kudu/tools/tool_replica_util.cc
A src/kudu/tools/tool_replica_util.h
15 files changed, 2,594 insertions(+), 434 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/99/10399/24
-- 
To view, visit http://gerrit.cloudera.org:8080/10399
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I269ea1dcb0b528ad9f03308bac6b8769e2141238
Gerrit-Change-Number: 10399
Gerrit-PatchSet: 24
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>

[kudu-CR] [tools] rebalancer in the kudu CLI tool

Posted by "Will Berkeley (Code Review)" <ge...@cloudera.org>.
Will Berkeley has posted comments on this change. ( http://gerrit.cloudera.org:8080/10399 )

Change subject: [tools] rebalancer in the kudu CLI tool
......................................................................


Patch Set 23:

(5 comments)

One substantive comment and I think this is good to go.

http://gerrit.cloudera.org:8080/#/c/10399/23/src/kudu/tools/rebalancer.cc
File src/kudu/tools/rebalancer.cc:

http://gerrit.cloudera.org:8080/#/c/10399/23/src/kudu/tools/rebalancer.cc@127
PS23, Line 127: ave
nit: avg


http://gerrit.cloudera.org:8080/#/c/10399/23/src/kudu/tools/rebalancer.cc@170
PS23, Line 170: ave
nit: avg


http://gerrit.cloudera.org:8080/#/c/10399/23/src/kudu/tools/tool_replica_util.h
File src/kudu/tools/tool_replica_util.h:

http://gerrit.cloudera.org:8080/#/c/10399/23/src/kudu/tools/tool_replica_util.h@84
PS23, Line 84: // The
nit: funny line wrap.


http://gerrit.cloudera.org:8080/#/c/10399/23/src/kudu/tools/tool_replica_util.cc
File src/kudu/tools/tool_replica_util.cc:

http://gerrit.cloudera.org:8080/#/c/10399/23/src/kudu/tools/tool_replica_util.cc@195
PS23, Line 195: IsMoveComplete
There's one funny thing about this function, which is that in 3-4-3 mode it merely observes cluster state change, while in 3-2-3 mode it actually kicks off part of the mode. I think it either needs a name change or some kind of clear indication it's not just read-only w.r.t. the cluster when moving in 3-2-3 mode.


http://gerrit.cloudera.org:8080/#/c/10399/23/src/kudu/tools/tool_replica_util.cc@360
PS23, Line 360:  
nit: extra space.



-- 
To view, visit http://gerrit.cloudera.org:8080/10399
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I269ea1dcb0b528ad9f03308bac6b8769e2141238
Gerrit-Change-Number: 10399
Gerrit-PatchSet: 23
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Thu, 07 Jun 2018 16:41:39 +0000
Gerrit-HasComments: Yes

[kudu-CR] WIP [tools] first draft of rebalancing in kudu CLI

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has removed Kudu Jenkins from this change.  ( http://gerrit.cloudera.org:8080/10399 )

Change subject: WIP [tools] first draft of rebalancing in kudu CLI
......................................................................


Removed reviewer Kudu Jenkins with the following votes:

* Verified-1 by Kudu Jenkins (120)
-- 
To view, visit http://gerrit.cloudera.org:8080/10399
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: deleteReviewer
Gerrit-Change-Id: I269ea1dcb0b528ad9f03308bac6b8769e2141238
Gerrit-Change-Number: 10399
Gerrit-PatchSet: 1
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>

[kudu-CR] [tools] rebalancer in the kudu CLI tool

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/10399 )

Change subject: [tools] rebalancer in the kudu CLI tool
......................................................................


Patch Set 17:

(17 comments)

http://gerrit.cloudera.org:8080/#/c/10399/15/src/kudu/tools/rebalance.h
File src/kudu/tools/rebalance.h:

PS15: 
> nit: consider naming this rebalancer.h since it has the Rebalancer class
Done


http://gerrit.cloudera.org:8080/#/c/10399/15/src/kudu/tools/rebalance.h@129
PS15, Line 129: 
> nit: using the term "step" here is confusing because it sounds like it's fo
Done


http://gerrit.cloudera.org:8080/#/c/10399/15/src/kudu/tools/rebalance.h@152
PS15, Line 152: 
> mutable ref violates style guide
Done


http://gerrit.cloudera.org:8080/#/c/10399/15/src/kudu/tools/rebalance.h@171
PS15, Line 171: 
> Needs doc, including the API contract
Done


http://gerrit.cloudera.org:8080/#/c/10399/15/src/kudu/tools/rebalance.h@181
PS15, Line 181: 
> This comment is very vague. Please add info about key -> value for the cont
Done


http://gerrit.cloudera.org:8080/#/c/10399/15/src/kudu/tools/rebalance.cc
File src/kudu/tools/rebalance.cc:

http://gerrit.cloudera.org:8080/#/c/10399/15/src/kudu/tools/rebalance.cc@53
PS15, Line 53: 
> true seems like a more desirable default in a 3-4-3 cluster.
We discussed it offline.  Yep, that will be a good idea for future releases, but for now it's safer to keep it 'false' by default because of KUDU-2443.  Another option would be 'auto', where the effective value for the flag is determined based on the version of Kudu master/tservers (the idea is to enable this for versions not affected by KUDU-2443).


http://gerrit.cloudera.org:8080/#/c/10399/15/src/kudu/tools/rebalance.cc@56
PS15, Line 56: 
> can we just make this a --verbose or -v argument to the kudu cluster rebala
I think --verbose flag is too generic and also it's already registered (flags are in global namespace, right)?


http://gerrit.cloudera.org:8080/#/c/10399/15/src/kudu/tools/rebalance.cc@238
PS15, Line 238: 
> A comment here would be nice to indicate when we want to clear scheduled mo
Done


http://gerrit.cloudera.org:8080/#/c/10399/15/src/kudu/tools/rebalance.cc@239
PS15, Line 239: 
> typo
Done


http://gerrit.cloudera.org:8080/#/c/10399/15/src/kudu/tools/rebalance.cc@250
PS15, Line 250: 
> nit: DCHECK_EQ("", FindNonUniqueTablet(replica_moves)) would print the offe
I updated this part, not it works a bit different.


http://gerrit.cloudera.org:8080/#/c/10399/15/src/kudu/tools/rebalance.cc@479
PS15, Line 479: 
> I think RunStep() might be better named GetNextMoveSet() or NextMoves() or 
It's a good idea.  I named it GetNextMoves().


http://gerrit.cloudera.org:8080/#/c/10399/15/src/kudu/tools/rebalance.cc@479
PS15, Line 479: 
> why pass pending_moves into this function instead of just directly referenc
That's because I was thinking about moving the helper structures into a separate sub-class, and the instance of that sub-class would be kept in the code of the Rebalancer::Run() function.


http://gerrit.cloudera.org:8080/#/c/10399/15/src/kudu/tools/rebalance.cc@559
PS15, Line 559: 
> Can you clarify whether this means we prefer to move other replicas before 
Done


http://gerrit.cloudera.org:8080/#/c/10399/15/src/kudu/tools/rebalance.cc@660
PS15, Line 660: 
              : 
> I'm having trouble understanding why you are adding i to the set here.
That's for easier addressing suggested moves by the algorithm, by involved tablet servers.  It's used in the FindNextMove() method.  I'll add more comments on that: it looks pretty obscure, yes.


http://gerrit.cloudera.org:8080/#/c/10399/15/src/kudu/tools/tool_action_cluster.cc
File src/kudu/tools/tool_action_cluster.cc:

http://gerrit.cloudera.org:8080/#/c/10399/15/src/kudu/tools/tool_action_cluster.cc@212
PS15, Line 212: nfig move_
> nit: consider removing the rebalance_ prefix from these command-line flags,
Done


http://gerrit.cloudera.org:8080/#/c/10399/15/src/kudu/tools/tool_replica_util.h
File src/kudu/tools/tool_replica_util.h:

http://gerrit.cloudera.org:8080/#/c/10399/15/src/kudu/tools/tool_replica_util.h@56
PS15, Line 56: 
> nit: s/Make/Request/
Done


http://gerrit.cloudera.org:8080/#/c/10399/15/src/kudu/tools/tool_replica_util.h@78
PS15, Line 78: // 'tablet_id'. Neither 'is_complete' nor 'completion_status' output parameter
> nit: doc the contract for completion_status and Status return from this fun
The 'is_complete' parameter specifies whether the operation is pending or completed.
The 'completion_status' is to specify what kind of completion is that: a success or a failure.



-- 
To view, visit http://gerrit.cloudera.org:8080/10399
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I269ea1dcb0b528ad9f03308bac6b8769e2141238
Gerrit-Change-Number: 10399
Gerrit-PatchSet: 17
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Sat, 02 Jun 2018 04:56:21 +0000
Gerrit-HasComments: Yes

[kudu-CR] WIP [tools] first draft of rebalancing in kudu CLI

Posted by "Will Berkeley (Code Review)" <ge...@cloudera.org>.
Will Berkeley has posted comments on this change. ( http://gerrit.cloudera.org:8080/10399 )

Change subject: WIP [tools] first draft of rebalancing in kudu CLI
......................................................................


Patch Set 7:

(12 comments)

I know the code has changed a lot and is still changing so I didn't dig deep into it. I just made some nits.

On a higher level the logic seems spread out across tool_action_cluster, tool_action_tablet, and rebalancer.cc. I think the tool would benefit from a more cohesive organization. I think the responsibilities should be split by file as follows:

1. tool_action_cluster: straightforward code to parser arguments, print stats, and start the rebalancer.
2. rebalancer.cc: main "administrative" rebalancing logic- turning algo moves into replica moves, evaluating when a step is done/when to schedule more moves, deciding when to terminate due to an error or timeout or rebalancing is complete, reconciling cluster new cluster state with old before feeding to the algo, etc.
3. tool_action_common: refactored move code that handles the nitty gritty of moves, with an api like
StartMove()
IsMoveDone()
WaitForMoveToFinish()
It'll be used both by kudu tablet change_config move_replica and by kudu cluster rebalance.

http://gerrit.cloudera.org:8080/#/c/10399/7/src/kudu/tools/rebalance-test.cc
File src/kudu/tools/rebalance-test.cc:

http://gerrit.cloudera.org:8080/#/c/10399/7/src/kudu/tools/rebalance-test.cc@277
PS7, Line 277: ARRAYSIZE
nit: From reading the comment at kudu/gutil/macros.h:116, we should prefer arraysize(). We've got a mix of both throughout our codebase.


http://gerrit.cloudera.org:8080/#/c/10399/7/src/kudu/tools/rebalance.h
File src/kudu/tools/rebalance.h:

http://gerrit.cloudera.org:8080/#/c/10399/7/src/kudu/tools/rebalance.h@38
PS7, Line 38: ReplicaMoveInfo
Can you relate this struct to TableReplicaMove in rebalance-algo.{cc, h}? I think it's important subtext for other parts of this file.


http://gerrit.cloudera.org:8080/#/c/10399/7/src/kudu/tools/rebalance.h@44
PS7, Line 44: MovesInProgress
Could you add a comment explaining what the string key is?


http://gerrit.cloudera.org:8080/#/c/10399/7/src/kudu/tools/rebalance.h@54
PS7, Line 54: if
            :   // non-empty, that's the only tables which will be rebalanced;
nit: I think this clause is redundant.


http://gerrit.cloudera.org:8080/#/c/10399/7/src/kudu/tools/rebalance.h@56
PS7, Line 56: whole cluster
nit: Can you be extra clear and say this means that all tables will be balanced, and the cluster as a whole will be balanced.


http://gerrit.cloudera.org:8080/#/c/10399/7/src/kudu/tools/rebalance.h@63
PS7, Line 63: rebalancing
rebalancer


http://gerrit.cloudera.org:8080/#/c/10399/7/src/kudu/tools/rebalance.h@67
PS7, Line 67: replica_move_infos
Can you comment on what the parameters are for?


http://gerrit.cloudera.org:8080/#/c/10399/7/src/kudu/tools/rebalance.h@70
PS7, Line 70: std::ostream*
nit: While it goes against the GSG's rules, I think it's standard to use a ref to an ostream. The ksck stuff uses ostream& IIRC.


http://gerrit.cloudera.org:8080/#/c/10399/7/src/kudu/tools/rebalance.cc
File src/kudu/tools/rebalance.cc:

http://gerrit.cloudera.org:8080/#/c/10399/7/src/kudu/tools/rebalance.cc@48
PS7, Line 48: rebalance_move_single_replicas
This is supposed to be a temporary flag, right?


http://gerrit.cloudera.org:8080/#/c/10399/7/src/kudu/tools/tool_action_cluster.cc
File src/kudu/tools/tool_action_cluster.cc:

http://gerrit.cloudera.org:8080/#/c/10399/7/src/kudu/tools/tool_action_cluster.cc@141
PS7, Line 141: RunRebalance
At the very least, this function looks too long. Can we break it up?

Also, in general I'd like to see more logic put in the Rebalancer class and less in the tool function...ideally the tool would just PrintStats, start the Rebalancer, and return when rebalancing is complete and PrintStats again.

This would make moving the Rebalancing code into the master very easy later on, so at that time we can get started integrating it with re-replication.


http://gerrit.cloudera.org:8080/#/c/10399/7/src/kudu/tools/tool_action_cluster.cc@145
PS7, Line 145: const vector<string> master_addresses = Split(master_addresses_str, ",");
nit: Swap this line and the one before it.


http://gerrit.cloudera.org:8080/#/c/10399/7/src/kudu/tools/tool_action_tablet.cc
File src/kudu/tools/tool_action_tablet.cc:

http://gerrit.cloudera.org:8080/#/c/10399/7/src/kudu/tools/tool_action_tablet.cc@98
PS7, Line 98: MoveReplicaImpl
Since this is now shared by the rebalancer and the move_tablet tool, I think it belongs in tool_action_common.



-- 
To view, visit http://gerrit.cloudera.org:8080/10399
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I269ea1dcb0b528ad9f03308bac6b8769e2141238
Gerrit-Change-Number: 10399
Gerrit-PatchSet: 7
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Thu, 24 May 2018 03:42:06 +0000
Gerrit-HasComments: Yes

[kudu-CR] [tools] rebalancer in the kudu CLI tool

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/10399 )

Change subject: [tools] rebalancer in the kudu CLI tool
......................................................................

[tools] rebalancer in the kudu CLI tool

Introduced rebalancing tool as a part of the Kudu CLI tool.  To run
the rebalancer against a Kudu cluster:

  sudo -u kudu kudu cluster rebalance <master_addresses>

This changelist also contains unit tests to cover converting KsckResults
into ClusterBalanceInfo and integration tests to verify
the functionality of the rebalancer as a part of the Kudu CLI tool.

The code was tested against a 12-node cluster running the recent
Kudu 1.7 (3-4-3 replica management scheme) with about 12TB of data total
and a cluster running Kudu 1.4 (3-2-3 replica management scheme).

This tool works against Kudu clusters of version 1.4 and above.
It does not work against Kudu version 1.3 and earlier.

Change-Id: I269ea1dcb0b528ad9f03308bac6b8769e2141238
Reviewed-on: http://gerrit.cloudera.org:8080/10399
Tested-by: Kudu Jenkins
Reviewed-by: Will Berkeley <wd...@gmail.com>
---
M src/kudu/client/client.cc
M src/kudu/tools/CMakeLists.txt
M src/kudu/tools/kudu-admin-test.cc
A src/kudu/tools/rebalance-test.cc
M src/kudu/tools/rebalance_algo-test.cc
M src/kudu/tools/rebalance_algo.cc
M src/kudu/tools/rebalance_algo.h
A src/kudu/tools/rebalancer.cc
A src/kudu/tools/rebalancer.h
M src/kudu/tools/tool_action_cluster.cc
M src/kudu/tools/tool_action_common.cc
M src/kudu/tools/tool_action_common.h
M src/kudu/tools/tool_action_tablet.cc
A src/kudu/tools/tool_replica_util.cc
A src/kudu/tools/tool_replica_util.h
15 files changed, 2,594 insertions(+), 434 deletions(-)

Approvals:
  Kudu Jenkins: Verified
  Will Berkeley: Looks good to me, approved

-- 
To view, visit http://gerrit.cloudera.org:8080/10399
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I269ea1dcb0b528ad9f03308bac6b8769e2141238
Gerrit-Change-Number: 10399
Gerrit-PatchSet: 25
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>

[kudu-CR] [tools] rebalancer in the kudu CLI tool

Posted by "Will Berkeley (Code Review)" <ge...@cloudera.org>.
Will Berkeley has posted comments on this change. ( http://gerrit.cloudera.org:8080/10399 )

Change subject: [tools] rebalancer in the kudu CLI tool
......................................................................


Patch Set 11:

(69 comments)

I didn't have time to through everything thoroughly but I'm posting some comments so far. They are mostly wording in comments.

My one overall suggestion is to try to break things up a little more. Some of the core functions are beefy and it's hard to review them because there is a lot to keep track of.

I'll try and produce more feedback on the trickier sections later tonight.

http://gerrit.cloudera.org:8080/#/c/10399/11//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/10399/11//COMMIT_MSG@12
PS11, Line 12: <cluster_rpc_endpoints>
It's more common just to say <master addresses>.


http://gerrit.cloudera.org:8080/#/c/10399/11//COMMIT_MSG@18
PS11, Line 18: the flash cluster
"the flash cluster" won't mean much to most Kudu community members. Maybe just quote how many nodes and how much data the cluster had.


http://gerrit.cloudera.org:8080/#/c/10399/11//COMMIT_MSG@19
PS11, Line 19: CDH5.12.x
Mention this vendor's Kudu is based on upstream 1.4.


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/CMakeLists.txt
File src/kudu/tools/CMakeLists.txt:

http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/CMakeLists.txt@87
PS11, Line 87: rebalance_algo.cc
             :   rebalance.cc
(Puts Adar hat on) nit: Need to swap these two for strict ascii sort order.


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/CMakeLists.txt@92
PS11, Line 92: kudu_common
             :   ksck
nit: Order.


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/CMakeLists.txt@173
PS11, Line 173: ADD_KUDU_TEST(rebalance_algo-test)
              : ADD_KUDU_TEST(rebalance-test)
nit: Swap for correct order.


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/kudu-admin-test.cc
File src/kudu/tools/kudu-admin-test.cc:

http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/kudu-admin-test.cc@1320
PS11, Line 1320: at
Extra word.


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/kudu-admin-test.cc@1320
PS11, Line 1320: the rebalancing
nit: the rebalancer


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/kudu-admin-test.cc@1323
PS11, Line 1323: public ::testing::WithParamInterface<bool>
Can we can reuse the enum class Kudu1097?


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/kudu-admin-test.cc@1330
PS11, Line 1330: constexpr int kRepFactor = 3;
This doesn't look necessary.


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/kudu-admin-test.cc@1338
PS11, Line 1338: FLAGS_num_tablet_servers = kNumTservers;
Can set to 3 directly if we shut down the first tablet server rather than a random one.


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/kudu-admin-test.cc@1339
PS11, Line 1339: FLAGS_num_replicas = kRepFactor;
Isn't this redundant?


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/kudu-admin-test.cc@1343
PS11, Line 1343: HostPort ts_host_port;
               :   {
               :     Random r(SeedRandom());
               :     auto idx = r.Next() % kNumTservers;
               :     auto* ts = cluster_->tablet_server(idx);
               :     ts_host_port = ts->bound_rpc_hostport();
               :     ts->Shutdown();
               :   }
It's fine to shut down the TS at idx = 0 instead of picking a random one.


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/kudu-admin-test.cc@1365
PS11, Line 1365: A test to verify that rebalancing works for both 3-4-3 and 3-2-3 replica
Can you mention the test is also covering different replication factors?


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/kudu-admin-test.cc@1382
PS11, Line 1382: int
Why not auto?


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/kudu-admin-test.cc@1412
PS11, Line 1412: 3
kRepFactor + 1


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/kudu-admin-test.cc@1427
PS11, Line 1427: Substitute("--table_num_replicas=$0", kRepFactor),
This shouldn't be needed because the table already exists.


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/kudu-admin-test.cc@1477
PS11, Line 1477: NO_FATALS(cluster_->AssertNoCrashes());
Can we run the ClusterVerifier as well?


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.h
File src/kudu/tools/rebalance.h:

http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.h@50
PS11, Line 50: (effectively Kudu cluster endpoints)
Seems unnecessary.


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.h@53
PS11, Line 53: Tables to balance (names).
nit: Names of tables to balance.


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.h@57
PS11, Line 57: at
nit: "on", or maybe "involving".


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.h@58
PS11, Line 58:  
the


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.h@58
PS11, Line 58: at
nit: "on a" or "involving a".


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.h@58
PS11, Line 58: mean
means a


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.h@59
PS11, Line 59: is at the
is located on the


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.h@65
PS11, Line 65: // Whether to move replicas of tablets with replication factor of one.
Why is this a special case?


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.h@69
PS11, Line 69: // Information on particular tablet replica to move. The high-level
             :   // specification of rebalancing moves described by istance of TableReplicaMove
             :   // might be represented by instance of ReplicaMoveInfo, and in most cases
             :   // there will be multiple alternatives represented by ReplicaMoveInfo to
             :   // correspond TableReplicaMove.
I'd simplify a bit and say something like "Represents a concrete move of a replica from one tablet server to another. Formed logically from a TableReplicaMove by specifying a tablet for the move.


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.h@74
PS11, Line 74: ReplicaMoveInfo
I think it'd be better just to call it ReplicaMove.


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.h@83
PS11, Line 83: TIME_IS_UP,
nit: Does it make sense just to call it TIMED_OUT, a more typical descriptor?


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.h@89
PS11, Line 89: // Constructor takes as a parameter configuration for the rebalancer.
nit: redundant.


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.h@95
PS11, Line 95: when
nit: once


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.h@96
PS11, Line 96: an error occurred
if an error occurs


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.h@98
PS11, Line 98: Run
Can 'status' be null?


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.h@101
PS11, Line 101: Builer
Builder


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.h@108
PS11, Line 108: moves_in_progress
Note what this parameter is for.


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.h@115
PS11, Line 115: currently
              :   // pending replica moves. Those might left in such state from the previous
              :   // rebalancing step, and they will be completed eventually
              :   // (either succeeding or failing).
Just to clean it up a bit: "on pending moves, which may be scheduled or running after from the previous step ends."


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.h@120
PS11, Line 120: the
remove


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.h@120
PS11, Line 120: info
infos


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.h@121
PS11, Line 121: where the container
Replace with "which".


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.h@129
PS11, Line 129: If no suitable replicas are found,
              :   // 'tablet_ids' will be empty
Does this imply a NotFound or other non-OK status will be returned as well?


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.cc
File src/kudu/tools/rebalance.cc:

http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.cc@117
PS11, Line 117: l
Use L instead since this looks too much like 1 in a lot of fonts.


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.cc@128
PS11, Line 128: "Replica Count", "Value"
I'd make this "Statistic" and "Value".


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.cc@129
PS11, Line 129: Minimum", to_string(min_replica_count)});
              :     summary.AddRow({"Maximum", to_string(max_replica_count)});
              :     summary.AddRow({"Average
Then add "Replica Count" to each statistic name, e.g. "Maximum Replica Count".


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.cc@143
PS11, Line 143: {"Server ID", "Replica Count", "Server RPC end-point"}
I think "UUID", "Address", "Replica Count" is a better order.

Do you think we should sort somehow? By count?


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.cc@181
PS11, Line 181: "Table ID", "Replica Skew", "Replicas Total", "Table Name"
I think "Table Id", "Table Name", "Replica Count", "Replica Skew" makes more sense.


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.cc@184
PS11, Line 184: const auto it = table_info.find(table_id);
Shouldn't it be impossible to have a table in one map but not the other?


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.cc@204
PS11, Line 204: Run
This function is hard to digest. Any change you could break it up?


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.cc@275
PS11, Line 275: unordered_map<string, set<size_t>> src_op_indices;
              :     unordered_map<string, set<size_t>> dst_op_indices;
Should these be part of the instance's state, with the update functions as private methods?


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.cc@292
PS11, Line 292: #ifndef NDEBUG
              :     // This is to enforce the invariants which the algorithm below relies upon.
              :     // In short, it can re-order operations suggested by the high-level
              :     // algorithm. However, since RunStep() outputs only one move operation per
              :     // tablet in every batch and there is at most one replica of a tablet
              :     // per server, that's perfectly safe to do. The code below verifies that
              :     // the contract between this code and RunStep() is in force.
              :     unordered_set<string> tablets;
              :     for (const auto& move_info : replica_move_infos) {
              :       const auto& tablet_id = move_info.tablet_uuid;
              :       if (!tablets.emplace(tablet_id).second) {
              :         LOG(DFATAL) << Substitute(
              :             "$0: more than one move operations for tablet", tablet_id);
              :       }
              :     }
              : #endif // #ifndef NDEBUG
Just in the interest of shortening this function, can this be a small helper we can DCHECK or something.


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.cc@311
PS11, Line 311:     vector<size_t> op_indices_to_schedule;
              :       for (auto it = ts_per_op_count.begin(); op_indices_to_schedule.empty() &&
              :            it != ts_per_op_count.end() &&
              :            it->first < config_.max_moves_per_server; ++it) {
              :         const auto& uuid_0 = it->second;
              : 
              :         auto it_1 = it;
              :         ++it_1;
              :         for (; op_indices_to_schedule.empty() &&
              :              it_1 != ts_per_op_count.end() &&
              :              it_1->first < config_.max_moves_per_server; ++it_1) {
              :           const auto& uuid_1 = it_1->second;
              : 
              :           // Check for available operations where uuid_0, uuid_1 would be
              :           // source or destination servers correspondingly.
              :           {
              :             const auto it_src = src_op_indices.find(uuid_0);
              :             const auto it_dst = dst_op_indices.find(uuid_1);
              :             if (it_src != src_op_indices.end() &&
              :                 it_dst != dst_op_indices.end()) {
              :               set_intersection(it_src->second.begin(), it_src->second.end(),
              :                                it_dst->second.begin(), it_dst->second.end(),
              :                                back_inserter(op_indices_to_schedule));
              :             }
              :           }
              :           {
              :             const auto it_src = src_op_indices.find(uuid_1);
              :             const auto it_dst = dst_op_indices.find(uuid_0);
              :             if (it_src != src_op_indices.end() &&
              :                 it_dst != dst_op_indices.end()) {
              :               set_intersection(it_src->second.begin(), it_src->second.end(),
              :                                it_dst->second.begin(), it_dst->second.end(),
              :                                back_inserter(op_indices_to_schedule));
              :             }
              :           }
              :         }
This section also feels like it could be factored out into its own function.


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.cc@395
PS11, Line 395: // Check for the moves in progress.
Ditto.


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.cc@473
PS11, Line 473: WRONG_SERVER_UUID
What about UNAVAILABLE servers?


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/tool_action_cluster.cc
File src/kudu/tools/tool_action_cluster.cc:

http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/tool_action_cluster.cc@74
PS11, Line 74: at
on


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/tool_action_cluster.cc@75
PS11, Line 75: ;
:, delete "where"


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/tool_action_cluster.cc@125
PS11, Line 125: master_addresses,
              :                          table_filters
If these are const, can we pass-by-value and move from them?


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/tool_action_cluster.cc@130
PS11, Line 130: std::move(cfg));
I think this is redundant as we move from 'cfg' in the initializer list portion of the constructor.


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/tool_action_cluster.cc@133
PS11, Line 133: ignore_result
Why not RETURN_NOT_OK?


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/tool_action_cluster.cc@149
PS11, Line 149: DCHECK(false
Did you mean to add a message here too? Or take this out and return a bad status based on 'msg_result_status'?


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/tool_action_cluster.cc@200
PS11, Line 200: by
for each


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/tool_replica_util.h
File src/kudu/tools/tool_replica_util.h:

PS11: 
Would you mind adding one sentence comments to these functions to explain what they do. Also it's nice if you explain what the output parameters are and whether they are allowed to be null.


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/tool_replica_util.h@52
PS11, Line 52: bool* is_3_4_3_replication = nullptr
This seems like a funny parameter to have. Does it detect if the cluster is using 3-4-3 replication? It's strange to do that when retrieving the state of a single tablet.


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/tool_replica_util.h@66
PS11, Line 66: Completed
nit: complete.


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/tool_replica_util.h@68
PS11, Line 68: client::sp::shared_ptr<client::KuduClient>
Why not a cref?


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/tool_replica_util.h@72
PS11, Line 72: completed
nit: complete.


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/tool_replica_util.cc
File src/kudu/tools/tool_replica_util.cc:

http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/tool_replica_util.cc@83
PS11, Line 83: GetLastCommittedOpId
Looks like this function also exists as GetLastOpIdForReplica in cluster_itest_util.{h,cc}. Can we reuse that?


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/tool_replica_util.cc@145
PS11, Line 145: client->default_admin_operation_timeout()
This is the only reason we need the client, so I think we should pass in the timeout as a parameter instead.


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/tool_replica_util.cc@162
PS11, Line 162: GetTablet
Aside: I had to look up this method because it surprised me that we take ownership of the KuduTablet based on the name of the method.


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/tool_replica_util.cc@180
PS11, Line 180: , 
and the


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/tool_replica_util.cc@184
PS11, Line 184: hapens
happens



-- 
To view, visit http://gerrit.cloudera.org:8080/10399
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I269ea1dcb0b528ad9f03308bac6b8769e2141238
Gerrit-Change-Number: 10399
Gerrit-PatchSet: 11
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Wed, 30 May 2018 00:25:36 +0000
Gerrit-HasComments: Yes

[kudu-CR] WIP [tools] first draft of rebalancing in kudu CLI

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Hello Will Berkeley, Kudu Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/10399

to look at the new patch set (#4).

Change subject: WIP [tools] first draft of rebalancing in kudu CLI
......................................................................

WIP [tools] first draft of rebalancing in kudu CLI

WIP:
  * needs integration tests with a la ksck-remote
  * address several TODOs
  * needs more testing against real cluster

Change-Id: I269ea1dcb0b528ad9f03308bac6b8769e2141238
---
M src/kudu/tools/CMakeLists.txt
M src/kudu/tools/rebalance-algo.cc
A src/kudu/tools/rebalance-test.cc
A src/kudu/tools/rebalance.cc
A src/kudu/tools/rebalance.h
M src/kudu/tools/tool_action_cluster.cc
M src/kudu/tools/tool_action_tablet.cc
7 files changed, 888 insertions(+), 65 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/99/10399/4
-- 
To view, visit http://gerrit.cloudera.org:8080/10399
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I269ea1dcb0b528ad9f03308bac6b8769e2141238
Gerrit-Change-Number: 10399
Gerrit-PatchSet: 4
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>

[kudu-CR] [tools] rebalancer in the kudu CLI tool

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/10399 )

Change subject: [tools] rebalancer in the kudu CLI tool
......................................................................


Patch Set 20: Verified+1

unrelated flake in org.apache.kudu.spark.tools.TestImportExportFiles


-- 
To view, visit http://gerrit.cloudera.org:8080/10399
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I269ea1dcb0b528ad9f03308bac6b8769e2141238
Gerrit-Change-Number: 10399
Gerrit-PatchSet: 20
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Tue, 05 Jun 2018 15:34:03 +0000
Gerrit-HasComments: No

[kudu-CR] [tools] rebalancer in the kudu CLI tool

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Hello Will Berkeley, Mike Percy, Kudu Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/10399

to look at the new patch set (#19).

Change subject: [tools] rebalancer in the kudu CLI tool
......................................................................

[tools] rebalancer in the kudu CLI tool

Introduced rebalancing tool as a part of the Kudu CLI tool.  To run
the rebalancer against a Kudu cluster:

  sudo -u kudu kudu cluster rebalance <master_addresses>

This changelist also contains unit tests to cover converting KsckResults
result into ClusterBalanceInfo and integration tests to verify
the functionality of the rebalancer as a part of the Kudu CLI tool.

The code was tested against the 12-node cluster running the recent
Kudu 1.7 (3-4-3 replica management scheme) with about 12TB of data total
and a cluster running Kudu 1.4 (3-2-3 replica management scheme).

Change-Id: I269ea1dcb0b528ad9f03308bac6b8769e2141238
---
M src/kudu/client/client.cc
M src/kudu/tools/CMakeLists.txt
M src/kudu/tools/kudu-admin-test.cc
A src/kudu/tools/rebalance-test.cc
A src/kudu/tools/rebalancer.cc
A src/kudu/tools/rebalancer.h
M src/kudu/tools/tool_action_cluster.cc
M src/kudu/tools/tool_action_common.cc
M src/kudu/tools/tool_action_common.h
M src/kudu/tools/tool_action_tablet.cc
A src/kudu/tools/tool_replica_util.cc
A src/kudu/tools/tool_replica_util.h
12 files changed, 2,526 insertions(+), 419 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/99/10399/19
-- 
To view, visit http://gerrit.cloudera.org:8080/10399
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I269ea1dcb0b528ad9f03308bac6b8769e2141238
Gerrit-Change-Number: 10399
Gerrit-PatchSet: 19
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>

[kudu-CR] [tools] rebalancer in the kudu CLI tool

Posted by "Will Berkeley (Code Review)" <ge...@cloudera.org>.
Will Berkeley has posted comments on this change. ( http://gerrit.cloudera.org:8080/10399 )

Change subject: [tools] rebalancer in the kudu CLI tool
......................................................................


Patch Set 24: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/10399
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I269ea1dcb0b528ad9f03308bac6b8769e2141238
Gerrit-Change-Number: 10399
Gerrit-PatchSet: 24
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Thu, 07 Jun 2018 21:43:40 +0000
Gerrit-HasComments: No

[kudu-CR] WIP [tools] first draft of rebalancing in kudu CLI

Posted by "Will Berkeley (Code Review)" <ge...@cloudera.org>.
Will Berkeley has posted comments on this change. ( http://gerrit.cloudera.org:8080/10399 )

Change subject: WIP [tools] first draft of rebalancing in kudu CLI
......................................................................


Patch Set 4:

(21 comments)

Some random comments. I didn't look at rebalancer.cc since IIRC you've already changed it a lot while testing compared to what's posted here.

http://gerrit.cloudera.org:8080/#/c/10399/4/src/kudu/tools/rebalance-test.cc
File src/kudu/tools/rebalance-test.cc:

http://gerrit.cloudera.org:8080/#/c/10399/4/src/kudu/tools/rebalance-test.cc@140
PS4, Line 140: Convert
nit: I think you mean "Test converting..."


http://gerrit.cloudera.org:8080/#/c/10399/4/src/kudu/tools/rebalance-test.cc@165
PS4, Line 165: "tablet_a_0"
These should all be the same for one tablet.


http://gerrit.cloudera.org:8080/#/c/10399/4/src/kudu/tools/rebalance-test.cc@240
PS4, Line 240: { 2, { "table_c", 
Can we put things in order to match the table order in the KsckResultsInput?


http://gerrit.cloudera.org:8080/#/c/10399/4/src/kudu/tools/rebalance.h
File src/kudu/tools/rebalance.h:

http://gerrit.cloudera.org:8080/#/c/10399/4/src/kudu/tools/rebalance.h@36
PS4, Line 36: ReplicaMoveInfo
Do you think we should synchronize the naming between this and TableReplicaMove? TabletReplicaMove? The former is a move of some replica of a table; the latter is a move of a uniquely-identified replica of a tablet. This 'engine' part must choose a specific ReplicaMoveInfo consistent with a TableReplicaMove handed to it by the algorithm.


http://gerrit.cloudera.org:8080/#/c/10399/4/src/kudu/tools/rebalance.h@37
PS4, Line 37: tablet_uuid
We usually call it a tablet_id and not a tablet_uuid, just like we usually say ts_uuid and not ts_id. This is more me realizing this and scratching my chin about it than suggesting you change anything :)


http://gerrit.cloudera.org:8080/#/c/10399/4/src/kudu/tools/rebalance.h@45
PS4, Line 45: step
What's one step?


http://gerrit.cloudera.org:8080/#/c/10399/4/src/kudu/tools/rebalance.h@46
PS4, Line 46: max_moves
Also can you doc the parameters?


http://gerrit.cloudera.org:8080/#/c/10399/4/src/kudu/tools/rebalance.h@49
PS4, Line 49: the stats
Can you say which stats are included? Also, where are we printing to? In my experience it's a good idea to have an 'out' parameter. It makes the method more flexible and easier to test.


http://gerrit.cloudera.org:8080/#/c/10399/4/src/kudu/tools/rebalance.h@59
PS4, Line 59: UUID
UUIDs


http://gerrit.cloudera.org:8080/#/c/10399/4/src/kudu/tools/rebalance.h@59
PS4, Line 59: is output into
nit: This wording leaves it unclear to me whether it replaces the contents of 'tablet_ids' with the output, or whether it appends it.


http://gerrit.cloudera.org:8080/#/c/10399/4/src/kudu/tools/rebalance.h@60
PS4, Line 60:  found
nit: are found


http://gerrit.cloudera.org:8080/#/c/10399/4/src/kudu/tools/rebalance.h@60
PS4, Line 60: the 'tablet_ids' is empty
OK this makes it clear the results are populated into 'tablet_ids'.


http://gerrit.cloudera.org:8080/#/c/10399/4/src/kudu/tools/rebalance.h@60
PS4, Line 60: is
nit: will be


http://gerrit.cloudera.org:8080/#/c/10399/4/src/kudu/tools/rebalance.h@60
PS4, Line 60: the
nit: remove


http://gerrit.cloudera.org:8080/#/c/10399/4/src/kudu/tools/tool_action_cluster.cc
File src/kudu/tools/tool_action_cluster.cc:

http://gerrit.cloudera.org:8080/#/c/10399/4/src/kudu/tools/tool_action_cluster.cc@62
PS4, Line 62: perfectly
nit: I don't like this extra word because I don't think it means anything- we don't actually have a distinction between "balanced" and "perfectly balanced".


http://gerrit.cloudera.org:8080/#/c/10399/4/src/kudu/tools/tool_action_cluster.cc@64
PS4, Line 64: max_moves_per_step
Is this parameter also secretly the max number of moves that can be going on at one time (the size of the 'pool')? I think users would want to know how to limit that, more than the number of moves per step.


http://gerrit.cloudera.org:8080/#/c/10399/4/src/kudu/tools/tool_action_cluster.cc@119
PS4, Line 119: LOG(INFO) << "rebalancing is complete: no moves left";
nit: We should straight print this rather than log it.


http://gerrit.cloudera.org:8080/#/c/10399/4/src/kudu/tools/tool_action_cluster.cc@125
PS4, Line 125: "tablet $0: moving replica from $1 to $2"
As we talked about, I think a brief notation for a move, with the table name included, would be ideal:

    Table 'foo': T <tablet id>: TS <ts uuid> -> TS <td uuid>

We should also be careful to keep the output relatively easily parsed by  machine. Could be useful someday.


http://gerrit.cloudera.org:8080/#/c/10399/4/src/kudu/tools/tool_action_cluster.cc@127
PS4, Line 127: LOG(INFO) << info_msg
I like using LOG for the timestamps, but other than that I usually think of tool output not being LOG(INFO) unless the user can kinda ignore it. The main output will go to stdout and the log messages to stderr, so they can be separated.


http://gerrit.cloudera.org:8080/#/c/10399/4/src/kudu/tools/tool_action_cluster.cc@135
PS4, Line 135: LOG(INFO)
Likewise.


http://gerrit.cloudera.org:8080/#/c/10399/4/src/kudu/tools/tool_action_tablet.cc
File src/kudu/tools/tool_action_tablet.cc:

http://gerrit.cloudera.org:8080/#/c/10399/4/src/kudu/tools/tool_action_tablet.cc@413
PS4, Line 413:   
Should be 4 spaces, not 6.



-- 
To view, visit http://gerrit.cloudera.org:8080/10399
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I269ea1dcb0b528ad9f03308bac6b8769e2141238
Gerrit-Change-Number: 10399
Gerrit-PatchSet: 4
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Sun, 20 May 2018 07:24:30 +0000
Gerrit-HasComments: Yes

[kudu-CR] [tools] rebalancer in the kudu CLI tool

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/10399 )

Change subject: [tools] rebalancer in the kudu CLI tool
......................................................................


Patch Set 22: Verified+1

unrelated flake in FileCacheTest/1.TestHeavyReads


-- 
To view, visit http://gerrit.cloudera.org:8080/10399
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I269ea1dcb0b528ad9f03308bac6b8769e2141238
Gerrit-Change-Number: 10399
Gerrit-PatchSet: 22
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Wed, 06 Jun 2018 08:26:10 +0000
Gerrit-HasComments: No

[kudu-CR] [tools] rebalancer in the kudu CLI tool

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/10399 )

Change subject: [tools] rebalancer in the kudu CLI tool
......................................................................


Patch Set 13:

(68 comments)

http://gerrit.cloudera.org:8080/#/c/10399/11//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/10399/11//COMMIT_MSG@12
PS11, Line 12: <master_addresses>
> It's more common just to say <master addresses>.
Done


http://gerrit.cloudera.org:8080/#/c/10399/11//COMMIT_MSG@18
PS11, Line 18: the 12-node clust
> "the flash cluster" won't mean much to most Kudu community members. Maybe j
Done


http://gerrit.cloudera.org:8080/#/c/10399/11//COMMIT_MSG@19
PS11, Line 19:  (3-4-3 r
> Mention this vendor's Kudu is based on upstream 1.4.
Done


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/CMakeLists.txt
File src/kudu/tools/CMakeLists.txt:

http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/CMakeLists.txt@87
PS11, Line 87: rebalance.cc
             :   rebalance_al
> (Puts Adar hat on) nit: Need to swap these two for strict ascii sort order.
Done


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/CMakeLists.txt@92
PS11, Line 92: ksck
             :   kudu
> nit: Order.
Done


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/CMakeLists.txt@173
PS11, Line 173: ADD_KUDU_TEST(rebalance-test)
              : ADD_KUDU_TEST(rebalance_algo-
> nit: Swap for correct order.
Done


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/kudu-admin-test.cc
File src/kudu/tools/kudu-admin-test.cc:

http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/kudu-admin-test.cc@1320
PS11, Line 1320: 
> Extra word.
Done


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/kudu-admin-test.cc@1320
PS11, Line 1320: ceStartCriteria
> nit: the rebalancer
Done


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/kudu-admin-test.cc@1323
PS11, Line 1323: 
> Can we can reuse the enum class Kudu1097?
Done


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/kudu-admin-test.cc@1330
PS11, Line 1330: };
> This doesn't look necessary.
Done


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/kudu-admin-test.cc@1338
PS11, Line 1338: // Shutdown one of the tablet servers.
> Can set to 3 directly if we shut down the first tablet server rather than a
Done


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/kudu-admin-test.cc@1339
PS11, Line 1339: HostPort ts_host_port;
> Isn't this redundant?
Done


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/kudu-admin-test.cc@1343
PS11, Line 1343:   ts_host_port = ts->bound_rpc_hostport();
               :     ts->Shutdown();
               :   }
               : 
               :   string err;
               :   Status s = RunKuduTool({
               :     "cluster",
               :    
> It's fine to shut down the TS at idx = 0 instead of picking a random one.
Done


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/kudu-admin-test.cc@1365
PS11, Line 1365:  public ::testing::WithParamInterface<std::tuple<int, Kudu1097>> {
> Can you mention the test is also covering different replication factors?
Done


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/kudu-admin-test.cc@1382
PS11, Line 1382: xpr
> Why not auto?
Done


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/kudu-admin-test.cc@1412
PS11, Line 1412: S
> kRepFactor + 1
Done


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/kudu-admin-test.cc@1427
PS11, Line 1427: r (auto i = 0; i < kNumTables; ++i) {
> This shouldn't be needed because the table already exists.
Done


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/kudu-admin-test.cc@1477
PS11, Line 1477: for (auto& workload : workloads) {
> Can we run the ClusterVerifier as well?
Done


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.h
File src/kudu/tools/rebalance.h:

http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.h@50
PS11, Line 50: 
> Seems unnecessary.
Done


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.h@53
PS11, Line 53: fig(std::vector<std::strin
> nit: Names of tables to balance.
Done


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.h@57
PS11, Line 57: 
> nit: "on", or maybe "involving".
Done


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.h@58
PS11, Line 58: 
> the
Done


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.h@58
PS11, Line 58: 
> means a
Done


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.h@58
PS11, Line 58: 
> nit: "on a" or "involving a".
Done


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.h@59
PS11, Line 59: dpoints.
> is located on the
Done


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.h@65
PS11, Line 65: 
> Why is this a special case?
That's because the code in rebalancer.cc has special provisions to avoid moving RF1 replicas.


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.h@69
PS11, Line 69:   size_t max_moves_per_server;
             : 
             :     // Maximum run time, in seconds.
             :     int64_t max_run_time_sec;
             : 
> I'd simplify a bit and say something like "Represents a concrete move of a 
Done


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.h@74
PS11, Line 74: ether to move r
> I think it'd be better just to call it ReplicaMove.
Done


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.h@83
PS11, Line 83: std::string
> nit: Does it make sense just to call it TIMED_OUT, a more typical descripto
Done


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.h@89
PS11, Line 89:   TIMED_OUT,
> nit: redundant.
Done


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.h@95
PS11, Line 95: 
> nit: once
Done


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.h@96
PS11, Line 96: 
> if an error occurs
Done


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.h@98
PS11, Line 98: Pri
> Can 'status' be null?
Nope, it cannot.  I added corresponding comment.


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.h@101
PS11, Line 101: ed or 
> Builder
Done. It's gone.


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.h@108
PS11, Line 108: 
> Note what this parameter is for.
Done


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.h@115
PS11, Line 115: eter. The 'cbi'
              :   // output parameter cannot be null.
              :   Status KsckResultsToClusterBalanceInfo(
              :       const KsckResults& ksck_info,
> Just to clean it up a bit: "on pending moves, which may be scheduled or run
Done


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.h@120
PS11, Line 120: 
> infos
Done


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.h@120
PS11, Line 120: 
> remove
Done


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.h@121
PS11, Line 121: 
> Replace with "which".
Done


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.h@129
PS11, Line 129:  into 'replica_moves',
              :   // which will be empty if no 
> Does this imply a NotFound or other non-OK status will be returned as well?
Nope, in that case this method happily returns Status::OK().  Non-OK statuses are reserved for errors.


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.cc
File src/kudu/tools/rebalance.cc:

http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.cc@117
PS11, Line 117: 
> Use L instead since this looks too much like 1 in a lot of fonts.
Done


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.cc@128
PS11, Line 128: plica_count = servers_lo
> I'd make this "Statistic" and "Value".
Done


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.cc@129
PS11, Line 129: replica_count = servers_load_info.rbegin()->first;
              :       const double ave_replica_count =
              :           1.0 * total_replic
> Then add "Replica Count" to each statistic name, e.g. "Maximum Replica Coun
Done


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.cc@143
PS11, Line 143: y : tserver_summaries) {
> I think "UUID", "Address", "Replica Count" is a better order.
They are sorted by count, I think.


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.cc@181
PS11, Line 181: 
> I think "Table Id", "Table Name", "Replica Count", "Replica Skew" makes mor
I would prefer the 'Table name' to be in the end of the list, otherwise the width of columns varies from run to run, and I don't like that.  Let me know if you feel strongly against that.


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.cc@204
PS11, Line 204: 
> This function is hard to digest. Any change you could break it up?
Done


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.cc@275
PS11, Line 275:     const auto& tablet_id = info.tablet_uuid;
              :         const auto& src_ts_uuid = info.ts_uuid_from;
> Should these be part of the instance's state, with the update functions as 
That's doable, yes.  Initially, everything was an assorted collection of functions.


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.cc@292
PS11, Line 292:           // Continue to the next round, populating op_indices_to_schedule with
              :           // appropriate moves and scheduling the operations.
              :           continue;
              :         }
              : 
              :         DCHECK(!s.ok());
              :         // The source replica is not found in the tablet's consensus config
              :         // or the tablet does not exit anymore. The replica might already
              :         // moved because of some other concurrent activity, e.g.
              :         // re-replication, another rebalancing session in progress, etc.
              :         LOG(INFO) << Substitute("tablet $0: $1 -> $2 move ignored: $3",
              :                                 tablet_id, src_ts_uuid, dst_ts_uuid,
              :                                 s.ToString());
              :         UpdateOnMoveScheduled(src_op_indices_, op_idx, info.ts_uuid_from, false);
              :         UpdateOnMoveScheduled(dst_op_indices_, op_idx, info.ts_uuid_to, false);
              :         // Re-evaluate t
> Just in the interest of shortening this function, can this be a small helpe
Done


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.cc@311
PS11, Line 311:     // Check for the moves in progress.
              :       clear_scheduled_moves = UpdateMovesInProgress(deadline, &timed_out);
              :       if (re_evaluate_cluster_state || clear_scheduled_moves) {
              :         break;
              :       }
              :       // Sleep a bit before going next cycle.
              :       SleepFor(MonoDelta::FromMilliseconds(200));
              :     }
              :   }
              : 
              :   *result_status = timed_out ? RunStatus::TIMED_OUT
              :                              : RunStatus::CLUSTER_IS_BALANCED;
              :   if (moves_count) {
              :     *moves_count = attempted_moves_count_;
              :   }
              : 
              :   return Status::OK();
              : }
              : 
              : // Transform the information on the cluster returned by ksck into
              : // ClusterBalanceInfo that could be consumed by the rebalancing algorithm,
              : // taking into account pending replica movement operations. The pending
              : // operations are evaluated against the state of the cluster in accordance with
              : // the ksck results, and if the replica movement operations are still in
              : // progress, then they are interpreted as successfully completed. The idea is to
              : // prevent the algorithm outputting the same moves again while some of the
              : // moves recommended at prior steps are still in progress.
              : Status Rebalancer::KsckResultsToClusterBalanceInfo(
              :     const KsckResults& ksck_info,
              :     const MovesInProgress& pending_moves,
              :     ClusterBalanceInfo* cbi) const {
              :   DCHECK(cbi);
              : 
              :   // tserver UUID --> total replica count of all table's tablets at the server
              :   typedef unordered_map<string, int32_t> TableReplicasAtServer;
              : 
> This section also feels like it could be factored out into its own function
Done


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.cc@395
PS11, Line 395:   LOG(INFO) << msg;
> Ditto.
Done


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.cc@473
PS11, Line 473: 
> What about UNAVAILABLE servers?
Good catch, indeed.  The reason it's not handled is that initially there was an idea that this code would be not run against a tablet server which is down, however such servers might appear in the course of the run.


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/tool_action_cluster.cc
File src/kudu/tools/tool_action_cluster.cc:

http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/tool_action_cluster.cc@74
PS11, Line 74: on
> on
Done


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/tool_action_cluster.cc@75
PS11, Line 75: :
> :, delete "where"
Done


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/tool_action_cluster.cc@125
PS11, Line 125: ebalancer::Config(
              :       std::move(master_addresses),
> If these are const, can we pass-by-value and move from them?
Done


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/tool_action_cluster.cc@130
PS11, Line 130: ve_single_replic
> I think this is redundant as we move from 'cfg' in the initializer list por
Done


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/tool_action_cluster.cc@133
PS11, Line 133: RETURN_NOT_OK
> Why not RETURN_NOT_OK?
Done


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/tool_action_cluster.cc@149
PS11, Line 149: msg_result_s
> Did you mean to add a message here too? Or take this out and return a bad s
Done


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/tool_action_cluster.cc@200
PS11, Line 200: le
> for each
Done


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/tool_replica_util.h
File src/kudu/tools/tool_replica_util.h:

PS11: 
> Would you mind adding one sentence comments to these functions to explain w
Done


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/tool_replica_util.h@52
PS11, Line 52: const MonoDelta& timeout,
> This seems like a funny parameter to have. Does it detect if the cluster is
Yes, this is what it is -- I just moved it from some other place.  Probably, we could improve the signature of this function in a separate changelist.


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/tool_replica_util.h@66
PS11, Line 66: ica is a 
> nit: complete.
Done


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/tool_replica_util.h@68
PS11, Line 68: us GetTabletLeader(
> Why not a cref?
Done


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/tool_replica_util.h@72
PS11, Line 72:  leader_h
> nit: complete.
Done


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/tool_replica_util.cc
File src/kudu/tools/tool_replica_util.cc:

http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/tool_replica_util.cc@83
PS11, Line 83: GetLastCommittedOpId
> Looks like this function also exists as GetLastOpIdForReplica in cluster_it
That one uses TServerDetails as a parameter.  Probably, it's possible to make GetLastOpIdForReplica() to use this function. I think it's better to address that separately.


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/tool_replica_util.cc@145
PS11, Line 145: 
> This is the only reason we need the client, so I think we should pass in th
Done


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/tool_replica_util.cc@162
PS11, Line 162: GetTablet
> Aside: I had to look up this method because it surprised me that we take ow
Yep, that's the part of the public API so I don't think it's easy to fix that.


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/tool_replica_util.cc@180
PS11, Line 180: 
> and the
Done


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/tool_replica_util.cc@184
PS11, Line 184: starts
> happens
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/10399
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I269ea1dcb0b528ad9f03308bac6b8769e2141238
Gerrit-Change-Number: 10399
Gerrit-PatchSet: 13
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Thu, 31 May 2018 05:02:02 +0000
Gerrit-HasComments: Yes

[kudu-CR] [tools] rebalancer in the kudu CLI tool

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Hello Will Berkeley, Mike Percy, Kudu Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/10399

to look at the new patch set (#17).

Change subject: [tools] rebalancer in the kudu CLI tool
......................................................................

[tools] rebalancer in the kudu CLI tool

Introduced rebalancing tool as a part of the Kudu CLI tool.  To run
the rebalancer against a Kudu cluster:

  sudo -u kudu kudu cluster rebalance <master_addresses>

This changelist also contains unit tests to cover converting KsckResults
result into ClusterBalanceInfo and integration tests to verify
the functionality of the rebalancer as a part of the Kudu CLI tool.

The code was tested against the 12-node cluster running the recent
Kudu 1.7 (3-4-3 replica management scheme) with about 12TB of data total
and a cluster running Kudu 1.4 (3-2-3 replica management scheme).

Change-Id: I269ea1dcb0b528ad9f03308bac6b8769e2141238
---
M src/kudu/client/client.cc
M src/kudu/tools/CMakeLists.txt
M src/kudu/tools/kudu-admin-test.cc
A src/kudu/tools/rebalance-test.cc
A src/kudu/tools/rebalancer.cc
A src/kudu/tools/rebalancer.h
M src/kudu/tools/tool_action_cluster.cc
M src/kudu/tools/tool_action_common.cc
M src/kudu/tools/tool_action_common.h
M src/kudu/tools/tool_action_tablet.cc
A src/kudu/tools/tool_replica_util.cc
A src/kudu/tools/tool_replica_util.h
12 files changed, 2,473 insertions(+), 419 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/99/10399/17
-- 
To view, visit http://gerrit.cloudera.org:8080/10399
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I269ea1dcb0b528ad9f03308bac6b8769e2141238
Gerrit-Change-Number: 10399
Gerrit-PatchSet: 17
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>

[kudu-CR] WIP [tools] first draft of rebalancing in kudu CLI

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/10399 )

Change subject: WIP [tools] first draft of rebalancing in kudu CLI
......................................................................


Patch Set 1: Verified+1

Unrelated flake in alter_table-randomized-test.xml.[failed-to-read]


-- 
To view, visit http://gerrit.cloudera.org:8080/10399
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I269ea1dcb0b528ad9f03308bac6b8769e2141238
Gerrit-Change-Number: 10399
Gerrit-PatchSet: 1
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Comment-Date: Tue, 15 May 2018 00:28:05 +0000
Gerrit-HasComments: No

[kudu-CR] [tools] rebalancer in the kudu CLI tool

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/10399 )

Change subject: [tools] rebalancer in the kudu CLI tool
......................................................................


Patch Set 17: Verified+1

Unrelated flake in OptionalSSL/TestRpc.TestClientConnectionMetrics/0


-- 
To view, visit http://gerrit.cloudera.org:8080/10399
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I269ea1dcb0b528ad9f03308bac6b8769e2141238
Gerrit-Change-Number: 10399
Gerrit-PatchSet: 17
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Sat, 02 Jun 2018 05:31:18 +0000
Gerrit-HasComments: No

[kudu-CR] [tools] rebalancer in the kudu CLI tool

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has removed Kudu Jenkins from this change.  ( http://gerrit.cloudera.org:8080/10399 )

Change subject: [tools] rebalancer in the kudu CLI tool
......................................................................


Removed reviewer Kudu Jenkins with the following votes:

* Verified-1 by Kudu Jenkins (120)
-- 
To view, visit http://gerrit.cloudera.org:8080/10399
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: deleteReviewer
Gerrit-Change-Id: I269ea1dcb0b528ad9f03308bac6b8769e2141238
Gerrit-Change-Number: 10399
Gerrit-PatchSet: 17
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>

[kudu-CR] WIP [tools] first draft of rebalancing in kudu CLI

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Hello Will Berkeley, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/10399

to look at the new patch set (#2).

Change subject: WIP [tools] first draft of rebalancing in kudu CLI
......................................................................

WIP [tools] first draft of rebalancing in kudu CLI

WIP:
  * needs integration tests with a la ksck-remote
  * address several TODOs
  * needs more testing against real cluster

Change-Id: I269ea1dcb0b528ad9f03308bac6b8769e2141238
---
M src/kudu/tools/CMakeLists.txt
M src/kudu/tools/rebalance-algo.cc
A src/kudu/tools/rebalance-test.cc
A src/kudu/tools/rebalance.cc
A src/kudu/tools/rebalance.h
M src/kudu/tools/tool_action_cluster.cc
M src/kudu/tools/tool_action_tablet.cc
7 files changed, 871 insertions(+), 58 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/99/10399/2
-- 
To view, visit http://gerrit.cloudera.org:8080/10399
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I269ea1dcb0b528ad9f03308bac6b8769e2141238
Gerrit-Change-Number: 10399
Gerrit-PatchSet: 2
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>

[kudu-CR] [tools] rebalancer in the kudu CLI tool

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/10399 )

Change subject: [tools] rebalancer in the kudu CLI tool
......................................................................


Patch Set 22:

(13 comments)

http://gerrit.cloudera.org:8080/#/c/10399/20//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/10399/20//COMMIT_MSG@15
PS20, Line 15: into C
> nit: extra word
Done


http://gerrit.cloudera.org:8080/#/c/10399/20//COMMIT_MSG@18
PS20, Line 18: a 1
> nit: a
Done


http://gerrit.cloudera.org:8080/#/c/10399/20//COMMIT_MSG@21
PS20, Line 21: 
> We should also cover the limitations (if any) this tool has working against
Done


http://gerrit.cloudera.org:8080/#/c/10399/20/src/kudu/client/client.cc
File src/kudu/client/client.cc:

http://gerrit.cloudera.org:8080/#/c/10399/20/src/kudu/client/client.cc@548
PS20, Line 548: resp.tablet_locations_size() == 0
> This is funny. How did this come up? Looking at the master code, we return 
I don't recall the details, but the definite thing I remember was the error coming from L552: "Expected only one tablet, but received 0".  I can dig a bit on that on the side, I think.


http://gerrit.cloudera.org:8080/#/c/10399/20/src/kudu/tools/kudu-admin-test.cc
File src/kudu/tools/kudu-admin-test.cc:

http://gerrit.cloudera.org:8080/#/c/10399/20/src/kudu/tools/kudu-admin-test.cc@1353
PS20, Line 1353: ASSERT_TRUE(s.IsRunti
> nit: ASSERT_TRUE on the sort of status it is. Here It looks like that is Il
Yep, that makes sense (it should be just RuntimeError).


http://gerrit.cloudera.org:8080/#/c/10399/20/src/kudu/tools/kudu-admin-test.cc@1383
PS20, Line 1383: constexpr auto kTserverUnresponsiveMs = 3000;
> nit: Can we lower this and speed up the test? Does it make things flaky if 
Something is fishy -- it's became more flakier.  I'll try to figure it out on the side, I think.


http://gerrit.cloudera.org:8080/#/c/10399/20/src/kudu/tools/kudu-admin-test.cc@1403
PS20, Line 1403: (
> nit: (kRepFactor + 1)
Done


http://gerrit.cloudera.org:8080/#/c/10399/20/src/kudu/tools/kudu-admin-test.cc@1427
PS20, Line 1427: // the tablet is un
> Ah, makes sense to me that writes can't replicate during replica movement w
Done


http://gerrit.cloudera.org:8080/#/c/10399/20/src/kudu/tools/rebalancer.h
File src/kudu/tools/rebalancer.h:

http://gerrit.cloudera.org:8080/#/c/10399/20/src/kudu/tools/rebalancer.h@78
PS20, Line 78: // with the cluster or when som
> This parameter seems like a nice safety valve but also hard to use. How lon
It's a good observation.  Indeed, having time-based parameter makes much more sense here.  Thanks!


http://gerrit.cloudera.org:8080/#/c/10399/20/src/kudu/tools/rebalancer.cc
File src/kudu/tools/rebalancer.cc:

http://gerrit.cloudera.org:8080/#/c/10399/20/src/kudu/tools/rebalancer.cc@243
PS20, Line 243: 
> The rsync happens in GetNextMoves but that's not obvious from reading this 
Done


http://gerrit.cloudera.org:8080/#/c/10399/20/src/kudu/tools/rebalancer.cc@539
PS20, Line 539: 
> nit: are
Done


http://gerrit.cloudera.org:8080/#/c/10399/20/src/kudu/tools/rebalancer.cc@545
PS20, Line 545: vector<string> tablet_uuids_dst;
              : 
> Howabout "Tablet ids of replicas on the source tserver that are non-leaders
That's much better, thanks.


http://gerrit.cloudera.org:8080/#/c/10399/20/src/kudu/tools/rebalancer.cc@548
PS20, Line 548:   if (tablet_summary.table_id != table_id) {
              :       continue;
> "Tablet ids of replicas on the source tserver that are leaders."
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/10399
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I269ea1dcb0b528ad9f03308bac6b8769e2141238
Gerrit-Change-Number: 10399
Gerrit-PatchSet: 22
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Wed, 06 Jun 2018 08:25:44 +0000
Gerrit-HasComments: Yes