You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "Yifan Zhang (Code Review)" <ge...@cloudera.org> on 2019/08/28 11:07:43 UTC

[kudu-CR] KUDU-2914: Rebalance tool support moving replicas from some specific tablet servers

Yifan Zhang has uploaded this change for review. ( http://gerrit.cloudera.org:8080/14154


Change subject: KUDU-2914: Rebalance tool support moving replicas from some specific tablet servers
......................................................................

KUDU-2914: Rebalance tool support moving replicas from some specific tablet servers

This patch add a '--blacklist_tservers' flag to rebalance tool.
Once specified blacklist_tservers, replicas on these servers would
be moved to other servers first and then run the rebalancing on
servers not in blacklist. It will be useful for server decommissioning.

Change-Id: I86cfb740030946c13db1a9ca63d241f4907d6c89
---
M src/kudu/rebalance/rebalance-test.cc
M src/kudu/rebalance/rebalance_algo-test.cc
M src/kudu/rebalance/rebalance_algo.cc
M src/kudu/rebalance/rebalance_algo.h
M src/kudu/rebalance/rebalancer.cc
M src/kudu/rebalance/rebalancer.h
M src/kudu/tools/rebalancer_tool-test.cc
M src/kudu/tools/rebalancer_tool.cc
M src/kudu/tools/rebalancer_tool.h
M src/kudu/tools/tool_action_cluster.cc
10 files changed, 965 insertions(+), 93 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/54/14154/1
-- 
To view, visit http://gerrit.cloudera.org:8080/14154
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I86cfb740030946c13db1a9ca63d241f4907d6c89
Gerrit-Change-Number: 14154
Gerrit-PatchSet: 1
Gerrit-Owner: Yifan Zhang <ch...@163.com>

[kudu-CR] KUDU-2914: Rebalance tool support moving replicas from some specific tablet servers

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/14154 )

Change subject: KUDU-2914: Rebalance tool support moving replicas from some specific tablet servers
......................................................................


Patch Set 12:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/14154/12/src/kudu/tools/rebalancer_tool-test.cc
File src/kudu/tools/rebalancer_tool-test.cc:

http://gerrit.cloudera.org:8080/#/c/14154/12/src/kudu/tools/rebalancer_tool-test.cc@757
PS12, Line 757: IsRebalancingInProgress()
> It's quiet difficult to know which tablet server is in progress of copying 
OK, I just wanted to know whether you saw any flakiness in the test due to those conditions.  If no flakiness is observed, then it's all right then.


http://gerrit.cloudera.org:8080/#/c/14154/12/src/kudu/tools/rebalancer_tool.cc
File src/kudu/tools/rebalancer_tool.cc:

http://gerrit.cloudera.org:8080/#/c/14154/12/src/kudu/tools/rebalancer_tool.cc@1243
PS12, Line 1243:     has_updates |= s.ok()
> I think the 'has_updates' variable indicates whether the statuses of in-pro
I think the re-synchronization of the cluster state happens only in case of an error, but otherwise the inner cycle of RebalancerTool::RunWith() ends if no updated were detected, and then next set of moves is required from algorithm.

Anyway, I agree this piece looks fishy and it seems there might be some bug here.  Could you please separate this change into its own patch?  That way it will be easier to reason about this and track related issues, if any.  Thanks!



-- 
To view, visit http://gerrit.cloudera.org:8080/14154
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I86cfb740030946c13db1a9ca63d241f4907d6c89
Gerrit-Change-Number: 14154
Gerrit-PatchSet: 12
Gerrit-Owner: Yifan Zhang <ch...@163.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Hannah Nguyen <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Yifan Zhang <ch...@163.com>
Gerrit-Comment-Date: Tue, 19 Nov 2019 07:20:22 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2914: Rebalance tool support moving replicas from some specific tablet servers

Posted by "Hannah Nguyen (Code Review)" <ge...@cloudera.org>.
Hannah Nguyen has posted comments on this change. ( http://gerrit.cloudera.org:8080/14154 )

Change subject: KUDU-2914: Rebalance tool support moving replicas from some specific tablet servers
......................................................................


Patch Set 10:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/14154/10/src/kudu/rebalance/rebalancer.h
File src/kudu/rebalance/rebalancer.h:

http://gerrit.cloudera.org:8080/#/c/14154/10/src/kudu/rebalance/rebalancer.h@70
PS10, Line 70: If empty, run the rebalancing on
             :     // all tablet servers in the cluster.
Would it be beneficial to have the option to choose the former default behavior of the rebalancer tool to not do rebalancing if there are any or some specified number of unhealthy tservers?
What if there are enough healthy tservers to maintain the RF and run rebalancing, but there are also a significant number of unhealthy tservers? Could the rebalancer tool end up moving replicas that will make the cluster imbalanced, and have to be moved again, once the unhealthy tservers become available again?


http://gerrit.cloudera.org:8080/#/c/14154/10/src/kudu/tools/rebalancer_tool.cc
File src/kudu/tools/rebalancer_tool.cc:

http://gerrit.cloudera.org:8080/#/c/14154/10/src/kudu/tools/rebalancer_tool.cc@631
PS10, Line 631: CheckIgnoredServers
Is this function only called when move_replicas_from_ignored_tservers=true?

When move_replicas_from_ignored_tservers=false, Rebalancer::BuildClusterInfo() still populates ignored_tservers with unhealthy tservers. We should still verify that remaining_tservers_count >= max_replication_factor, even if not moving replicas from the ignored tservers.



-- 
To view, visit http://gerrit.cloudera.org:8080/14154
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I86cfb740030946c13db1a9ca63d241f4907d6c89
Gerrit-Change-Number: 14154
Gerrit-PatchSet: 10
Gerrit-Owner: Yifan Zhang <ch...@163.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Hannah Nguyen <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Yifan Zhang <ch...@163.com>
Gerrit-Comment-Date: Thu, 24 Oct 2019 16:10:18 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2914: Rebalance tool support moving replicas from some specific tablet servers

Posted by "Yifan Zhang (Code Review)" <ge...@cloudera.org>.
Yifan Zhang has posted comments on this change. ( http://gerrit.cloudera.org:8080/14154 )

Change subject: KUDU-2914: Rebalance tool support moving replicas from some specific tablet servers
......................................................................


Patch Set 2:

(8 comments)

http://gerrit.cloudera.org:8080/#/c/14154/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/14154/1//COMMIT_MSG@9
PS1, Line 9: por
> adds
Done


http://gerrit.cloudera.org:8080/#/c/14154/1//COMMIT_MSG@9
PS1, Line 9: blet servers,
> the rebalancer tool
Done


http://gerrit.cloudera.org:8080/#/c/14154/1//COMMIT_MSG@12
PS1, Line 12: LI tool.
> I think moving this into a separate paragraph would make the commit message
Done


http://gerrit.cloudera.org:8080/#/c/14154/1//COMMIT_MSG@10
PS1, Line 10: this patch re-uses the '--ignored_tservers' flag and adds a
            : '--move_replicas_from_ignored_tservers' flag to the
            : `kudu rebalance cluster`
> I think it is easier to read if this sentence is split into a few, for exam
Done


http://gerrit.cloudera.org:8080/#/c/14154/1/src/kudu/rebalance/rebalancer.h
File src/kudu/rebalance/rebalancer.h:

http://gerrit.cloudera.org:8080/#/c/14154/1/src/kudu/rebalance/rebalancer.h@208
PS1, Line 208:   // involve moving replicas of tablets which are in 'scheduled_moves'. The
> Nit: should be "bool* is_healthy_move".
Done


http://gerrit.cloudera.org:8080/#/c/14154/1/src/kudu/rebalance/rebalancer.cc
File src/kudu/rebalance/rebalancer.cc:

http://gerrit.cloudera.org:8080/#/c/14154/1/src/kudu/rebalance/rebalancer.cc@110
PS1, Line 110:   const auto& table_id = move.table_id;
> Why is this necessary? Looks like every path out of the function swaps to t
Done


http://gerrit.cloudera.org:8080/#/c/14154/1/src/kudu/tools/tool_action_cluster.cc
File src/kudu/tools/tool_action_cluster.cc:

http://gerrit.cloudera.org:8080/#/c/14154/1/src/kudu/tools/tool_action_cluster.cc@62
PS1, Line 62: DEFINE_string(ignored_tservers, "",
> +1
Done


http://gerrit.cloudera.org:8080/#/c/14154/1/src/kudu/tools/tool_action_cluster.cc@66
PS1, Line 66:               "as a part of the cluster as well as the replicas on them. "
> Alternatively, you could simply drop the 'If not specified part' -- it's as
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/14154
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I86cfb740030946c13db1a9ca63d241f4907d6c89
Gerrit-Change-Number: 14154
Gerrit-PatchSet: 2
Gerrit-Owner: Yifan Zhang <ch...@163.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Yifan Zhang <ch...@163.com>
Gerrit-Comment-Date: Wed, 04 Sep 2019 12:26:51 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2914: Rebalance tool support moving replicas from some specific tablet servers

Posted by "Yifan Zhang (Code Review)" <ge...@cloudera.org>.
Hello Alexey Serbin, Kudu Jenkins, Andrew Wong, Adar Dembo, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/14154

to look at the new patch set (#4).

Change subject: KUDU-2914: Rebalance tool support moving replicas from some specific tablet servers
......................................................................

KUDU-2914: Rebalance tool support moving replicas from some specific tablet servers

Aims to support moving replicas from specific tablet servers,
this patch re-uses the '--ignored_tservers' flag and adds a
'--move_replicas_from_ignored_tservers' flag to the
`kudu rebalance cluster` CLI tool.

Once the flag '--ignored_tservers' is specified, and the flag
'--move_replicas_from_ignored_tservers' is specified false,
the given tablet servers are not considered as a part of the
cluster, their health state and replicas on them are ignored
by the rebalancer tool.

If '--move_replicas_from_ignored_tservers' is enabled,nhealthy
tservers in the given ignored tservers will also be ignored by
the rebalancer tool. But for healthy ignored tservers, replicas
on them would be moved to other tservers before starting the
rebalancing process, when running the rebalancing, these tservers
are not considered as a part of the cluster.

Change-Id: I86cfb740030946c13db1a9ca63d241f4907d6c89
---
M src/kudu/rebalance/rebalance-test.cc
M src/kudu/rebalance/rebalance_algo-test.cc
M src/kudu/rebalance/rebalance_algo.cc
M src/kudu/rebalance/rebalance_algo.h
M src/kudu/rebalance/rebalancer.cc
M src/kudu/rebalance/rebalancer.h
M src/kudu/tools/rebalancer_tool-test.cc
M src/kudu/tools/rebalancer_tool.cc
M src/kudu/tools/rebalancer_tool.h
M src/kudu/tools/tool_action_cluster.cc
10 files changed, 827 insertions(+), 67 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/54/14154/4
-- 
To view, visit http://gerrit.cloudera.org:8080/14154
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I86cfb740030946c13db1a9ca63d241f4907d6c89
Gerrit-Change-Number: 14154
Gerrit-PatchSet: 4
Gerrit-Owner: Yifan Zhang <ch...@163.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Yifan Zhang <ch...@163.com>

[kudu-CR] KUDU-2914: Rebalance tool support moving replicas from some specific tablet servers

Posted by "Yifan Zhang (Code Review)" <ge...@cloudera.org>.
Hello Alexey Serbin, Kudu Jenkins, Andrew Wong, Adar Dembo, Hannah Nguyen, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/14154

to look at the new patch set (#14).

Change subject: KUDU-2914: Rebalance tool support moving replicas from some specific tablet servers
......................................................................

KUDU-2914: Rebalance tool support moving replicas from some specific tablet servers

Aims to support moving replicas from specific tablet servers,
this patch re-uses the '--ignored_tservers' flag and adds a
'--move_replicas_from_ignored_tservers' flag to the
`kudu rebalance cluster` CLI tool.

Once the flag '--ignored_tservers' is specified, the given
tablet servers are not considered as a part of the cluster,
both their health state and replicas on them are ignored
by the rebalancer tool.

While if '--move_replicas_from_ignored_tservers' is enabled,
replicas on healthy ignored tservers would be moved to other
tservers first, and then running the rebalancing on the other
healthy tservers in the cluster.

Additionally, if we want to move replicas from some specified
tablet servers to other servers, the specified tablet servers
should be set into maintenance_mode first, otherwise the
rebalancer tool would not run.

Change-Id: I86cfb740030946c13db1a9ca63d241f4907d6c89
---
M src/kudu/rebalance/rebalance-test.cc
M src/kudu/rebalance/rebalance_algo.h
M src/kudu/rebalance/rebalancer.cc
M src/kudu/rebalance/rebalancer.h
M src/kudu/tools/rebalancer_tool-test.cc
M src/kudu/tools/rebalancer_tool.cc
M src/kudu/tools/rebalancer_tool.h
M src/kudu/tools/tool_action_cluster.cc
8 files changed, 806 insertions(+), 105 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/54/14154/14
-- 
To view, visit http://gerrit.cloudera.org:8080/14154
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I86cfb740030946c13db1a9ca63d241f4907d6c89
Gerrit-Change-Number: 14154
Gerrit-PatchSet: 14
Gerrit-Owner: Yifan Zhang <ch...@163.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Hannah Nguyen <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Yifan Zhang <ch...@163.com>

[kudu-CR] KUDU-2914: Rebalance tool support moving replicas from some specific tablet servers

Posted by "Yifan Zhang (Code Review)" <ge...@cloudera.org>.
Hello Alexey Serbin, Kudu Jenkins, Andrew Wong, Adar Dembo, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/14154

to look at the new patch set (#8).

Change subject: KUDU-2914: Rebalance tool support moving replicas from some specific tablet servers
......................................................................

KUDU-2914: Rebalance tool support moving replicas from some specific tablet servers

Aims to support moving replicas from specific tablet servers,
this patch re-uses the '--ignored_tservers' flag and adds a
'--move_replicas_from_ignored_tservers' flag to the
`kudu rebalance cluster` CLI tool.

Once the flag '--ignored_tservers' is specified, the given
tablet servers are not considered as a part of the cluster,
both their health state and replicas on them are ignored
by the rebalancer tool.

While if '--move_replicas_from_ignored_tservers' is enabled,
replicas on healthy ignored tservers would be moved to other
tservers first, and then running the rebalancing on the other
tservers in the cluster.

Change-Id: I86cfb740030946c13db1a9ca63d241f4907d6c89
---
M src/kudu/rebalance/rebalance-test.cc
M src/kudu/rebalance/rebalance_algo.h
M src/kudu/rebalance/rebalancer.cc
M src/kudu/rebalance/rebalancer.h
M src/kudu/tools/rebalancer_tool-test.cc
M src/kudu/tools/rebalancer_tool.cc
M src/kudu/tools/rebalancer_tool.h
M src/kudu/tools/tool_action_cluster.cc
8 files changed, 699 insertions(+), 90 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/54/14154/8
-- 
To view, visit http://gerrit.cloudera.org:8080/14154
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I86cfb740030946c13db1a9ca63d241f4907d6c89
Gerrit-Change-Number: 14154
Gerrit-PatchSet: 8
Gerrit-Owner: Yifan Zhang <ch...@163.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Yifan Zhang <ch...@163.com>

[kudu-CR] KUDU-2914: Rebalance tool support moving replicas from some specific tablet servers

Posted by "Yifan Zhang (Code Review)" <ge...@cloudera.org>.
Yifan Zhang has posted comments on this change. ( http://gerrit.cloudera.org:8080/14154 )

Change subject: KUDU-2914: Rebalance tool support moving replicas from some specific tablet servers
......................................................................


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/14154/1/src/kudu/tools/tool_action_cluster.cc
File src/kudu/tools/tool_action_cluster.cc:

http://gerrit.cloudera.org:8080/#/c/14154/1/src/kudu/tools/tool_action_cluster.cc@62
PS1, Line 62: DEFINE_string(blacklist_tservers, "",
> Nit: how about "blacklisted_tservers" to be consistent with "ignored_tserve
Perhaps it's a good idea.

If 'ignored_tservers' are healthy, move replicas from these servers to other servers, if 'ignored_tservers' are unhealthy, remove these tservers and replicas on them from the cluster info and run the rebalancing on remaining servers.

I will try to make some modifications on this patch.



-- 
To view, visit http://gerrit.cloudera.org:8080/14154
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I86cfb740030946c13db1a9ca63d241f4907d6c89
Gerrit-Change-Number: 14154
Gerrit-PatchSet: 1
Gerrit-Owner: Yifan Zhang <ch...@163.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Yifan Zhang <ch...@163.com>
Gerrit-Comment-Date: Fri, 30 Aug 2019 09:11:27 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2914: Rebalance tool support moving replicas from some specific tablet servers

Posted by "Yifan Zhang (Code Review)" <ge...@cloudera.org>.
Yifan Zhang has posted comments on this change. ( http://gerrit.cloudera.org:8080/14154 )

Change subject: KUDU-2914: Rebalance tool support moving replicas from some specific tablet servers
......................................................................


Patch Set 6:

> Patch Set 4:
> 
> I added a comment on the JIRA for more context.  I hope it's useful:
>  
> https://issues.apache.org/jira/browse/KUDU-2914?focusedCommentId=16941152&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16941152

According to the latest recommendations on the JIRA, the progress of moving replicas on a tablet server has been implemented by marking all replicas with REPLACE attribute. And before starting to move replicas, we'll check whether all specified tablet servers are set into maintenance mode first.

Please have a look at the new patch set:)


-- 
To view, visit http://gerrit.cloudera.org:8080/14154
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I86cfb740030946c13db1a9ca63d241f4907d6c89
Gerrit-Change-Number: 14154
Gerrit-PatchSet: 6
Gerrit-Owner: Yifan Zhang <ch...@163.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Yifan Zhang <ch...@163.com>
Gerrit-Comment-Date: Fri, 11 Oct 2019 12:19:35 +0000
Gerrit-HasComments: No

[kudu-CR] KUDU-2914: Rebalance tool support moving replicas from some specific tablet servers

Posted by "Yifan Zhang (Code Review)" <ge...@cloudera.org>.
Hello Alexey Serbin, Kudu Jenkins, Andrew Wong, Adar Dembo, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/14154

to look at the new patch set (#2).

Change subject: KUDU-2914: Rebalance tool support moving replicas from some specific tablet servers
......................................................................

KUDU-2914: Rebalance tool support moving replicas from some specific tablet servers

Aims to support moving replicas from specific tablet servers,
this patch re-uses the '--ignored_tservers' flag and adds a
'--move_replicas_from_ignored_tservers' flag to the
`kudu rebalance cluster` CLI tool.

Once the flag '--ignored_tservers' is specified, and the flag
'--move_replicas_from_ignored_tservers' is specified false,
the given tablet servers are not considered as a part of the
cluster, their health state and replicas on them are ignored
by the rebalancer tool.

If '--move_replicas_from_ignored_tservers' is specified true,
unhealthy tservers in the given tservers will also be ignored by
the rebalancer tool. But for healthy ignored tservers, replicas
on them would be moved to other tservers before starting the
rebalancing process, when running the rebalancing, these tservers
are not considered as a part of the cluster.

Change-Id: I86cfb740030946c13db1a9ca63d241f4907d6c89
---
M src/kudu/rebalance/rebalance-test.cc
M src/kudu/rebalance/rebalance_algo-test.cc
M src/kudu/rebalance/rebalance_algo.cc
M src/kudu/rebalance/rebalance_algo.h
M src/kudu/rebalance/rebalancer.cc
M src/kudu/rebalance/rebalancer.h
M src/kudu/tools/rebalancer_tool-test.cc
M src/kudu/tools/rebalancer_tool.cc
M src/kudu/tools/rebalancer_tool.h
M src/kudu/tools/tool_action_cluster.cc
10 files changed, 936 insertions(+), 127 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/54/14154/2
-- 
To view, visit http://gerrit.cloudera.org:8080/14154
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I86cfb740030946c13db1a9ca63d241f4907d6c89
Gerrit-Change-Number: 14154
Gerrit-PatchSet: 2
Gerrit-Owner: Yifan Zhang <ch...@163.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Yifan Zhang <ch...@163.com>

[kudu-CR] KUDU-2914: Rebalance tool support moving replicas from some specific tablet servers

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/14154 )

Change subject: KUDU-2914: Rebalance tool support moving replicas from some specific tablet servers
......................................................................


Patch Set 1:

(6 comments)

I think I'll take a look at the updated patch once the change suggested by Adar is implemented.

Thank you for your contribution!

http://gerrit.cloudera.org:8080/#/c/14154/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/14154/1//COMMIT_MSG@9
PS1, Line 9: rebalance tool
the rebalancer tool

or

the `kudu cluster rebalance` CLI tool


http://gerrit.cloudera.org:8080/#/c/14154/1//COMMIT_MSG@9
PS1, Line 9: add
adds


http://gerrit.cloudera.org:8080/#/c/14154/1//COMMIT_MSG@12
PS1, Line 12: It will be useful for server decommissioning.
I think moving this into a separate paragraph would make the commit message more readable.


http://gerrit.cloudera.org:8080/#/c/14154/1//COMMIT_MSG@10
PS1, Line 10: Once specified blacklist_tservers, replicas on these servers would
            : be moved to other servers first and then run the rebalancing on
            : servers not in blacklist
I think it is easier to read if this sentence is split into a few, for example:


Use the '--blacklist_tservers' to specify a comma-separated list of tablet servers' UUIDs to move tablet replicas from.  Once the flag is specified, the tool first moves all replicas from the given tablet servers elsewhere before starting the rebalancing process.  When running the rebalancing, the specified tablet servers are not considered as a part of the cluster (i.e. they are effectively ignored).

UPDATE: once you transform this patch to re-use the '--ignored_tservers' flag, the description should be updated correspondingly, sure :)


http://gerrit.cloudera.org:8080/#/c/14154/1/src/kudu/tools/tool_action_cluster.cc
File src/kudu/tools/tool_action_cluster.cc:

http://gerrit.cloudera.org:8080/#/c/14154/1/src/kudu/tools/tool_action_cluster.cc@62
PS1, Line 62: DEFINE_string(blacklist_tservers, "",
> Perhaps it's a good idea.
+1

Yep, I think that's a great idea.

Do you also want to add some control over what to do with replicas on tablet servers which are healthy, but it's desirable to ignore replicas at such tablet servers anyways?   

Probably, it's possible to represent that as extra boolean flag that controls whether to move tablet replicas from the specified tablet servers first.


http://gerrit.cloudera.org:8080/#/c/14154/1/src/kudu/tools/tool_action_cluster.cc@66
PS1, Line 66:               "tserver not in blacklist"
> Should end this sentence with a period and make sure there's a space before
Alternatively, you could simply drop the 'If not specified part' -- it's assumed the tool works with all the tablet servers in the cluster otherwise.



-- 
To view, visit http://gerrit.cloudera.org:8080/14154
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I86cfb740030946c13db1a9ca63d241f4907d6c89
Gerrit-Change-Number: 14154
Gerrit-PatchSet: 1
Gerrit-Owner: Yifan Zhang <ch...@163.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Yifan Zhang <ch...@163.com>
Gerrit-Comment-Date: Fri, 30 Aug 2019 23:56:01 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2914: Rebalance tool support moving replicas from some specific tablet servers

Posted by "Yifan Zhang (Code Review)" <ge...@cloudera.org>.
Hello Alexey Serbin, Kudu Jenkins, Andrew Wong, Adar Dembo, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/14154

to look at the new patch set (#5).

Change subject: KUDU-2914: Rebalance tool support moving replicas from some specific tablet servers
......................................................................

KUDU-2914: Rebalance tool support moving replicas from some specific tablet servers

Aims to support moving replicas from specific tablet servers,
this patch re-uses the '--ignored_tservers' flag and adds a
'--move_replicas_from_ignored_tservers' flag to the
`kudu rebalance cluster` CLI tool.

Once the flag '--ignored_tservers' is specified, and the flag
'--move_replicas_from_ignored_tservers' is specified false,
the given tablet servers are not considered as a part of the
cluster, their health state and replicas on them are ignored
by the rebalancer tool.

If '--move_replicas_from_ignored_tservers' is enabled, unhealthy
tservers in the given ignored tservers will also be ignored by
the rebalancer tool. But for healthy ignored tservers, replicas
on them would be moved to other tservers before starting the
rebalancing process, when running the rebalancing, these tservers
are not considered as a part of the cluster.

Change-Id: I86cfb740030946c13db1a9ca63d241f4907d6c89
---
M src/kudu/rebalance/rebalance-test.cc
M src/kudu/rebalance/rebalance_algo-test.cc
M src/kudu/rebalance/rebalance_algo.cc
M src/kudu/rebalance/rebalance_algo.h
M src/kudu/rebalance/rebalancer.cc
M src/kudu/rebalance/rebalancer.h
M src/kudu/tools/rebalancer_tool-test.cc
M src/kudu/tools/rebalancer_tool.cc
M src/kudu/tools/rebalancer_tool.h
M src/kudu/tools/tool_action_cluster.cc
10 files changed, 865 insertions(+), 68 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/54/14154/5
-- 
To view, visit http://gerrit.cloudera.org:8080/14154
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I86cfb740030946c13db1a9ca63d241f4907d6c89
Gerrit-Change-Number: 14154
Gerrit-PatchSet: 5
Gerrit-Owner: Yifan Zhang <ch...@163.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Yifan Zhang <ch...@163.com>

[kudu-CR] KUDU-2914: Rebalance tool support moving replicas from some specific tablet servers

Posted by "Yifan Zhang (Code Review)" <ge...@cloudera.org>.
Hello Alexey Serbin, Kudu Jenkins, Andrew Wong, Adar Dembo, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/14154

to look at the new patch set (#7).

Change subject: KUDU-2914: Rebalance tool support moving replicas from some specific tablet servers
......................................................................

KUDU-2914: Rebalance tool support moving replicas from some specific tablet servers

Aims to support moving replicas from specific tablet servers,
this patch re-uses the '--ignored_tservers' flag and adds a
'--move_replicas_from_ignored_tservers' flag to the
`kudu rebalance cluster` CLI tool.

Once the flag '--ignored_tservers' is specified, the given
tablet servers are not considered as a part of the cluster,
both their health state and replicas on them are ignored
by the rebalancer tool.

While if '--move_replicas_from_ignored_tservers' is enabled,
replicas on healthy ignored tservers would be moved to other
tservers first, and then running the rebalancing on the other
tservers in the cluster.

Change-Id: I86cfb740030946c13db1a9ca63d241f4907d6c89
---
M src/kudu/rebalance/rebalance-test.cc
M src/kudu/rebalance/rebalance_algo.h
M src/kudu/rebalance/rebalancer.cc
M src/kudu/rebalance/rebalancer.h
M src/kudu/tools/rebalancer_tool-test.cc
M src/kudu/tools/rebalancer_tool.cc
M src/kudu/tools/rebalancer_tool.h
M src/kudu/tools/tool_action_cluster.cc
8 files changed, 699 insertions(+), 90 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/54/14154/7
-- 
To view, visit http://gerrit.cloudera.org:8080/14154
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I86cfb740030946c13db1a9ca63d241f4907d6c89
Gerrit-Change-Number: 14154
Gerrit-PatchSet: 7
Gerrit-Owner: Yifan Zhang <ch...@163.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Yifan Zhang <ch...@163.com>

[kudu-CR] KUDU-2914: Rebalance tool support moving replicas from some specific tablet servers

Posted by "Yifan Zhang (Code Review)" <ge...@cloudera.org>.
Hello Alexey Serbin, Kudu Jenkins, Andrew Wong, Adar Dembo, Hannah Nguyen, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/14154

to look at the new patch set (#10).

Change subject: KUDU-2914: Rebalance tool support moving replicas from some specific tablet servers
......................................................................

KUDU-2914: Rebalance tool support moving replicas from some specific tablet servers

Aims to support moving replicas from specific tablet servers,
this patch re-uses the '--ignored_tservers' flag and adds a
'--move_replicas_from_ignored_tservers' flag to the
`kudu rebalance cluster` CLI tool.

Once the flag '--ignored_tservers' is specified, the given
tablet servers are not considered as a part of the cluster,
both their health state and replicas on them are ignored
by the rebalancer tool.

While if '--move_replicas_from_ignored_tservers' is enabled,
replicas on healthy ignored tservers would be moved to other
tservers first, and then running the rebalancing on the other
healthy tservers in the cluster.

Change-Id: I86cfb740030946c13db1a9ca63d241f4907d6c89
---
M src/kudu/rebalance/rebalance-test.cc
M src/kudu/rebalance/rebalance_algo.h
M src/kudu/rebalance/rebalancer.cc
M src/kudu/rebalance/rebalancer.h
M src/kudu/tools/rebalancer_tool-test.cc
M src/kudu/tools/rebalancer_tool.cc
M src/kudu/tools/rebalancer_tool.h
M src/kudu/tools/tool_action_cluster.cc
8 files changed, 708 insertions(+), 99 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/54/14154/10
-- 
To view, visit http://gerrit.cloudera.org:8080/14154
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I86cfb740030946c13db1a9ca63d241f4907d6c89
Gerrit-Change-Number: 14154
Gerrit-PatchSet: 10
Gerrit-Owner: Yifan Zhang <ch...@163.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Hannah Nguyen <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Yifan Zhang <ch...@163.com>

[kudu-CR] KUDU-2914: Rebalance tool support moving replicas from some specific tablet servers

Posted by "Yifan Zhang (Code Review)" <ge...@cloudera.org>.
Yifan Zhang has posted comments on this change. ( http://gerrit.cloudera.org:8080/14154 )

Change subject: KUDU-2914: Rebalance tool support moving replicas from some specific tablet servers
......................................................................


Patch Set 3:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/14154/2/src/kudu/rebalance/rebalancer.h
File src/kudu/rebalance/rebalancer.h:

http://gerrit.cloudera.org:8080/#/c/14154/2/src/kudu/rebalance/rebalancer.h@205
PS2, Line 205: a specific tabl
> nit: document whether it's possible to specify 'nullptr' as an argument whe
Done


http://gerrit.cloudera.org:8080/#/c/14154/2/src/kudu/rebalance/rebalancer.cc
File src/kudu/rebalance/rebalancer.cc:

http://gerrit.cloudera.org:8080/#/c/14154/2/src/kudu/rebalance/rebalancer.cc@157
PS2, Line 157:     // No healthy tablet of s
> This seems to be the only place where the output parameter is assigned.  Do
Done


http://gerrit.cloudera.org:8080/#/c/14154/2/src/kudu/tools/rebalancer_tool.cc
File src/kudu/tools/rebalancer_tool.cc:

http://gerrit.cloudera.org:8080/#/c/14154/2/src/kudu/tools/rebalancer_tool.cc@951
PS2, Line 951:     for (const auto& s : raw_info.tablet_summaries) {
             :       int num_voters = 0;
             :       for (const auto& rs : s.replicas) {
             :         if (rs.is_voter) {
             :           ++num_voters;
             :         }
             :       }
             :       const auto rf = FindOrDie(replication_factors_by_table, s.table_id);
             :       EmplaceOrDie(&extra_info_by_tablet_id,
             :                    s.id, TabletExtraInfo{rf, num_voters});
             :     }
             :   }
> Moving the replicas from the healthy ignored tservers and moving the replic
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/14154
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I86cfb740030946c13db1a9ca63d241f4907d6c89
Gerrit-Change-Number: 14154
Gerrit-PatchSet: 3
Gerrit-Owner: Yifan Zhang <ch...@163.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Yifan Zhang <ch...@163.com>
Gerrit-Comment-Date: Sat, 14 Sep 2019 00:55:08 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2914: Rebalance tool support moving replicas from some specific tablet servers

Posted by "Yifan Zhang (Code Review)" <ge...@cloudera.org>.
Hello Alexey Serbin, Kudu Jenkins, Andrew Wong, Adar Dembo, Hannah Nguyen, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/14154

to look at the new patch set (#12).

Change subject: KUDU-2914: Rebalance tool support moving replicas from some specific tablet servers
......................................................................

KUDU-2914: Rebalance tool support moving replicas from some specific tablet servers

Aims to support moving replicas from specific tablet servers,
this patch re-uses the '--ignored_tservers' flag and adds a
'--move_replicas_from_ignored_tservers' flag to the
`kudu rebalance cluster` CLI tool.

Once the flag '--ignored_tservers' is specified, the given
tablet servers are not considered as a part of the cluster,
both their health state and replicas on them are ignored
by the rebalancer tool.

While if '--move_replicas_from_ignored_tservers' is enabled,
replicas on healthy ignored tservers would be moved to other
tservers first, and then running the rebalancing on the other
healthy tservers in the cluster.

Change-Id: I86cfb740030946c13db1a9ca63d241f4907d6c89
---
M src/kudu/rebalance/rebalance-test.cc
M src/kudu/rebalance/rebalance_algo.h
M src/kudu/rebalance/rebalancer.cc
M src/kudu/rebalance/rebalancer.h
M src/kudu/tools/rebalancer_tool-test.cc
M src/kudu/tools/rebalancer_tool.cc
M src/kudu/tools/rebalancer_tool.h
M src/kudu/tools/tool_action_cluster.cc
8 files changed, 824 insertions(+), 100 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/54/14154/12
-- 
To view, visit http://gerrit.cloudera.org:8080/14154
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I86cfb740030946c13db1a9ca63d241f4907d6c89
Gerrit-Change-Number: 14154
Gerrit-PatchSet: 12
Gerrit-Owner: Yifan Zhang <ch...@163.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Hannah Nguyen <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Yifan Zhang <ch...@163.com>

[kudu-CR] KUDU-2914: Rebalance tool support moving replicas from some specific tablet servers

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/14154 )

Change subject: KUDU-2914: Rebalance tool support moving replicas from some specific tablet servers
......................................................................


Patch Set 10:

(10 comments)

http://gerrit.cloudera.org:8080/#/c/14154/10/src/kudu/rebalance/rebalance-test.cc
File src/kudu/rebalance/rebalance-test.cc:

http://gerrit.cloudera.org:8080/#/c/14154/10/src/kudu/rebalance/rebalance-test.cc@192
PS10, Line 192: bool HasSameContents(const multimap<int32_t, TableBalanceInfo>& lhs,
I think it's worth documenting the essence of what this function uses as the lhs-to-rhs comparision criteria.


http://gerrit.cloudera.org:8080/#/c/14154/10/src/kudu/tools/rebalancer_tool-test.cc
File src/kudu/tools/rebalancer_tool-test.cc:

http://gerrit.cloudera.org:8080/#/c/14154/10/src/kudu/tools/rebalancer_tool-test.cc@305
PS10, Line 305:   bool is_343_scheme = true;
              :   const vector<string> kMasterFlags = {
              :     Substitute("--raft_prepare_replacement_before_eviction=$0", is_343_scheme),
              :   };
              :   const vector<string> kTserverFlags = {
              :     Substitute("--raft_prepare_replacement_before_eviction=$0", is_343_scheme),
              :   };
This is not necessary: newer Kudu clusters runs in 343 mode by default.


http://gerrit.cloudera.org:8080/#/c/14154/10/src/kudu/tools/rebalancer_tool-test.cc@355
PS10, Line 355:       "--ignored_tservers=" +
              :       cluster_->tablet_server(0)->uuid(),
nit: join there two lines for better readability


http://gerrit.cloudera.org:8080/#/c/14154/10/src/kudu/tools/rebalancer_tool-test.cc@360
PS10, Line 360:     ASSERT_STR_CONTAINS(out, Substitute("$0 | 0", cluster_->tablet_server(0)->uuid()))
              :         << "stderr: " << err;
Add a comment explaining that this line show that the number of tablet replicas at the ignored tserver is 0 after the rebalancing.


http://gerrit.cloudera.org:8080/#/c/14154/10/src/kudu/tools/rebalancer_tool-test.cc@372
PS10, Line 372:   ts->Shutdown();
BTW, what happens if a tablet server, which is ignored and healthy in the beginning, goes down during the rebalancing process?  Maybe, add a test scenario to be explicit of what behavior is expected in such case.


http://gerrit.cloudera.org:8080/#/c/14154/10/src/kudu/tools/rebalancer_tool.cc
File src/kudu/tools/rebalancer_tool.cc:

http://gerrit.cloudera.org:8080/#/c/14154/10/src/kudu/tools/rebalancer_tool.cc@188
PS10, Line 188: on
nit: from


http://gerrit.cloudera.org:8080/#/c/14154/10/src/kudu/tools/rebalancer_tool.cc@1294
PS10, Line 1294:   RETURN_NOT_OK(GetReplaceMoves(ci, raw_info, replica_moves));
               :   return Status::OK();
nit: shorten this to

return GetReplaceMoves(ci, raw_info, replica_moves);


http://gerrit.cloudera.org:8080/#/c/14154/10/src/kudu/tools/rebalancer_tool.cc@1420
PS10, Line 1420: helathy
healthy


http://gerrit.cloudera.org:8080/#/c/14154/10/src/kudu/tools/rebalancer_tool.cc@1421
PS10, Line 1421:   // maintenance(or decommision) mode.
nit: add a space before the parenthesis


http://gerrit.cloudera.org:8080/#/c/14154/10/src/kudu/tools/rebalancer_tool.cc@1506
PS10, Line 1506:       ReplicaMove move = {tablets_info[i].tablet_id, elem.first, "", tablets_info[i].config_idx};
               :       result_moves.emplace_back(std::move(move));
nit: would the following work here

  result_moves.emplace_back(tablets_info[i].tablet_id, elem.first, "", tablets_info[i].config_idx);

?



-- 
To view, visit http://gerrit.cloudera.org:8080/14154
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I86cfb740030946c13db1a9ca63d241f4907d6c89
Gerrit-Change-Number: 14154
Gerrit-PatchSet: 10
Gerrit-Owner: Yifan Zhang <ch...@163.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Hannah Nguyen <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Yifan Zhang <ch...@163.com>
Gerrit-Comment-Date: Fri, 25 Oct 2019 01:27:26 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2914: Rebalance tool support moving replicas from some specific tablet servers

Posted by "Yifan Zhang (Code Review)" <ge...@cloudera.org>.
Hello Alexey Serbin, Kudu Jenkins, Andrew Wong, Adar Dembo, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/14154

to look at the new patch set (#3).

Change subject: KUDU-2914: Rebalance tool support moving replicas from some specific tablet servers
......................................................................

KUDU-2914: Rebalance tool support moving replicas from some specific tablet servers

Aims to support moving replicas from specific tablet servers,
this patch re-uses the '--ignored_tservers' flag and adds a
'--move_replicas_from_ignored_tservers' flag to the
`kudu rebalance cluster` CLI tool.

Once the flag '--ignored_tservers' is specified, and the flag
'--move_replicas_from_ignored_tservers' is specified false,
the given tablet servers are not considered as a part of the
cluster, their health state and replicas on them are ignored
by the rebalancer tool.

If '--move_replicas_from_ignored_tservers' is specified true,
unhealthy tservers in the given tservers will also be ignored by
the rebalancer tool. But for healthy ignored tservers, replicas
on them would be moved to other tservers before starting the
rebalancing process, when running the rebalancing, these tservers
are not considered as a part of the cluster.

Change-Id: I86cfb740030946c13db1a9ca63d241f4907d6c89
---
M src/kudu/rebalance/rebalance-test.cc
M src/kudu/rebalance/rebalance_algo-test.cc
M src/kudu/rebalance/rebalance_algo.cc
M src/kudu/rebalance/rebalance_algo.h
M src/kudu/rebalance/rebalancer.cc
M src/kudu/rebalance/rebalancer.h
M src/kudu/tools/rebalancer_tool-test.cc
M src/kudu/tools/rebalancer_tool.cc
M src/kudu/tools/rebalancer_tool.h
M src/kudu/tools/tool_action_cluster.cc
10 files changed, 827 insertions(+), 67 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/54/14154/3
-- 
To view, visit http://gerrit.cloudera.org:8080/14154
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I86cfb740030946c13db1a9ca63d241f4907d6c89
Gerrit-Change-Number: 14154
Gerrit-PatchSet: 3
Gerrit-Owner: Yifan Zhang <ch...@163.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Yifan Zhang <ch...@163.com>

[kudu-CR] KUDU-2914: Rebalance tool support moving replicas from some specific tablet servers

Posted by "Yifan Zhang (Code Review)" <ge...@cloudera.org>.
Yifan Zhang has posted comments on this change. ( http://gerrit.cloudera.org:8080/14154 )

Change subject: KUDU-2914: Rebalance tool support moving replicas from some specific tablet servers
......................................................................


Patch Set 13:

(20 comments)

http://gerrit.cloudera.org:8080/#/c/14154/12/src/kudu/rebalance/rebalance-test.cc
File src/kudu/rebalance/rebalance-test.cc:

http://gerrit.cloudera.org:8080/#/c/14154/12/src/kudu/rebalance/rebalance-test.cc@156
PS12, Line 156:       }
> const ?
Done


http://gerrit.cloudera.org:8080/#/c/14154/12/src/kudu/rebalance/rebalance-test.cc@157
PS12, Line 157: in()
> nit: fix the alignment
Done


http://gerrit.cloudera.org:8080/#/c/14154/12/src/kudu/rebalance/rebalance-test.cc@197
PS12, Line 197:                       rhs.servers_by_total_replica_count);
              : }
> nit: is it possible to templatize bool HasSameContents(const ServersByCount
Done


http://gerrit.cloudera.org:8080/#/c/14154/12/src/kudu/rebalance/rebalance-test.cc@271
PS12, Line 271: Empty
> I think this method by itself doesn't imply working on ignored or other sub
Done


http://gerrit.cloudera.org:8080/#/c/14154/12/src/kudu/rebalance/rebalance-test.cc@293
PS12, Line 293:           { "tablet_a0", "table_a", { { "ts_0", true }, }, },
> I think it's worth adding a comment explaining the essence of the constrain
Done


http://gerrit.cloudera.org:8080/#/c/14154/12/src/kudu/rebalance/rebalance-test.cc@296
PS12, Line 296:         },
> Add SCOPED_TRACE() to get more information on the context if ASSERT_FALSE()
Done


http://gerrit.cloudera.org:8080/#/c/14154/12/src/kudu/rebalance/rebalance-test.cc@527
PS12, Line 527: 
> if moving replicas from
Done


http://gerrit.cloudera.org:8080/#/c/14154/12/src/kudu/rebalance/rebalance_algo.h
File src/kudu/rebalance/rebalance_algo.h:

http://gerrit.cloudera.org:8080/#/c/14154/12/src/kudu/rebalance/rebalance_algo.h@92
PS12, Line 92: on
> nit: on
Done


http://gerrit.cloudera.org:8080/#/c/14154/12/src/kudu/rebalance/rebalancer.cc
File src/kudu/rebalance/rebalancer.cc:

http://gerrit.cloudera.org:8080/#/c/14154/12/src/kudu/rebalance/rebalancer.cc@450
PS12, Line 450:       const int* replica_count = FindOrNull(tserver_replicas_count, ignored_tserver);
              :       if (!replica_count) {
> Would FindOrNull() work here?
Done


http://gerrit.cloudera.org:8080/#/c/14154/12/src/kudu/tools/rebalancer_tool-test.cc
File src/kudu/tools/rebalancer_tool-test.cc:

http://gerrit.cloudera.org:8080/#/c/14154/12/src/kudu/tools/rebalancer_tool-test.cc@273
PS12, Line 273:       "--ignored_tservers=" + JoinStrings(ignored_tservers, ","),
              :       "--move_replicas_from_ignored_tserv
> Join these two strings.
Done


http://gerrit.cloudera.org:8080/#/c/14154/12/src/kudu/tools/rebalancer_tool-test.cc@701
PS12, Line 701:  // Wait for the catalog manager to unde
> If an ignored tablet server went down
Done


http://gerrit.cloudera.org:8080/#/c/14154/12/src/kudu/tools/rebalancer_tool-test.cc@702
PS12, Line 702: 
> server
Done


http://gerrit.cloudera.org:8080/#/c/14154/12/src/kudu/tools/rebalancer_tool-test.cc@706
PS12, Line 706: 
> nit: move this to the next line
Done


http://gerrit.cloudera.org:8080/#/c/14154/12/src/kudu/tools/rebalancer_tool-test.cc@757
PS12, Line 757: ts()) {
> I think it might be important to know which tablet server is in progress of
It's quiet difficult to know which tablet server is in progress of copying data, through ksck tool we could only know some tablets are copying data (tablets contains 4 replicas and one of replicas is not running), but which replica would be removed is not sure.

The second case is similar to the case where the ignored tablet server goes down before run the rebalancing.


http://gerrit.cloudera.org:8080/#/c/14154/12/src/kudu/tools/rebalancer_tool-test.cc@795
PS12, Line 795:   }
> Is there some significance behind starting the tablet server back?  If yes,
Done


http://gerrit.cloudera.org:8080/#/c/14154/12/src/kudu/tools/rebalancer_tool.cc
File src/kudu/tools/rebalancer_tool.cc:

http://gerrit.cloudera.org:8080/#/c/14154/12/src/kudu/tools/rebalancer_tool.cc@639
PS12, Line 639: 
> healthy non-ignored
Done


http://gerrit.cloudera.org:8080/#/c/14154/12/src/kudu/tools/rebalancer_tool.cc@1243
PS12, Line 1243:     auto s = CheckComplet
> I'm not sure this is correct by the semantics of the 'has_updates' variable
I think the 'has_updates' variable indicates whether the statuses of in-progress moves has been updated.
 
In RebalancerTool::RunWith(Runner* runner, RunStatus* result_status) method, if no updates, the rebalancer tool will re-synchronize the state of the cluster.  I think it's not necessary to re-synchronize the cluster state when the move has not been completed.


http://gerrit.cloudera.org:8080/#/c/14154/12/src/kudu/tools/tool_action_cluster.cc
File src/kudu/tools/tool_action_cluster.cc:

http://gerrit.cloudera.org:8080/#/c/14154/12/src/kudu/tools/tool_action_cluster.cc@131
PS12, Line 131: the source tablet
> the source tablet server
Done


http://gerrit.cloudera.org:8080/#/c/14154/12/src/kudu/tools/tool_action_cluster.cc@132
PS12, Line 132:             "This setting is effective only if the '--ignored_tservers' flag "
> nit: add a space between 'flag' and '"'
Done


http://gerrit.cloudera.org:8080/#/c/14154/12/src/kudu/tools/tool_action_cluster.cc@134
PS12, Line 134: then all ignored tablet servers must be placed into "
              :             "the 'maintenance mode'.");
> then all ignored tablet servers must be placed into the 'maintenance mode'
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/14154
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I86cfb740030946c13db1a9ca63d241f4907d6c89
Gerrit-Change-Number: 14154
Gerrit-PatchSet: 13
Gerrit-Owner: Yifan Zhang <ch...@163.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Hannah Nguyen <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Yifan Zhang <ch...@163.com>
Gerrit-Comment-Date: Fri, 15 Nov 2019 08:01:36 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2914: Rebalance tool support moving replicas from some specific tablet servers

Posted by "Yifan Zhang (Code Review)" <ge...@cloudera.org>.
Yifan Zhang has posted comments on this change. ( http://gerrit.cloudera.org:8080/14154 )

Change subject: KUDU-2914: Rebalance tool support moving replicas from some specific tablet servers
......................................................................


Patch Set 5:

(13 comments)

http://gerrit.cloudera.org:8080/#/c/14154/4/src/kudu/rebalance/rebalancer.h
File src/kudu/rebalance/rebalancer.h:

http://gerrit.cloudera.org:8080/#/c/14154/4/src/kudu/rebalance/rebalancer.h@72
PS4, Line 72:     // If not empty, specified tablet servers (including their health state
> nit: missing space
Done


http://gerrit.cloudera.org:8080/#/c/14154/4/src/kudu/rebalance/rebalancer.h@101
PS4, Line 101: run
> running
Done


http://gerrit.cloudera.org:8080/#/c/14154/4/src/kudu/rebalance/rebalancer.h@202
PS4, Line 202: .
> This part is no longer relevant once the return type changed from Status to
Done


http://gerrit.cloudera.org:8080/#/c/14154/4/src/kudu/tools/rebalancer_tool-test.cc
File src/kudu/tools/rebalancer_tool-test.cc:

http://gerrit.cloudera.org:8080/#/c/14154/4/src/kudu/tools/rebalancer_tool-test.cc@239
PS4, Line 239: SafetyTe
> Safety?
Done


http://gerrit.cloudera.org:8080/#/c/14154/4/src/kudu/tools/rebalancer_tool-test.cc@245
PS4, Line 245: TEST_P(RebalanceStartSafetyTest, TooManyIgnoredTservers) {
> I think it's also necessary to add a scenario when the rebalancer tool actu
Done


http://gerrit.cloudera.org:8080/#/c/14154/4/src/kudu/tools/rebalancer_tool-test.cc@278
PS4, Line 278: MATCHES(err, "T
> nit: this constant is used only once; maybe replace it's usage with the str
Done


http://gerrit.cloudera.org:8080/#/c/14154/4/src/kudu/tools/rebalancer_tool-test.cc@299
PS4, Line 299:   {
> nit: an extra line
Done


http://gerrit.cloudera.org:8080/#/c/14154/4/src/kudu/tools/rebalancer_tool.h
File src/kudu/tools/rebalancer_tool.h:

http://gerrit.cloudera.org:8080/#/c/14154/4/src/kudu/tools/rebalancer_tool.h@327
PS4, Line 327: ab
> nit: about
Done


http://gerrit.cloudera.org:8080/#/c/14154/4/src/kudu/tools/rebalancer_tool.h@346
PS4, Line 346: the ignored to o
> nit: ignored
Done


http://gerrit.cloudera.org:8080/#/c/14154/4/src/kudu/tools/rebalancer_tool.cc
File src/kudu/tools/rebalancer_tool.cc:

http://gerrit.cloudera.org:8080/#/c/14154/4/src/kudu/tools/rebalancer_tool.cc@1014
PS4, Line 1014:         for (const auto& elem : server_uuids) {
              :           if (elem != move.to && !ContainsKey(cluster_info.healthy_ignored_tservers, elem)) {
              :             LOG(INFO) << Substitute("choose destination server: $0",elem);
              :             move.to = elem;
              :            
> How is it guaranteed that the amended move doesn't make the cluster even mo
A random 'move.to' server has been chosen here and the move would possibly make the cluster more imbalanced.

When we need to move a replica of table T from ignored server A to server B, maybe we can't move any replica because there are replicas of the same tablets on server B, and if we skip this move, next time the scheduled move will be same, so we need to find another 'move.to' server.

If the move has been fixed, we will break the loop and get next moves based on the move, so the final state is balanced.


http://gerrit.cloudera.org:8080/#/c/14154/4/src/kudu/tools/tool_action_cluster.cc
File src/kudu/tools/tool_action_cluster.cc:

http://gerrit.cloudera.org:8080/#/c/14154/4/src/kudu/tools/tool_action_cluster.cc@130
PS4, Line 130: move 
> move
Done


http://gerrit.cloudera.org:8080/#/c/14154/4/src/kudu/tools/tool_action_cluster.cc@130
PS4, Line 130: ervers to other"
> servers
Done


http://gerrit.cloudera.org:8080/#/c/14154/4/src/kudu/tools/tool_action_cluster.cc@132
PS4, Line 132: This setting is effective only if the '--ignored_tservers' fla
> How about:
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/14154
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I86cfb740030946c13db1a9ca63d241f4907d6c89
Gerrit-Change-Number: 14154
Gerrit-PatchSet: 5
Gerrit-Owner: Yifan Zhang <ch...@163.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Yifan Zhang <ch...@163.com>
Gerrit-Comment-Date: Tue, 01 Oct 2019 16:21:02 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2914: Rebalance tool support moving replicas from some specific tablet servers

Posted by "Yifan Zhang (Code Review)" <ge...@cloudera.org>.
Yifan Zhang has posted comments on this change. ( http://gerrit.cloudera.org:8080/14154 )

Change subject: KUDU-2914: Rebalance tool support moving replicas from some specific tablet servers
......................................................................


Patch Set 7:

(13 comments)

http://gerrit.cloudera.org:8080/#/c/14154/6//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/14154/6//COMMIT_MSG@20
PS6, Line 20: other
            : tservers first, and then running the rebalancing on the other
            : tservers in the clu
> Why bother differentiating between unhealthy and healthy ignored servers? I
It is because the rebalancer tool only runs on healthy servers.


http://gerrit.cloudera.org:8080/#/c/14154/6/src/kudu/rebalance/rebalance-test.cc
File src/kudu/rebalance/rebalance-test.cc:

http://gerrit.cloudera.org:8080/#/c/14154/6/src/kudu/rebalance/rebalance-test.cc@157
PS6, Line 157:     auto get_values = [](pair<ServersByCountMap::const_iterator,
> nit: for local variables, we prefer snake_case, or sometimes for lambdas, w
Done


http://gerrit.cloudera.org:8080/#/c/14154/6/src/kudu/rebalance/rebalance_algo.h
File src/kudu/rebalance/rebalance_algo.h:

http://gerrit.cloudera.org:8080/#/c/14154/6/src/kudu/rebalance/rebalance_algo.h@93
PS6, Line 93: tservers_to_empty;
> nit: "healthy"
I modified it to "tservers_to_empty"


http://gerrit.cloudera.org:8080/#/c/14154/6/src/kudu/tools/rebalancer_tool-test.cc
File src/kudu/tools/rebalancer_tool-test.cc:

http://gerrit.cloudera.org:8080/#/c/14154/6/src/kudu/tools/rebalancer_tool-test.cc@241
PS6, Line 241: ::testing::WithParamInterface<Kudu1097>
> Why bother testing the 3-2-3 replication? I don't think maintenance mode wi
I'll fix the test in the next patch set.


http://gerrit.cloudera.org:8080/#/c/14154/6/src/kudu/tools/rebalancer_tool.h
File src/kudu/tools/rebalancer_tool.h:

http://gerrit.cloudera.org:8080/#/c/14154/6/src/kudu/tools/rebalancer_tool.h@322
PS6, Line 322:                    co
> nit: Could you add a class-wide comment explaining what the IgnoredTservers
I added some comments for 'GetReplaceMoves' in each class.


http://gerrit.cloudera.org:8080/#/c/14154/6/src/kudu/tools/rebalancer_tool.cc
File src/kudu/tools/rebalancer_tool.cc:

http://gerrit.cloudera.org:8080/#/c/14154/6/src/kudu/tools/rebalancer_tool.cc@639
PS6, Line 639:   Substitute("Too many ignored servers; $0 healthy servers exist but $1 are require
> nit: for returned statuses, we generally just use a single line message. Al
Done


http://gerrit.cloudera.org:8080/#/c/14154/6/src/kudu/tools/rebalancer_tool.cc@1217
PS6, Line 1217:   LOG(INFO) << Su
> nit: this is trivially true based on the line above, so maybe remove it?
Done


http://gerrit.cloudera.org:8080/#/c/14154/6/src/kudu/tools/rebalancer_tool.cc@1302
PS6, Line 1302:   // desitnation servers for the move of the replica marked with
              :   // the REPLACE attribute is not known.
              :   // Load the least loaded (in terms of 
> nit: while I appreciate you taking care of this TODO, could you move this c
OK.


http://gerrit.cloudera.org:8080/#/c/14154/6/src/kudu/tools/rebalancer_tool.cc@1337
PS6, Line 1337:   const auto op_count = op_count_per_ts_[ts_uuid]++;
              :   const auto op_range = ts_per_op_count_.equal_range(op_count);
              :   bool ts_op_count_updated = false;
              :   for (auto it = op_range.first; it != op_range.second; ++it) {
              :     if (it->second == ts_uuid) {
              :       ts_per_op_count_.erase(it);
              :       ts_per_op_count_.emplace(op_count + 1, ts_uuid);
              :       ts_op_count_updated = true;
              :       break;
              :     }
              :   }
              :   D
> Do you know why the policy fixer didn't do this before?
This part has something to do with the TODO above, I'll move them all to a seperate patch.


http://gerrit.cloudera.org:8080/#/c/14154/6/src/kudu/tools/rebalancer_tool.cc@1421
PS6, Line 1421: d tserv
> nit: ignored
Done


http://gerrit.cloudera.org:8080/#/c/14154/6/src/kudu/tools/rebalancer_tool.cc@1482
PS6, Line 1482: sKey(ci.t
> nit: tserver ID
Done


http://gerrit.cloudera.org:8080/#/c/14154/6/src/kudu/tools/rebalancer_tool.cc@1506
PS6, Line 1506: et_id, elem.first, "", t
> nit: fix spacing
Done


http://gerrit.cloudera.org:8080/#/c/14154/6/src/kudu/tools/tool_action_cluster.cc
File src/kudu/tools/tool_action_cluster.cc:

http://gerrit.cloudera.org:8080/#/c/14154/6/src/kudu/tools/tool_action_cluster.cc@131
PS6, Line 131:  If set true, "
             :             "the healthy ignored servers should be set into maintenance mode first. 
> While this is the most effective way to decommission a tserver, I don't thi
Agreed. But now this tool is stongly dependent on the 'maintenance mode'. If use this flag,  we should make sure that new replicas will not be placed on the specified tservers.



-- 
To view, visit http://gerrit.cloudera.org:8080/14154
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I86cfb740030946c13db1a9ca63d241f4907d6c89
Gerrit-Change-Number: 14154
Gerrit-PatchSet: 7
Gerrit-Owner: Yifan Zhang <ch...@163.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Yifan Zhang <ch...@163.com>
Gerrit-Comment-Date: Mon, 21 Oct 2019 13:05:43 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2914: Rebalance tool support moving replicas from some specific tablet servers

Posted by "Yifan Zhang (Code Review)" <ge...@cloudera.org>.
Yifan Zhang has posted comments on this change. ( http://gerrit.cloudera.org:8080/14154 )

Change subject: KUDU-2914: Rebalance tool support moving replicas from some specific tablet servers
......................................................................


Patch Set 11:

(12 comments)

http://gerrit.cloudera.org:8080/#/c/14154/10/src/kudu/rebalance/rebalance-test.cc
File src/kudu/rebalance/rebalance-test.cc:

http://gerrit.cloudera.org:8080/#/c/14154/10/src/kudu/rebalance/rebalance-test.cc@192
PS10, Line 192: // The order of the key-value pairs whose keys compare equivalent is the order
> I think it's worth documenting the essence of what this function uses as th
Done


http://gerrit.cloudera.org:8080/#/c/14154/10/src/kudu/rebalance/rebalancer.h
File src/kudu/rebalance/rebalancer.h:

http://gerrit.cloudera.org:8080/#/c/14154/10/src/kudu/rebalance/rebalancer.h@70
PS10, Line 70: If empty, run the rebalancing on
             :     // all tablet servers in the cluster 
> Would it be beneficial to have the option to choose the former default beha
Yes, if there are some unhealthy tservers in the cluster, and we didn't specify them in the ignored_tservers flag, the rebalancer tool would not run.


http://gerrit.cloudera.org:8080/#/c/14154/10/src/kudu/tools/rebalancer_tool-test.cc
File src/kudu/tools/rebalancer_tool-test.cc:

http://gerrit.cloudera.org:8080/#/c/14154/10/src/kudu/tools/rebalancer_tool-test.cc@305
PS10, Line 305:   FLAGS_num_tablet_servers = 5;
              :   NO_FATALS(BuildAndStart());
              : 
              :   // Assign one ignored tserver and move_replicas_from_ignored_tservers=true
              :   // without setting it into maintenance mode.
              :   {
              :     
> This is not necessary: newer Kudu clusters runs in 343 mode by default.
Done


http://gerrit.cloudera.org:8080/#/c/14154/10/src/kudu/tools/rebalancer_tool-test.cc@355
PS10, Line 355:         << "stderr: " << err;
              :   }
> nit: join there two lines for better readability
Done


http://gerrit.cloudera.org:8080/#/c/14154/10/src/kudu/tools/rebalancer_tool-test.cc@360
PS10, Line 360:   // 'output_replica_distribution_details' are both enabled.
              :   auto* ts = cluster_->tablet
> Add a comment explaining that this line show that the number of tablet repl
Done


http://gerrit.cloudera.org:8080/#/c/14154/10/src/kudu/tools/rebalancer_tool-test.cc@372
PS10, Line 372:       "--move_replicas_from_ignored_tservers",
> BTW, what happens if a tablet server, which is ignored and healthy in the b
The rebalancer tool would ignore the unhealthy state of the tablet server since it has been specified 'ignored', it would just log some information about the unhealthy server and exit normally. I have added a test.


http://gerrit.cloudera.org:8080/#/c/14154/10/src/kudu/tools/rebalancer_tool.cc
File src/kudu/tools/rebalancer_tool.cc:

http://gerrit.cloudera.org:8080/#/c/14154/10/src/kudu/tools/rebalancer_tool.cc@188
PS10, Line 188: fr
> nit: from
Done


http://gerrit.cloudera.org:8080/#/c/14154/10/src/kudu/tools/rebalancer_tool.cc@631
PS10, Line 631:                    
> Is this function only called when move_replicas_from_ignored_tservers=true?
Yes, this function only called when move_replicas_from_ignored_tservers=true.


http://gerrit.cloudera.org:8080/#/c/14154/10/src/kudu/tools/rebalancer_tool.cc@1294
PS10, Line 1294:   ClusterInfo ci;
               :   RETURN_NOT_OK(rebala
> nit: shorten this to
Done


http://gerrit.cloudera.org:8080/#/c/14154/10/src/kudu/tools/rebalancer_tool.cc@1420
PS10, Line 1420: eplica 
> healthy
Done


http://gerrit.cloudera.org:8080/#/c/14154/10/src/kudu/tools/rebalancer_tool.cc@1421
PS10, Line 1421:   // allow to run only when all healthy ignored tservers are in
> nit: add a space before the parenthesis
Done


http://gerrit.cloudera.org:8080/#/c/14154/10/src/kudu/tools/rebalancer_tool.cc@1506
PS10, Line 1506:             });
               : 
> nit: would the following work here
It would not work, ’ReplicaMove' was defined as a struct.



-- 
To view, visit http://gerrit.cloudera.org:8080/14154
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I86cfb740030946c13db1a9ca63d241f4907d6c89
Gerrit-Change-Number: 14154
Gerrit-PatchSet: 11
Gerrit-Owner: Yifan Zhang <ch...@163.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Hannah Nguyen <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Yifan Zhang <ch...@163.com>
Gerrit-Comment-Date: Mon, 28 Oct 2019 11:35:47 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2914: Rebalance tool support moving replicas from some specific tablet servers

Posted by "Yifan Zhang (Code Review)" <ge...@cloudera.org>.
Yifan Zhang has posted comments on this change. ( http://gerrit.cloudera.org:8080/14154 )

Change subject: KUDU-2914: Rebalance tool support moving replicas from some specific tablet servers
......................................................................


Patch Set 9:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/14154/6/src/kudu/tools/rebalancer_tool-test.cc
File src/kudu/tools/rebalancer_tool-test.cc:

http://gerrit.cloudera.org:8080/#/c/14154/6/src/kudu/tools/rebalancer_tool-test.cc@241
PS6, Line 241: ::testing::WithParamInterface<Kudu1097>
> I'll fix the test in the next patch set.
Done


http://gerrit.cloudera.org:8080/#/c/14154/6/src/kudu/tools/rebalancer_tool.cc
File src/kudu/tools/rebalancer_tool.cc:

http://gerrit.cloudera.org:8080/#/c/14154/6/src/kudu/tools/rebalancer_tool.cc@1302
PS6, Line 1302:   // Use pessimistic /2 limit for max_moves_per_servers_ since the
              :   // desitnation servers for the move of the replica marked with
              :   // the REPLACE attribute is not known.
> OK.
I moved this change to https://gerrit.cloudera.org/c/14519/



-- 
To view, visit http://gerrit.cloudera.org:8080/14154
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I86cfb740030946c13db1a9ca63d241f4907d6c89
Gerrit-Change-Number: 14154
Gerrit-PatchSet: 9
Gerrit-Owner: Yifan Zhang <ch...@163.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Yifan Zhang <ch...@163.com>
Gerrit-Comment-Date: Tue, 22 Oct 2019 11:21:47 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2914: Rebalance tool support moving replicas from some specific tablet servers

Posted by "Yifan Zhang (Code Review)" <ge...@cloudera.org>.
Hello Alexey Serbin, Kudu Jenkins, Andrew Wong, Adar Dembo, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/14154

to look at the new patch set (#6).

Change subject: KUDU-2914: Rebalance tool support moving replicas from some specific tablet servers
......................................................................

KUDU-2914: Rebalance tool support moving replicas from some specific tablet servers

Aims to support moving replicas from specific tablet servers,
this patch re-uses the '--ignored_tservers' flag and adds a
'--move_replicas_from_ignored_tservers' flag to the
`kudu rebalance cluster` CLI tool.

Once the flag '--ignored_tservers' is specified, and the flag
'--move_replicas_from_ignored_tservers' is specified false,
the given tablet servers are not considered as a part of the
cluster, their health state and replicas on them are ignored
by the rebalancer tool.

If '--move_replicas_from_ignored_tservers' is enabled, unhealthy
tservers in the given ignored tservers will also be ignored by
the rebalancer tool. But for healthy ignored tservers, replicas
on them would be moved to other tservers before starting the
rebalancing process, when running the rebalancing, these tservers
are not considered as a part of the cluster.

Change-Id: I86cfb740030946c13db1a9ca63d241f4907d6c89
---
M src/kudu/rebalance/rebalance-test.cc
M src/kudu/rebalance/rebalance_algo.h
M src/kudu/rebalance/rebalancer.cc
M src/kudu/rebalance/rebalancer.h
M src/kudu/tools/rebalancer_tool-test.cc
M src/kudu/tools/rebalancer_tool.cc
M src/kudu/tools/rebalancer_tool.h
M src/kudu/tools/tool_action_cluster.cc
8 files changed, 693 insertions(+), 90 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/54/14154/6
-- 
To view, visit http://gerrit.cloudera.org:8080/14154
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I86cfb740030946c13db1a9ca63d241f4907d6c89
Gerrit-Change-Number: 14154
Gerrit-PatchSet: 6
Gerrit-Owner: Yifan Zhang <ch...@163.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Yifan Zhang <ch...@163.com>

[kudu-CR] KUDU-2914: Rebalance tool support moving replicas from some specific tablet servers

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/14154 )

Change subject: KUDU-2914: Rebalance tool support moving replicas from some specific tablet servers
......................................................................


Patch Set 15: Code-Review+2

Looks good to me.

Thank you for implementing this useful functionality!


-- 
To view, visit http://gerrit.cloudera.org:8080/14154
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I86cfb740030946c13db1a9ca63d241f4907d6c89
Gerrit-Change-Number: 14154
Gerrit-PatchSet: 15
Gerrit-Owner: Yifan Zhang <ch...@163.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Hannah Nguyen <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Yifan Zhang <ch...@163.com>
Gerrit-Comment-Date: Thu, 21 Nov 2019 03:15:03 +0000
Gerrit-HasComments: No

[kudu-CR] KUDU-2914: Rebalance tool support moving replicas from some specific tablet servers

Posted by "Yifan Zhang (Code Review)" <ge...@cloudera.org>.
Hello Alexey Serbin, Kudu Jenkins, Andrew Wong, Adar Dembo, Hannah Nguyen, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/14154

to look at the new patch set (#15).

Change subject: KUDU-2914: Rebalance tool support moving replicas from some specific tablet servers
......................................................................

KUDU-2914: Rebalance tool support moving replicas from some specific tablet servers

Aims to support moving replicas from specific tablet servers,
this patch re-uses the '--ignored_tservers' flag and adds a
'--move_replicas_from_ignored_tservers' flag to the
`kudu rebalance cluster` CLI tool.

Once the flag '--ignored_tservers' is specified, the given
tablet servers are not considered as a part of the cluster,
both their health state and replicas on them are ignored
by the rebalancer tool.

While if '--move_replicas_from_ignored_tservers' is enabled,
replicas on healthy ignored tservers would be moved to other
tservers first, and then running the rebalancing on the other
healthy tservers in the cluster.

Additionally, if we want to move replicas from some specified
tablet servers to other servers, the specified tablet servers
should be set into maintenance_mode first, otherwise the
rebalancer tool would not run.

Change-Id: I86cfb740030946c13db1a9ca63d241f4907d6c89
---
M src/kudu/rebalance/rebalance-test.cc
M src/kudu/rebalance/rebalance_algo.h
M src/kudu/rebalance/rebalancer.cc
M src/kudu/rebalance/rebalancer.h
M src/kudu/tools/rebalancer_tool-test.cc
M src/kudu/tools/rebalancer_tool.cc
M src/kudu/tools/rebalancer_tool.h
M src/kudu/tools/tool_action_cluster.cc
8 files changed, 804 insertions(+), 105 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/54/14154/15
-- 
To view, visit http://gerrit.cloudera.org:8080/14154
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I86cfb740030946c13db1a9ca63d241f4907d6c89
Gerrit-Change-Number: 14154
Gerrit-PatchSet: 15
Gerrit-Owner: Yifan Zhang <ch...@163.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Hannah Nguyen <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Yifan Zhang <ch...@163.com>

[kudu-CR] KUDU-2914: Rebalance tool support moving replicas from some specific tablet servers

Posted by "Yifan Zhang (Code Review)" <ge...@cloudera.org>.
Hello Alexey Serbin, Kudu Jenkins, Andrew Wong, Adar Dembo, Hannah Nguyen, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/14154

to look at the new patch set (#13).

Change subject: KUDU-2914: Rebalance tool support moving replicas from some specific tablet servers
......................................................................

KUDU-2914: Rebalance tool support moving replicas from some specific tablet servers

Aims to support moving replicas from specific tablet servers,
this patch re-uses the '--ignored_tservers' flag and adds a
'--move_replicas_from_ignored_tservers' flag to the
`kudu rebalance cluster` CLI tool.

Once the flag '--ignored_tservers' is specified, the given
tablet servers are not considered as a part of the cluster,
both their health state and replicas on them are ignored
by the rebalancer tool.

While if '--move_replicas_from_ignored_tservers' is enabled,
replicas on healthy ignored tservers would be moved to other
tservers first, and then running the rebalancing on the other
healthy tservers in the cluster.

Change-Id: I86cfb740030946c13db1a9ca63d241f4907d6c89
---
M src/kudu/rebalance/rebalance-test.cc
M src/kudu/rebalance/rebalance_algo.h
M src/kudu/rebalance/rebalancer.cc
M src/kudu/rebalance/rebalancer.h
M src/kudu/tools/rebalancer_tool-test.cc
M src/kudu/tools/rebalancer_tool.cc
M src/kudu/tools/rebalancer_tool.h
M src/kudu/tools/tool_action_cluster.cc
8 files changed, 806 insertions(+), 105 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/54/14154/13
-- 
To view, visit http://gerrit.cloudera.org:8080/14154
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I86cfb740030946c13db1a9ca63d241f4907d6c89
Gerrit-Change-Number: 14154
Gerrit-PatchSet: 13
Gerrit-Owner: Yifan Zhang <ch...@163.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Hannah Nguyen <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Yifan Zhang <ch...@163.com>

[kudu-CR] KUDU-2914: Rebalance tool support moving replicas from some specific tablet servers

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/14154 )

Change subject: KUDU-2914: Rebalance tool support moving replicas from some specific tablet servers
......................................................................


Patch Set 12:

(20 comments)

http://gerrit.cloudera.org:8080/#/c/14154/12/src/kudu/rebalance/rebalance-test.cc
File src/kudu/rebalance/rebalance-test.cc:

http://gerrit.cloudera.org:8080/#/c/14154/12/src/kudu/rebalance/rebalance-test.cc@156
PS12, Line 156:     auto get_values = [](pair<ServersByCountMap::const_iterator,
const ?


http://gerrit.cloudera.org:8080/#/c/14154/12/src/kudu/rebalance/rebalance-test.cc@157
PS12, Line 157:     
nit: fix the alignment


http://gerrit.cloudera.org:8080/#/c/14154/12/src/kudu/rebalance/rebalance-test.cc@197
PS12, Line 197: bool HasSameContents(const multimap<int32_t, TableBalanceInfo>& lhs,
              :                      const multimap<int32_t, TableBalanceInfo>& rhs) {
nit: is it possible to templatize bool HasSameContents(const ServersByCountMap&, const ServersByCountMap&) and this function so there would be less code duplication?


http://gerrit.cloudera.org:8080/#/c/14154/12/src/kudu/rebalance/rebalance-test.cc@271
PS12, Line 271: ContainsIgnoredTserver
I think this method by itself doesn't imply working on ignored or other subset of tablet servers.  Maybe, renaming it to something like

  ContainsOneOf(...)

Also, move this method into the private section of this class since its signature and logic hints that it's very specific to this class and unlikely to be reused somewhere else.


http://gerrit.cloudera.org:8080/#/c/14154/12/src/kudu/rebalance/rebalance-test.cc@293
PS12, Line 293:       ASSERT_FALSE(ContainsIgnoredTserver(ci.balance.servers_by_total_replica_count,
I think it's worth adding a comment explaining the essence of the constrains being checked below.   Yes, it's obvious what's going on, but it would be nice to explain why.


http://gerrit.cloudera.org:8080/#/c/14154/12/src/kudu/rebalance/rebalance-test.cc@296
PS12, Line 296:         ASSERT_FALSE(ContainsIgnoredTserver(elem.second.servers_by_replica_count,
Add SCOPED_TRACE() to get more information on the context if ASSERT_FALSE() triggers.  Having TableBalanceInfo::tablet_id from 'elem' would be enough, I think.


http://gerrit.cloudera.org:8080/#/c/14154/12/src/kudu/rebalance/rebalance-test.cc@527
PS12, Line 527: when movement of
if moving replicas from


http://gerrit.cloudera.org:8080/#/c/14154/12/src/kudu/rebalance/rebalance_algo.h
File src/kudu/rebalance/rebalance_algo.h:

http://gerrit.cloudera.org:8080/#/c/14154/12/src/kudu/rebalance/rebalance_algo.h@92
PS12, Line 92: of
nit: on


http://gerrit.cloudera.org:8080/#/c/14154/12/src/kudu/rebalance/rebalancer.cc
File src/kudu/rebalance/rebalancer.cc:

http://gerrit.cloudera.org:8080/#/c/14154/12/src/kudu/rebalance/rebalancer.cc@450
PS12, Line 450:       auto it = tserver_replicas_count.find(ignored_tserver);
              :       if (it == tserver_replicas_count.end()) {
Would FindOrNull() work here?


http://gerrit.cloudera.org:8080/#/c/14154/12/src/kudu/tools/rebalancer_tool-test.cc
File src/kudu/tools/rebalancer_tool-test.cc:

http://gerrit.cloudera.org:8080/#/c/14154/12/src/kudu/tools/rebalancer_tool-test.cc@273
PS12, Line 273:       "--ignored_tservers=" +
              :       JoinStrings(ignored_tservers, ","),
Join these two strings.


http://gerrit.cloudera.org:8080/#/c/14154/12/src/kudu/tools/rebalancer_tool-test.cc@701
PS12, Line 701: If detecting a ignored tserver went down
If an ignored tablet server went down


http://gerrit.cloudera.org:8080/#/c/14154/12/src/kudu/tools/rebalancer_tool-test.cc@702
PS12, Line 702: servers
server


http://gerrit.cloudera.org:8080/#/c/14154/12/src/kudu/tools/rebalancer_tool-test.cc@706
PS12, Line 706:  :
nit: move this to the next line


http://gerrit.cloudera.org:8080/#/c/14154/12/src/kudu/tools/rebalancer_tool-test.cc@757
PS12, Line 757: IsRebalancingInProgress()
I think it might be important to know which tablet server is in progress of copying data (that's the why how this method works).

What if it's some other tablet server (not the ignored one) is in progress of copying data?  Or that's the server which is ignored?

In other words, do you know whether this test covers both cases below:

  * the ignored tablet server goes down during replica movement from it
  * the ignored tablet server goes down before rebalancer tries to move a replica from it

It would be nice to clarify on that.


http://gerrit.cloudera.org:8080/#/c/14154/12/src/kudu/tools/rebalancer_tool-test.cc@795
PS12, Line 795:   ASSERT_OK(cluster_->tablet_server(ignored_tserver_idx)->Restart());
Is there some significance behind starting the tablet server back?  If yes, please a comment explaining why.  If not, maybe drop this line.


http://gerrit.cloudera.org:8080/#/c/14154/12/src/kudu/tools/rebalancer_tool.cc
File src/kudu/tools/rebalancer_tool.cc:

http://gerrit.cloudera.org:8080/#/c/14154/12/src/kudu/tools/rebalancer_tool.cc@639
PS12, Line 639: healthy
healthy non-ignored


http://gerrit.cloudera.org:8080/#/c/14154/12/src/kudu/tools/rebalancer_tool.cc@1243
PS12, Line 1243:     has_updates |= s.ok()
I'm not sure this is correct by the semantics of the 'has_updates' variable: if the method returned Status::OK(), that doesn't mean there has been an update.  For update to be there, it's necessary to have 'is_complete' to be 'true'.

What is the motivation for this change?


http://gerrit.cloudera.org:8080/#/c/14154/12/src/kudu/tools/tool_action_cluster.cc
File src/kudu/tools/tool_action_cluster.cc:

http://gerrit.cloudera.org:8080/#/c/14154/12/src/kudu/tools/tool_action_cluster.cc@131
PS12, Line 131: the tablet server
the source tablet server


http://gerrit.cloudera.org:8080/#/c/14154/12/src/kudu/tools/tool_action_cluster.cc@132
PS12, Line 132:             "This setting is effective only if the '--ignored_tservers' flag"
nit: add a space between 'flag' and '"'


http://gerrit.cloudera.org:8080/#/c/14154/12/src/kudu/tools/tool_action_cluster.cc@134
PS12, Line 134: replicas shouldn't be placed on all healthy ignored servers "
              :             "automatically, e.g. set them into 'mainetance mode' or 'decommision mode'.
then all ignored tablet servers must be placed into the 'maintenance mode'



-- 
To view, visit http://gerrit.cloudera.org:8080/14154
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I86cfb740030946c13db1a9ca63d241f4907d6c89
Gerrit-Change-Number: 14154
Gerrit-PatchSet: 12
Gerrit-Owner: Yifan Zhang <ch...@163.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Hannah Nguyen <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Yifan Zhang <ch...@163.com>
Gerrit-Comment-Date: Thu, 14 Nov 2019 02:19:02 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2914: Rebalance tool support moving replicas from some specific tablet servers

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/14154 )

Change subject: KUDU-2914: Rebalance tool support moving replicas from some specific tablet servers
......................................................................


Patch Set 4:

(13 comments)

http://gerrit.cloudera.org:8080/#/c/14154/4/src/kudu/rebalance/rebalancer.h
File src/kudu/rebalance/rebalancer.h:

http://gerrit.cloudera.org:8080/#/c/14154/4/src/kudu/rebalance/rebalancer.h@72
PS4, Line 72:     // If not empty, specified tablet servers(including their health state
nit: missing space


http://gerrit.cloudera.org:8080/#/c/14154/4/src/kudu/rebalance/rebalancer.h@101
PS4, Line 101: run
running


http://gerrit.cloudera.org:8080/#/c/14154/4/src/kudu/rebalance/rebalancer.h@202
PS4, Line 202:  with the result status of Status::OK()
This part is no longer relevant once the return type changed from Status to void, right?


http://gerrit.cloudera.org:8080/#/c/14154/4/src/kudu/tools/rebalancer_tool-test.cc
File src/kudu/tools/rebalancer_tool-test.cc:

http://gerrit.cloudera.org:8080/#/c/14154/4/src/kudu/tools/rebalancer_tool-test.cc@239
PS4, Line 239: Security
Safety?


http://gerrit.cloudera.org:8080/#/c/14154/4/src/kudu/tools/rebalancer_tool-test.cc@245
PS4, Line 245: TEST_P(RebalanceStartSecurityTest, TooManyIgnoredTservers) {
I think it's also necessary to add a scenario when the rebalancer tool actually moves replicas from the ignored tservers.  Sub-scenarios should include cases when some of ignored tservers are up and running, while others are shut down.


http://gerrit.cloudera.org:8080/#/c/14154/4/src/kudu/tools/rebalancer_tool-test.cc@278
PS4, Line 278: err_msg_pattern
nit: this constant is used only once; maybe replace it's usage with the string literal as is at line 279?


http://gerrit.cloudera.org:8080/#/c/14154/4/src/kudu/tools/rebalancer_tool-test.cc@299
PS4, Line 299: 
nit: an extra line


http://gerrit.cloudera.org:8080/#/c/14154/4/src/kudu/tools/rebalancer_tool.h
File src/kudu/tools/rebalancer_tool.h:

http://gerrit.cloudera.org:8080/#/c/14154/4/src/kudu/tools/rebalancer_tool.h@327
PS4, Line 327: of
nit: about


http://gerrit.cloudera.org:8080/#/c/14154/4/src/kudu/tools/rebalancer_tool.h@346
PS4, Line 346: ignored_tservers
nit: ignored

That would be easier to read.


http://gerrit.cloudera.org:8080/#/c/14154/4/src/kudu/tools/rebalancer_tool.cc
File src/kudu/tools/rebalancer_tool.cc:

http://gerrit.cloudera.org:8080/#/c/14154/4/src/kudu/tools/rebalancer_tool.cc@1014
PS4, Line 1014:           if (elem != move.to && !ContainsKey(cluster_info.healthy_ignored_tservers, elem)) {
              :             LOG(INFO) << Substitute("choose destination server: $0",elem);
              :             move.to = elem;
              :             break;
              :           }
How is it guaranteed that the amended move doesn't make the cluster even more imbalanced than it was before this move?

Please add a comment about that.


http://gerrit.cloudera.org:8080/#/c/14154/4/src/kudu/tools/tool_action_cluster.cc
File src/kudu/tools/tool_action_cluster.cc:

http://gerrit.cloudera.org:8080/#/c/14154/4/src/kudu/tools/tool_action_cluster.cc@130
PS4, Line 130: moves
move


http://gerrit.cloudera.org:8080/#/c/14154/4/src/kudu/tools/tool_action_cluster.cc@130
PS4, Line 130: 'ignored_tservers'
servers


http://gerrit.cloudera.org:8080/#/c/14154/4/src/kudu/tools/tool_action_cluster.cc@132
PS4, Line 132: This setting only works when 'ignored_tservers' are specified.
How about:

This setting is effective only if the '--ignored_tservers' flag is specified as well.



-- 
To view, visit http://gerrit.cloudera.org:8080/14154
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I86cfb740030946c13db1a9ca63d241f4907d6c89
Gerrit-Change-Number: 14154
Gerrit-PatchSet: 4
Gerrit-Owner: Yifan Zhang <ch...@163.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Yifan Zhang <ch...@163.com>
Gerrit-Comment-Date: Mon, 30 Sep 2019 07:56:59 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2914: Rebalance tool support moving replicas from some specific tablet servers

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/14154 )

Change subject: KUDU-2914: Rebalance tool support moving replicas from some specific tablet servers
......................................................................


Patch Set 4:

I added a comment on the JIRA for more context.  I hope it's useful:
 
https://issues.apache.org/jira/browse/KUDU-2914?focusedCommentId=16941152&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16941152


-- 
To view, visit http://gerrit.cloudera.org:8080/14154
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I86cfb740030946c13db1a9ca63d241f4907d6c89
Gerrit-Change-Number: 14154
Gerrit-PatchSet: 4
Gerrit-Owner: Yifan Zhang <ch...@163.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Yifan Zhang <ch...@163.com>
Gerrit-Comment-Date: Mon, 30 Sep 2019 17:25:17 +0000
Gerrit-HasComments: No

[kudu-CR] KUDU-2914: Rebalance tool support moving replicas from some specific tablet servers

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/14154 )

Change subject: KUDU-2914: Rebalance tool support moving replicas from some specific tablet servers
......................................................................


Patch Set 6:

(13 comments)

I did a high-level look through the code, though I still need to understand the rebalancer's runner hierarchy more.

http://gerrit.cloudera.org:8080/#/c/14154/6//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/14154/6//COMMIT_MSG@20
PS6, Line 20: unhealthy
            : tservers in the given ignored tservers will also be ignored by
            : the rebalancer tool
Why bother differentiating between unhealthy and healthy ignored servers? Is it because we expect the re-replication to happen for those tservers automatically? Could you document the reasoning somewhere in a comment?


http://gerrit.cloudera.org:8080/#/c/14154/6/src/kudu/rebalance/rebalance-test.cc
File src/kudu/rebalance/rebalance-test.cc:

http://gerrit.cloudera.org:8080/#/c/14154/6/src/kudu/rebalance/rebalance-test.cc@157
PS6, Line 157:     auto getValues = [](pair<ServersByCountMap::const_iterator,
nit: for local variables, we prefer snake_case, or sometimes for lambdas, we the same PascalCase we use for functions.


http://gerrit.cloudera.org:8080/#/c/14154/6/src/kudu/rebalance/rebalance_algo.h
File src/kudu/rebalance/rebalance_algo.h:

http://gerrit.cloudera.org:8080/#/c/14154/6/src/kudu/rebalance/rebalance_algo.h@93
PS6, Line 93:  helathy_ignored_tservers;
nit: "healthy"

Alternatively, rather than having the concept of "healthy" and "ignored", which are somewhat concepts of the CLI tool rather than the rebalancing algorithm, how about calling this "replica_counts_to_empty" that refer to the counts for tablet servers that the rebalancing should leave empty.


http://gerrit.cloudera.org:8080/#/c/14154/6/src/kudu/tools/rebalancer_tool-test.cc
File src/kudu/tools/rebalancer_tool-test.cc:

http://gerrit.cloudera.org:8080/#/c/14154/6/src/kudu/tools/rebalancer_tool-test.cc@241
PS6, Line 241: ::testing::WithParamInterface<Kudu1097>
Why bother testing the 3-2-3 replication? I don't think maintenance mode will do much in this mode.


http://gerrit.cloudera.org:8080/#/c/14154/6/src/kudu/tools/rebalancer_tool.h
File src/kudu/tools/rebalancer_tool.h:

http://gerrit.cloudera.org:8080/#/c/14154/6/src/kudu/tools/rebalancer_tool.h@322
PS6, Line 322: IgnoredTserversRunner
nit: Could you add a class-wide comment explaining what the IgnoredTserversRunner does that the PolicyFixer doesn't? What is the final "balanced" state for each?


http://gerrit.cloudera.org:8080/#/c/14154/6/src/kudu/tools/rebalancer_tool.cc
File src/kudu/tools/rebalancer_tool.cc:

http://gerrit.cloudera.org:8080/#/c/14154/6/src/kudu/tools/rebalancer_tool.cc@639
PS6, Line 639: "Too many ignored tservers.\nThe number of remaining tservers must be at least $0."
nit: for returned statuses, we generally just use a single line message. Also maybe list the number of healthy servers? E.g.

Substitute("Too many ignored servers; $0 healthy servers exist but $1 are required", remaining_tservers_count, max_replication_factor)


http://gerrit.cloudera.org:8080/#/c/14154/6/src/kudu/tools/rebalancer_tool.cc@1217
PS6, Line 1217:   DCHECK(s.ok());
nit: this is trivially true based on the line above, so maybe remove it?


http://gerrit.cloudera.org:8080/#/c/14154/6/src/kudu/tools/rebalancer_tool.cc@1302
PS6, Line 1302:   // Use pessimistic /2 limit for max_moves_per_servers_ since the
              :   // desitnation servers for the move of the replica marked with
              :   // the REPLACE attribute is not known.
nit: while I appreciate you taking care of this TODO, could you move this change in behavior into a separate patch with targeted testing? If possible, it'd be nice to make sure each patch is focused.


http://gerrit.cloudera.org:8080/#/c/14154/6/src/kudu/tools/rebalancer_tool.cc@1337
PS6, Line 1337:   // update helper containers.
              :   const auto op_count = op_count_per_ts_[ts_uuid]++;
              :   const auto op_range = ts_per_op_count_.equal_range(op_count);
              :   bool ts_op_count_updated = false;
              :   for (auto it = op_range.first; it != op_range.second; ++it) {
              :     if (it->second == ts_uuid) {
              :       ts_per_op_count_.erase(it);
              :       ts_per_op_count_.emplace(op_count + 1, ts_uuid);
              :       ts_op_count_updated = true;
              :       break;
              :     }
              :   }
Do you know why the policy fixer didn't do this before?


http://gerrit.cloudera.org:8080/#/c/14154/6/src/kudu/tools/rebalancer_tool.cc@1421
PS6, Line 1421: ignoted
nit: ignored


http://gerrit.cloudera.org:8080/#/c/14154/6/src/kudu/tools/rebalancer_tool.cc@1482
PS6, Line 1482: tablet_id
nit: tserver ID


http://gerrit.cloudera.org:8080/#/c/14154/6/src/kudu/tools/rebalancer_tool.cc@1506
PS6, Line 1506:  i<max_moves_per_server_
nit: fix spacing


http://gerrit.cloudera.org:8080/#/c/14154/6/src/kudu/tools/tool_action_cluster.cc
File src/kudu/tools/tool_action_cluster.cc:

http://gerrit.cloudera.org:8080/#/c/14154/6/src/kudu/tools/tool_action_cluster.cc@131
PS6, Line 131:  If set true, "
             :             "the healthy ignored servers should be set into maintenance mode first. 
While this is the most effective way to decommission a tserver, I don't think it should be a hard requirement of this tool. For instance, if in the future, we add a "decommissioning" state, I think we would want this flag to work as well.



-- 
To view, visit http://gerrit.cloudera.org:8080/14154
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I86cfb740030946c13db1a9ca63d241f4907d6c89
Gerrit-Change-Number: 14154
Gerrit-PatchSet: 6
Gerrit-Owner: Yifan Zhang <ch...@163.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Yifan Zhang <ch...@163.com>
Gerrit-Comment-Date: Sat, 19 Oct 2019 02:37:19 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2914: Rebalance tool support moving replicas from some specific tablet servers

Posted by "Yifan Zhang (Code Review)" <ge...@cloudera.org>.
Hello Alexey Serbin, Kudu Jenkins, Andrew Wong, Adar Dembo, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/14154

to look at the new patch set (#9).

Change subject: KUDU-2914: Rebalance tool support moving replicas from some specific tablet servers
......................................................................

KUDU-2914: Rebalance tool support moving replicas from some specific tablet servers

Aims to support moving replicas from specific tablet servers,
this patch re-uses the '--ignored_tservers' flag and adds a
'--move_replicas_from_ignored_tservers' flag to the
`kudu rebalance cluster` CLI tool.

Once the flag '--ignored_tservers' is specified, the given
tablet servers are not considered as a part of the cluster,
both their health state and replicas on them are ignored
by the rebalancer tool.

While if '--move_replicas_from_ignored_tservers' is enabled,
replicas on healthy ignored tservers would be moved to other
tservers first, and then running the rebalancing on the other
healthy tservers in the cluster.

Change-Id: I86cfb740030946c13db1a9ca63d241f4907d6c89
---
M src/kudu/rebalance/rebalance-test.cc
M src/kudu/rebalance/rebalance_algo.h
M src/kudu/rebalance/rebalancer.cc
M src/kudu/rebalance/rebalancer.h
M src/kudu/tools/rebalancer_tool-test.cc
M src/kudu/tools/rebalancer_tool.cc
M src/kudu/tools/rebalancer_tool.h
M src/kudu/tools/tool_action_cluster.cc
8 files changed, 717 insertions(+), 90 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/54/14154/9
-- 
To view, visit http://gerrit.cloudera.org:8080/14154
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I86cfb740030946c13db1a9ca63d241f4907d6c89
Gerrit-Change-Number: 14154
Gerrit-PatchSet: 9
Gerrit-Owner: Yifan Zhang <ch...@163.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Yifan Zhang <ch...@163.com>

[kudu-CR] KUDU-2914: Rebalance tool support moving replicas from some specific tablet servers

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/14154 )

Change subject: KUDU-2914: Rebalance tool support moving replicas from some specific tablet servers
......................................................................

KUDU-2914: Rebalance tool support moving replicas from some specific tablet servers

Aims to support moving replicas from specific tablet servers,
this patch re-uses the '--ignored_tservers' flag and adds a
'--move_replicas_from_ignored_tservers' flag to the
`kudu rebalance cluster` CLI tool.

Once the flag '--ignored_tservers' is specified, the given
tablet servers are not considered as a part of the cluster,
both their health state and replicas on them are ignored
by the rebalancer tool.

While if '--move_replicas_from_ignored_tservers' is enabled,
replicas on healthy ignored tservers would be moved to other
tservers first, and then running the rebalancing on the other
healthy tservers in the cluster.

Additionally, if we want to move replicas from some specified
tablet servers to other servers, the specified tablet servers
should be set into maintenance_mode first, otherwise the
rebalancer tool would not run.

Change-Id: I86cfb740030946c13db1a9ca63d241f4907d6c89
Reviewed-on: http://gerrit.cloudera.org:8080/14154
Tested-by: Kudu Jenkins
Reviewed-by: Alexey Serbin <as...@cloudera.com>
---
M src/kudu/rebalance/rebalance-test.cc
M src/kudu/rebalance/rebalance_algo.h
M src/kudu/rebalance/rebalancer.cc
M src/kudu/rebalance/rebalancer.h
M src/kudu/tools/rebalancer_tool-test.cc
M src/kudu/tools/rebalancer_tool.cc
M src/kudu/tools/rebalancer_tool.h
M src/kudu/tools/tool_action_cluster.cc
8 files changed, 804 insertions(+), 105 deletions(-)

Approvals:
  Kudu Jenkins: Verified
  Alexey Serbin: Looks good to me, approved

-- 
To view, visit http://gerrit.cloudera.org:8080/14154
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I86cfb740030946c13db1a9ca63d241f4907d6c89
Gerrit-Change-Number: 14154
Gerrit-PatchSet: 16
Gerrit-Owner: Yifan Zhang <ch...@163.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Hannah Nguyen <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Yifan Zhang <ch...@163.com>

[kudu-CR] KUDU-2914: Rebalance tool support moving replicas from some specific tablet servers

Posted by "Yifan Zhang (Code Review)" <ge...@cloudera.org>.
Hello Alexey Serbin, Kudu Jenkins, Andrew Wong, Adar Dembo, Hannah Nguyen, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/14154

to look at the new patch set (#11).

Change subject: KUDU-2914: Rebalance tool support moving replicas from some specific tablet servers
......................................................................

KUDU-2914: Rebalance tool support moving replicas from some specific tablet servers

Aims to support moving replicas from specific tablet servers,
this patch re-uses the '--ignored_tservers' flag and adds a
'--move_replicas_from_ignored_tservers' flag to the
`kudu rebalance cluster` CLI tool.

Once the flag '--ignored_tservers' is specified, the given
tablet servers are not considered as a part of the cluster,
both their health state and replicas on them are ignored
by the rebalancer tool.

While if '--move_replicas_from_ignored_tservers' is enabled,
replicas on healthy ignored tservers would be moved to other
tservers first, and then running the rebalancing on the other
healthy tservers in the cluster.

Change-Id: I86cfb740030946c13db1a9ca63d241f4907d6c89
---
M src/kudu/rebalance/rebalance-test.cc
M src/kudu/rebalance/rebalance_algo.h
M src/kudu/rebalance/rebalancer.cc
M src/kudu/rebalance/rebalancer.h
M src/kudu/tools/rebalancer_tool-test.cc
M src/kudu/tools/rebalancer_tool.cc
M src/kudu/tools/rebalancer_tool.h
M src/kudu/tools/tool_action_cluster.cc
8 files changed, 825 insertions(+), 100 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/54/14154/11
-- 
To view, visit http://gerrit.cloudera.org:8080/14154
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I86cfb740030946c13db1a9ca63d241f4907d6c89
Gerrit-Change-Number: 14154
Gerrit-PatchSet: 11
Gerrit-Owner: Yifan Zhang <ch...@163.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Hannah Nguyen <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Yifan Zhang <ch...@163.com>

[kudu-CR] KUDU-2914: Rebalance tool support moving replicas from some specific tablet servers

Posted by "Yifan Zhang (Code Review)" <ge...@cloudera.org>.
Yifan Zhang has posted comments on this change. ( http://gerrit.cloudera.org:8080/14154 )

Change subject: KUDU-2914: Rebalance tool support moving replicas from some specific tablet servers
......................................................................


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/14154/2/src/kudu/tools/rebalancer_tool.cc
File src/kudu/tools/rebalancer_tool.cc:

http://gerrit.cloudera.org:8080/#/c/14154/2/src/kudu/tools/rebalancer_tool.cc@951
PS2, Line 951:   if (!cluster_info.healthy_ignored_tservers.empty()) {
             :     // Check if it is safe to remove ignored tservers from the cluster.
             :     int remaining_tservers_count = cluster_info.balance.servers_by_total_replica_count.size();
             :     if (remaining_tservers_count < max_replication_factor) {
             :       return Status::InvalidArgument(Substitute(
             :         "Too many ignored tservers.\nThe number of remaining tservers must be at least $0.",
             :         max_replication_factor));
             :     }
             :     RETURN_NOT_OK(algorithm()->MoveReplicasFromTservers(&cluster_info,
             :                                                         max_moves_per_server_ * 5,
             :                                                         &moves));
             :   }
> To simplify the matters, is it possible to move the replicas from the healt
Moving the replicas from the healthy ignored tservers and moving the replicas in the rebalancing progress are almost the same progress, the only difference is to get particular movements, so I just added an extra step to get optional movements. Maybe I should put those codes in another method and it will looks simpler.

I think the algorithm to get those optional movement should be different when running in one location and multiple locations, we should select different server in order to move fewer replicas in the next rebalancing progress. And the idea to get those optional movement is consistent with the existing algorithms. Compared with introducing another two algorithms, maybe implementing new features in the existing ones will be simpler.



-- 
To view, visit http://gerrit.cloudera.org:8080/14154
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I86cfb740030946c13db1a9ca63d241f4907d6c89
Gerrit-Change-Number: 14154
Gerrit-PatchSet: 2
Gerrit-Owner: Yifan Zhang <ch...@163.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Yifan Zhang <ch...@163.com>
Gerrit-Comment-Date: Fri, 06 Sep 2019 12:51:11 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2914: Rebalance tool support moving replicas from some specific tablet servers

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/14154 )

Change subject: KUDU-2914: Rebalance tool support moving replicas from some specific tablet servers
......................................................................


Patch Set 6: Verified+1

The test failure seems unrelated.


-- 
To view, visit http://gerrit.cloudera.org:8080/14154
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I86cfb740030946c13db1a9ca63d241f4907d6c89
Gerrit-Change-Number: 14154
Gerrit-PatchSet: 6
Gerrit-Owner: Yifan Zhang <ch...@163.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Yifan Zhang <ch...@163.com>
Gerrit-Comment-Date: Fri, 18 Oct 2019 22:41:18 +0000
Gerrit-HasComments: No

[kudu-CR] KUDU-2914: Rebalance tool support moving replicas from some specific tablet servers

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/14154 )

Change subject: KUDU-2914: Rebalance tool support moving replicas from some specific tablet servers
......................................................................


Patch Set 2:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/14154/2/src/kudu/rebalance/rebalancer.h
File src/kudu/rebalance/rebalancer.h:

http://gerrit.cloudera.org:8080/#/c/14154/2/src/kudu/rebalance/rebalancer.h@205
PS2, Line 205: is_healthy_move
nit: document whether it's possible to specify 'nullptr' as an argument when calling this function.


http://gerrit.cloudera.org:8080/#/c/14154/2/src/kudu/rebalance/rebalancer.cc
File src/kudu/rebalance/rebalancer.cc:

http://gerrit.cloudera.org:8080/#/c/14154/2/src/kudu/rebalance/rebalancer.cc@157
PS2, Line 157:     *is_healthy_move = false;
This seems to be the only place where the output parameter is assigned.  Does it make to assign the 'is_healthy_move' parameter in other cases as well?


http://gerrit.cloudera.org:8080/#/c/14154/2/src/kudu/tools/rebalancer_tool.cc
File src/kudu/tools/rebalancer_tool.cc:

http://gerrit.cloudera.org:8080/#/c/14154/2/src/kudu/tools/rebalancer_tool.cc@951
PS2, Line 951:   if (!cluster_info.healthy_ignored_tservers.empty()) {
             :     // Check if it is safe to remove ignored tservers from the cluster.
             :     int remaining_tservers_count = cluster_info.balance.servers_by_total_replica_count.size();
             :     if (remaining_tservers_count < max_replication_factor) {
             :       return Status::InvalidArgument(Substitute(
             :         "Too many ignored tservers.\nThe number of remaining tservers must be at least $0.",
             :         max_replication_factor));
             :     }
             :     RETURN_NOT_OK(algorithm()->MoveReplicasFromTservers(&cluster_info,
             :                                                         max_moves_per_server_ * 5,
             :                                                         &moves));
             :   }
To simplify the matters, is it possible to move the replicas from the healthy ignored tservers before starting the rebalancing?  Basically, that would be an extra step in the beginning.

The optimal movements of those replicas might be output by some new specialized algo not necessarily the existing one.  The new algo could have the same interface, but different implementation for GetNextMoves() method.

I think doing it that way would provide separation of responsibilities and keep the code simpler.

What do you think?



-- 
To view, visit http://gerrit.cloudera.org:8080/14154
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I86cfb740030946c13db1a9ca63d241f4907d6c89
Gerrit-Change-Number: 14154
Gerrit-PatchSet: 2
Gerrit-Owner: Yifan Zhang <ch...@163.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Yifan Zhang <ch...@163.com>
Gerrit-Comment-Date: Fri, 06 Sep 2019 01:00:20 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2914: Rebalance tool support moving replicas from some specific tablet servers

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/14154 )

Change subject: KUDU-2914: Rebalance tool support moving replicas from some specific tablet servers
......................................................................


Patch Set 1:

(4 comments)

Just passing through.

Alexey/Andrew, could you guys review this?

http://gerrit.cloudera.org:8080/#/c/14154/1/src/kudu/rebalance/rebalancer.h
File src/kudu/rebalance/rebalancer.h:

http://gerrit.cloudera.org:8080/#/c/14154/1/src/kudu/rebalance/rebalancer.h@208
PS1, Line 208:                              bool *is_healthy_move);
Nit: should be "bool* is_healthy_move".


http://gerrit.cloudera.org:8080/#/c/14154/1/src/kudu/rebalance/rebalancer.cc
File src/kudu/rebalance/rebalancer.cc:

http://gerrit.cloudera.org:8080/#/c/14154/1/src/kudu/rebalance/rebalancer.cc@110
PS1, Line 110:   tablet_ids->clear();
Why is this necessary? Looks like every path out of the function swaps to tablet_ids.

BTW, this function could be 'void' (or return tablet_ids); doesn't look like it ever fails.


http://gerrit.cloudera.org:8080/#/c/14154/1/src/kudu/tools/tool_action_cluster.cc
File src/kudu/tools/tool_action_cluster.cc:

http://gerrit.cloudera.org:8080/#/c/14154/1/src/kudu/tools/tool_action_cluster.cc@62
PS1, Line 62: DEFINE_string(blacklist_tservers, "",
Nit: how about "blacklisted_tservers" to be consistent with "ignored_tservers"?


http://gerrit.cloudera.org:8080/#/c/14154/1/src/kudu/tools/tool_action_cluster.cc@66
PS1, Line 66:               "tserver not in blacklist"
Should end this sentence with a period and make sure there's a space before "If not specified..."



-- 
To view, visit http://gerrit.cloudera.org:8080/14154
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I86cfb740030946c13db1a9ca63d241f4907d6c89
Gerrit-Change-Number: 14154
Gerrit-PatchSet: 1
Gerrit-Owner: Yifan Zhang <ch...@163.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Fri, 30 Aug 2019 00:39:00 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2914: Rebalance tool support moving replicas from some specific tablet servers

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has removed a vote on this change.

Change subject: KUDU-2914: Rebalance tool support moving replicas from some specific tablet servers
......................................................................


Removed Verified-1 by Kudu Jenkins (120)
-- 
To view, visit http://gerrit.cloudera.org:8080/14154
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: deleteVote
Gerrit-Change-Id: I86cfb740030946c13db1a9ca63d241f4907d6c89
Gerrit-Change-Number: 14154
Gerrit-PatchSet: 6
Gerrit-Owner: Yifan Zhang <ch...@163.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Yifan Zhang <ch...@163.com>