You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@kudu.apache.org by "ASF subversion and git services (Jira)" <ji...@apache.org> on 2022/01/05 03:53:00 UTC

[jira] [Commented] (KUDU-3346) Rebalance fails when trying to decommission tserver on a rack-aware cluster

    [ https://issues.apache.org/jira/browse/KUDU-3346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17469005#comment-17469005 ] 

ASF subversion and git services commented on KUDU-3346:
-------------------------------------------------------

Commit 5ef0168cf0ae4471632d63cad223d7301f415982 in kudu's branch refs/heads/master from zhangyifan27
[ https://gitbox.apache.org/repos/asf?p=kudu.git;h=5ef0168 ]

KUDU-3346: fix rebalancer tool fails to run with '--ignored_tservers'

Prior to this patch the validity of 'ignored_tservers' was checked when
'BuildClusterinfo', which leads to a failure when the 'raw_info' only contains
contains information of tservers on a specific location. This patch fix it by
moving the parameter validity check into 'KsckResultsToClusterRawInfo', because
ksck results contain original cluster information.

I noticed 'ClusterInfo::tservers_to_empty' is not necessary to be built when
'BuildClusterInfo', because we use this info only for printing cluster's stats
and running IgnoredTserverRunner. This should be refactored in follow-up patch.

This patch adds a regression test for the issue and I also verified this fix on
a real cluster.

Change-Id: I1361f562f3e886077a79c3de8ea5fb2ebb8df6e9
Reviewed-on: http://gerrit.cloudera.org:8080/18114
Reviewed-by: Andrew Wong <aw...@cloudera.com>
Tested-by: Andrew Wong <aw...@cloudera.com>


> Rebalance fails when trying to decommission tserver on a rack-aware cluster
> ---------------------------------------------------------------------------
>
>                 Key: KUDU-3346
>                 URL: https://issues.apache.org/jira/browse/KUDU-3346
>             Project: Kudu
>          Issue Type: Bug
>    Affects Versions: 1.15.0
>            Reporter: Georgiana Ogrean
>            Assignee: YifanZhang
>            Priority: Major
>         Attachments: rebalance_ignored_tserver_1c.log.Z, rebalance_v1.log.Z
>
>
> When following the steps [in the docs|https://docs.cloudera.com/runtime/7.2.0/administering-kudu/topics/kudu-decommissioning-or-permanently-removing-tablet-server-from-cluster.html] for decommissioning a tserver, the rebalance job fails with:
> {code:java}
> Invalid argument: ignored tserver <tserver_uuid> is not reported among know tservers 
> {code}
> Steps followed:
> 1. Checked that ksck passes.
> 2. Put the tserver to be decommissioned in maintenance mode.
> {code:java}
> sudo -u kudu kudu tserver state enter_maintenance $MASTER_ADDRESSES 5ae499b1b870419daabb0e8da90ef233 {code}
> 3. Ran rebalance with {{-ignored_tservers}} and {{-move_replicas_from_ignored_tservers}} flags.
> {code:java}
> sudo -u kudu kudu cluster rebalance $MASTER_ADDRESSES -move_replicas_from_ignored_tservers -ignored_tservers=5ae499b1b870419daabb0e8da90ef233 -v=1{code}
> The logs for the rebalace command are attached.
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)