You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@kudu.apache.org by "Georgiana Ogrean (Jira)" <ji...@apache.org> on 2021/12/24 01:57:00 UTC

[jira] [Commented] (KUDU-3346) Rebalance fails when trying to decommission tserver on a rack-aware cluster

    [ https://issues.apache.org/jira/browse/KUDU-3346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17464857#comment-17464857 ] 

Georgiana Ogrean commented on KUDU-3346:
----------------------------------------

In case it helps with getting to the bottom of this:

After noticing that some logs appear twice for tservers in us-east-1c, e.g.
{code:java}
I1223 13:52:53.569551 11613 rebalancer.cc:305] found tserver ca2b022920654fd2aacd320adfe39148 at location '/us-east-1/us-east-1c'{code}
I tried placing in maintenance a tserver in that region and then running rebalance with the same flags. It fails with the same error as above, but while for the other two regions in our cluster all it printed before failing was the *Locations load summary* table, when ignoring a tserver in us-east-1c it also prints the *replica distribution summary* tables for that region (both per-server and per-table). I attached the rebalance log file when the job is run with a tserver in us-east-1c ignored after being put in maintenance.

[^rebalance_ignored_tserver_1c.log.Z] 

 

> Rebalance fails when trying to decommission tserver on a rack-aware cluster
> ---------------------------------------------------------------------------
>
>                 Key: KUDU-3346
>                 URL: https://issues.apache.org/jira/browse/KUDU-3346
>             Project: Kudu
>          Issue Type: Bug
>    Affects Versions: 1.15.0
>            Reporter: Georgiana Ogrean
>            Priority: Major
>         Attachments: rebalance_ignored_tserver_1c.log.Z, rebalance_v1.log.Z
>
>
> When following the steps [in the docs|https://docs.cloudera.com/runtime/7.2.0/administering-kudu/topics/kudu-decommissioning-or-permanently-removing-tablet-server-from-cluster.html] for decommissioning a tserver, the rebalance job fails with:
> {code:java}
> Invalid argument: ignored tserver <tserver_uuid> is not reported among know tservers 
> {code}
> Steps followed:
> 1. Checked that ksck passes.
> 2. Put the tserver to be decommissioned in maintenance mode.
> {code:java}
> sudo -u kudu kudu tserver state enter_maintenance $MASTER_ADDRESSES 5ae499b1b870419daabb0e8da90ef233 {code}
> 3. Ran rebalance with {{-ignored_tservers}} and {{-move_replicas_from_ignored_tservers}} flags.
> {code:java}
> sudo -u kudu kudu cluster rebalance $MASTER_ADDRESSES -move_replicas_from_ignored_tservers -ignored_tservers=5ae499b1b870419daabb0e8da90ef233 -v=1{code}
> The logs for the rebalace command are attached.
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)