You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@kudu.apache.org by "Grant Henke (Jira)" <ji...@apache.org> on 2020/06/03 15:39:00 UTC

[jira] [Updated] (KUDU-2918) Rebalancer can fail when a service queue is full

     [ https://issues.apache.org/jira/browse/KUDU-2918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Grant Henke updated KUDU-2918:
------------------------------
    Labels: stability supportability  (was: )

> Rebalancer can fail when a service queue is full
> ------------------------------------------------
>
>                 Key: KUDU-2918
>                 URL: https://issues.apache.org/jira/browse/KUDU-2918
>             Project: Kudu
>          Issue Type: Bug
>          Components: CLI, ksck
>    Affects Versions: 1.11.0
>            Reporter: Adar Dembo
>            Priority: Major
>              Labels: stability, supportability
>
> The various low-level RPCs issued by ksck aren't retried if the corresponding service queues are full. These include GetConsensusState, GetStatus, and ListTablets.
> Without retries, ksck (and the rebalancer) can fail midway:
> {noformat}
> I0812 11:21:10.669682 42799 rebalancer.cc:831] tablet d729fb149e804696a0862adacb725d66: a0dca75bbbfb4de69616694834adf930 -> 24d0eb73b3c64a0f901ae092186b3439 move is abandoned: Remote error: Service unavailable: GetConsensusState request on kudu.consensus.ConsensusService from 10.17.182.15:50754 dropped due to backpressure. The service queue is full; it has 50 items.
> I0812 11:21:10.871894 42799 rebalancer.cc:239] re-synchronizing cluster state
> Illegal state: tablet server 0d88ff7360b74d1e81cd2ccd41fab8a5 (foo.bar.com:7050): unacceptable health status UNAVAILABLE
> {noformat}
> The helper classes in rpc/rpc.h may be useful here.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)