You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@kudu.apache.org by "YifanZhang (Jira)" <ji...@apache.org> on 2021/12/04 01:35:00 UTC

[jira] [Resolved] (KUDU-3341) Catalog Manager should stop retrying DeleteTablet when receive WRONG_SERVER_UUID error

     [ https://issues.apache.org/jira/browse/KUDU-3341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

YifanZhang resolved KUDU-3341.
------------------------------
    Fix Version/s: 1.16.0
       Resolution: Fixed

> Catalog Manager should stop retrying DeleteTablet when receive WRONG_SERVER_UUID error
> --------------------------------------------------------------------------------------
>
>                 Key: KUDU-3341
>                 URL: https://issues.apache.org/jira/browse/KUDU-3341
>             Project: Kudu
>          Issue Type: Improvement
>          Components: master
>            Reporter: YifanZhang
>            Assignee: YifanZhang
>            Priority: Minor
>             Fix For: 1.16.0
>
>
> Sometimes a tablet server could be shutdown because of detected disk failures, and this server would be re-added to the cluster with all data cleared.
> Replicas could be replicated after  {{\-\-follower_unavailable_considered_failed_sec}} seconds. And then master send DeleteTablet RPCs to this tserver, but receive either a RPC failure(tserver was shutdown) or a WRONG_SERVER_UUID error(tserver started with a new uuid), and keep retrying to delete tablets after {{{}--unresponsive_ts_rpc_timeout_ms{}}}(default 1 hour).
> It's not so necessary to retry when receive WRONG_SERVER_UUID errors, because the server uuid could only be corrected by restarting the tablet server, at that time full tablet reports would sent to master and if any, outdated replicas could be deleted finally.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)