You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@kudu.apache.org by "Alexey Serbin (JIRA)" <ji...@apache.org> on 2018/03/22 09:18:00 UTC

[jira] [Updated] (KUDU-2367) Leader replica sometimes reports follower's health status as FAILED instead of FAILED_UNRECOVERABLE

     [ https://issues.apache.org/jira/browse/KUDU-2367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Alexey Serbin updated KUDU-2367:
--------------------------------
    Status: In Review  (was: Open)

> Leader replica sometimes reports follower's health status as FAILED instead of FAILED_UNRECOVERABLE
> ---------------------------------------------------------------------------------------------------
>
>                 Key: KUDU-2367
>                 URL: https://issues.apache.org/jira/browse/KUDU-2367
>             Project: Kudu
>          Issue Type: Bug
>          Components: tserver
>    Affects Versions: 1.7.0, 1.8.0
>            Reporter: Alexey Serbin
>            Assignee: Alexey Serbin
>            Priority: Major
>
> If a leader tablet replica detects that its follower falls behind the WAL segment GC threshold after the unavailability interval (defined by the {{--follower_unavailable_considered_failed_sec}} flag), it never reports the status of the follower as FAILED_UNRECOVERABLE to the catalog manager, and continues reporting FAILED instead.  In configurations where the tablet replication factor equals to the total number of tablet servers in the cluster, that leads to situations when the tablet cannot be automatically recovered for a long time.  In particular, such situations last until a new leader is elected or corresponding tablet servers are restarted.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)