You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@kudu.apache.org by "Todd Lipcon (JIRA)" <ji...@apache.org> on 2017/01/26 20:20:24 UTC

[jira] [Commented] (KUDU-1731) Evict replicas that are alive but lagging

    [ https://issues.apache.org/jira/browse/KUDU-1731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15840367#comment-15840367 ] 

Todd Lipcon commented on KUDU-1731:
-----------------------------------

[~mpercy] can you explain this one further? How is this different than what we do today by evicting a replica who has fallen too far behind the WAL retention?

Maybe this is more about doing the "3->4->3" type of config change, along with PRE-VOTER? ie if a node is slow (seems to be falling farther behind) but still alive, we could try to recruit the pre-voter _before_ evicting the slow node? Let's fill out this JIRA with some more specifics, and/or link to a design doc where we cover all of the various backlog for improving re-replication.

> Evict replicas that are alive but lagging
> -----------------------------------------
>
>                 Key: KUDU-1731
>                 URL: https://issues.apache.org/jira/browse/KUDU-1731
>             Project: Kudu
>          Issue Type: Bug
>          Components: consensus
>            Reporter: Mike Percy
>
> In the case that a replica is consistently behind the other replicas, we may be able to detect that the node is slow and evict it. (Currently under high write load we very often degrade to all tablets having just two live replicas and one COPYING)
> * In fact, this would also be useful for leaders that are significantly slower than their followers.
> * We should instead recruit a new replica and only evict the lagging one once the new one is online.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)