You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@kudu.apache.org by "Todd Lipcon (JIRA)" <ji...@apache.org> on 2017/10/05 17:38:00 UTC

[jira] [Resolved] (KUDU-1788) Raft UpdateConsensus retry behavior on timeout is counter-productive

     [ https://issues.apache.org/jira/browse/KUDU-1788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Todd Lipcon resolved KUDU-1788.
-------------------------------
       Resolution: Fixed
    Fix Version/s: 1.6.0

Bumped the default timeout to 30sec for 1.6

> Raft UpdateConsensus retry behavior on timeout is counter-productive
> --------------------------------------------------------------------
>
>                 Key: KUDU-1788
>                 URL: https://issues.apache.org/jira/browse/KUDU-1788
>             Project: Kudu
>          Issue Type: Bug
>          Components: consensus
>    Affects Versions: 1.1.0
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>            Priority: Critical
>             Fix For: 1.6.0
>
>
> In a stress test, I've seen the following counter-productive behavior:
> - a leader is trying to send operations to a replica (eg a 10MB batch)
> - the network is constrained due to other activity, so sending 10MB may take >1sec
> - the request times out on the client side, likely while it was still in the process of sending the batch
> - when the server receives it, it is likely to have timed out while waiting in the queue. Or ,it will receive it and upon processing will all be duplicate ops from the previous attempt
> - the client has no idea whether the server received it or not, and thus keeps retrying the same batch (triggering the same timeout)
> This tends to be a "sticky"/cascading sort of state: after one such timeout, the follower will be lagging behind more, and the next batch will be larger (up to the configured max batch size). The client neither backs off nor increases its timeout, so it will basically just keep the network pipe full of useless redundant updates



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)