You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mesos.apache.org by "Artem Harutyunyan (JIRA)" <ji...@apache.org> on 2015/10/02 17:19:28 UTC

[jira] [Updated] (MESOS-770) Rate control and randomization of Replicated Log catching-up

     [ https://issues.apache.org/jira/browse/MESOS-770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Artem Harutyunyan updated MESOS-770:
------------------------------------
    Labels: mesosphere  (was: )

> Rate control and randomization of Replicated Log catching-up
> ------------------------------------------------------------
>
>                 Key: MESOS-770
>                 URL: https://issues.apache.org/jira/browse/MESOS-770
>             Project: Mesos
>          Issue Type: Improvement
>          Components: replicated log
>            Reporter: Yan Xu
>              Labels: mesosphere
>
> When the log is catching up either in the process of recovering or after coordinator failover the Paxos protocol is run on multiple positions (possibly the entire log).
> Currently the catch-up process is linear (one thread fills positions one-by-one). What's preventing us from catching up all positions concurrently is that too much concurrency could have negative impact on the network and the problem may be exacerbated by the contention between multiple recovering replicas and the coordinator.
> Rate control helps limit the number of concurrent positions a proposer (recoverer or coordinator) seeks consensus at a time. We can batch a number of positions each time.
> Randomly picking the positions in each batch reduces the possibility that multiple proposers contend for the same position at the same time which causes conflict and retries.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)