You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@mesos.apache.org by "Jie Yu (JIRA)" <ji...@apache.org> on 2013/11/23 09:23:39 UTC

[jira] [Commented] (MESOS-736) Support catch-up replicated log

    [ https://issues.apache.org/jira/browse/MESOS-736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13830618#comment-13830618 ] 

Jie Yu commented on MESOS-736:
------------------------------

The third patch is split into two patches:
(1) https://reviews.apache.org/r/15799/
(2) https://reviews.apache.org/r/15802/

The reason for patch (1) is to introduce a intermediate version (called VERSION-I) such that if anything goes wrong, we can rollback to the previous versions since it is still writing using the old log format (however, it can recognize the new log format). We can bake this version for a period of time.

After patch (2) is applied, if anything goes wrong, we can rollback to VERSION-I.

> Support catch-up replicated log
> -------------------------------
>
>                 Key: MESOS-736
>                 URL: https://issues.apache.org/jira/browse/MESOS-736
>             Project: Mesos
>          Issue Type: Improvement
>          Components: replicated log
>            Reporter: Jie Yu
>            Assignee: Jie Yu
>              Labels: twitter
>             Fix For: 0.16.0
>
>
> When a replica joins Paxos with an empty log, we don't allow this replica to fully participate in Paxos immediately. Instead, this replica is treated as a non-voting member, meaning that it will not reply to any requests from other replicas. It simply learns those log entries that have been agreed on and tries to catch up the leader. When the catch-up process is done, we re-admit this replica to Paxos and allow it to vote.
> If we have a disk failure and want to swap a master machine, we can simply start the scheduler on a new machine with an empty log (and that's it).



--
This message was sent by Atlassian JIRA
(v6.1#6144)