You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@mesos.apache.org by "Neil Conway (JIRA)" <ji...@apache.org> on 2015/11/03 22:48:27 UTC

[jira] [Comment Edited] (MESOS-3280) Master fails to access replicated log after network partition

    [ https://issues.apache.org/jira/browse/MESOS-3280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14988199#comment-14988199 ] 

Neil Conway edited comment on MESOS-3280 at 11/3/15 9:47 PM:
-------------------------------------------------------------

Fix merged in 82b6112cabc838f9bfa, should be in 0.26


was (Author: neilc):
Merged in 82b6112cabc838f9bfa.

> Master fails to access replicated log after network partition
> -------------------------------------------------------------
>
>                 Key: MESOS-3280
>                 URL: https://issues.apache.org/jira/browse/MESOS-3280
>             Project: Mesos
>          Issue Type: Bug
>          Components: master, replicated log
>    Affects Versions: 0.23.0
>         Environment: Zookeeper version 3.4.5--1
>            Reporter: Bernd Mathiske
>            Assignee: Neil Conway
>              Labels: mesosphere
>             Fix For: 0.26.0
>
>         Attachments: rep-log-race-cond-logs.tar.gz, rep-log-startup-race-test-1.patch
>
>
> In a 5 node cluster with 3 masters and 2 slaves, and ZK on each node, when a network partition is forced, all the masters apparently lose access to their replicated log. The leading master halts. Unknown reasons, but presumably related to replicated log access. The others fail to recover from the replicated log. Unknown reasons. This could have to do with ZK setup, but it might also be a Mesos bug. 
> This was observed in a Chronos test drive scenario described in detail here:
> https://github.com/mesos/chronos/issues/511
> With setup instructions here:
> https://github.com/mesos/chronos/issues/508



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)