You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mesos.apache.org by "Craig W (JIRA)" <ji...@apache.org> on 2015/06/25 12:27:04 UTC

[jira] [Created] (MESOS-2934) Mesos master crashes when quorum set to 4

Craig W created MESOS-2934:
------------------------------

             Summary: Mesos master crashes when quorum set to 4
                 Key: MESOS-2934
                 URL: https://issues.apache.org/jira/browse/MESOS-2934
             Project: Mesos
          Issue Type: Bug
          Components: master
    Affects Versions: 0.22.1
         Environment: CentOS 7
Java 1.7.0_55
            Reporter: Craig W


When deploying 5 mesos masters, with quorum set to 4, the masters start up but fail to stay running. Instead they exit and then restart (Monit is used to supervise the process) within a few seconds. This cycle continues non-stop.

The logs on the master look like this:

{noformat}
Received a recover response from a replica in EMPTY status
Received a recover response from a replica in EMPTY status
Replica in EMPTY status received a broadcasted recover request
Recovery failed: Failed to recover registrar: Failed to perform fetch within 1mins

Replica in EMPTY status received a broadcasted recover request
Received a recover response from a replica in EMPTY status
Received a recover response from a replica in EMPTY status
Replica in EMPTY status received a broadcasted recover 

The newly elected leader is master@<ip>:5050 with id 20150625-102436-748881418-5050-2157
Elected as the leading master!
Recovering from registrar
Recovering registrar
Unable to finish the recover protocol in 10secs, retrying
Unable to finish the recover protocol in 10secs, retrying
Recovery failed: Failed to recover registrar: Failed to perform fetch within 1mins
{noformat}

When I change the quorum to 2 and run just 3 mesos master processes, the cluster stays up without a hitch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)