You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mesos.apache.org by "Craig W (JIRA)" <ji...@apache.org> on 2015/06/25 16:02:04 UTC

[jira] [Updated] (MESOS-2934) Mesos master crashes when quorum set to 4

     [ https://issues.apache.org/jira/browse/MESOS-2934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Craig W updated MESOS-2934:
---------------------------
      Labels: documentaion  (was: )
    Priority: Minor  (was: Major)

> Mesos master crashes when quorum set to 4
> -----------------------------------------
>
>                 Key: MESOS-2934
>                 URL: https://issues.apache.org/jira/browse/MESOS-2934
>             Project: Mesos
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 0.22.1
>         Environment: CentOS 7
> Java 1.7.0_55
>            Reporter: Craig W
>            Priority: Minor
>              Labels: documentaion
>
> When deploying 5 mesos masters, with quorum set to 4, the masters start up but fail to stay running. Instead they exit and then restart (Monit is used to supervise the process) within a few seconds. This cycle continues non-stop.
> The logs on the master look like this:
> {noformat}
> Received a recover response from a replica in EMPTY status
> Received a recover response from a replica in EMPTY status
> Replica in EMPTY status received a broadcasted recover request
> Recovery failed: Failed to recover registrar: Failed to perform fetch within 1mins
> Replica in EMPTY status received a broadcasted recover request
> Received a recover response from a replica in EMPTY status
> Received a recover response from a replica in EMPTY status
> Replica in EMPTY status received a broadcasted recover 
> The newly elected leader is master@<ip>:5050 with id 20150625-102436-748881418-5050-2157
> Elected as the leading master!
> Recovering from registrar
> Recovering registrar
> Unable to finish the recover protocol in 10secs, retrying
> Unable to finish the recover protocol in 10secs, retrying
> Recovery failed: Failed to recover registrar: Failed to perform fetch within 1mins
> {noformat}
> When I change the quorum to 2 and run just 3 mesos master processes, the cluster stays up without a hitch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)