You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mesos.apache.org by "Craig W (JIRA)" <ji...@apache.org> on 2015/06/25 16:02:04 UTC
[jira] [Updated] (MESOS-2934) Mesos master crashes when quorum set
to 4
[ https://issues.apache.org/jira/browse/MESOS-2934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Craig W updated MESOS-2934:
---------------------------
Labels: documentaion (was: )
Priority: Minor (was: Major)
> Mesos master crashes when quorum set to 4
> -----------------------------------------
>
> Key: MESOS-2934
> URL: https://issues.apache.org/jira/browse/MESOS-2934
> Project: Mesos
> Issue Type: Bug
> Components: master
> Affects Versions: 0.22.1
> Environment: CentOS 7
> Java 1.7.0_55
> Reporter: Craig W
> Priority: Minor
> Labels: documentaion
>
> When deploying 5 mesos masters, with quorum set to 4, the masters start up but fail to stay running. Instead they exit and then restart (Monit is used to supervise the process) within a few seconds. This cycle continues non-stop.
> The logs on the master look like this:
> {noformat}
> Received a recover response from a replica in EMPTY status
> Received a recover response from a replica in EMPTY status
> Replica in EMPTY status received a broadcasted recover request
> Recovery failed: Failed to recover registrar: Failed to perform fetch within 1mins
> Replica in EMPTY status received a broadcasted recover request
> Received a recover response from a replica in EMPTY status
> Received a recover response from a replica in EMPTY status
> Replica in EMPTY status received a broadcasted recover
> The newly elected leader is master@<ip>:5050 with id 20150625-102436-748881418-5050-2157
> Elected as the leading master!
> Recovering from registrar
> Recovering registrar
> Unable to finish the recover protocol in 10secs, retrying
> Unable to finish the recover protocol in 10secs, retrying
> Recovery failed: Failed to recover registrar: Failed to perform fetch within 1mins
> {noformat}
> When I change the quorum to 2 and run just 3 mesos master processes, the cluster stays up without a hitch.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)