You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mesos.apache.org by "Craig W (JIRA)" <ji...@apache.org> on 2015/02/09 11:49:34 UTC

[jira] [Created] (MESOS-2329) Mesos master crashes after ZooKeeper session expires

Craig W created MESOS-2329:
------------------------------

             Summary: Mesos master crashes after ZooKeeper session expires
                 Key: MESOS-2329
                 URL: https://issues.apache.org/jira/browse/MESOS-2329
             Project: Mesos
          Issue Type: Bug
          Components: master
    Affects Versions: 0.21.1
         Environment: CentOS 6.5 (kernel 2.6.32-431), Java 1.7.0_55, ZooKeeper 3.4.5
            Reporter: Craig W


In a test environment I have experience an issue where the Mesos Master process crashes after its ZooKeeper session expires. The last messages in the INFO log file look like this:

{noformat}
group.cpp:418] Lost connection to ZooKeeper, attempting to reconnect ...
group.cpp:418] Lost connection to ZooKeeper, attempting to reconnect ...
group.cpp:313] Group process (group(4)@192.168.4.42:5050) reconnected to ZooKeeper
group.cpp:418] Lost connection to ZooKeeper, attempting to reconnect ...
group.cpp:790] Syncing group operations: queue size (joins, cancels datas) = (0, 0, 0)
group.cpp:418] Lost connection to ZooKeeper, attempting to reconnect ...
group.cpp:472] ZooKeeper session expired
detector.cpp:138] Detected a new leader: None
master.cpp:1263] The newly elected leader is None
{noformat}

In my environment, I had a single master and 3 slaves. I had a single node ZooKeeper ensemble. 

Restarting the mater process "fixes" the issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)