You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mesos.apache.org by "Benjamin Mahler (JIRA)" <ji...@apache.org> on 2015/02/09 20:31:35 UTC

[jira] [Commented] (MESOS-2329) Mesos master crashes after ZooKeeper session expires

    [ https://issues.apache.org/jira/browse/MESOS-2329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14312704#comment-14312704 ] 

Benjamin Mahler commented on MESOS-2329:
----------------------------------------

There is no crash in the logs you provided, can you provide the full logs?

The leading master will commit suicide when it loses leadership, was this master leading before the expiration? We recommend running the master under a process that will restart it upon any termination.

> Mesos master crashes after ZooKeeper session expires
> ----------------------------------------------------
>
>                 Key: MESOS-2329
>                 URL: https://issues.apache.org/jira/browse/MESOS-2329
>             Project: Mesos
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 0.21.1
>         Environment: CentOS 6.5 (kernel 2.6.32-431), Java 1.7.0_55, ZooKeeper 3.4.5
>            Reporter: Craig W
>
> In a test environment I have experienced an issue where the Mesos Master process crashes after its ZooKeeper session expires. The last few messages in the INFO log file look like this:
> {noformat}
> group.cpp:418] Lost connection to ZooKeeper, attempting to reconnect ...
> group.cpp:418] Lost connection to ZooKeeper, attempting to reconnect ...
> group.cpp:313] Group process (group(4)@192.168.1.4:5050) reconnected to ZooKeeper
> group.cpp:418] Lost connection to ZooKeeper, attempting to reconnect ...
> group.cpp:790] Syncing group operations: queue size (joins, cancels datas) = (0, 0, 0)
> group.cpp:418] Lost connection to ZooKeeper, attempting to reconnect ...
> group.cpp:472] ZooKeeper session expired
> detector.cpp:138] Detected a new leader: None
> master.cpp:1263] The newly elected leader is None
> {noformat}
> . I had a single node ZooKeeper ensemble.
> In my environment, I had a single master, 7 slaves and a single ZooKeeper instance. 
> Restarting the mater process "fixes" the issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)