You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@mesos.apache.org by "Steve Niemitz (JIRA)" <ji...@apache.org> on 2015/01/13 17:11:34 UTC

[jira] [Commented] (MESOS-2214) Mesos slave can't restart if running with --slave_subsystems=blkio or net_cls and checkpointing is enabled

    [ https://issues.apache.org/jira/browse/MESOS-2214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14275460#comment-14275460 ] 

Steve Niemitz commented on MESOS-2214:
--------------------------------------

Is it possible that this is because the slave was also only launched with "--isolation=cgroups/cpu,cgroups/mem", not including blkio & net_cls?

> Mesos slave can't restart if running with --slave_subsystems=blkio or net_cls and checkpointing is enabled
> ----------------------------------------------------------------------------------------------------------
>
>                 Key: MESOS-2214
>                 URL: https://issues.apache.org/jira/browse/MESOS-2214
>             Project: Mesos
>          Issue Type: Bug
>    Affects Versions: 0.21.0
>            Reporter: Steve Niemitz
>            Priority: Minor
>
> Steps to reproduce:
> - Enable checkpointing on the slave (on by default)
> - Enable checkpointing on the framework
> - Start the slave with --slave_subsystems=memory,cpuacct,blkio,net_cls
> - Ensure a task is running on the slave
> - Restart mesos-slave
> Doing so causes this error:
> I0113 15:38:46.600033 729216 detector.cpp:433] A new leading master (UPID=master@10.111.154.140:5050) is detected
> I0113 15:38:46.610535 729196 slave.cpp:189] Moving slave process into its own cgroup for subsystem: cpuacct
> I0113 15:38:46.618446 729196 slave.cpp:189] Moving slave process into its own cgroup for subsystem: net_cls
> A slave (or child process) is still running, please check the process(es) '{ 561866, 561880, 561924, 561977, 561978, 700306, 700319 }' listed in /sys/fs/cgroup/net_cls/mesos/slave/cgroups.proc
> Also, a smaller bug is that the error message is not logged with the logging system but instead printed to stderr.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)