You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mesos.apache.org by "Jie Yu (JIRA)" <ji...@apache.org> on 2016/10/20 22:35:58 UTC

[jira] [Commented] (MESOS-6414) cgroups isolator cleanup failed when the hierarchy is cleanup by docker daemon

    [ https://issues.apache.org/jira/browse/MESOS-6414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15593240#comment-15593240 ] 

Jie Yu commented on MESOS-6414:
-------------------------------

I think we need to revisit the whole cgroups destroy path given that there could be multiple entities to mutate the same cgroup.

I think it makes sense that the process launched my Mesos wants to manipulate its own cgroup (e.g., sub-divide cgroups for tasks). However, I don't think it makes sense to allow a process on the agent to manipulate the same cgroup managed by Mesos. Even if Mesos supports that, the process on the agent might not tolerate that.

If we keep that in mind, i think the correct sequence should be:
1) Try to kill all processes in the cgroup (including all nested cgroups). This makes sure that the process that can manipulate nested cgroups goes away.
2) Try to remove all cgroups.

> cgroups isolator cleanup failed when the hierarchy is cleanup by docker daemon 
> -------------------------------------------------------------------------------
>
>                 Key: MESOS-6414
>                 URL: https://issues.apache.org/jira/browse/MESOS-6414
>             Project: Mesos
>          Issue Type: Bug
>          Components: cgroups
>            Reporter: Anindya Sinha
>            Assignee: Anindya Sinha
>            Priority: Minor
>              Labels: containerizer
>             Fix For: 1.2.0
>
>
> Now if we launch a docker container in Mesos containerizer, the racing may happen
> between docker daemon and Mesos containerizer during cgroups operations.
> For example, when the docker container which run in Mesos containerizer OOM exit,
> Mesos containerizer would destroy following hierarchies
> {code}
> /sys/fs/cgroup/freezer/mesos/<mesos-cgroup>/<docker-cgroup>
> /sys/fs/cgroup/freezer/mesos/<mesos-cgroup>
> {code}
> But the docker daemon may destroy 
> {code}
> /sys/fs/cgroup/freezer/mesos/<mesos-cgroup>/<docker-cgroup>
> {code}
> at the same time.
> If the docker daemon destroy the hierarchy first, then the Mesos containerizer would
> failed during {{CgroupsIsolatorProcess::cleanup}} because it could not find that hierarchy
> when destroying.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)