You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mesos.apache.org by "Zhitao Li (JIRA)" <ji...@apache.org> on 2018/01/24 19:40:00 UTC

[jira] [Comment Edited] (MESOS-8480) Mesos returns high resource usage when killing a Docker task.

    [ https://issues.apache.org/jira/browse/MESOS-8480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16338114#comment-16338114 ] 

Zhitao Li edited comment on MESOS-8480 at 1/24/18 7:39 PM:
-----------------------------------------------------------

Will this be also cherrypicked to 1.5.0 since the RC is still not finalized yet?


was (Author: zhitao):
Will this be also back ported to 1.5.0 since the RC is still not finalized yet?

> Mesos returns high resource usage when killing a Docker task.
> -------------------------------------------------------------
>
>                 Key: MESOS-8480
>                 URL: https://issues.apache.org/jira/browse/MESOS-8480
>             Project: Mesos
>          Issue Type: Bug
>          Components: cgroups
>            Reporter: Chun-Hung Hsiao
>            Assignee: Chun-Hung Hsiao
>            Priority: Major
>             Fix For: 1.3.2, 1.4.2, 1.6.0, 1.5.1
>
>         Attachments: test.cpp
>
>
> The way we get resource statistics for Docker tasks is through getting the cgroup subsystem path through {{/proc/<pid>/cgroup}} first (taking the {{cpuacct}} subsystem as an example):
> {noformat}
> 9:cpuacct,cpu:/docker/66fbe67b64ad3a86c6e080e18578bc9e540e55ee0bdcae09c2e131a4264a3a3b
> {noformat}
> Then read {{/sys/fs/cgroup/cpuacct//docker/66fbe67b64ad3a86c6e080e18578bc9e540e55ee0bdcae09c2e131a4264a3a3b/cpuacct.stat}} to get the statistics:
> {noformat}
> user 4
> system 0
> {noformat}
> However, when a Docker container is being teared down, it seems that Docker or the operation system will first move the process to the root cgroup before actually killing it, making {{/proc/<pid>/docker}} look like the following:
> {noformat}
> 9:cpuacct,cpu:/
> {noformat}
> This makes a racy call to [{{cgroup::internal::cgroup()}}|https://github.com/apache/mesos/blob/master/src/linux/cgroups.cpp#L1935] return a single '/', which in turn makes [{{DockerContainerizerProcess::cgroupsStatistics()}}|https://github.com/apache/mesos/blob/master/src/slave/containerizer/docker.cpp#L1991] read {{/sys/fs/cgroup/cpuacct///cpuacct.stat}}, which contains the statistics for the root cgroup:
> {noformat}
> user 228058750
> system 24506461
> {noformat}
> This can be reproduced by [^test.cpp] with the following command:
> {noformat}
> $ docker run --name sleep -d --rm alpine sleep 1000; ./test $(docker inspect sleep | jq .[].State.Pid) & sleep 1 && docker rm -f sleep
> ...
> Reading file '/proc/44224/cgroup'
> Reading file '/sys/fs/cgroup/cpuacct//docker/1d79a6c877e2af3081630aa57d23d853e6bd7d210dad28f897556bfea20bc9c1/cpuacct.stat'
> user 4
> system 0
> Reading file '/proc/44224/cgroup'
> Reading file '/sys/fs/cgroup/cpuacct///cpuacct.stat'
> user 228058750
> system 24506461
> Reading file '/proc/44224/cgroup'
> Reading file '/sys/fs/cgroup/cpuacct///cpuacct.stat'
> user 228058750
> system 24506461
> Failed to open file '/proc/44224/cgroup'
> sleep
> [2]-  Exit 1                  ./test $(docker inspect sleep | jq .[].State.Pid)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)