You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mesos.apache.org by "Vinod Kone (JIRA)" <ji...@apache.org> on 2016/07/12 21:14:20 UTC
[jira] [Commented] (MESOS-5836) Memory cgroup leakage in 4.2, 4.4, 4.5 kernels

    [ https://issues.apache.org/jira/browse/MESOS-5836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15373708#comment-15373708 ] 

Vinod Kone commented on MESOS-5836:
-----------------------------------

Thanks for the information in the ticket [~jgarcia4]! Have you also filed a ticket with the Docker project for the silent failure? I presume they would like to know as well.

> Memory cgroup leakage in 4.2, 4.4, 4.5 kernels
> ----------------------------------------------
>
>                 Key: MESOS-5836
>                 URL: https://issues.apache.org/jira/browse/MESOS-5836
>             Project: Mesos
>          Issue Type: Bug
>          Components: containerization
>    Affects Versions: 0.28.1, 0.28.2, 1.0.0, 1.1.0
>            Reporter: John Garcia
>              Labels: mesosphere
>
> We've noticed an issue with kernel versions 4.2, 4.4, and 4.5 where memory cgroups are not cleaned up by the system. When the register fills up with 65336 cgroups, additional cgroups cannot be formed because there's no IDs for the new cgroup, and ENOSPC is returned. This is a concern for the Mesos project because no further containers can be created by Mesos in this state. We tested Docker 1.8.3, and Docker 1.8.3 will silently fail to build the memory cgroup, resulting in rogue containers that are memory-unbound.
> h3. Steps to reproduce:
> *NOTE: Mesos is not required to reproduce this issue*
> - Start a new instance using kernel 4.2, 4.4, or 4.5 (CoreOS 766-1010, Ubuntu 16.04) 
> - ssh to the machine
> - {{cat /proc/cgroups}} to determine the number of memory cgroups
> - Run several docker containers using the {{--memory}} or {{-m}} option to set a memory isolator, either in parallel or in series
> - Stop all containers
> - {{cat /proc/cgroups}} to review the number of memory cgroups and compare to previous run
> - Optional: Run 65,336 docker containers using memory isolation and then try to launch a Mesos container
> h3. Differential diagnosis:
> When the cgroup limit is exceeded, subsequent container terminations will draw the following error in {{dmesg}}:
> {code}idr_remove called for id=65536 which is not allocated.{code}
> Subsequent efforts to create a cgroup folder will fail:
> {code}/sys/fs/cgroup/memory/mesos $ df .
> Filesystem     1K-blocks  Used Available Use% Mounted on
> cgroup                 0     0         0    - /sys/fs/cgroup/memory
> /sys/fs/cgroup/memory/mesos $ sudo mkdir foo
> mkdir: cannot create directory 'foo': No space left on device{code}
> Subsequently launched Docker containers will fail to utilize memory isolation: {code}/sys/fs/cgroup/memory/mesos $ docker run -m 32m -d example/busybox sleep 10000
> ...
> /sys/fs/cgroup/memory/mesos $ docker ps | grep busybox
> 849c66081229        example/busybox                                                         "sleep 10000"            6 seconds ago       Up 4 seconds                                                                                    suspicious_mahavira
> /sys/fs/cgroup/memory/mesos $ find /sys/fs/cgroup -name "*849c66081229*"
> /sys/fs/cgroup/blkio/system.slice/docker-849c6608122989f1bc9ae39a5c70281228a304092baa0d73d9430ed94223f554.scope
> /sys/fs/cgroup/freezer/system.slice/docker-849c6608122989f1bc9ae39a5c70281228a304092baa0d73d9430ed94223f554.scope
> /sys/fs/cgroup/devices/system.slice/docker-849c6608122989f1bc9ae39a5c70281228a304092baa0d73d9430ed94223f554.scope
> /sys/fs/cgroup/cpu,cpuacct/system.slice/docker-849c6608122989f1bc9ae39a5c70281228a304092baa0d73d9430ed94223f554.scope
> /sys/fs/cgroup/cpuset/system.slice/docker-849c6608122989f1bc9ae39a5c70281228a304092baa0d73d9430ed94223f554.scope
> /sys/fs/cgroup/net_cls,net_prio/system.slice/docker-849c6608122989f1bc9ae39a5c70281228a304092baa0d73d9430ed94223f554.scope
> /sys/fs/cgroup/systemd/system.slice/docker-849c6608122989f1bc9ae39a5c70281228a304092baa0d73d9430ed94223f554.scope
> /sys/fs/cgroup/memory/mesos $ {code}
> Mesos containerizer will fail with {{No space left on device}}:
> {code}E0707 20:17:29.091142 105665 slave.cpp:3802] Container 'ef5419cf-9d00-425a-a9ee-a848d330bfb2' for executor 'node-0_executor__42a4fafe-f64d-4b41-91d2-efc20a86a6a3' of framework d6ab251a-064a-46a0-a1c8-9ee559f3b44a-0023 failed to start: Failed to prepare isolator: Failed to create directory '/sys/fs/cgroup/memory/mesos/ef5419cf-9d00-425a-a9ee-a848d330bfb2': No space left on device
> {code}
> h3. Remediation
> Once a system is found to be affected, the following command can be used to drop all page caches, which allows the system to reap all of the old cgroups and return to normal operation.
> {code}echo 1 > /proc/sys/vm/drop_caches{code}
> We suspect that [patch 9184539|https://patchwork.kernel.org/patch/9184539/] could fix it, but we have not yet tested.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)