You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mesos.apache.org by "John Garcia (JIRA)" <ji...@apache.org> on 2016/07/12 18:21:20 UTC
[jira] [Created] (MESOS-5836) Cgroup Leakage in 4.2, 4.4, 4.5 kernels

John Garcia created MESOS-5836:
----------------------------------

             Summary: Cgroup Leakage in 4.2, 4.4, 4.5 kernels
                 Key: MESOS-5836
                 URL: https://issues.apache.org/jira/browse/MESOS-5836
             Project: Mesos
          Issue Type: Bug
          Components: containerization
    Affects Versions: 0.28.2, 0.28.1, 1.0.0, 1.1.0
            Reporter: John Garcia


We've noticed an issue with kernel versions 4.2, 4.4, and 4.5 where memory cgroups are not cleaned up by the system. When the register fills up with 65336 cgroups, additional cgroups cannot be formed because there's no address space, and ENOSPC is returned. This is a concern for the Mesos project because no further containers can be created by Mesos in this state (and Docker containers will silently fail to build the memory isolator, resulting in rogue containers that are memory-unbound).

h3. Steps to reproduce:
*NOTE: Mesos is not required to reproduce this issue*

- Start a new instance using kernel 4.2, 4.4, or 4.5 (CoreOS 766-1010, Ubuntu 16.04) 
- ssh to the machine
- {{cat /proc/cgroups}} to determine the number of memory cgroups
- Run several docker containers using the {{--memory}} or {{-m}} option to set a memory isolator, either in parallel or in series
- Stop all containers
- {{cat /proc/cgroups}} to review the number of memory cgroups and compare to previous run
- Optional: Run 65,336 docker containers using memory isolation and then try to launch a Mesos container

h3. Differential diagnosis:

When the cgroup limit is exceeded, subsequent container terminations will draw the following error in {{dmesg}}:
{code}idr_remove called for id=65536 which is not allocated.{code}
Subsequent efforts to create a cgroup folder will fail:
{code}/sys/fs/cgroup/memory/mesos $ df .
Filesystem     1K-blocks  Used Available Use% Mounted on
cgroup                 0     0         0    - /sys/fs/cgroup/memory
/sys/fs/cgroup/memory/mesos $ sudo mkdir foo
mkdir: cannot create directory 'foo': No space left on device{code}
Subsequently launched Docker containers will fail to utilize memory isolation: {code}/sys/fs/cgroup/memory/mesos $ docker run -m 32m -d 10.1.13.1:9000/montana/busybox sleep 10000

...

/sys/fs/cgroup/memory/mesos $ docker ps | grep busybox
849c66081229        example/busybox                                                         "sleep 10000"            6 seconds ago       Up 4 seconds                                                                                    suspicious_mahavira

/sys/fs/cgroup/memory/mesos $ find /sys/fs/cgroup -name "*849c66081229*"
/sys/fs/cgroup/blkio/system.slice/docker-849c6608122989f1bc9ae39a5c70281228a304092baa0d73d9430ed94223f554.scope
/sys/fs/cgroup/freezer/system.slice/docker-849c6608122989f1bc9ae39a5c70281228a304092baa0d73d9430ed94223f554.scope
/sys/fs/cgroup/devices/system.slice/docker-849c6608122989f1bc9ae39a5c70281228a304092baa0d73d9430ed94223f554.scope
/sys/fs/cgroup/cpu,cpuacct/system.slice/docker-849c6608122989f1bc9ae39a5c70281228a304092baa0d73d9430ed94223f554.scope
/sys/fs/cgroup/cpuset/system.slice/docker-849c6608122989f1bc9ae39a5c70281228a304092baa0d73d9430ed94223f554.scope
/sys/fs/cgroup/net_cls,net_prio/system.slice/docker-849c6608122989f1bc9ae39a5c70281228a304092baa0d73d9430ed94223f554.scope
/sys/fs/cgroup/systemd/system.slice/docker-849c6608122989f1bc9ae39a5c70281228a304092baa0d73d9430ed94223f554.scope
/sys/fs/cgroup/memory/mesos $ {code}
Mesos containerizer will fail with {{No space left on device}}:
{code}E0707 20:17:29.091142 105665 slave.cpp:3802] Container 'ef5419cf-9d00-425a-a9ee-a848d330bfb2' for executor 'node-0_executor__42a4fafe-f64d-4b41-91d2-efc20a86a6a3' of framework d6ab251a-064a-46a0-a1c8-9ee559f3b44a-0023 failed to start: Failed to prepare isolator: Failed to create directory '/sys/fs/cgroup/memory/mesos/ef5419cf-9d00-425a-a9ee-a848d330bfb2': No space left on device
{code}

h3. Remediation

Once a system is found to be affected, the following command can be used to drop all page caches, which allows the system to reap all of the old cgroups and return to normal operation.
{code}echo 1 > /proc/sys/vm/drop_caches{code}

We suspect that [patch 9184539|https://patchwork.kernel.org/patch/9184539/] could fix it , but we have not yet tested.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)