You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mesos.apache.org by "Chris Lambert (JIRA)" <ji...@apache.org> on 2012/09/08 00:16:07 UTC

[jira] [Closed] (MESOS-47) Kill entire containers on OOM with LXC isolation module

     [ https://issues.apache.org/jira/browse/MESOS-47?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chris Lambert closed MESOS-47.
------------------------------

    Resolution: Won't Fix

Deprecated in favor of cgroups.
                
> Kill entire containers on OOM with LXC isolation module
> -------------------------------------------------------
>
>                 Key: MESOS-47
>                 URL: https://issues.apache.org/jira/browse/MESOS-47
>             Project: Mesos
>          Issue Type: Improvement
>          Components: isolation
>         Environment: Linux with container-based isolation
>            Reporter: Charles Reiss
>            Assignee: Benjamin Hindman
>              Labels: lxc
>
> When using the LXC isolation module, the kernel OOM killer will kill a victim process in the container when the container exceeds its memory limit. When the container contains multiple processes this can cause weird failures.
> Instead, Mesos should use the memory cgroup's oom_control feature to disable OOM kills (instead, processes requesting memory block) and have the slave be informed of OOM events using an eventfd. When the slave receives OOM messages on this event fd, it should kill all processes in the over-limit executor's container.
> (These OOM events only happen when a container exceeds its hard memory limit. If Mesos does overcommit of memory in the future, then it should have a outer cgroup with memory hard limits and memory.use_hierarchy enabled on which to get OOM events (so they don't turn into global OOM kills). Mesos will need to have code to figure out which executors are exceeding their "soft" memory limits and choose a victim executor.)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira