You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mesos.apache.org by "Charles Reiss (Created) (JIRA)" <ji...@apache.org> on 2011/10/26 22:47:32 UTC

[jira] [Created] (MESOS-47) Kill entire containers on OOM with LXC isolation module

Kill entire containers on OOM with LXC isolation module
-------------------------------------------------------

                 Key: MESOS-47
                 URL: https://issues.apache.org/jira/browse/MESOS-47
             Project: Mesos
          Issue Type: Improvement
          Components: isolation
         Environment: Linux with container-based isolation
            Reporter: Charles Reiss


When using the LXC isolation module, the kernel OOM killer will kill a victim process in the container when the container exceeds its memory limit. When the container contains multiple processes this can cause weird failures.

Instead, Mesos should use the memory cgroup's oom_control feature to disable OOM kills (instead, processes requesting memory block) and have the slave be informed of OOM events using an eventfd. When the slave receives OOM messages on this event fd, it should kill all processes in the over-limit executor's container.

(These OOM events only happen when a container exceeds its hard memory limit. If Mesos does overcommit of memory in the future, then it should have a outer cgroup with memory hard limits and memory.use_hierarchy enabled on which to get OOM events (so they don't turn into global OOM kills). Mesos will need to have code to figure out which executors are exceeding their "soft" memory limits and choose a victim executor.)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Closed] (MESOS-47) Kill entire containers on OOM with LXC isolation module

Posted by "Chris Lambert (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MESOS-47?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chris Lambert closed MESOS-47.
------------------------------

    Resolution: Won't Fix

Deprecated in favor of cgroups.
                
> Kill entire containers on OOM with LXC isolation module
> -------------------------------------------------------
>
>                 Key: MESOS-47
>                 URL: https://issues.apache.org/jira/browse/MESOS-47
>             Project: Mesos
>          Issue Type: Improvement
>          Components: isolation
>         Environment: Linux with container-based isolation
>            Reporter: Charles Reiss
>            Assignee: Benjamin Hindman
>              Labels: lxc
>
> When using the LXC isolation module, the kernel OOM killer will kill a victim process in the container when the container exceeds its memory limit. When the container contains multiple processes this can cause weird failures.
> Instead, Mesos should use the memory cgroup's oom_control feature to disable OOM kills (instead, processes requesting memory block) and have the slave be informed of OOM events using an eventfd. When the slave receives OOM messages on this event fd, it should kill all processes in the over-limit executor's container.
> (These OOM events only happen when a container exceeds its hard memory limit. If Mesos does overcommit of memory in the future, then it should have a outer cgroup with memory hard limits and memory.use_hierarchy enabled on which to get OOM events (so they don't turn into global OOM kills). Mesos will need to have code to figure out which executors are exceeding their "soft" memory limits and choose a victim executor.)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (MESOS-47) Kill entire containers on OOM with LXC isolation module

Posted by "Charles Reiss (Assigned) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MESOS-47?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Charles Reiss reassigned MESOS-47:
----------------------------------

    Assignee: Benjamin Hindman

Ben has apparently started working on this.
                
> Kill entire containers on OOM with LXC isolation module
> -------------------------------------------------------
>
>                 Key: MESOS-47
>                 URL: https://issues.apache.org/jira/browse/MESOS-47
>             Project: Mesos
>          Issue Type: Improvement
>          Components: isolation
>         Environment: Linux with container-based isolation
>            Reporter: Charles Reiss
>            Assignee: Benjamin Hindman
>              Labels: lxc
>
> When using the LXC isolation module, the kernel OOM killer will kill a victim process in the container when the container exceeds its memory limit. When the container contains multiple processes this can cause weird failures.
> Instead, Mesos should use the memory cgroup's oom_control feature to disable OOM kills (instead, processes requesting memory block) and have the slave be informed of OOM events using an eventfd. When the slave receives OOM messages on this event fd, it should kill all processes in the over-limit executor's container.
> (These OOM events only happen when a container exceeds its hard memory limit. If Mesos does overcommit of memory in the future, then it should have a outer cgroup with memory hard limits and memory.use_hierarchy enabled on which to get OOM events (so they don't turn into global OOM kills). Mesos will need to have code to figure out which executors are exceeding their "soft" memory limits and choose a victim executor.)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira