You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mesos.apache.org by "Chris Lambert (JIRA)" <ji...@apache.org> on 2014/05/05 21:09:16 UTC

[jira] [Updated] (MESOS-287) have cgroups isolation module oversubscribe in order to killTask rather than OOM

     [ https://issues.apache.org/jira/browse/MESOS-287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chris Lambert updated MESOS-287:
--------------------------------

    Sprint: Q2'14 Sprint 1

> have cgroups isolation module oversubscribe in order to killTask rather than OOM
> --------------------------------------------------------------------------------
>
>                 Key: MESOS-287
>                 URL: https://issues.apache.org/jira/browse/MESOS-287
>             Project: Mesos
>          Issue Type: Improvement
>          Components: isolation
>            Reporter: brian wickman
>
> Right now if you set the cgroup memory limit to exactly the memory specified in the ExecutorInfo, you're probably going to get an OOM that is out of your control, leaving your application in a state that's hard to diagnose.
> What would be ideal is to set the memory limit of the cgroup to some (1+epsilon) * memory limit, where epsilon is initially something like, e.g. .2, but continuously varies depending upon the allocated resources on the box.  Obviously epsilon might need to go to 0 as the box becomes heavily subscribed.
> The higher the epsilon however, the higher the chance you observe memory overuse prior to the OOM kill, allowing you to invoke a killTask which gives the executor the opportunity to invoke cleanup routines and the like.



--
This message was sent by Atlassian JIRA
(v6.2#6252)