You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mesos.apache.org by Ben Mahler <be...@gmail.com> on 2013/10/02 22:05:59 UTC

Re: Review Request 14024: cgroup_isolator: Allow kernel to handle OOM

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/14024/#review26625
-----------------------------------------------------------


Thanks David! Please mark as submitted.

- Ben Mahler


On Sept. 6, 2013, 11:05 p.m., David Mackey wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/14024/
> -----------------------------------------------------------
> 
> (Updated Sept. 6, 2013, 11:05 p.m.)
> 
> 
> Review request for mesos, Benjamin Hindman, Ben Mahler, Eric Biederman, and Vinod Kone.
> 
> 
> Bugs: MESOS-662
>     https://issues.apache.org/jira/browse/MESOS-662
> 
> 
> Repository: mesos-git
> 
> 
> Description
> -------
> 
> I post this partially as a RFC. I'm in favor of this approach but happy to have the discussion here.
> 
> The Mesos userspace OOM handler does not conform to the practical
> restrictions imposed upon it given the potential states the kernel can
> be in when it gets the OOM notification. The result of this has been
> numerous deadlocks because the Mesos OOM handler blocks on a lock that
> is being held by the task it is trying to kill.
> 
> This patch does not try to fix the issues with the OOM handler. Instead,
> it hands over the job of OOM-killing to the kernel. The end result is
> very similar. The downside to this approach compared to the approach
> it's moving away from is now when the Mesos OOM handler reads the
> memory.stats they will be after the oom condition occurred. The "maximum
> usage" is still captured but the breakdown is lost. This exposes another
> weakness in the memcg implementation regarding page cache awareness.
> However, the reliability improvements outweigh the weakness in stats.
> 
> 
> Diffs
> -----
> 
>   src/linux/cgroups.hpp 5ee64d6 
>   src/linux/cgroups.cpp 813dcb3 
>   src/slave/cgroups_isolator.cpp a1f5b32 
> 
> Diff: https://reviews.apache.org/r/14024/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> David Mackey
> 
>