You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mesos.apache.org by Ben Mahler <be...@gmail.com> on 2013/10/02 22:05:59 UTC
Re: Review Request 14024: cgroup_isolator: Allow kernel to handle OOM
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/14024/#review26625
-----------------------------------------------------------
Thanks David! Please mark as submitted.
- Ben Mahler
On Sept. 6, 2013, 11:05 p.m., David Mackey wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/14024/
> -----------------------------------------------------------
>
> (Updated Sept. 6, 2013, 11:05 p.m.)
>
>
> Review request for mesos, Benjamin Hindman, Ben Mahler, Eric Biederman, and Vinod Kone.
>
>
> Bugs: MESOS-662
> https://issues.apache.org/jira/browse/MESOS-662
>
>
> Repository: mesos-git
>
>
> Description
> -------
>
> I post this partially as a RFC. I'm in favor of this approach but happy to have the discussion here.
>
> The Mesos userspace OOM handler does not conform to the practical
> restrictions imposed upon it given the potential states the kernel can
> be in when it gets the OOM notification. The result of this has been
> numerous deadlocks because the Mesos OOM handler blocks on a lock that
> is being held by the task it is trying to kill.
>
> This patch does not try to fix the issues with the OOM handler. Instead,
> it hands over the job of OOM-killing to the kernel. The end result is
> very similar. The downside to this approach compared to the approach
> it's moving away from is now when the Mesos OOM handler reads the
> memory.stats they will be after the oom condition occurred. The "maximum
> usage" is still captured but the breakdown is lost. This exposes another
> weakness in the memcg implementation regarding page cache awareness.
> However, the reliability improvements outweigh the weakness in stats.
>
>
> Diffs
> -----
>
> src/linux/cgroups.hpp 5ee64d6
> src/linux/cgroups.cpp 813dcb3
> src/slave/cgroups_isolator.cpp a1f5b32
>
> Diff: https://reviews.apache.org/r/14024/diff/
>
>
> Testing
> -------
>
>
> Thanks,
>
> David Mackey
>
>