You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mesos.apache.org by Shiyao Ma <i...@introo.me> on 2016/03/17 17:21:06 UTC

How to kill tasks when memory exceeds the cgroup limit?

Hi,


For the slave side:
export MESOS_RESOURCES='cpus:4;mem:180'
export MESOS_ISOLATION='cgroups/cpu,cgroups/mem'

For the framework,
It accepts the offer from the slave and sends tasks with memory spec less
than offered.


However, the task actually *deliberately* asks for an arbitrary large
memory during runtime.

My assumption is that the slave will kill the task.  However, it doesn't.

So here goes my question. How does slave handle the 'runtime memory
exceeding cgroup limit' behavior? Will any handlers be invoked?



Regards.

Re: How to kill tasks when memory exceeds the cgroup limit?

Posted by haosdent <ha...@gmail.com>.
Does your oom killer enable? Could check by cat memory.oom_control file.

On Fri, Mar 18, 2016 at 12:21 AM, Shiyao Ma <i...@introo.me> wrote:

> Hi,
>
>
> For the slave side:
> export MESOS_RESOURCES='cpus:4;mem:180'
> export MESOS_ISOLATION='cgroups/cpu,cgroups/mem'
>
> For the framework,
> It accepts the offer from the slave and sends tasks with memory spec less
> than offered.
>
>
> However, the task actually *deliberately* asks for an arbitrary large
> memory during runtime.
>
> My assumption is that the slave will kill the task.  However, it doesn't.
>
> So here goes my question. How does slave handle the 'runtime memory
> exceeding cgroup limit' behavior? Will any handlers be invoked?
>
>
>
> Regards.
>



-- 
Best Regards,
Haosdent Huang

Re: How to kill tasks when memory exceeds the cgroup limit?

Posted by Dick Davies <di...@hellooperator.net>.
On 18 March 2016 at 20:58, Benjamin Mahler <bm...@apache.org> wrote:
> Interesting, why does it take down the slaves?

This was a good while back, but when swap gets low our slaves kernel
OOM killer tended to mess things up.

> Because a lot of organizations run with swap disabled (e.g. for more
> deterministic performance), we originally did not set the swap limit at all.
> When we introduced the '--cgroups_limit_swap' flag we had to make it default
> to false initially in case any users were depending on the original behavior
> of no swap limit. Now that it's been available for some time, we can
> consider moving the default to true. This is actually reflected in the TODO
> alongside the flag:
>
> https://github.com/apache/mesos/blob/0.28.0/src/slave/flags.cpp#L331-L336
>
> Want to send a patch? We'd need to communicate this change to the default
> behavior in the CHANGELOG and specify how users can keep the original
> behaviour.

I'll see if I can get time - just about to finish a consulting gig and
was going to take a break,
so it might be an option.

Thanks for the explanation, I *knew* there'd be a reason :)


> Also, there's more we would need to do in the long term for use cases that
> desire swapping. The only support today is (1) no memory limits (2) memory
> limit and no swap limit (3) both memory and swap limits. You can imagine
> scenarios where users may want to control how much they're allowed to swap,
> or maybe we want to swap for non-latency sensitive containers. However, it's
> more complicated (the user and operator have to co-operate more, there are
> more ways to run things, etc), and so the general advice is to disable swap
> to keep things simple and deterministic.
>
> On Fri, Mar 18, 2016 at 11:34 AM, Dick Davies <di...@hellooperator.net>
> wrote:
>>
>> Great!
>>
>> I'm not really sure why mesos even allows RSS limiting without VMEM,
>> it takes down slaves like the Black Death
>> if you accidentally deploy a 'leaker'. I'm sure there's a use case I'm
>> not seeing :)
>>
>> On 18 March 2016 at 16:27, Shiyao Ma <i...@introo.me> wrote:
>> > Thanks. The limit_swap works.
>
>

Re: How to kill tasks when memory exceeds the cgroup limit?

Posted by Benjamin Mahler <bm...@apache.org>.
Interesting, why does it take down the slaves?

Because a lot of organizations run with swap disabled (e.g. for more
deterministic performance), we originally did not set the swap limit at
all. When we introduced the '--cgroups_limit_swap' flag we had to make it
default to false initially in case any users were depending on the original
behavior of no swap limit. Now that it's been available for some time, we
can consider moving the default to true. This is actually reflected in the
TODO alongside the flag:

https://github.com/apache/mesos/blob/0.28.0/src/slave/flags.cpp#L331-L336

Want to send a patch? We'd need to communicate this change to the default
behavior in the CHANGELOG and specify how users can keep the original
behavior.

Also, there's more we would need to do in the long term for use cases that
desire swapping. The only support today is (1) no memory limits (2) memory
limit and no swap limit (3) both memory and swap limits. You can imagine
scenarios where users may want to control how much they're allowed to swap,
or maybe we want to swap for non-latency sensitive containers. However,
it's more complicated (the user and operator have to co-operate more, there
are more ways to run things, etc), and so the general advice is to disable
swap to keep things simple and deterministic.

On Fri, Mar 18, 2016 at 11:34 AM, Dick Davies <di...@hellooperator.net>
wrote:

> Great!
>
> I'm not really sure why mesos even allows RSS limiting without VMEM,
> it takes down slaves like the Black Death
> if you accidentally deploy a 'leaker'. I'm sure there's a use case I'm
> not seeing :)
>
> On 18 March 2016 at 16:27, Shiyao Ma <i...@introo.me> wrote:
> > Thanks. The limit_swap works.
>

Re: How to kill tasks when memory exceeds the cgroup limit?

Posted by Dick Davies <di...@hellooperator.net>.
Great!

I'm not really sure why mesos even allows RSS limiting without VMEM,
it takes down slaves like the Black Death
if you accidentally deploy a 'leaker'. I'm sure there's a use case I'm
not seeing :)

On 18 March 2016 at 16:27, Shiyao Ma <i...@introo.me> wrote:
> Thanks. The limit_swap works.

Re: How to kill tasks when memory exceeds the cgroup limit?

Posted by Shiyao Ma <i...@introo.me>.
Thanks. The limit_swap works.

Re: How to kill tasks when memory exceeds the cgroup limit?

Posted by Dick Davies <di...@hellooperator.net>.
Last time I tried (not on the latest release) I also had to have
cgroups set to limit swap, otherwise
as soon as the process hit the RAM limit it would just start to consume swap.

try adding --cgroups_limit_swap to the slaves startup flags.

On 17 March 2016 at 16:21, Shiyao Ma <i...@introo.me> wrote:
> Hi,
>
>
> For the slave side:
> export MESOS_RESOURCES='cpus:4;mem:180'
> export MESOS_ISOLATION='cgroups/cpu,cgroups/mem'
>
> For the framework,
> It accepts the offer from the slave and sends tasks with memory spec less
> than offered.
>
>
> However, the task actually *deliberately* asks for an arbitrary large memory
> during runtime.
>
> My assumption is that the slave will kill the task.  However, it doesn't.
>
> So here goes my question. How does slave handle the 'runtime memory
> exceeding cgroup limit' behavior? Will any handlers be invoked?
>
>
>
> Regards.