You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@mesos.apache.org by Benjamin Hindman <be...@berkeley.edu> on 2012/09/21 04:01:11 UTC

Review Request: Updated cgroup freezer to retry after failed attempts (rather than just waiting indefinitely).

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/7203/
-----------------------------------------------------------

Review request for mesos, Vinod Kone, Brian Wickman, and Jie Yu.


Description
-------


Diffs
-----

  src/linux/cgroups.cpp 4efd06e 

Diff: https://reviews.apache.org/r/7203/diff/


Testing
-------


Thanks,

Benjamin Hindman

Re: Review Request: Updated cgroup freezer to retry after failed attempts (rather than just waiting indefinitely).

Posted by Benjamin Mahler <bm...@twitter.com>.

Seems ok, I'm surprised a FROZEN loop doesn't work.

It would be interesting to have some introspection on how many iterations
this takes in practice, I guess this could be done with some unix-fu on the
logs.

On Wed, Oct 3, 2012 at 9:31 AM, Benjamin Hindman <be...@berkeley.edu> wrote:

> I think the best we can do here is _try_ and send a SIGKILL to all
> processes (in R or T or S or whatever) after we write FROZEN to
> freezer.state and we find out everything is still in FREEZING. We'll
> continue to write FROZEN to freezer.state after the interval AND we'll
> continue to try sending SIGKILL. Hopefully these two mechanisms will
> _eventually_ get everything cleaned up.
>
> How does that sound?
>
>
>
>
> On Sat, Sep 22, 2012 at 10:09 AM, Jie Yu <yu...@gmail.com> wrote:
>
> > Also, I don't understand what you mean here? Could you elaborate?
> >
> >
> > If you have two running process to kill, you cannot send a SIGKILL to
> them
> > atomically. As a result, one proces will be killed first (likely), and
> the
> > other process is still making progress (though in a very short interval).
> > That may cause unpredictable errors.
> >
> > - Jie
> >
> > On Sat, Sep 22, 2012 at 2:03 AM, Vinod Kone <vi...@twitter.com> wrote:
> >
> >> Thanks for digging up the kernel code Jie! Its fascinating.
> >>
> >>
> >>> Will that cause potential problems if there are more than 1 process in
> >>> 'R' because the kill is not atomic.
> >>>
> >>>
> >> Also, I don't understand what you mean here? Could you elaborate?
> >>
> >>
> >> Vinod
> >>
> >>
> >>> - Jie
> >>>
> >>> On Fri, Sep 21, 2012 at 9:10 PM, Benjamin Hindman <benh@berkeley.edu
> >wrote:
> >>>
> >>>>
> >>>>
> >>>> > On Sept. 21, 2012, 7 p.m., Vinod Kone wrote:
> >>>> > > lgtm. i've a feeling we need to also do a force kill. but we can
> do
> >>>> this after we see how brian's test pans out.
> >>>>
> >>>> I tried just setting FREEZING to the cgroup freezer.state manually and
> >>>> that didn't seem to work. Meanwhile, I sent a SIGKILL to the process
> in the
> >>>> cgroup still in R, and that got everything to cleanup. So I expect
> that
> >>>> you're correct, and we'll also need to send explicit SIGKILLs to those
> >>>> processes still in R (in fact, probably just to all processes still
> in the
> >>>> cgroup). Review incoming.
> >>>>
> >>>>
> >>>> - Benjamin
> >>>>
> >>>>
> >>>> -----------------------------------------------------------
> >>>>
> >>>> This is an automatically generated e-mail. To reply, visit:
> >>>> https://reviews.apache.org/r/7203/#review11794
> >>>>
> >>>> -----------------------------------------------------------
> >>>>
> >>>>
> >>>> On Sept. 21, 2012, 2:02 a.m., Benjamin Hindman wrote:
> >>>> >
> >>>> > -----------------------------------------------------------
> >>>> > This is an automatically generated e-mail. To reply, visit:
> >>>> > https://reviews.apache.org/r/7203/
> >>>> > -----------------------------------------------------------
> >>>> >
> >>>> > (Updated Sept. 21, 2012, 2:02 a.m.)
> >>>>
> >>>> >
> >>>> >
> >>>> > Review request for mesos, Vinod Kone, Brian Wickman, and Jie Yu.
> >>>> >
> >>>> >
> >>>> > Description
> >>>> > -------
> >>>>
> >>>> >
> >>>> > See summary and
> >>>> http://www.kernel.org/doc/Documentation/cgroups/freezer-subsystem.txt
> :
> >>>> >
> >>>> > It's important to note that freezing can be incomplete. In that case
> >>>> we return
> >>>> > EBUSY. This means that some tasks in the cgroup are busy doing
> >>>> something that
> >>>> > prevents us from completely freezing the cgroup at this time. After
> >>>> EBUSY,
> >>>> > the cgroup will remain partially frozen -- reflected by
> freezer.state
> >>>> reporting
> >>>> > "FREEZING" when read. The state will remain "FREEZING" until one of
> >>>> these
> >>>> > things happens:
> >>>> >
> >>>> >       1) Userspace cancels the freezing operation by writing
> "THAWED"
> >>>> to
> >>>> >               the freezer.state file
> >>>> >       2) Userspace retries the freezing operation by writing
> "FROZEN"
> >>>> to
> >>>> >               the freezer.state file (writing "FREEZING" is not
> legal
> >>>> >               and returns EINVAL)
> >>>> >       3) The tasks that blocked the cgroup from entering the
> "FROZEN"
> >>>> >               state disappear from the cgroup's set of tasks.
> >>>> >
> >>>> >
> >>>> > Diffs
> >>>> > -----
> >>>> >
> >>>> >   src/linux/cgroups.cpp 4efd06e
> >>>> >
> >>>> > Diff: https://reviews.apache.org/r/7203/diff/
> >>>> >
> >>>> >
> >>>> > Testing
> >>>> > -------
> >>>> >
> >>>> >
> >>>> > Thanks,
> >>>> >
> >>>> > Benjamin Hindman
> >>>> >
> >>>> >
> >>>>
> >>>>
> >>>
> >>
> >
>

Re: Review Request: Updated cgroup freezer to retry after failed attempts (rather than just waiting indefinitely).

Posted by Vinod Kone <vi...@twitter.com>.

sgtm

@vinodkone


On Wed, Oct 3, 2012 at 9:31 AM, Benjamin Hindman <be...@berkeley.edu> wrote:

> I think the best we can do here is _try_ and send a SIGKILL to all
> processes (in R or T or S or whatever) after we write FROZEN to
> freezer.state and we find out everything is still in FREEZING. We'll
> continue to write FROZEN to freezer.state after the interval AND we'll
> continue to try sending SIGKILL. Hopefully these two mechanisms will
> _eventually_ get everything cleaned up.
>
> How does that sound?
>
>
>
>
>
> On Sat, Sep 22, 2012 at 10:09 AM, Jie Yu <yu...@gmail.com> wrote:
>
>> Also, I don't understand what you mean here? Could you elaborate?
>>
>>
>> If you have two running process to kill, you cannot send a SIGKILL to
>> them atomically. As a result, one proces will be killed first (likely), and
>> the other process is still making progress (though in a very short
>> interval). That may cause unpredictable errors.
>>
>> - Jie
>>
>> On Sat, Sep 22, 2012 at 2:03 AM, Vinod Kone <vi...@twitter.com> wrote:
>>
>>> Thanks for digging up the kernel code Jie! Its fascinating.
>>>
>>>
>>>> Will that cause potential problems if there are more than 1 process in
>>>> 'R' because the kill is not atomic.
>>>>
>>>>
>>> Also, I don't understand what you mean here? Could you elaborate?
>>>
>>>
>>> Vinod
>>>
>>>
>>>> - Jie
>>>>
>>>> On Fri, Sep 21, 2012 at 9:10 PM, Benjamin Hindman <be...@berkeley.edu>wrote:
>>>>
>>>>>
>>>>>
>>>>> > On Sept. 21, 2012, 7 p.m., Vinod Kone wrote:
>>>>> > > lgtm. i've a feeling we need to also do a force kill. but we can
>>>>> do this after we see how brian's test pans out.
>>>>>
>>>>> I tried just setting FREEZING to the cgroup freezer.state manually and
>>>>> that didn't seem to work. Meanwhile, I sent a SIGKILL to the process in the
>>>>> cgroup still in R, and that got everything to cleanup. So I expect that
>>>>> you're correct, and we'll also need to send explicit SIGKILLs to those
>>>>> processes still in R (in fact, probably just to all processes still in the
>>>>> cgroup). Review incoming.
>>>>>
>>>>>
>>>>> - Benjamin
>>>>>
>>>>>
>>>>> -----------------------------------------------------------
>>>>>
>>>>> This is an automatically generated e-mail. To reply, visit:
>>>>> https://reviews.apache.org/r/7203/#review11794
>>>>>
>>>>> -----------------------------------------------------------
>>>>>
>>>>>
>>>>> On Sept. 21, 2012, 2:02 a.m., Benjamin Hindman wrote:
>>>>> >
>>>>> > -----------------------------------------------------------
>>>>> > This is an automatically generated e-mail. To reply, visit:
>>>>> > https://reviews.apache.org/r/7203/
>>>>> > -----------------------------------------------------------
>>>>> >
>>>>> > (Updated Sept. 21, 2012, 2:02 a.m.)
>>>>>
>>>>> >
>>>>> >
>>>>> > Review request for mesos, Vinod Kone, Brian Wickman, and Jie Yu.
>>>>> >
>>>>> >
>>>>> > Description
>>>>> > -------
>>>>>
>>>>> >
>>>>> > See summary and
>>>>> http://www.kernel.org/doc/Documentation/cgroups/freezer-subsystem.txt:
>>>>> >
>>>>> > It's important to note that freezing can be incomplete. In that case
>>>>> we return
>>>>> > EBUSY. This means that some tasks in the cgroup are busy doing
>>>>> something that
>>>>> > prevents us from completely freezing the cgroup at this time. After
>>>>> EBUSY,
>>>>> > the cgroup will remain partially frozen -- reflected by
>>>>> freezer.state reporting
>>>>> > "FREEZING" when read. The state will remain "FREEZING" until one of
>>>>> these
>>>>> > things happens:
>>>>> >
>>>>> >       1) Userspace cancels the freezing operation by writing
>>>>> "THAWED" to
>>>>> >               the freezer.state file
>>>>> >       2) Userspace retries the freezing operation by writing
>>>>> "FROZEN" to
>>>>> >               the freezer.state file (writing "FREEZING" is not legal
>>>>> >               and returns EINVAL)
>>>>> >       3) The tasks that blocked the cgroup from entering the "FROZEN"
>>>>> >               state disappear from the cgroup's set of tasks.
>>>>> >
>>>>> >
>>>>> > Diffs
>>>>> > -----
>>>>> >
>>>>> >   src/linux/cgroups.cpp 4efd06e
>>>>> >
>>>>> > Diff: https://reviews.apache.org/r/7203/diff/
>>>>> >
>>>>> >
>>>>> > Testing
>>>>> > -------
>>>>> >
>>>>> >
>>>>> > Thanks,
>>>>> >
>>>>> > Benjamin Hindman
>>>>> >
>>>>> >
>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Review Request: Updated cgroup freezer to retry after failed attempts (rather than just waiting indefinitely).

Posted by Benjamin Hindman <be...@berkeley.edu>.

I think the best we can do here is _try_ and send a SIGKILL to all
processes (in R or T or S or whatever) after we write FROZEN to
freezer.state and we find out everything is still in FREEZING. We'll
continue to write FROZEN to freezer.state after the interval AND we'll
continue to try sending SIGKILL. Hopefully these two mechanisms will
_eventually_ get everything cleaned up.

How does that sound?




On Sat, Sep 22, 2012 at 10:09 AM, Jie Yu <yu...@gmail.com> wrote:

> Also, I don't understand what you mean here? Could you elaborate?
>
>
> If you have two running process to kill, you cannot send a SIGKILL to them
> atomically. As a result, one proces will be killed first (likely), and the
> other process is still making progress (though in a very short interval).
> That may cause unpredictable errors.
>
> - Jie
>
> On Sat, Sep 22, 2012 at 2:03 AM, Vinod Kone <vi...@twitter.com> wrote:
>
>> Thanks for digging up the kernel code Jie! Its fascinating.
>>
>>
>>> Will that cause potential problems if there are more than 1 process in
>>> 'R' because the kill is not atomic.
>>>
>>>
>> Also, I don't understand what you mean here? Could you elaborate?
>>
>>
>> Vinod
>>
>>
>>> - Jie
>>>
>>> On Fri, Sep 21, 2012 at 9:10 PM, Benjamin Hindman <be...@berkeley.edu>wrote:
>>>
>>>>
>>>>
>>>> > On Sept. 21, 2012, 7 p.m., Vinod Kone wrote:
>>>> > > lgtm. i've a feeling we need to also do a force kill. but we can do
>>>> this after we see how brian's test pans out.
>>>>
>>>> I tried just setting FREEZING to the cgroup freezer.state manually and
>>>> that didn't seem to work. Meanwhile, I sent a SIGKILL to the process in the
>>>> cgroup still in R, and that got everything to cleanup. So I expect that
>>>> you're correct, and we'll also need to send explicit SIGKILLs to those
>>>> processes still in R (in fact, probably just to all processes still in the
>>>> cgroup). Review incoming.
>>>>
>>>>
>>>> - Benjamin
>>>>
>>>>
>>>> -----------------------------------------------------------
>>>>
>>>> This is an automatically generated e-mail. To reply, visit:
>>>> https://reviews.apache.org/r/7203/#review11794
>>>>
>>>> -----------------------------------------------------------
>>>>
>>>>
>>>> On Sept. 21, 2012, 2:02 a.m., Benjamin Hindman wrote:
>>>> >
>>>> > -----------------------------------------------------------
>>>> > This is an automatically generated e-mail. To reply, visit:
>>>> > https://reviews.apache.org/r/7203/
>>>> > -----------------------------------------------------------
>>>> >
>>>> > (Updated Sept. 21, 2012, 2:02 a.m.)
>>>>
>>>> >
>>>> >
>>>> > Review request for mesos, Vinod Kone, Brian Wickman, and Jie Yu.
>>>> >
>>>> >
>>>> > Description
>>>> > -------
>>>>
>>>> >
>>>> > See summary and
>>>> http://www.kernel.org/doc/Documentation/cgroups/freezer-subsystem.txt:
>>>> >
>>>> > It's important to note that freezing can be incomplete. In that case
>>>> we return
>>>> > EBUSY. This means that some tasks in the cgroup are busy doing
>>>> something that
>>>> > prevents us from completely freezing the cgroup at this time. After
>>>> EBUSY,
>>>> > the cgroup will remain partially frozen -- reflected by freezer.state
>>>> reporting
>>>> > "FREEZING" when read. The state will remain "FREEZING" until one of
>>>> these
>>>> > things happens:
>>>> >
>>>> >       1) Userspace cancels the freezing operation by writing "THAWED"
>>>> to
>>>> >               the freezer.state file
>>>> >       2) Userspace retries the freezing operation by writing "FROZEN"
>>>> to
>>>> >               the freezer.state file (writing "FREEZING" is not legal
>>>> >               and returns EINVAL)
>>>> >       3) The tasks that blocked the cgroup from entering the "FROZEN"
>>>> >               state disappear from the cgroup's set of tasks.
>>>> >
>>>> >
>>>> > Diffs
>>>> > -----
>>>> >
>>>> >   src/linux/cgroups.cpp 4efd06e
>>>> >
>>>> > Diff: https://reviews.apache.org/r/7203/diff/
>>>> >
>>>> >
>>>> > Testing
>>>> > -------
>>>> >
>>>> >
>>>> > Thanks,
>>>> >
>>>> > Benjamin Hindman
>>>> >
>>>> >
>>>>
>>>>
>>>
>>
>

Re: Review Request: Updated cgroup freezer to retry after failed attempts (rather than just waiting indefinitely).

Posted by Jie Yu <yu...@gmail.com>.

>
> Also, I don't understand what you mean here? Could you elaborate?


If you have two running process to kill, you cannot send a SIGKILL to them
atomically. As a result, one proces will be killed first (likely), and the
other process is still making progress (though in a very short interval).
That may cause unpredictable errors.

- Jie

On Sat, Sep 22, 2012 at 2:03 AM, Vinod Kone <vi...@twitter.com> wrote:

> Thanks for digging up the kernel code Jie! Its fascinating.
>
>
>> Will that cause potential problems if there are more than 1 process in
>> 'R' because the kill is not atomic.
>>
>>
> Also, I don't understand what you mean here? Could you elaborate?
>
>
> Vinod
>
>
>> - Jie
>>
>> On Fri, Sep 21, 2012 at 9:10 PM, Benjamin Hindman <be...@berkeley.edu>wrote:
>>
>>>
>>>
>>> > On Sept. 21, 2012, 7 p.m., Vinod Kone wrote:
>>> > > lgtm. i've a feeling we need to also do a force kill. but we can do
>>> this after we see how brian's test pans out.
>>>
>>> I tried just setting FREEZING to the cgroup freezer.state manually and
>>> that didn't seem to work. Meanwhile, I sent a SIGKILL to the process in the
>>> cgroup still in R, and that got everything to cleanup. So I expect that
>>> you're correct, and we'll also need to send explicit SIGKILLs to those
>>> processes still in R (in fact, probably just to all processes still in the
>>> cgroup). Review incoming.
>>>
>>>
>>> - Benjamin
>>>
>>>
>>> -----------------------------------------------------------
>>>
>>> This is an automatically generated e-mail. To reply, visit:
>>> https://reviews.apache.org/r/7203/#review11794
>>>
>>> -----------------------------------------------------------
>>>
>>>
>>> On Sept. 21, 2012, 2:02 a.m., Benjamin Hindman wrote:
>>> >
>>> > -----------------------------------------------------------
>>> > This is an automatically generated e-mail. To reply, visit:
>>> > https://reviews.apache.org/r/7203/
>>> > -----------------------------------------------------------
>>> >
>>> > (Updated Sept. 21, 2012, 2:02 a.m.)
>>>
>>> >
>>> >
>>> > Review request for mesos, Vinod Kone, Brian Wickman, and Jie Yu.
>>> >
>>> >
>>> > Description
>>> > -------
>>>
>>> >
>>> > See summary and
>>> http://www.kernel.org/doc/Documentation/cgroups/freezer-subsystem.txt:
>>> >
>>> > It's important to note that freezing can be incomplete. In that case
>>> we return
>>> > EBUSY. This means that some tasks in the cgroup are busy doing
>>> something that
>>> > prevents us from completely freezing the cgroup at this time. After
>>> EBUSY,
>>> > the cgroup will remain partially frozen -- reflected by freezer.state
>>> reporting
>>> > "FREEZING" when read. The state will remain "FREEZING" until one of
>>> these
>>> > things happens:
>>> >
>>> >       1) Userspace cancels the freezing operation by writing "THAWED"
>>> to
>>> >               the freezer.state file
>>> >       2) Userspace retries the freezing operation by writing "FROZEN"
>>> to
>>> >               the freezer.state file (writing "FREEZING" is not legal
>>> >               and returns EINVAL)
>>> >       3) The tasks that blocked the cgroup from entering the "FROZEN"
>>> >               state disappear from the cgroup's set of tasks.
>>> >
>>> >
>>> > Diffs
>>> > -----
>>> >
>>> >   src/linux/cgroups.cpp 4efd06e
>>> >
>>> > Diff: https://reviews.apache.org/r/7203/diff/
>>> >
>>> >
>>> > Testing
>>> > -------
>>> >
>>> >
>>> > Thanks,
>>> >
>>> > Benjamin Hindman
>>> >
>>> >
>>>
>>>
>>
>

Re: Review Request: Updated cgroup freezer to retry after failed attempts (rather than just waiting indefinitely).

Posted by Vinod Kone <vi...@twitter.com>.

Thanks for digging up the kernel code Jie! Its fascinating.


> Will that cause potential problems if there are more than 1 process in 'R'
> because the kill is not atomic.
>
>
Also, I don't understand what you mean here? Could you elaborate?


Vinod


> - Jie
>
> On Fri, Sep 21, 2012 at 9:10 PM, Benjamin Hindman <be...@berkeley.edu>wrote:
>
>>
>>
>> > On Sept. 21, 2012, 7 p.m., Vinod Kone wrote:
>> > > lgtm. i've a feeling we need to also do a force kill. but we can do
>> this after we see how brian's test pans out.
>>
>> I tried just setting FREEZING to the cgroup freezer.state manually and
>> that didn't seem to work. Meanwhile, I sent a SIGKILL to the process in the
>> cgroup still in R, and that got everything to cleanup. So I expect that
>> you're correct, and we'll also need to send explicit SIGKILLs to those
>> processes still in R (in fact, probably just to all processes still in the
>> cgroup). Review incoming.
>>
>>
>> - Benjamin
>>
>>
>> -----------------------------------------------------------
>>
>> This is an automatically generated e-mail. To reply, visit:
>> https://reviews.apache.org/r/7203/#review11794
>>
>> -----------------------------------------------------------
>>
>>
>> On Sept. 21, 2012, 2:02 a.m., Benjamin Hindman wrote:
>> >
>> > -----------------------------------------------------------
>> > This is an automatically generated e-mail. To reply, visit:
>> > https://reviews.apache.org/r/7203/
>> > -----------------------------------------------------------
>> >
>> > (Updated Sept. 21, 2012, 2:02 a.m.)
>>
>> >
>> >
>> > Review request for mesos, Vinod Kone, Brian Wickman, and Jie Yu.
>> >
>> >
>> > Description
>> > -------
>>
>> >
>> > See summary and
>> http://www.kernel.org/doc/Documentation/cgroups/freezer-subsystem.txt:
>> >
>> > It's important to note that freezing can be incomplete. In that case we
>> return
>> > EBUSY. This means that some tasks in the cgroup are busy doing
>> something that
>> > prevents us from completely freezing the cgroup at this time. After
>> EBUSY,
>> > the cgroup will remain partially frozen -- reflected by freezer.state
>> reporting
>> > "FREEZING" when read. The state will remain "FREEZING" until one of
>> these
>> > things happens:
>> >
>> >       1) Userspace cancels the freezing operation by writing "THAWED" to
>> >               the freezer.state file
>> >       2) Userspace retries the freezing operation by writing "FROZEN" to
>> >               the freezer.state file (writing "FREEZING" is not legal
>> >               and returns EINVAL)
>> >       3) The tasks that blocked the cgroup from entering the "FROZEN"
>> >               state disappear from the cgroup's set of tasks.
>> >
>> >
>> > Diffs
>> > -----
>> >
>> >   src/linux/cgroups.cpp 4efd06e
>> >
>> > Diff: https://reviews.apache.org/r/7203/diff/
>> >
>> >
>> > Testing
>> > -------
>> >
>> >
>> > Thanks,
>> >
>> > Benjamin Hindman
>> >
>> >
>>
>>
>

Re: Review Request: Updated cgroup freezer to retry after failed attempts (rather than just waiting indefinitely).

Posted by Benjamin Hindman <be...@berkeley.edu>.

>
> From brian's example, I think that might be related to some race
> conditions (due to many fork() ? e.g. process is added to the cgroup while
> at the same time the cgroup is being frozen.)
>

Even after the process is no longer forking (but is still running), writing
FROZEN to freezer.state didn't seem to do what we wanted. I had to
explicitly send SIGKILL to that process in order for the cgroup to get
frozen. Any reason you can see for that behavior?





> Let me know if you have any findings.
>
> - Jie
>
>
> On Fri, Sep 21, 2012 at 10:03 PM, Jie Yu <yu...@gmail.com> wrote:
>
>> Here is the kernel flow when user echo FROZEN to freezer.state to freeze
>> a cgroup.
>>
>> Hopefully, this will be useful to you.
>>
>> (I am looking at the code of linux-2.6.39)
>>
>> 1) freezer_write(...) --> freezer_change_state(...)
>> --> try_to_freeze_cgroup(...)  (kernel/cgroup_freezer.c)
>>
>> 2) try_fo_freeze_cgroup(...) will iterate all the tasks in the given
>> cgroup:
>>
>>> ...
>>> cgroup_iter_start(cgroup, &it);
>>> while ((task = cgroup_iter_next(cgroup, &it))) {
>>>     if (!freeze_task(task, true))
>>>         continue;
>>>     if (frozen(task))
>>>         continue;
>>>     if (!freezing(task) && !freezer_should_skip(task))
>>>         num_cant_freeze_now++;
>>> }
>>> cgroup_iter_end(cgroup, &it);
>>
>> return num_cant_freeze_now ? -EBUSY : 0;
>>
>>
>>  So, for each task in the cgroup, freeze_task(...) will be invoked
>>
>> 3) freeze_task(p) (in kernel/freezer.c)
>> So basically, what this function will do is to set a 'FREEZE' flag in
>> process 'p' (set_freeze_flag(p)), and send a fake signal to process 'p' by
>> invoking fake_signal_wake_up(p) which will also try to wake the process 'p'
>> up (very important!)
>>
>> 4) fake_signal_wake_up(p) --> signal_wake_up(p, 0)
>>
>> 5) signal_wake_up(p, 0)  (kernel/signal.c)
>>
>> set_tsk_thread_flag(p, TIF_SIGPENDING);
>>> ...
>>> if (!wake_up_state(p, TASK_INTERRUPTIBLE))
>>>     kick_process(p);
>>
>>
>> First, the function set flag TIF_SIGPENDING in process p. Then, this
>> function will wake up process 'p' to make sure that p will try to handle
>> the fake signal when p is about to return to the user mode (Linux kernel
>> will check TIF_SIGPENDING everytime before it returns to user mode to check
>> any pending signals)
>>
>> 6) When p see the faked pending signal, it will call do_signal(...)
>>  (arch/x86/kernel/signal.c)
>> This function will call get_signal_to_deliver(...) (kernel/signal.c)
>>
>> 7) The first line of get_signal_to_deliver(...) will call
>> try_to_freeze(...), if the FREEZE flag is set, the process will enter a
>> function called refrigerator(...) (in kernel/freezer.c) which will mark the
>> process as FROZEN and mark self as TASK_UNINTERRUPTIBLE, and call
>> schedule() to release the cpu.
>>
>> - Jie
>>
>> On Fri, Sep 21, 2012 at 9:29 PM, Jie Yu <yu...@gmail.com> wrote:
>>
>>> Ben,
>>>
>>> The retry does not work? The process remains in 'R' after you echo
>>> "FROZEN" to freezer.state?
>>>
>>> So I expect that you're correct, and we'll also need to send explicit
>>>> SIGKILLs to those processes still in R (in fact, probably just to all
>>>> processes still in the cgroup).
>>>
>>>
>>> Will that cause potential problems if there are more than 1 process in
>>> 'R' because the kill is not atomic.
>>>
>>> - Jie
>>>
>>> On Fri, Sep 21, 2012 at 9:10 PM, Benjamin Hindman <be...@berkeley.edu>wrote:
>>>
>>>>
>>>>
>>>> > On Sept. 21, 2012, 7 p.m., Vinod Kone wrote:
>>>> > > lgtm. i've a feeling we need to also do a force kill. but we can do
>>>> this after we see how brian's test pans out.
>>>>
>>>> I tried just setting FREEZING to the cgroup freezer.state manually and
>>>> that didn't seem to work. Meanwhile, I sent a SIGKILL to the process in the
>>>> cgroup still in R, and that got everything to cleanup. So I expect that
>>>> you're correct, and we'll also need to send explicit SIGKILLs to those
>>>> processes still in R (in fact, probably just to all processes still in the
>>>> cgroup). Review incoming.
>>>>
>>>>
>>>> - Benjamin
>>>>
>>>>
>>>> -----------------------------------------------------------
>>>> This is an automatically generated e-mail. To reply, visit:
>>>> https://reviews.apache.org/r/7203/#review11794
>>>> -----------------------------------------------------------
>>>>
>>>>
>>>> On Sept. 21, 2012, 2:02 a.m., Benjamin Hindman wrote:
>>>> >
>>>> > -----------------------------------------------------------
>>>> > This is an automatically generated e-mail. To reply, visit:
>>>> > https://reviews.apache.org/r/7203/
>>>> > -----------------------------------------------------------
>>>> >
>>>> > (Updated Sept. 21, 2012, 2:02 a.m.)
>>>> >
>>>> >
>>>> > Review request for mesos, Vinod Kone, Brian Wickman, and Jie Yu.
>>>> >
>>>> >
>>>> > Description
>>>> > -------
>>>> >
>>>> > See summary and
>>>> http://www.kernel.org/doc/Documentation/cgroups/freezer-subsystem.txt:
>>>> >
>>>> > It's important to note that freezing can be incomplete. In that case
>>>> we return
>>>> > EBUSY. This means that some tasks in the cgroup are busy doing
>>>> something that
>>>> > prevents us from completely freezing the cgroup at this time. After
>>>> EBUSY,
>>>> > the cgroup will remain partially frozen -- reflected by freezer.state
>>>> reporting
>>>> > "FREEZING" when read. The state will remain "FREEZING" until one of
>>>> these
>>>> > things happens:
>>>> >
>>>> >       1) Userspace cancels the freezing operation by writing "THAWED"
>>>> to
>>>> >               the freezer.state file
>>>> >       2) Userspace retries the freezing operation by writing "FROZEN"
>>>> to
>>>> >               the freezer.state file (writing "FREEZING" is not legal
>>>> >               and returns EINVAL)
>>>> >       3) The tasks that blocked the cgroup from entering the "FROZEN"
>>>> >               state disappear from the cgroup's set of tasks.
>>>> >
>>>> >
>>>> > Diffs
>>>> > -----
>>>> >
>>>> >   src/linux/cgroups.cpp 4efd06e
>>>> >
>>>> > Diff: https://reviews.apache.org/r/7203/diff/
>>>> >
>>>> >
>>>> > Testing
>>>> > -------
>>>> >
>>>> >
>>>> > Thanks,
>>>> >
>>>> > Benjamin Hindman
>>>> >
>>>> >
>>>>
>>>>
>>>
>>
>

Re: Review Request: Updated cgroup freezer to retry after failed attempts (rather than just waiting indefinitely).

Posted by Jie Yu <yu...@gmail.com>.

So if a process is in 'R' (TASK_RUNNING) state, the fake signal should be
sent to the process, and should later be delivered before the process
returns to the user mode. As a result, the process should be able to enter
the FROZEN state within 1ms (the timer interrupt interval).

>From brian's example, I think that might be related to some race conditions
(due to many fork() ? e.g. process is added to the cgroup while at the same
time the cgroup is being frozen.)

Let me know if you have any findings.

- Jie

On Fri, Sep 21, 2012 at 10:03 PM, Jie Yu <yu...@gmail.com> wrote:

> Here is the kernel flow when user echo FROZEN to freezer.state to freeze a
> cgroup.
>
> Hopefully, this will be useful to you.
>
> (I am looking at the code of linux-2.6.39)
>
> 1) freezer_write(...) --> freezer_change_state(...)
> --> try_to_freeze_cgroup(...)  (kernel/cgroup_freezer.c)
>
> 2) try_fo_freeze_cgroup(...) will iterate all the tasks in the given
> cgroup:
>
>> ...
>> cgroup_iter_start(cgroup, &it);
>> while ((task = cgroup_iter_next(cgroup, &it))) {
>>     if (!freeze_task(task, true))
>>         continue;
>>     if (frozen(task))
>>         continue;
>>     if (!freezing(task) && !freezer_should_skip(task))
>>         num_cant_freeze_now++;
>> }
>> cgroup_iter_end(cgroup, &it);
>
> return num_cant_freeze_now ? -EBUSY : 0;
>
>
>  So, for each task in the cgroup, freeze_task(...) will be invoked
>
> 3) freeze_task(p) (in kernel/freezer.c)
> So basically, what this function will do is to set a 'FREEZE' flag in
> process 'p' (set_freeze_flag(p)), and send a fake signal to process 'p' by
> invoking fake_signal_wake_up(p) which will also try to wake the process 'p'
> up (very important!)
>
> 4) fake_signal_wake_up(p) --> signal_wake_up(p, 0)
>
> 5) signal_wake_up(p, 0)  (kernel/signal.c)
>
> set_tsk_thread_flag(p, TIF_SIGPENDING);
>> ...
>> if (!wake_up_state(p, TASK_INTERRUPTIBLE))
>>     kick_process(p);
>
>
> First, the function set flag TIF_SIGPENDING in process p. Then, this
> function will wake up process 'p' to make sure that p will try to handle
> the fake signal when p is about to return to the user mode (Linux kernel
> will check TIF_SIGPENDING everytime before it returns to user mode to check
> any pending signals)
>
> 6) When p see the faked pending signal, it will call do_signal(...)
>  (arch/x86/kernel/signal.c)
> This function will call get_signal_to_deliver(...) (kernel/signal.c)
>
> 7) The first line of get_signal_to_deliver(...) will call
> try_to_freeze(...), if the FREEZE flag is set, the process will enter a
> function called refrigerator(...) (in kernel/freezer.c) which will mark the
> process as FROZEN and mark self as TASK_UNINTERRUPTIBLE, and call
> schedule() to release the cpu.
>
> - Jie
>
> On Fri, Sep 21, 2012 at 9:29 PM, Jie Yu <yu...@gmail.com> wrote:
>
>> Ben,
>>
>> The retry does not work? The process remains in 'R' after you echo
>> "FROZEN" to freezer.state?
>>
>> So I expect that you're correct, and we'll also need to send explicit
>>> SIGKILLs to those processes still in R (in fact, probably just to all
>>> processes still in the cgroup).
>>
>>
>> Will that cause potential problems if there are more than 1 process in
>> 'R' because the kill is not atomic.
>>
>> - Jie
>>
>> On Fri, Sep 21, 2012 at 9:10 PM, Benjamin Hindman <be...@berkeley.edu>wrote:
>>
>>>
>>>
>>> > On Sept. 21, 2012, 7 p.m., Vinod Kone wrote:
>>> > > lgtm. i've a feeling we need to also do a force kill. but we can do
>>> this after we see how brian's test pans out.
>>>
>>> I tried just setting FREEZING to the cgroup freezer.state manually and
>>> that didn't seem to work. Meanwhile, I sent a SIGKILL to the process in the
>>> cgroup still in R, and that got everything to cleanup. So I expect that
>>> you're correct, and we'll also need to send explicit SIGKILLs to those
>>> processes still in R (in fact, probably just to all processes still in the
>>> cgroup). Review incoming.
>>>
>>>
>>> - Benjamin
>>>
>>>
>>> -----------------------------------------------------------
>>> This is an automatically generated e-mail. To reply, visit:
>>> https://reviews.apache.org/r/7203/#review11794
>>> -----------------------------------------------------------
>>>
>>>
>>> On Sept. 21, 2012, 2:02 a.m., Benjamin Hindman wrote:
>>> >
>>> > -----------------------------------------------------------
>>> > This is an automatically generated e-mail. To reply, visit:
>>> > https://reviews.apache.org/r/7203/
>>> > -----------------------------------------------------------
>>> >
>>> > (Updated Sept. 21, 2012, 2:02 a.m.)
>>> >
>>> >
>>> > Review request for mesos, Vinod Kone, Brian Wickman, and Jie Yu.
>>> >
>>> >
>>> > Description
>>> > -------
>>> >
>>> > See summary and
>>> http://www.kernel.org/doc/Documentation/cgroups/freezer-subsystem.txt:
>>> >
>>> > It's important to note that freezing can be incomplete. In that case
>>> we return
>>> > EBUSY. This means that some tasks in the cgroup are busy doing
>>> something that
>>> > prevents us from completely freezing the cgroup at this time. After
>>> EBUSY,
>>> > the cgroup will remain partially frozen -- reflected by freezer.state
>>> reporting
>>> > "FREEZING" when read. The state will remain "FREEZING" until one of
>>> these
>>> > things happens:
>>> >
>>> >       1) Userspace cancels the freezing operation by writing "THAWED"
>>> to
>>> >               the freezer.state file
>>> >       2) Userspace retries the freezing operation by writing "FROZEN"
>>> to
>>> >               the freezer.state file (writing "FREEZING" is not legal
>>> >               and returns EINVAL)
>>> >       3) The tasks that blocked the cgroup from entering the "FROZEN"
>>> >               state disappear from the cgroup's set of tasks.
>>> >
>>> >
>>> > Diffs
>>> > -----
>>> >
>>> >   src/linux/cgroups.cpp 4efd06e
>>> >
>>> > Diff: https://reviews.apache.org/r/7203/diff/
>>> >
>>> >
>>> > Testing
>>> > -------
>>> >
>>> >
>>> > Thanks,
>>> >
>>> > Benjamin Hindman
>>> >
>>> >
>>>
>>>
>>
>

Re: Review Request: Updated cgroup freezer to retry after failed attempts (rather than just waiting indefinitely).

Posted by Jie Yu <yu...@gmail.com>.

Here is the kernel flow when user echo FROZEN to freezer.state to freeze a
cgroup.

Hopefully, this will be useful to you.

(I am looking at the code of linux-2.6.39)

1) freezer_write(...) --> freezer_change_state(...)
--> try_to_freeze_cgroup(...)  (kernel/cgroup_freezer.c)

2) try_fo_freeze_cgroup(...) will iterate all the tasks in the given cgroup:

> ...
> cgroup_iter_start(cgroup, &it);
> while ((task = cgroup_iter_next(cgroup, &it))) {
>     if (!freeze_task(task, true))
>         continue;
>     if (frozen(task))
>         continue;
>     if (!freezing(task) && !freezer_should_skip(task))
>         num_cant_freeze_now++;
> }
> cgroup_iter_end(cgroup, &it);

return num_cant_freeze_now ? -EBUSY : 0;


 So, for each task in the cgroup, freeze_task(...) will be invoked

3) freeze_task(p) (in kernel/freezer.c)
So basically, what this function will do is to set a 'FREEZE' flag in
process 'p' (set_freeze_flag(p)), and send a fake signal to process 'p' by
invoking fake_signal_wake_up(p) which will also try to wake the process 'p'
up (very important!)

4) fake_signal_wake_up(p) --> signal_wake_up(p, 0)

5) signal_wake_up(p, 0)  (kernel/signal.c)

set_tsk_thread_flag(p, TIF_SIGPENDING);
> ...
> if (!wake_up_state(p, TASK_INTERRUPTIBLE))
>     kick_process(p);


First, the function set flag TIF_SIGPENDING in process p. Then, this
function will wake up process 'p' to make sure that p will try to handle
the fake signal when p is about to return to the user mode (Linux kernel
will check TIF_SIGPENDING everytime before it returns to user mode to check
any pending signals)

6) When p see the faked pending signal, it will call do_signal(...)
 (arch/x86/kernel/signal.c)
This function will call get_signal_to_deliver(...) (kernel/signal.c)

7) The first line of get_signal_to_deliver(...) will call
try_to_freeze(...), if the FREEZE flag is set, the process will enter a
function called refrigerator(...) (in kernel/freezer.c) which will mark the
process as FROZEN and mark self as TASK_UNINTERRUPTIBLE, and call
schedule() to release the cpu.

- Jie

On Fri, Sep 21, 2012 at 9:29 PM, Jie Yu <yu...@gmail.com> wrote:

> Ben,
>
> The retry does not work? The process remains in 'R' after you echo
> "FROZEN" to freezer.state?
>
> So I expect that you're correct, and we'll also need to send explicit
>> SIGKILLs to those processes still in R (in fact, probably just to all
>> processes still in the cgroup).
>
>
> Will that cause potential problems if there are more than 1 process in 'R'
> because the kill is not atomic.
>
> - Jie
>
> On Fri, Sep 21, 2012 at 9:10 PM, Benjamin Hindman <be...@berkeley.edu>wrote:
>
>>
>>
>> > On Sept. 21, 2012, 7 p.m., Vinod Kone wrote:
>> > > lgtm. i've a feeling we need to also do a force kill. but we can do
>> this after we see how brian's test pans out.
>>
>> I tried just setting FREEZING to the cgroup freezer.state manually and
>> that didn't seem to work. Meanwhile, I sent a SIGKILL to the process in the
>> cgroup still in R, and that got everything to cleanup. So I expect that
>> you're correct, and we'll also need to send explicit SIGKILLs to those
>> processes still in R (in fact, probably just to all processes still in the
>> cgroup). Review incoming.
>>
>>
>> - Benjamin
>>
>>
>> -----------------------------------------------------------
>> This is an automatically generated e-mail. To reply, visit:
>> https://reviews.apache.org/r/7203/#review11794
>> -----------------------------------------------------------
>>
>>
>> On Sept. 21, 2012, 2:02 a.m., Benjamin Hindman wrote:
>> >
>> > -----------------------------------------------------------
>> > This is an automatically generated e-mail. To reply, visit:
>> > https://reviews.apache.org/r/7203/
>> > -----------------------------------------------------------
>> >
>> > (Updated Sept. 21, 2012, 2:02 a.m.)
>> >
>> >
>> > Review request for mesos, Vinod Kone, Brian Wickman, and Jie Yu.
>> >
>> >
>> > Description
>> > -------
>> >
>> > See summary and
>> http://www.kernel.org/doc/Documentation/cgroups/freezer-subsystem.txt:
>> >
>> > It's important to note that freezing can be incomplete. In that case we
>> return
>> > EBUSY. This means that some tasks in the cgroup are busy doing
>> something that
>> > prevents us from completely freezing the cgroup at this time. After
>> EBUSY,
>> > the cgroup will remain partially frozen -- reflected by freezer.state
>> reporting
>> > "FREEZING" when read. The state will remain "FREEZING" until one of
>> these
>> > things happens:
>> >
>> >       1) Userspace cancels the freezing operation by writing "THAWED" to
>> >               the freezer.state file
>> >       2) Userspace retries the freezing operation by writing "FROZEN" to
>> >               the freezer.state file (writing "FREEZING" is not legal
>> >               and returns EINVAL)
>> >       3) The tasks that blocked the cgroup from entering the "FROZEN"
>> >               state disappear from the cgroup's set of tasks.
>> >
>> >
>> > Diffs
>> > -----
>> >
>> >   src/linux/cgroups.cpp 4efd06e
>> >
>> > Diff: https://reviews.apache.org/r/7203/diff/
>> >
>> >
>> > Testing
>> > -------
>> >
>> >
>> > Thanks,
>> >
>> > Benjamin Hindman
>> >
>> >
>>
>>
>

Re: Review Request: Updated cgroup freezer to retry after failed attempts (rather than just waiting indefinitely).

Posted by Jie Yu <yu...@gmail.com>.

Ben,

The retry does not work? The process remains in 'R' after you echo "FROZEN"
to freezer.state?

So I expect that you're correct, and we'll also need to send explicit
> SIGKILLs to those processes still in R (in fact, probably just to all
> processes still in the cgroup).


Will that cause potential problems if there are more than 1 process in 'R'
because the kill is not atomic.

- Jie

On Fri, Sep 21, 2012 at 9:10 PM, Benjamin Hindman <be...@berkeley.edu> wrote:

>
>
> > On Sept. 21, 2012, 7 p.m., Vinod Kone wrote:
> > > lgtm. i've a feeling we need to also do a force kill. but we can do
> this after we see how brian's test pans out.
>
> I tried just setting FREEZING to the cgroup freezer.state manually and
> that didn't seem to work. Meanwhile, I sent a SIGKILL to the process in the
> cgroup still in R, and that got everything to cleanup. So I expect that
> you're correct, and we'll also need to send explicit SIGKILLs to those
> processes still in R (in fact, probably just to all processes still in the
> cgroup). Review incoming.
>
>
> - Benjamin
>
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/7203/#review11794
> -----------------------------------------------------------
>
>
> On Sept. 21, 2012, 2:02 a.m., Benjamin Hindman wrote:
> >
> > -----------------------------------------------------------
> > This is an automatically generated e-mail. To reply, visit:
> > https://reviews.apache.org/r/7203/
> > -----------------------------------------------------------
> >
> > (Updated Sept. 21, 2012, 2:02 a.m.)
> >
> >
> > Review request for mesos, Vinod Kone, Brian Wickman, and Jie Yu.
> >
> >
> > Description
> > -------
> >
> > See summary and
> http://www.kernel.org/doc/Documentation/cgroups/freezer-subsystem.txt:
> >
> > It's important to note that freezing can be incomplete. In that case we
> return
> > EBUSY. This means that some tasks in the cgroup are busy doing something
> that
> > prevents us from completely freezing the cgroup at this time. After
> EBUSY,
> > the cgroup will remain partially frozen -- reflected by freezer.state
> reporting
> > "FREEZING" when read. The state will remain "FREEZING" until one of these
> > things happens:
> >
> >       1) Userspace cancels the freezing operation by writing "THAWED" to
> >               the freezer.state file
> >       2) Userspace retries the freezing operation by writing "FROZEN" to
> >               the freezer.state file (writing "FREEZING" is not legal
> >               and returns EINVAL)
> >       3) The tasks that blocked the cgroup from entering the "FROZEN"
> >               state disappear from the cgroup's set of tasks.
> >
> >
> > Diffs
> > -----
> >
> >   src/linux/cgroups.cpp 4efd06e
> >
> > Diff: https://reviews.apache.org/r/7203/diff/
> >
> >
> > Testing
> > -------
> >
> >
> > Thanks,
> >
> > Benjamin Hindman
> >
> >
>
>

Re: Review Request: Updated cgroup freezer to retry after failed attempts (rather than just waiting indefinitely).

Posted by Benjamin Hindman <be...@berkeley.edu>.


> On Sept. 21, 2012, 7 p.m., Vinod Kone wrote:
> > lgtm. i've a feeling we need to also do a force kill. but we can do this after we see how brian's test pans out.

I tried just setting FREEZING to the cgroup freezer.state manually and that didn't seem to work. Meanwhile, I sent a SIGKILL to the process in the cgroup still in R, and that got everything to cleanup. So I expect that you're correct, and we'll also need to send explicit SIGKILLs to those processes still in R (in fact, probably just to all processes still in the cgroup). Review incoming.


- Benjamin


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/7203/#review11794
-----------------------------------------------------------


On Sept. 21, 2012, 2:02 a.m., Benjamin Hindman wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/7203/
> -----------------------------------------------------------
> 
> (Updated Sept. 21, 2012, 2:02 a.m.)
> 
> 
> Review request for mesos, Vinod Kone, Brian Wickman, and Jie Yu.
> 
> 
> Description
> -------
> 
> See summary and http://www.kernel.org/doc/Documentation/cgroups/freezer-subsystem.txt:
> 
> It's important to note that freezing can be incomplete. In that case we return
> EBUSY. This means that some tasks in the cgroup are busy doing something that
> prevents us from completely freezing the cgroup at this time. After EBUSY,
> the cgroup will remain partially frozen -- reflected by freezer.state reporting
> "FREEZING" when read. The state will remain "FREEZING" until one of these
> things happens:
> 
> 	1) Userspace cancels the freezing operation by writing "THAWED" to
> 		the freezer.state file
> 	2) Userspace retries the freezing operation by writing "FROZEN" to
> 		the freezer.state file (writing "FREEZING" is not legal
> 		and returns EINVAL)
> 	3) The tasks that blocked the cgroup from entering the "FROZEN"
> 		state disappear from the cgroup's set of tasks.
> 
> 
> Diffs
> -----
> 
>   src/linux/cgroups.cpp 4efd06e 
> 
> Diff: https://reviews.apache.org/r/7203/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Benjamin Hindman
> 
>

Re: Review Request: Updated cgroup freezer to retry after failed attempts (rather than just waiting indefinitely).

Posted by Vinod Kone <vi...@gmail.com>.


> On Sept. 21, 2012, 7 p.m., Vinod Kone wrote:
> > lgtm. i've a feeling we need to also do a force kill. but we can do this after we see how brian's test pans out.
> 
> Benjamin Hindman wrote:
>     I tried just setting FREEZING to the cgroup freezer.state manually and that didn't seem to work. Meanwhile, I sent a SIGKILL to the process in the cgroup still in R, and that got everything to cleanup. So I expect that you're correct, and we'll also need to send explicit SIGKILLs to those processes still in R (in fact, probably just to all processes still in the cgroup). Review incoming.

you mean FROZEN right?


- Vinod


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/7203/#review11794
-----------------------------------------------------------


On Sept. 21, 2012, 2:02 a.m., Benjamin Hindman wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/7203/
> -----------------------------------------------------------
> 
> (Updated Sept. 21, 2012, 2:02 a.m.)
> 
> 
> Review request for mesos, Vinod Kone, Brian Wickman, and Jie Yu.
> 
> 
> Description
> -------
> 
> See summary and http://www.kernel.org/doc/Documentation/cgroups/freezer-subsystem.txt:
> 
> It's important to note that freezing can be incomplete. In that case we return
> EBUSY. This means that some tasks in the cgroup are busy doing something that
> prevents us from completely freezing the cgroup at this time. After EBUSY,
> the cgroup will remain partially frozen -- reflected by freezer.state reporting
> "FREEZING" when read. The state will remain "FREEZING" until one of these
> things happens:
> 
> 	1) Userspace cancels the freezing operation by writing "THAWED" to
> 		the freezer.state file
> 	2) Userspace retries the freezing operation by writing "FROZEN" to
> 		the freezer.state file (writing "FREEZING" is not legal
> 		and returns EINVAL)
> 	3) The tasks that blocked the cgroup from entering the "FROZEN"
> 		state disappear from the cgroup's set of tasks.
> 
> 
> Diffs
> -----
> 
>   src/linux/cgroups.cpp 4efd06e 
> 
> Diff: https://reviews.apache.org/r/7203/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Benjamin Hindman
> 
>

Re: Review Request: Updated cgroup freezer to retry after failed attempts (rather than just waiting indefinitely).

Posted by Vinod Kone <vi...@gmail.com>.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/7203/#review11794
-----------------------------------------------------------

Ship it!


lgtm. i've a feeling we need to also do a force kill. but we can do this after we see how brian's test pans out.

- Vinod Kone


On Sept. 21, 2012, 2:02 a.m., Benjamin Hindman wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/7203/
> -----------------------------------------------------------
> 
> (Updated Sept. 21, 2012, 2:02 a.m.)
> 
> 
> Review request for mesos, Vinod Kone, Brian Wickman, and Jie Yu.
> 
> 
> Description
> -------
> 
> See summary and http://www.kernel.org/doc/Documentation/cgroups/freezer-subsystem.txt:
> 
> It's important to note that freezing can be incomplete. In that case we return
> EBUSY. This means that some tasks in the cgroup are busy doing something that
> prevents us from completely freezing the cgroup at this time. After EBUSY,
> the cgroup will remain partially frozen -- reflected by freezer.state reporting
> "FREEZING" when read. The state will remain "FREEZING" until one of these
> things happens:
> 
> 	1) Userspace cancels the freezing operation by writing "THAWED" to
> 		the freezer.state file
> 	2) Userspace retries the freezing operation by writing "FROZEN" to
> 		the freezer.state file (writing "FREEZING" is not legal
> 		and returns EINVAL)
> 	3) The tasks that blocked the cgroup from entering the "FROZEN"
> 		state disappear from the cgroup's set of tasks.
> 
> 
> Diffs
> -----
> 
>   src/linux/cgroups.cpp 4efd06e 
> 
> Diff: https://reviews.apache.org/r/7203/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Benjamin Hindman
> 
>

Re: Review Request: Updated cgroup freezer to retry after failed attempts (rather than just waiting indefinitely).

Posted by Brian Wickman <wi...@gmail.com>.


> On Sept. 21, 2012, 3:23 a.m., Jie Yu wrote:
> > Ben, I am just curious whether you have observed a case in which a retry is useful?
> > 
> > From my experience, if a cgroup stucks at FREEZING state (e.g. some process is in T or Z state), writing FROZEN to retry never brings the state to FROZEN.
> > 
> > If you do see a case that a retry is useful, let me know.
> 
> Benjamin Hindman wrote:
>     We've actually seen cases in which a process in the cgroup is still in R! It's possible that at the time the kernel could not freeze that process for whatever reason, and so retrying seems to be the only option (although, I hope that it's not the case that the process can never be frozen, which would seem like a pretty serious design issue).
> 
> Jie Yu wrote:
>     > We've actually seen cases in which a process in the cgroup is still in R!
>     
>     Maybe this is a kernel bug (race condition?) ;) from my understanding of the kernel code, this seems to be impossible...
>     
>     You can take a look at "kernel/cgroup_freezer.c"
>     
>     Probably you can start with the function "freezer_write(...)"
> 
> Benjamin Hindman wrote:
>     Hmm, so is the documentation out of date? The documentation makes me think that partially frozen cgroups are indeed possible and expected, and the user might need to try and freeze a cgroup multiple times (I attached the relevant snippet from the documentation in the review summary above).
> 
> Jie Yu wrote:
>     No, I am not saying that the doc is out-of-date. What I am trying to understand is why a process in "R" state cannot be frozen.
>     
>     I will take a look at the kernel code that you use, and let you know the possible explanation.
> 
> Benjamin Hindman wrote:
>     Sounds great, thanks! In the mean time, I'll commit this change and see if it fixes the issue.

I ran 50 tasks, each that forked off 20 processes (where each process technically forked ~4 subprocesses.) 

The memory limit for the tasks was about 10% too low for start-up, but just about right for the steady-state, which resulted in non-deterministic OOMing of tasks.  Eventually all of them scheduled and were running fine, but first taking about ~200 tasks getting OOMed first.  So we had a big sample set of OOM kills.  Of the ~200, 3 got stuck into this state.  The freezer froze _right_ in the middle of those 80 forks, and the cgroup was left in FREEZING state with only one process in R and the rest in D/Ds.


- Brian


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/7203/#review11766
-----------------------------------------------------------


On Sept. 21, 2012, 2:02 a.m., Benjamin Hindman wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/7203/
> -----------------------------------------------------------
> 
> (Updated Sept. 21, 2012, 2:02 a.m.)
> 
> 
> Review request for mesos, Vinod Kone, Brian Wickman, and Jie Yu.
> 
> 
> Description
> -------
> 
> See summary and http://www.kernel.org/doc/Documentation/cgroups/freezer-subsystem.txt:
> 
> It's important to note that freezing can be incomplete. In that case we return
> EBUSY. This means that some tasks in the cgroup are busy doing something that
> prevents us from completely freezing the cgroup at this time. After EBUSY,
> the cgroup will remain partially frozen -- reflected by freezer.state reporting
> "FREEZING" when read. The state will remain "FREEZING" until one of these
> things happens:
> 
> 	1) Userspace cancels the freezing operation by writing "THAWED" to
> 		the freezer.state file
> 	2) Userspace retries the freezing operation by writing "FROZEN" to
> 		the freezer.state file (writing "FREEZING" is not legal
> 		and returns EINVAL)
> 	3) The tasks that blocked the cgroup from entering the "FROZEN"
> 		state disappear from the cgroup's set of tasks.
> 
> 
> Diffs
> -----
> 
>   src/linux/cgroups.cpp 4efd06e 
> 
> Diff: https://reviews.apache.org/r/7203/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Benjamin Hindman
> 
>

Re: Review Request: Updated cgroup freezer to retry after failed attempts (rather than just waiting indefinitely).

Posted by Jie Yu <yu...@gmail.com>.


> On Sept. 21, 2012, 3:23 a.m., Jie Yu wrote:
> > Ben, I am just curious whether you have observed a case in which a retry is useful?
> > 
> > From my experience, if a cgroup stucks at FREEZING state (e.g. some process is in T or Z state), writing FROZEN to retry never brings the state to FROZEN.
> > 
> > If you do see a case that a retry is useful, let me know.
> 
> Benjamin Hindman wrote:
>     We've actually seen cases in which a process in the cgroup is still in R! It's possible that at the time the kernel could not freeze that process for whatever reason, and so retrying seems to be the only option (although, I hope that it's not the case that the process can never be frozen, which would seem like a pretty serious design issue).

> We've actually seen cases in which a process in the cgroup is still in R!

Maybe this is a kernel bug (race condition?) ;) from my understanding of the kernel code, this seems to be impossible...

You can take a look at "kernel/cgroup_freezer.c"

Probably you can start with the function "freezer_write(...)"


- Jie


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/7203/#review11766
-----------------------------------------------------------


On Sept. 21, 2012, 2:02 a.m., Benjamin Hindman wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/7203/
> -----------------------------------------------------------
> 
> (Updated Sept. 21, 2012, 2:02 a.m.)
> 
> 
> Review request for mesos, Vinod Kone, Brian Wickman, and Jie Yu.
> 
> 
> Description
> -------
> 
> See summary and http://www.kernel.org/doc/Documentation/cgroups/freezer-subsystem.txt:
> 
> It's important to note that freezing can be incomplete. In that case we return
> EBUSY. This means that some tasks in the cgroup are busy doing something that
> prevents us from completely freezing the cgroup at this time. After EBUSY,
> the cgroup will remain partially frozen -- reflected by freezer.state reporting
> "FREEZING" when read. The state will remain "FREEZING" until one of these
> things happens:
> 
> 	1) Userspace cancels the freezing operation by writing "THAWED" to
> 		the freezer.state file
> 	2) Userspace retries the freezing operation by writing "FROZEN" to
> 		the freezer.state file (writing "FREEZING" is not legal
> 		and returns EINVAL)
> 	3) The tasks that blocked the cgroup from entering the "FROZEN"
> 		state disappear from the cgroup's set of tasks.
> 
> 
> Diffs
> -----
> 
>   src/linux/cgroups.cpp 4efd06e 
> 
> Diff: https://reviews.apache.org/r/7203/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Benjamin Hindman
> 
>

Re: Review Request: Updated cgroup freezer to retry after failed attempts (rather than just waiting indefinitely).

Posted by Jie Yu <yu...@gmail.com>.


> On Sept. 21, 2012, 3:23 a.m., Jie Yu wrote:
> > Ben, I am just curious whether you have observed a case in which a retry is useful?
> > 
> > From my experience, if a cgroup stucks at FREEZING state (e.g. some process is in T or Z state), writing FROZEN to retry never brings the state to FROZEN.
> > 
> > If you do see a case that a retry is useful, let me know.
> 
> Benjamin Hindman wrote:
>     We've actually seen cases in which a process in the cgroup is still in R! It's possible that at the time the kernel could not freeze that process for whatever reason, and so retrying seems to be the only option (although, I hope that it's not the case that the process can never be frozen, which would seem like a pretty serious design issue).
> 
> Jie Yu wrote:
>     > We've actually seen cases in which a process in the cgroup is still in R!
>     
>     Maybe this is a kernel bug (race condition?) ;) from my understanding of the kernel code, this seems to be impossible...
>     
>     You can take a look at "kernel/cgroup_freezer.c"
>     
>     Probably you can start with the function "freezer_write(...)"
> 
> Benjamin Hindman wrote:
>     Hmm, so is the documentation out of date? The documentation makes me think that partially frozen cgroups are indeed possible and expected, and the user might need to try and freeze a cgroup multiple times (I attached the relevant snippet from the documentation in the review summary above).

No, I am not saying that the doc is out-of-date. What I am trying to understand is why a process in "R" state cannot be frozen.

I will take a look at the kernel code that you use, and let you know the possible explanation.


- Jie


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/7203/#review11766
-----------------------------------------------------------


On Sept. 21, 2012, 2:02 a.m., Benjamin Hindman wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/7203/
> -----------------------------------------------------------
> 
> (Updated Sept. 21, 2012, 2:02 a.m.)
> 
> 
> Review request for mesos, Vinod Kone, Brian Wickman, and Jie Yu.
> 
> 
> Description
> -------
> 
> See summary and http://www.kernel.org/doc/Documentation/cgroups/freezer-subsystem.txt:
> 
> It's important to note that freezing can be incomplete. In that case we return
> EBUSY. This means that some tasks in the cgroup are busy doing something that
> prevents us from completely freezing the cgroup at this time. After EBUSY,
> the cgroup will remain partially frozen -- reflected by freezer.state reporting
> "FREEZING" when read. The state will remain "FREEZING" until one of these
> things happens:
> 
> 	1) Userspace cancels the freezing operation by writing "THAWED" to
> 		the freezer.state file
> 	2) Userspace retries the freezing operation by writing "FROZEN" to
> 		the freezer.state file (writing "FREEZING" is not legal
> 		and returns EINVAL)
> 	3) The tasks that blocked the cgroup from entering the "FROZEN"
> 		state disappear from the cgroup's set of tasks.
> 
> 
> Diffs
> -----
> 
>   src/linux/cgroups.cpp 4efd06e 
> 
> Diff: https://reviews.apache.org/r/7203/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Benjamin Hindman
> 
>

Re: Review Request: Updated cgroup freezer to retry after failed attempts (rather than just waiting indefinitely).

Posted by Benjamin Hindman <be...@berkeley.edu>.


> On Sept. 21, 2012, 3:23 a.m., Jie Yu wrote:
> > Ben, I am just curious whether you have observed a case in which a retry is useful?
> > 
> > From my experience, if a cgroup stucks at FREEZING state (e.g. some process is in T or Z state), writing FROZEN to retry never brings the state to FROZEN.
> > 
> > If you do see a case that a retry is useful, let me know.
> 
> Benjamin Hindman wrote:
>     We've actually seen cases in which a process in the cgroup is still in R! It's possible that at the time the kernel could not freeze that process for whatever reason, and so retrying seems to be the only option (although, I hope that it's not the case that the process can never be frozen, which would seem like a pretty serious design issue).
> 
> Jie Yu wrote:
>     > We've actually seen cases in which a process in the cgroup is still in R!
>     
>     Maybe this is a kernel bug (race condition?) ;) from my understanding of the kernel code, this seems to be impossible...
>     
>     You can take a look at "kernel/cgroup_freezer.c"
>     
>     Probably you can start with the function "freezer_write(...)"

Hmm, so is the documentation out of date? The documentation makes me think that partially frozen cgroups are indeed possible and expected, and the user might need to try and freeze a cgroup multiple times (I attached the relevant snippet from the documentation in the review summary above).


- Benjamin


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/7203/#review11766
-----------------------------------------------------------


On Sept. 21, 2012, 2:02 a.m., Benjamin Hindman wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/7203/
> -----------------------------------------------------------
> 
> (Updated Sept. 21, 2012, 2:02 a.m.)
> 
> 
> Review request for mesos, Vinod Kone, Brian Wickman, and Jie Yu.
> 
> 
> Description
> -------
> 
> See summary and http://www.kernel.org/doc/Documentation/cgroups/freezer-subsystem.txt:
> 
> It's important to note that freezing can be incomplete. In that case we return
> EBUSY. This means that some tasks in the cgroup are busy doing something that
> prevents us from completely freezing the cgroup at this time. After EBUSY,
> the cgroup will remain partially frozen -- reflected by freezer.state reporting
> "FREEZING" when read. The state will remain "FREEZING" until one of these
> things happens:
> 
> 	1) Userspace cancels the freezing operation by writing "THAWED" to
> 		the freezer.state file
> 	2) Userspace retries the freezing operation by writing "FROZEN" to
> 		the freezer.state file (writing "FREEZING" is not legal
> 		and returns EINVAL)
> 	3) The tasks that blocked the cgroup from entering the "FROZEN"
> 		state disappear from the cgroup's set of tasks.
> 
> 
> Diffs
> -----
> 
>   src/linux/cgroups.cpp 4efd06e 
> 
> Diff: https://reviews.apache.org/r/7203/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Benjamin Hindman
> 
>

Re: Review Request: Updated cgroup freezer to retry after failed attempts (rather than just waiting indefinitely).

Posted by Benjamin Hindman <be...@berkeley.edu>.


> On Sept. 21, 2012, 3:23 a.m., Jie Yu wrote:
> > Ben, I am just curious whether you have observed a case in which a retry is useful?
> > 
> > From my experience, if a cgroup stucks at FREEZING state (e.g. some process is in T or Z state), writing FROZEN to retry never brings the state to FROZEN.
> > 
> > If you do see a case that a retry is useful, let me know.
> 
> Benjamin Hindman wrote:
>     We've actually seen cases in which a process in the cgroup is still in R! It's possible that at the time the kernel could not freeze that process for whatever reason, and so retrying seems to be the only option (although, I hope that it's not the case that the process can never be frozen, which would seem like a pretty serious design issue).
> 
> Jie Yu wrote:
>     > We've actually seen cases in which a process in the cgroup is still in R!
>     
>     Maybe this is a kernel bug (race condition?) ;) from my understanding of the kernel code, this seems to be impossible...
>     
>     You can take a look at "kernel/cgroup_freezer.c"
>     
>     Probably you can start with the function "freezer_write(...)"
> 
> Benjamin Hindman wrote:
>     Hmm, so is the documentation out of date? The documentation makes me think that partially frozen cgroups are indeed possible and expected, and the user might need to try and freeze a cgroup multiple times (I attached the relevant snippet from the documentation in the review summary above).
> 
> Jie Yu wrote:
>     No, I am not saying that the doc is out-of-date. What I am trying to understand is why a process in "R" state cannot be frozen.
>     
>     I will take a look at the kernel code that you use, and let you know the possible explanation.

Sounds great, thanks! In the mean time, I'll commit this change and see if it fixes the issue.


- Benjamin


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/7203/#review11766
-----------------------------------------------------------


On Sept. 21, 2012, 2:02 a.m., Benjamin Hindman wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/7203/
> -----------------------------------------------------------
> 
> (Updated Sept. 21, 2012, 2:02 a.m.)
> 
> 
> Review request for mesos, Vinod Kone, Brian Wickman, and Jie Yu.
> 
> 
> Description
> -------
> 
> See summary and http://www.kernel.org/doc/Documentation/cgroups/freezer-subsystem.txt:
> 
> It's important to note that freezing can be incomplete. In that case we return
> EBUSY. This means that some tasks in the cgroup are busy doing something that
> prevents us from completely freezing the cgroup at this time. After EBUSY,
> the cgroup will remain partially frozen -- reflected by freezer.state reporting
> "FREEZING" when read. The state will remain "FREEZING" until one of these
> things happens:
> 
> 	1) Userspace cancels the freezing operation by writing "THAWED" to
> 		the freezer.state file
> 	2) Userspace retries the freezing operation by writing "FROZEN" to
> 		the freezer.state file (writing "FREEZING" is not legal
> 		and returns EINVAL)
> 	3) The tasks that blocked the cgroup from entering the "FROZEN"
> 		state disappear from the cgroup's set of tasks.
> 
> 
> Diffs
> -----
> 
>   src/linux/cgroups.cpp 4efd06e 
> 
> Diff: https://reviews.apache.org/r/7203/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Benjamin Hindman
> 
>

Re: Review Request: Updated cgroup freezer to retry after failed attempts (rather than just waiting indefinitely).

Posted by Benjamin Hindman <be...@berkeley.edu>.


> On Sept. 21, 2012, 3:23 a.m., Jie Yu wrote:
> > Ben, I am just curious whether you have observed a case in which a retry is useful?
> > 
> > From my experience, if a cgroup stucks at FREEZING state (e.g. some process is in T or Z state), writing FROZEN to retry never brings the state to FROZEN.
> > 
> > If you do see a case that a retry is useful, let me know.

We've actually seen cases in which a process in the cgroup is still in R! It's possible that at the time the kernel could not freeze that process for whatever reason, and so retrying seems to be the only option (although, I hope that it's not the case that the process can never be frozen, which would seem like a pretty serious design issue).


- Benjamin


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/7203/#review11766
-----------------------------------------------------------


On Sept. 21, 2012, 2:02 a.m., Benjamin Hindman wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/7203/
> -----------------------------------------------------------
> 
> (Updated Sept. 21, 2012, 2:02 a.m.)
> 
> 
> Review request for mesos, Vinod Kone, Brian Wickman, and Jie Yu.
> 
> 
> Description
> -------
> 
> See summary and http://www.kernel.org/doc/Documentation/cgroups/freezer-subsystem.txt:
> 
> It's important to note that freezing can be incomplete. In that case we return
> EBUSY. This means that some tasks in the cgroup are busy doing something that
> prevents us from completely freezing the cgroup at this time. After EBUSY,
> the cgroup will remain partially frozen -- reflected by freezer.state reporting
> "FREEZING" when read. The state will remain "FREEZING" until one of these
> things happens:
> 
> 	1) Userspace cancels the freezing operation by writing "THAWED" to
> 		the freezer.state file
> 	2) Userspace retries the freezing operation by writing "FROZEN" to
> 		the freezer.state file (writing "FREEZING" is not legal
> 		and returns EINVAL)
> 	3) The tasks that blocked the cgroup from entering the "FROZEN"
> 		state disappear from the cgroup's set of tasks.
> 
> 
> Diffs
> -----
> 
>   src/linux/cgroups.cpp 4efd06e 
> 
> Diff: https://reviews.apache.org/r/7203/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Benjamin Hindman
> 
>

Re: Review Request: Updated cgroup freezer to retry after failed attempts (rather than just waiting indefinitely).

Posted by Jie Yu <yu...@gmail.com>.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/7203/#review11766
-----------------------------------------------------------

Ship it!


Ben, I am just curious whether you have observed a case in which a retry is useful?

>From my experience, if a cgroup stucks at FREEZING state (e.g. some process is in T or Z state), writing FROZEN to retry never brings the state to FROZEN.

If you do see a case that a retry is useful, let me know.

- Jie Yu


On Sept. 21, 2012, 2:02 a.m., Benjamin Hindman wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/7203/
> -----------------------------------------------------------
> 
> (Updated Sept. 21, 2012, 2:02 a.m.)
> 
> 
> Review request for mesos, Vinod Kone, Brian Wickman, and Jie Yu.
> 
> 
> Description
> -------
> 
> See summary and http://www.kernel.org/doc/Documentation/cgroups/freezer-subsystem.txt:
> 
> It's important to note that freezing can be incomplete. In that case we return
> EBUSY. This means that some tasks in the cgroup are busy doing something that
> prevents us from completely freezing the cgroup at this time. After EBUSY,
> the cgroup will remain partially frozen -- reflected by freezer.state reporting
> "FREEZING" when read. The state will remain "FREEZING" until one of these
> things happens:
> 
> 	1) Userspace cancels the freezing operation by writing "THAWED" to
> 		the freezer.state file
> 	2) Userspace retries the freezing operation by writing "FROZEN" to
> 		the freezer.state file (writing "FREEZING" is not legal
> 		and returns EINVAL)
> 	3) The tasks that blocked the cgroup from entering the "FROZEN"
> 		state disappear from the cgroup's set of tasks.
> 
> 
> Diffs
> -----
> 
>   src/linux/cgroups.cpp 4efd06e 
> 
> Diff: https://reviews.apache.org/r/7203/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Benjamin Hindman
> 
>

Re: Review Request: Updated cgroup freezer to retry after failed attempts (rather than just waiting indefinitely).

Posted by Benjamin Hindman <be...@berkeley.edu>.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/7203/
-----------------------------------------------------------

(Updated Sept. 21, 2012, 2:02 a.m.)


Review request for mesos, Vinod Kone, Brian Wickman, and Jie Yu.


Description (updated)
-------

See summary and http://www.kernel.org/doc/Documentation/cgroups/freezer-subsystem.txt:

It's important to note that freezing can be incomplete. In that case we return
EBUSY. This means that some tasks in the cgroup are busy doing something that
prevents us from completely freezing the cgroup at this time. After EBUSY,
the cgroup will remain partially frozen -- reflected by freezer.state reporting
"FREEZING" when read. The state will remain "FREEZING" until one of these
things happens:

	1) Userspace cancels the freezing operation by writing "THAWED" to
		the freezer.state file
	2) Userspace retries the freezing operation by writing "FROZEN" to
		the freezer.state file (writing "FREEZING" is not legal
		and returns EINVAL)
	3) The tasks that blocked the cgroup from entering the "FROZEN"
		state disappear from the cgroup's set of tasks.


Diffs
-----

  src/linux/cgroups.cpp 4efd06e 

Diff: https://reviews.apache.org/r/7203/diff/


Testing
-------


Thanks,

Benjamin Hindman