You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mesos.apache.org by Benjamin Hindman <be...@berkeley.edu> on 2012/10/03 18:31:24 UTC

Re: Review Request: Updated cgroup freezer to retry after failed attempts (rather than just waiting indefinitely).

>
> From brian's example, I think that might be related to some race
> conditions (due to many fork() ? e.g. process is added to the cgroup while
> at the same time the cgroup is being frozen.)
>

Even after the process is no longer forking (but is still running), writing
FROZEN to freezer.state didn't seem to do what we wanted. I had to
explicitly send SIGKILL to that process in order for the cgroup to get
frozen. Any reason you can see for that behavior?





> Let me know if you have any findings.
>
> - Jie
>
>
> On Fri, Sep 21, 2012 at 10:03 PM, Jie Yu <yu...@gmail.com> wrote:
>
>> Here is the kernel flow when user echo FROZEN to freezer.state to freeze
>> a cgroup.
>>
>> Hopefully, this will be useful to you.
>>
>> (I am looking at the code of linux-2.6.39)
>>
>> 1) freezer_write(...) --> freezer_change_state(...)
>> --> try_to_freeze_cgroup(...)  (kernel/cgroup_freezer.c)
>>
>> 2) try_fo_freeze_cgroup(...) will iterate all the tasks in the given
>> cgroup:
>>
>>> ...
>>> cgroup_iter_start(cgroup, &it);
>>> while ((task = cgroup_iter_next(cgroup, &it))) {
>>>     if (!freeze_task(task, true))
>>>         continue;
>>>     if (frozen(task))
>>>         continue;
>>>     if (!freezing(task) && !freezer_should_skip(task))
>>>         num_cant_freeze_now++;
>>> }
>>> cgroup_iter_end(cgroup, &it);
>>
>> return num_cant_freeze_now ? -EBUSY : 0;
>>
>>
>>  So, for each task in the cgroup, freeze_task(...) will be invoked
>>
>> 3) freeze_task(p) (in kernel/freezer.c)
>> So basically, what this function will do is to set a 'FREEZE' flag in
>> process 'p' (set_freeze_flag(p)), and send a fake signal to process 'p' by
>> invoking fake_signal_wake_up(p) which will also try to wake the process 'p'
>> up (very important!)
>>
>> 4) fake_signal_wake_up(p) --> signal_wake_up(p, 0)
>>
>> 5) signal_wake_up(p, 0)  (kernel/signal.c)
>>
>> set_tsk_thread_flag(p, TIF_SIGPENDING);
>>> ...
>>> if (!wake_up_state(p, TASK_INTERRUPTIBLE))
>>>     kick_process(p);
>>
>>
>> First, the function set flag TIF_SIGPENDING in process p. Then, this
>> function will wake up process 'p' to make sure that p will try to handle
>> the fake signal when p is about to return to the user mode (Linux kernel
>> will check TIF_SIGPENDING everytime before it returns to user mode to check
>> any pending signals)
>>
>> 6) When p see the faked pending signal, it will call do_signal(...)
>>  (arch/x86/kernel/signal.c)
>> This function will call get_signal_to_deliver(...) (kernel/signal.c)
>>
>> 7) The first line of get_signal_to_deliver(...) will call
>> try_to_freeze(...), if the FREEZE flag is set, the process will enter a
>> function called refrigerator(...) (in kernel/freezer.c) which will mark the
>> process as FROZEN and mark self as TASK_UNINTERRUPTIBLE, and call
>> schedule() to release the cpu.
>>
>> - Jie
>>
>> On Fri, Sep 21, 2012 at 9:29 PM, Jie Yu <yu...@gmail.com> wrote:
>>
>>> Ben,
>>>
>>> The retry does not work? The process remains in 'R' after you echo
>>> "FROZEN" to freezer.state?
>>>
>>> So I expect that you're correct, and we'll also need to send explicit
>>>> SIGKILLs to those processes still in R (in fact, probably just to all
>>>> processes still in the cgroup).
>>>
>>>
>>> Will that cause potential problems if there are more than 1 process in
>>> 'R' because the kill is not atomic.
>>>
>>> - Jie
>>>
>>> On Fri, Sep 21, 2012 at 9:10 PM, Benjamin Hindman <be...@berkeley.edu>wrote:
>>>
>>>>
>>>>
>>>> > On Sept. 21, 2012, 7 p.m., Vinod Kone wrote:
>>>> > > lgtm. i've a feeling we need to also do a force kill. but we can do
>>>> this after we see how brian's test pans out.
>>>>
>>>> I tried just setting FREEZING to the cgroup freezer.state manually and
>>>> that didn't seem to work. Meanwhile, I sent a SIGKILL to the process in the
>>>> cgroup still in R, and that got everything to cleanup. So I expect that
>>>> you're correct, and we'll also need to send explicit SIGKILLs to those
>>>> processes still in R (in fact, probably just to all processes still in the
>>>> cgroup). Review incoming.
>>>>
>>>>
>>>> - Benjamin
>>>>
>>>>
>>>> -----------------------------------------------------------
>>>> This is an automatically generated e-mail. To reply, visit:
>>>> https://reviews.apache.org/r/7203/#review11794
>>>> -----------------------------------------------------------
>>>>
>>>>
>>>> On Sept. 21, 2012, 2:02 a.m., Benjamin Hindman wrote:
>>>> >
>>>> > -----------------------------------------------------------
>>>> > This is an automatically generated e-mail. To reply, visit:
>>>> > https://reviews.apache.org/r/7203/
>>>> > -----------------------------------------------------------
>>>> >
>>>> > (Updated Sept. 21, 2012, 2:02 a.m.)
>>>> >
>>>> >
>>>> > Review request for mesos, Vinod Kone, Brian Wickman, and Jie Yu.
>>>> >
>>>> >
>>>> > Description
>>>> > -------
>>>> >
>>>> > See summary and
>>>> http://www.kernel.org/doc/Documentation/cgroups/freezer-subsystem.txt:
>>>> >
>>>> > It's important to note that freezing can be incomplete. In that case
>>>> we return
>>>> > EBUSY. This means that some tasks in the cgroup are busy doing
>>>> something that
>>>> > prevents us from completely freezing the cgroup at this time. After
>>>> EBUSY,
>>>> > the cgroup will remain partially frozen -- reflected by freezer.state
>>>> reporting
>>>> > "FREEZING" when read. The state will remain "FREEZING" until one of
>>>> these
>>>> > things happens:
>>>> >
>>>> >       1) Userspace cancels the freezing operation by writing "THAWED"
>>>> to
>>>> >               the freezer.state file
>>>> >       2) Userspace retries the freezing operation by writing "FROZEN"
>>>> to
>>>> >               the freezer.state file (writing "FREEZING" is not legal
>>>> >               and returns EINVAL)
>>>> >       3) The tasks that blocked the cgroup from entering the "FROZEN"
>>>> >               state disappear from the cgroup's set of tasks.
>>>> >
>>>> >
>>>> > Diffs
>>>> > -----
>>>> >
>>>> >   src/linux/cgroups.cpp 4efd06e
>>>> >
>>>> > Diff: https://reviews.apache.org/r/7203/diff/
>>>> >
>>>> >
>>>> > Testing
>>>> > -------
>>>> >
>>>> >
>>>> > Thanks,
>>>> >
>>>> > Benjamin Hindman
>>>> >
>>>> >
>>>>
>>>>
>>>
>>
>