You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@mesos.apache.org by Vinod Kone <vi...@gmail.com> on 2014/06/04 02:44:49 UTC

Re: Dealing with "run away" task processes after executor terminates

+Jie,Ian

Not sure if you've talked to Ian Downes and/or Jie Yu regarding this but
they were discussing the same issue (offline) today.

Just to be sure, if you are using cgroups, the mesos slave will cleanup the
container (and all its processes) when an executor exits. Now there is
definitely a race here, mesos might release the resource to framework
before the container is destroyed. We'll try to fix that really soon. I'll
let Jie/Ian chime in regarding fixes/tickets.


On Tue, Jun 3, 2014 at 4:25 PM, Sharma Podila <sp...@netflix.com> wrote:

> When a framework executor terminates, Mesos sends TASK_LOST status updates
> for tasks that were running. However, if a task had processes that do not
> terminate when the executor dies, then we have a problem since Mesos
> considers the slave resources assigned to those tasks as released. Where
> as, the task processes are running without releasing those resources.
>
> While it is a good practice for the task processes to exit when their
> executor dies, I am not sure that can be guaranteed. I am wondering how
> others are dealing with such "illegal" processes - that is, processes that
> once belonged to Mesos run tasks but not anymore.
>
> Conceivably, a per-slave reaper/GC process can periodically scan the
> slave's process tree to ensure all processes are 'legal'. Assuming that
> such a reaper exists (and could be tricky in a multi-framework environment)
> on the slave and is not risky in killing illegal processes, there is still
> the time window left until the reaper completes its next clean up routine.
> In the mean time, new tasks can land and fail trying to use a resource that
> was assumed to be free by Mesos. Especially problematic for ports. Not as
> much for CPU and memory.
>
> Would love to hear thoughts on how you are handling this scenario.
>
>

Re: Dealing with "run away" task processes after executor terminates

Posted by Sharma Podila <sp...@netflix.com>.

Jie, that sounds good. I think between the cgroups guarantees and
MESOS-1417 we should be OK.

I think with cgroups (or pid namespaces in the future), when the executor
> dies, all processes belong to the cgroup (or pid namespace) will killed by
> the slave.


Another point: Strictly speaking, the top pid of the executor's process
tree isn't necessarily the executor itself but could be a /bin/sh process
that launched it. What I noticed (when using process isolation for testing)
is that when that sh process is killed, the executor process continues to
live but Mesos reports task lost for tasks associated with that executor. I
am taking your above statement to imply that the exiting of the sh process
in this case also kills all processes of the cgroup before reporting task
lost status.

Thank you.


On Tue, Jun 3, 2014 at 7:17 PM, Jie Yu <yu...@gmail.com> wrote:

> Sharma,
>
> While it is a good practice for the task processes to exit when their
>> executor dies, I am not sure that can be guaranteed.
>
>
> I think with cgroups (or pid namespaces in the future), when the executor
> dies, all processes belong to the cgroup (or pid namespace) will killed by
> the slave.
>
> Especially problematic for ports. Not as much for CPU and memory.
>
>
> Yeah, we are addressing this issue right now. I think ticket MESOS-1417
> keeps track of this process. More specifically, we should not tell the
> master a task has FINISHED/LOST/FAILED/KILLED until its resources have been
> freed.
>
> - Jie
>
>
> On Tue, Jun 3, 2014 at 6:07 PM, Sharma Podila <sp...@netflix.com> wrote:
>
>> No, I haven't talked to either of them. Would be great to hear their
>> thoughts on this. Thanks for including them.
>>
>> Is container cleanup specific to cgroups? Or, would other containers, say
>> Docker, also have similar clean up behavior?
>>
>>
>> On Tue, Jun 3, 2014 at 5:44 PM, Vinod Kone <vi...@gmail.com> wrote:
>>
>>> +Jie,Ian
>>>
>>> Not sure if you've talked to Ian Downes and/or Jie Yu regarding this but
>>> they were discussing the same issue (offline) today.
>>>
>>> Just to be sure, if you are using cgroups, the mesos slave will cleanup
>>> the container (and all its processes) when an executor exits. Now there is
>>> definitely a race here, mesos might release the resource to framework
>>> before the container is destroyed. We'll try to fix that really soon. I'll
>>> let Jie/Ian chime in regarding fixes/tickets.
>>>
>>>
>>> On Tue, Jun 3, 2014 at 4:25 PM, Sharma Podila <sp...@netflix.com>
>>> wrote:
>>>
>>>> When a framework executor terminates, Mesos sends TASK_LOST status
>>>> updates for tasks that were running. However, if a task had processes that
>>>> do not terminate when the executor dies, then we have a problem since Mesos
>>>> considers the slave resources assigned to those tasks as released. Where
>>>> as, the task processes are running without releasing those resources.
>>>>
>>>> While it is a good practice for the task processes to exit when their
>>>> executor dies, I am not sure that can be guaranteed. I am wondering how
>>>> others are dealing with such "illegal" processes - that is, processes that
>>>> once belonged to Mesos run tasks but not anymore.
>>>>
>>>> Conceivably, a per-slave reaper/GC process can periodically scan the
>>>> slave's process tree to ensure all processes are 'legal'. Assuming that
>>>> such a reaper exists (and could be tricky in a multi-framework environment)
>>>> on the slave and is not risky in killing illegal processes, there is still
>>>> the time window left until the reaper completes its next clean up routine.
>>>> In the mean time, new tasks can land and fail trying to use a resource that
>>>> was assumed to be free by Mesos. Especially problematic for ports. Not as
>>>> much for CPU and memory.
>>>>
>>>> Would love to hear thoughts on how you are handling this scenario.
>>>>
>>>>
>>>
>>
>

Re: Dealing with "run away" task processes after executor terminates

Posted by Ian Downes <id...@twitter.com.INVALID>.

I'll expand a little on Jie's reply:

On Jun 3, 2014, at 7:17 PM, Jie Yu <yu...@gmail.com> wrote:

> Sharma,
> 
> While it is a good practice for the task processes to exit when their executor dies, I am not sure that can be guaranteed.
> 
> I think with cgroups (or pid namespaces in the future), when the executor dies, all processes belong to the cgroup (or pid namespace) will killed by the slave.

The MesosContainerizer uses a Launcher which is responsible for the managing the process lifecycle of a container. There are two implementations a) one for any Posix system that uses sessions to track the executor and it's processes, and b) a Linux implementation that uses a freezer cgroup to track the processes. Method (a) is not perfect and processes can 'escape' and miss being killed. Method (b) ensures that all processes have been killed before continuing with container destruction. Container destruction is either requested by the slave, triggered by the executor exiting, or triggered by a resource limitation (at the moment this is only out-of-memory from the CgroupsMemIsolator).

> Especially problematic for ports. Not as much for CPU and memory. 
> 
> Yeah, we are addressing this issue right now. I think ticket MESOS-1417 keeps track of this process. More specifically, we should not tell the master a task has FINISHED/LOST/FAILED/KILLED until its resources have been freed.
> 

MESOS-1417 is tracking one place in the slave code where the slave doesn't wait for the update to complete before sending a status update and can lead to resource issues as you describe. There are other places in the slave code that should also wait and we'll get to those shortly as well.

Ian

> 
> 
> On Tue, Jun 3, 2014 at 6:07 PM, Sharma Podila <sp...@netflix.com> wrote:
> No, I haven't talked to either of them. Would be great to hear their thoughts on this. Thanks for including them.
> 
> Is container cleanup specific to cgroups? Or, would other containers, say Docker, also have similar clean up behavior?
> 
> 
> On Tue, Jun 3, 2014 at 5:44 PM, Vinod Kone <vi...@gmail.com> wrote:
> +Jie,Ian
> 
> Not sure if you've talked to Ian Downes and/or Jie Yu regarding this but they were discussing the same issue (offline) today.
> 
> Just to be sure, if you are using cgroups, the mesos slave will cleanup the container (and all its processes) when an executor exits. Now there is definitely a race here, mesos might release the resource to framework before the container is destroyed. We'll try to fix that really soon. I'll let Jie/Ian chime in regarding fixes/tickets.
> 
> 
> On Tue, Jun 3, 2014 at 4:25 PM, Sharma Podila <sp...@netflix.com> wrote:
> When a framework executor terminates, Mesos sends TASK_LOST status updates for tasks that were running. However, if a task had processes that do not terminate when the executor dies, then we have a problem since Mesos considers the slave resources assigned to those tasks as released. Where as, the task processes are running without releasing those resources. 
> 
> While it is a good practice for the task processes to exit when their executor dies, I am not sure that can be guaranteed. I am wondering how others are dealing with such "illegal" processes - that is, processes that once belonged to Mesos run tasks but not anymore. 
> 
> Conceivably, a per-slave reaper/GC process can periodically scan the slave's process tree to ensure all processes are 'legal'. Assuming that such a reaper exists (and could be tricky in a multi-framework environment) on the slave and is not risky in killing illegal processes, there is still the time window left until the reaper completes its next clean up routine. In the mean time, new tasks can land and fail trying to use a resource that was assumed to be free by Mesos. Especially problematic for ports. Not as much for CPU and memory. 
> 
> Would love to hear thoughts on how you are handling this scenario.
> 
> 
> 
>

Re: Dealing with "run away" task processes after executor terminates

Posted by Jie Yu <yu...@gmail.com>.

Sharma,

While it is a good practice for the task processes to exit when their
> executor dies, I am not sure that can be guaranteed.


I think with cgroups (or pid namespaces in the future), when the executor
dies, all processes belong to the cgroup (or pid namespace) will killed by
the slave.

Especially problematic for ports. Not as much for CPU and memory.


Yeah, we are addressing this issue right now. I think ticket MESOS-1417
keeps track of this process. More specifically, we should not tell the
master a task has FINISHED/LOST/FAILED/KILLED until its resources have been
freed.

- Jie


On Tue, Jun 3, 2014 at 6:07 PM, Sharma Podila <sp...@netflix.com> wrote:

> No, I haven't talked to either of them. Would be great to hear their
> thoughts on this. Thanks for including them.
>
> Is container cleanup specific to cgroups? Or, would other containers, say
> Docker, also have similar clean up behavior?
>
>
> On Tue, Jun 3, 2014 at 5:44 PM, Vinod Kone <vi...@gmail.com> wrote:
>
>> +Jie,Ian
>>
>> Not sure if you've talked to Ian Downes and/or Jie Yu regarding this but
>> they were discussing the same issue (offline) today.
>>
>> Just to be sure, if you are using cgroups, the mesos slave will cleanup
>> the container (and all its processes) when an executor exits. Now there is
>> definitely a race here, mesos might release the resource to framework
>> before the container is destroyed. We'll try to fix that really soon. I'll
>> let Jie/Ian chime in regarding fixes/tickets.
>>
>>
>> On Tue, Jun 3, 2014 at 4:25 PM, Sharma Podila <sp...@netflix.com>
>> wrote:
>>
>>> When a framework executor terminates, Mesos sends TASK_LOST status
>>> updates for tasks that were running. However, if a task had processes that
>>> do not terminate when the executor dies, then we have a problem since Mesos
>>> considers the slave resources assigned to those tasks as released. Where
>>> as, the task processes are running without releasing those resources.
>>>
>>> While it is a good practice for the task processes to exit when their
>>> executor dies, I am not sure that can be guaranteed. I am wondering how
>>> others are dealing with such "illegal" processes - that is, processes that
>>> once belonged to Mesos run tasks but not anymore.
>>>
>>> Conceivably, a per-slave reaper/GC process can periodically scan the
>>> slave's process tree to ensure all processes are 'legal'. Assuming that
>>> such a reaper exists (and could be tricky in a multi-framework environment)
>>> on the slave and is not risky in killing illegal processes, there is still
>>> the time window left until the reaper completes its next clean up routine.
>>> In the mean time, new tasks can land and fail trying to use a resource that
>>> was assumed to be free by Mesos. Especially problematic for ports. Not as
>>> much for CPU and memory.
>>>
>>> Would love to hear thoughts on how you are handling this scenario.
>>>
>>>
>>
>

Re: Dealing with "run away" task processes after executor terminates

Posted by Jie Yu <yu...@gmail.com>.

Sharma,

While it is a good practice for the task processes to exit when their
> executor dies, I am not sure that can be guaranteed.


I think with cgroups (or pid namespaces in the future), when the executor
dies, all processes belong to the cgroup (or pid namespace) will killed by
the slave.

Especially problematic for ports. Not as much for CPU and memory.


Yeah, we are addressing this issue right now. I think ticket MESOS-1417
keeps track of this process. More specifically, we should not tell the
master a task has FINISHED/LOST/FAILED/KILLED until its resources have been
freed.

- Jie


On Tue, Jun 3, 2014 at 6:07 PM, Sharma Podila <sp...@netflix.com> wrote:

> No, I haven't talked to either of them. Would be great to hear their
> thoughts on this. Thanks for including them.
>
> Is container cleanup specific to cgroups? Or, would other containers, say
> Docker, also have similar clean up behavior?
>
>
> On Tue, Jun 3, 2014 at 5:44 PM, Vinod Kone <vi...@gmail.com> wrote:
>
>> +Jie,Ian
>>
>> Not sure if you've talked to Ian Downes and/or Jie Yu regarding this but
>> they were discussing the same issue (offline) today.
>>
>> Just to be sure, if you are using cgroups, the mesos slave will cleanup
>> the container (and all its processes) when an executor exits. Now there is
>> definitely a race here, mesos might release the resource to framework
>> before the container is destroyed. We'll try to fix that really soon. I'll
>> let Jie/Ian chime in regarding fixes/tickets.
>>
>>
>> On Tue, Jun 3, 2014 at 4:25 PM, Sharma Podila <sp...@netflix.com>
>> wrote:
>>
>>> When a framework executor terminates, Mesos sends TASK_LOST status
>>> updates for tasks that were running. However, if a task had processes that
>>> do not terminate when the executor dies, then we have a problem since Mesos
>>> considers the slave resources assigned to those tasks as released. Where
>>> as, the task processes are running without releasing those resources.
>>>
>>> While it is a good practice for the task processes to exit when their
>>> executor dies, I am not sure that can be guaranteed. I am wondering how
>>> others are dealing with such "illegal" processes - that is, processes that
>>> once belonged to Mesos run tasks but not anymore.
>>>
>>> Conceivably, a per-slave reaper/GC process can periodically scan the
>>> slave's process tree to ensure all processes are 'legal'. Assuming that
>>> such a reaper exists (and could be tricky in a multi-framework environment)
>>> on the slave and is not risky in killing illegal processes, there is still
>>> the time window left until the reaper completes its next clean up routine.
>>> In the mean time, new tasks can land and fail trying to use a resource that
>>> was assumed to be free by Mesos. Especially problematic for ports. Not as
>>> much for CPU and memory.
>>>
>>> Would love to hear thoughts on how you are handling this scenario.
>>>
>>>
>>
>

Re: Dealing with "run away" task processes after executor terminates

Posted by Sharma Podila <sp...@netflix.com.INVALID>.

No, I haven't talked to either of them. Would be great to hear their
thoughts on this. Thanks for including them.

Is container cleanup specific to cgroups? Or, would other containers, say
Docker, also have similar clean up behavior?


On Tue, Jun 3, 2014 at 5:44 PM, Vinod Kone <vi...@gmail.com> wrote:

> +Jie,Ian
>
> Not sure if you've talked to Ian Downes and/or Jie Yu regarding this but
> they were discussing the same issue (offline) today.
>
> Just to be sure, if you are using cgroups, the mesos slave will cleanup
> the container (and all its processes) when an executor exits. Now there is
> definitely a race here, mesos might release the resource to framework
> before the container is destroyed. We'll try to fix that really soon. I'll
> let Jie/Ian chime in regarding fixes/tickets.
>
>
> On Tue, Jun 3, 2014 at 4:25 PM, Sharma Podila <sp...@netflix.com> wrote:
>
>> When a framework executor terminates, Mesos sends TASK_LOST status
>> updates for tasks that were running. However, if a task had processes that
>> do not terminate when the executor dies, then we have a problem since Mesos
>> considers the slave resources assigned to those tasks as released. Where
>> as, the task processes are running without releasing those resources.
>>
>> While it is a good practice for the task processes to exit when their
>> executor dies, I am not sure that can be guaranteed. I am wondering how
>> others are dealing with such "illegal" processes - that is, processes that
>> once belonged to Mesos run tasks but not anymore.
>>
>> Conceivably, a per-slave reaper/GC process can periodically scan the
>> slave's process tree to ensure all processes are 'legal'. Assuming that
>> such a reaper exists (and could be tricky in a multi-framework environment)
>> on the slave and is not risky in killing illegal processes, there is still
>> the time window left until the reaper completes its next clean up routine.
>> In the mean time, new tasks can land and fail trying to use a resource that
>> was assumed to be free by Mesos. Especially problematic for ports. Not as
>> much for CPU and memory.
>>
>> Would love to hear thoughts on how you are handling this scenario.
>>
>>
>

Re: Dealing with "run away" task processes after executor terminates

Posted by Sharma Podila <sp...@netflix.com>.

No, I haven't talked to either of them. Would be great to hear their
thoughts on this. Thanks for including them.

Is container cleanup specific to cgroups? Or, would other containers, say
Docker, also have similar clean up behavior?


On Tue, Jun 3, 2014 at 5:44 PM, Vinod Kone <vi...@gmail.com> wrote:

> +Jie,Ian
>
> Not sure if you've talked to Ian Downes and/or Jie Yu regarding this but
> they were discussing the same issue (offline) today.
>
> Just to be sure, if you are using cgroups, the mesos slave will cleanup
> the container (and all its processes) when an executor exits. Now there is
> definitely a race here, mesos might release the resource to framework
> before the container is destroyed. We'll try to fix that really soon. I'll
> let Jie/Ian chime in regarding fixes/tickets.
>
>
> On Tue, Jun 3, 2014 at 4:25 PM, Sharma Podila <sp...@netflix.com> wrote:
>
>> When a framework executor terminates, Mesos sends TASK_LOST status
>> updates for tasks that were running. However, if a task had processes that
>> do not terminate when the executor dies, then we have a problem since Mesos
>> considers the slave resources assigned to those tasks as released. Where
>> as, the task processes are running without releasing those resources.
>>
>> While it is a good practice for the task processes to exit when their
>> executor dies, I am not sure that can be guaranteed. I am wondering how
>> others are dealing with such "illegal" processes - that is, processes that
>> once belonged to Mesos run tasks but not anymore.
>>
>> Conceivably, a per-slave reaper/GC process can periodically scan the
>> slave's process tree to ensure all processes are 'legal'. Assuming that
>> such a reaper exists (and could be tricky in a multi-framework environment)
>> on the slave and is not risky in killing illegal processes, there is still
>> the time window left until the reaper completes its next clean up routine.
>> In the mean time, new tasks can land and fail trying to use a resource that
>> was assumed to be free by Mesos. Especially problematic for ports. Not as
>> much for CPU and memory.
>>
>> Would love to hear thoughts on how you are handling this scenario.
>>
>>
>