You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mesos.apache.org by "Joe Smith (JIRA)" <ji...@apache.org> on 2015/02/18 03:36:11 UTC

[jira] [Created] (MESOS-2367) Improve slave resiliency in the face of orphan containers

Joe Smith created MESOS-2367:
--------------------------------

             Summary: Improve slave resiliency in the face of orphan containers 
                 Key: MESOS-2367
                 URL: https://issues.apache.org/jira/browse/MESOS-2367
             Project: Mesos
          Issue Type: Bug
          Components: slave
            Reporter: Joe Smith


Right now there's a case where a misbehaving executor can cause a slave process to flap:

{panel:title=Quote From [~jieyu]}
{quote}
1) User tries to kill an instance
2) Slave sends {{KillTaskMessage}} to executor
3) Executor sends kill signals to task processes
4) Executor sends {{TASK_KILLED}} to slave
5) Slave updates container cpu limit to be 0.01 cpus
6) A user-process is still processing the kill signal
7) the task process cannot exit since it has too little cpu share and is throttled
8) Executor itself terminates
9) Slave tries to destroy the container, but cannot because the user-process is stuck in the exit path.
10) Slave restarts, and is constantly flapping because it cannot kill orphan containers
{quote}
{panel}

The slave's orphan container handling should be improved to deal with this case despite ill-behaved users (framework writers).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)