You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mesos.apache.org by "Jie Yu (JIRA)" <ji...@apache.org> on 2016/05/13 19:37:13 UTC
[jira] [Created] (MESOS-5380) Killing a queued task can cause the
corresponding command executor never terminates.
Jie Yu created MESOS-5380:
-----------------------------
Summary: Killing a queued task can cause the corresponding command executor never terminates.
Key: MESOS-5380
URL: https://issues.apache.org/jira/browse/MESOS-5380
Project: Mesos
Issue Type: Bug
Affects Versions: 0.28.1, 0.28.0
Reporter: Jie Yu
Assignee: Vinod Kone
Priority: Blocker
Fix For: 0.29.0, 0.28.2
We observed that in our testing environment. So here is the sequence of events:
1) A command task is queued, the executor is not registered yet
2) The framework issues a killTask
3) Since executor is in REGISTERING state, agent calls `statusUpdate(TASK_KILLED, UPID())`
4) `statusUpdate` now will call `containerizer->status()` before calling `executor->terminateTask(status.task_id(), status);` which will remove the queued task. (introduced in this patch https://reviews.apache.org/r/43258).
5) Since the above is async, it's possible that the task is still in queued task when we trying to see if we need to kill unregistered executor in `killTask`:
```
// TODO(jieyu): Here, we kill the executor if it no longer has
// any task to run and has not yet registered. This is a
// workaround for those single task executors that do not have a
// proper self terminating logic when they haven't received the
// task within a timeout.
if (executor->queuedTasks.empty()) {
CHECK(executor->launchedTasks.empty())
<< " Unregistered executor '" << executor->id
<< "' has launched tasks";
LOG(WARNING) << "Killing the unregistered executor " << *executor
<< " because it has no tasks";
executor->state = Executor::TERMINATING;
containerizer->destroy(executor->containerId);
}
```
6) The executor will never be terminated by Mesos after that.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)