You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mesos.apache.org by "Meng Zhu (JIRA)" <ji...@apache.org> on 2018/01/18 20:25:00 UTC

[jira] [Created] (MESOS-8459) Executor could linger without ever receiving any tasks

Meng Zhu created MESOS-8459:
-------------------------------

             Summary: Executor could linger without ever receiving any tasks
                 Key: MESOS-8459
                 URL: https://issues.apache.org/jira/browse/MESOS-8459
             Project: Mesos
          Issue Type: Bug
          Components: executor
            Reporter: Meng Zhu


An executor's initial tasks may be killed even after it has been registered. In that case, the executor could linger forever.

In MESOS-8411, we have a short-term fix that checks an executor's completed and terminated task queues to see if it had ever received any tasks. if the check is false and there is no queued or launched tasks, agent will shutdown the executor. 

However, this check is not bullet-proof. The completedTasks queue is a circular_buffer (current size 200) which means earlier completed tasks that are possibly updated by the executor may be ejected and thus are missed by this check. This would lead to false positive shutdowns.

Per discussion with [~vinodkone] and [~bmahler]. There are two long term solutions.

The first one is to checkpoint additional executor states which indicates whether the executor has ever received any tasks (no more inference from task queue status);

The alternative is to add timeouts in the executor driver to shutdown lingering executors automatically.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)