You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mesos.apache.org by "Benjamin Bannier (Jira)" <ji...@apache.org> on 2019/10/23 13:14:00 UTC

[jira] [Assigned] (MESOS-10018) Duplicate tasks if agent partitioned during maintenance down period

     [ https://issues.apache.org/jira/browse/MESOS-10018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Benjamin Bannier reassigned MESOS-10018:
----------------------------------------

    Shepherd: Benno Evers
      Sprint: Foundations: RI-19 57
    Assignee: Benjamin Bannier

> Duplicate tasks if agent partitioned during maintenance down period
> -------------------------------------------------------------------
>
>                 Key: MESOS-10018
>                 URL: https://issues.apache.org/jira/browse/MESOS-10018
>             Project: Mesos
>          Issue Type: Bug
>            Reporter: Benjamin Bannier
>            Assignee: Benjamin Bannier
>            Priority: Major
>
> When the master starts maintenance for a node it
> (1) sends a {{ShutdownMessage}} message to agent, and
> (2) removes the slave which transitions all tasks to {{TASK_LOST}} and moves them
> to the completed task set.
> If the {{ShutdownMessage}} isn't fully processed on the agent (e.g., message dropped between (1) and (2), or agent process killed before the executor has shut down), the agent could come back with the lost task running. It would report the task on registration with the master, which would add it to the list of active tasks. With that the same task could be both completed and active.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)