You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@mesos.apache.org by "Gilbert Song (JIRA)" <ji...@apache.org> on 2018/01/04 19:17:00 UTC

[jira] [Assigned] (MESOS-8391) Mesos agent doesn't notice that a pod task exits or crashes after the agent restart

     [ https://issues.apache.org/jira/browse/MESOS-8391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Gilbert Song reassigned MESOS-8391:
-----------------------------------

    Assignee: Gilbert Song

> Mesos agent doesn't notice that a pod task exits or crashes after the agent restart
> -----------------------------------------------------------------------------------
>
>                 Key: MESOS-8391
>                 URL: https://issues.apache.org/jira/browse/MESOS-8391
>             Project: Mesos
>          Issue Type: Bug
>          Components: agent, containerization, executor
>    Affects Versions: 1.5.0
>            Reporter: Ivan Chernetsky
>            Assignee: Gilbert Song
>            Priority: Blocker
>         Attachments: agent.log.gz
>
>
> h4. (1) Agent doesn't detect that a pod task exits/crashes
> # Create a Marathon pod with two containers which just do {{sleep 10000}}.
> # Restart the Mesos agent on the node the pod got launched.
> # Kill one of the pod tasks
> *Expected result*: The Mesos agent detects that one of the tasks got killed, and forwards {{TASK_FAILED}} status to Marathon.
> *Actual result*: The Mesos agent does nothing, and the Mesos master thinks that both tasks are running just fine. Marathon doesn't take any action because it doesn't receive any update from Mesos.
> h4. (2) After the agent restart, it detects that the task crashed, forwards the correct status update, but the other task stays in {{TASK_KILLING}} state forever
> # Perform steps in (1).
> # Restart the Mesos agent
> *Expected result*: The Mesos agent detects that one of the tasks got crashed, forwards the corresponding status update, and kills the other task too.
> *Actual result*: The Mesos agent detects that one of the tasks got crashed, forwards the corresponding status update, but the other task stays in `TASK_KILLING` state forever.
> Please note, that after another agent restart, the other tasks gets finally killed and the correct status updates get propagated all the way to Marathon.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)