You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mesos.apache.org by "Ian Downes (JIRA)" <ji...@apache.org> on 2014/11/08 00:50:34 UTC

[jira] [Assigned] (MESOS-2052) RunState::recover should always recover 'completed'

     [ https://issues.apache.org/jira/browse/MESOS-2052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ian Downes reassigned MESOS-2052:
---------------------------------

    Assignee: Ian Downes

> RunState::recover should always recover 'completed'
> ---------------------------------------------------
>
>                 Key: MESOS-2052
>                 URL: https://issues.apache.org/jira/browse/MESOS-2052
>             Project: Mesos
>          Issue Type: Bug
>          Components: containerization, slave
>    Affects Versions: 0.20.0
>            Reporter: Ian Downes
>            Assignee: Ian Downes
>
> RunState::recover() will return partial state if it cannot find or open the libprocess pid file. Specifically, it does not recover the 'completed' flag.
> However, if the slave has removed the executor (because launch failed or the executor failed to register) the sentinel flag will be set and this fact should be recovered. This ensures that container recovery is not attempted later.
> This was discovered when the LinuxLauncher failed to recover because it was asked to recover two containers with the same forkedPid. Investigation showed the executors both OOM'ed before registering, i.e., no libprocess pid file was present. However, the containerizer had detected the OOM, destroyed the container, and notified the slave which cleaned everything up: failing the task and calling removeExecutor (which writes the completed sentinel file.)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)