You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@mesos.apache.org by "Ian Downes (JIRA)" <ji...@apache.org> on 2015/05/06 01:37:01 UTC

[jira] [Commented] (MESOS-2656) Slave should send status update immediately when container launch fails.

    [ https://issues.apache.org/jira/browse/MESOS-2656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14529555#comment-14529555 ] 

Ian Downes commented on MESOS-2656:
-----------------------------------

I'm seeing the container being destroyed by the containerizer:
{noformat}
E0505 23:23:13.262598 31892 containerizer.cpp:671] Failed to launch container '22468185-6473-4dae-9053-ce3b5143c085' for executor 'foobar' of framework '20150505-175526-33554559-5050-1218-0048': Fetched image does not match
I0505 23:23:13.262676 31892 containerizer.cpp:1119] Destroying container '22468185-6473-4dae-9053-ce3b5143c085'
E0505 23:23:13.262969 31880 slave.cpp:3108] Container '22468185-6473-4dae-9053-ce3b5143c085' for executor 'foobar' of framework '20150505-175526-33554559-5050-1218-0048' failed to start: Fetched image does not match
I0505 23:24:10.356695 31889 slave.cpp:3722] Current disk usage 58.02%. Max allowed age: 2.238647445277164days
I0505 23:24:13.045687 31889 slave.cpp:3678] Terminating executor foobar of framework 20150505-175526-33554559-5050-1218-0048 because it did not register within 1mins
I0505 23:25:10.357317 31890 slave.cpp:3722] Current disk usage 58.01%. Max allowed age: 2.238966137882743days
...

{noformat}

but still no update (using {{mesos execute}}):
{noformat}
I0505 23:23:13.025764 31935 sched.cpp:448] Framework registered with 20150505-175526-33554559-5050-1218-0048
Framework registered with 20150505-175526-33554559-5050-1218-0048
task test submitted to slave 20150505-175526-33554559-5050-1218-S9
{noformat}

> Slave should send status update immediately when container launch fails.
> ------------------------------------------------------------------------
>
>                 Key: MESOS-2656
>                 URL: https://issues.apache.org/jira/browse/MESOS-2656
>             Project: Mesos
>          Issue Type: Bug
>    Affects Versions: 0.22.1
>            Reporter: Jie Yu
>            Assignee: Jay Buffington
>
> Right now, the slave doesn't send status update to the scheduler if containerizer launch fails until executor reregistration timeout happens. Since for docker containerizer, someone might use a very large timeout value, ideally, the slave should send a status update to the scheduler right after containerizer launch fails.
> The simplest solution is to add a containerizer->destroy(..) in executorLaunched when containerizer->launch fails. In that way, it's going to trigger containerizer->wait and thus send status update to the scheduler.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)