You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mesos.apache.org by "Benjamin Mahler (JIRA)" <ji...@apache.org> on 2015/05/02 00:01:07 UTC

[jira] [Commented] (MESOS-2684) mesos-slave should not abort when a single task has e.g. a 'mkdir' failure

    [ https://issues.apache.org/jira/browse/MESOS-2684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14524062#comment-14524062 ] 

Benjamin Mahler commented on MESOS-2684:
----------------------------------------

TASK_KILLED is coming from the executors, not the slave:

I0501 19:10:48.069172  1685 slave.cpp:2215] Handling status update TASK_KILLED (UUID: U) for task T of framework Singularity *from executor(1)@10.70.8.160:49741*

Can you check why the executors are sending this?

> mesos-slave should not abort when a single task has e.g. a 'mkdir' failure
> --------------------------------------------------------------------------
>
>                 Key: MESOS-2684
>                 URL: https://issues.apache.org/jira/browse/MESOS-2684
>             Project: Mesos
>          Issue Type: Bug
>          Components: slave
>    Affects Versions: 0.21.1
>            Reporter: Steven Schlansker
>         Attachments: mesos-slave-restart.txt
>
>
> mesos-slave can encounter a variety of problems while attempting to launch a task.  If the task fails, that is unfortunate, but not the end of the world.  Other tasks should not be affected.
> However, if the task failure happens to trigger an assertion, the entire slave comes crashing down:
> F0501 19:10:46.095464  1705 paths.hpp:342] CHECK_SOME(mkdir): No space left on device Failed to create executor directory '/mnt/mesos/slaves/20150327-194449-419644938-5050-1649-S71/frameworks/Singularity/executors/pp-gc-eventlog-teamcity.2015.03.31T23.55.14-1430507446029-2-10.70.8.160-us_west_2b/runs/95a54aeb-322c-48e9-9f6f-5b359bccbc01'
> Immediately afterwards, all tasks on this slave were declared TASK_KILLED when mesos-slave restarted.
> Something as simple as a 'mkdir' failing is not worthy of an assertion failure.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)