You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mesos.apache.org by "Timothy Chen (JIRA)" <ji...@apache.org> on 2015/06/17 09:46:00 UTC

[jira] [Updated] (MESOS-2684) mesos-slave should not abort when a single task has e.g. a 'mkdir' failure

     [ https://issues.apache.org/jira/browse/MESOS-2684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Timothy Chen updated MESOS-2684:
--------------------------------
    Component/s:     (was: docker)

> mesos-slave should not abort when a single task has e.g. a 'mkdir' failure
> --------------------------------------------------------------------------
>
>                 Key: MESOS-2684
>                 URL: https://issues.apache.org/jira/browse/MESOS-2684
>             Project: Mesos
>          Issue Type: Bug
>          Components: slave
>    Affects Versions: 0.21.1
>            Reporter: Steven Schlansker
>         Attachments: mesos-slave-restart.txt
>
>
> mesos-slave can encounter a variety of problems while attempting to launch a task.  If the task fails, that is unfortunate, but not the end of the world.  Other tasks should not be affected.
> However, if the task failure happens to trigger an assertion, the entire slave comes crashing down:
> F0501 19:10:46.095464  1705 paths.hpp:342] CHECK_SOME(mkdir): No space left on device Failed to create executor directory '/mnt/mesos/slaves/20150327-194449-419644938-5050-1649-S71/frameworks/Singularity/executors/pp-gc-eventlog-teamcity.2015.03.31T23.55.14-1430507446029-2-10.70.8.160-us_west_2b/runs/95a54aeb-322c-48e9-9f6f-5b359bccbc01'
> Immediately afterwards, all tasks on this slave were declared TASK_KILLED when mesos-slave restarted.
> Something as simple as a 'mkdir' failing is not worthy of an assertion failure.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)