You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mesos.apache.org by "Steven Schlansker (JIRA)" <ji...@apache.org> on 2015/05/01 23:35:09 UTC

[jira] [Updated] (MESOS-2684) mesos-slave should not abort when a single task has e.g. a 'mkdir' failure

     [ https://issues.apache.org/jira/browse/MESOS-2684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Steven Schlansker updated MESOS-2684:
-------------------------------------
    Attachment: mesos-slave-restart.txt

I've attached the log from slave restart.  The FATAL error above was the last line written before the abort, this is the head of the new log file created on restart.

> mesos-slave should not abort when a single task has e.g. a 'mkdir' failure
> --------------------------------------------------------------------------
>
>                 Key: MESOS-2684
>                 URL: https://issues.apache.org/jira/browse/MESOS-2684
>             Project: Mesos
>          Issue Type: Bug
>          Components: slave
>    Affects Versions: 0.21.1
>            Reporter: Steven Schlansker
>         Attachments: mesos-slave-restart.txt
>
>
> mesos-slave can encounter a variety of problems while attempting to launch a task.  If the task fails, that is unfortunate, but not the end of the world.  Other tasks should not be affected.
> However, if the task failure happens to trigger an assertion, the entire slave comes crashing down:
> F0501 19:10:46.095464  1705 paths.hpp:342] CHECK_SOME(mkdir): No space left on device Failed to create executor directory '/mnt/mesos/slaves/20150327-194449-419644938-5050-1649-S71/frameworks/Singularity/executors/pp-gc-eventlog-teamcity.2015.03.31T23.55.14-1430507446029-2-10.70.8.160-us_west_2b/runs/95a54aeb-322c-48e9-9f6f-5b359bccbc01'
> Immediately afterwards, all tasks on this slave were declared TASK_LOST when mesos-slave restarted.
> Something as simple as a 'mkdir' failing is not worthy of an assertion failure.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)