You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@mesos.apache.org by "Andreas Chalupa (JIRA)" <ji...@apache.org> on 2015/10/14 16:30:05 UTC

[jira] [Updated] (MESOS-3730) Docker containers wont start on a set of mesos slaves

     [ https://issues.apache.org/jira/browse/MESOS-3730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andreas Chalupa updated MESOS-3730:
-----------------------------------
    Attachment: slaveLogs.zip

Logs from one of the slaves that exhibits the problem

> Docker containers wont start on a set of mesos slaves
> -----------------------------------------------------
>
>                 Key: MESOS-3730
>                 URL: https://issues.apache.org/jira/browse/MESOS-3730
>             Project: Mesos
>          Issue Type: Bug
>          Components: slave
>    Affects Versions: 0.25.0
>         Environment: CentOS 7
>            Reporter: Andreas Chalupa
>         Attachments: slaveLogs.zip
>
>
> We have 3 nodes that we've designated to run 'data' containers.  These are stateful containers that share a volume with the slave host machine.  We've seen on two different test beds now that these slaves get into a state where they can't start any containers.  The STDERROR of the containers show this error:
> mesos-docker-executor: /tmp/mesos-build/mesos-repo/3rdparty/libprocess/3rdparty/stout/include/stout/option.hpp:110: T& Option<T>::get() [with T = std::basic_string<char>]: Assertion `isSome()' failed.
> *** Aborted at 1444832114 (unix time) try "date -d @1444832114" if you are using GNU date ***
> PC: @     0x7fc02694a5d7 __GI_raise
> *** SIGABRT (@0x4a7b) received by PID 19067 (TID 0x7fc02913b8c0) from PID 19067; stack trace: ***
>     @     0x7fc027504130 (unknown)
>     @     0x7fc02694a5d7 __GI_raise
>     @     0x7fc02694bcc8 __GI_abort
>     @     0x7fc026943546 __assert_fail_base
>     @     0x7fc0269435f2 __GI___assert_fail
>     @           0x4166b2 Option<>::get()
>     @           0x417725 main
>     @     0x7fc026936af5 __libc_start_main
>     @           0x417875 (unknown)
> I have no idea how to interpret this error and what it might mean.
> Once the slave is in this state it is busted and no containers can be reloaded.  I don't think it clears up until we reboot the host machine (maybe a restart of the slave or docker might be enough?)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)