You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mesos.apache.org by "Gilbert Song (JIRA)" <ji...@apache.org> on 2018/01/19 19:57:01 UTC

[jira] [Commented] (MESOS-6986) abort in DRFSorter::add

    [ https://issues.apache.org/jira/browse/MESOS-6986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16332832#comment-16332832 ] 

Gilbert Song commented on MESOS-6986:
-------------------------------------

[~yroyon] , is this still an issue in Mesos 1.3+ versions (Mesosphere DC/OS EE 1.4+)?

> abort in DRFSorter::add
> -----------------------
>
>                 Key: MESOS-6986
>                 URL: https://issues.apache.org/jira/browse/MESOS-6986
>             Project: Mesos
>          Issue Type: Bug
>          Components: allocation
>    Affects Versions: 1.0.1
>         Environment: Mesosphere Enterprise DC/OS, CoreOS
>            Reporter: Yvan Royon
>            Priority: Critical
>              Labels: mesosphere
>         Attachments: mesos-master.node-36-1.log
>
>
> My mesos-master process terminated on SIGABRT.
> The CHECK failed in function {{DRFSorter::add}}:
> https://github.com/apache/mesos/blob/master/src/master/allocator/sorter/drf/sorter.cpp#L74
> It seems there is a condition during framework registration where names are lost?
> We are using the mesos-go library ({{next}} branch), which uses the new HTTP API. The framework is custom Go code. The crash is hard to reliably reproduce.
> {code}
> mesos-master[90061]: F0119 01:07:57.426159 90086 sorter.cpp:73] Check failed: !contains(name)
> mesos-master[90061]: *** Check failure stack trace: ***
> mesos-master[90061]: @     0x7f960d9299fd  google::LogMessage::Fail()
> mesos-master[90061]: @     0x7f960d92b82d  google::LogMessage::SendToLog()
> mesos-master[90061]: @     0x7f960d9295ec  google::LogMessage::Flush()
> mesos-master[90061]: @     0x7f960d92c129  google::LogMessageFatal::~LogMessageFatal()
> mesos-master[90061]: @     0x7f960d03460d  mesos::internal::master::allocator::DRFSorter::add()
> mesos-master[90061]: @     0x7f960d021177  mesos::internal::master::allocator::internal::HierarchicalAllocatorProcess::addFramework()
> mesos-master[90061]: @     0x7f960d8b9381  process::ProcessManager::resume()
> mesos-master[90061]: @     0x7f960d8b9687  _ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEEE6_M_runEv
> mesos-master[90061]: @     0x7f960bf52d73  (unknown)
> mesos-master[90061]: @     0x7f960b74f52c  (unknown)
> mesos-master[90061]: @     0x7f960b49180d  (unknown)
> systemd[1]: dcos-mesos-master.service: Main process exited, code=killed, status=6/ABRT
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)