You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mesos.apache.org by "Benjamin Hindman (JIRA)" <ji...@apache.org> on 2013/03/01 00:21:13 UTC

[jira] [Commented] (MESOS-365) Master check failure.

    [ https://issues.apache.org/jira/browse/MESOS-365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13590036#comment-13590036 ] 

Benjamin Hindman commented on MESOS-365:
----------------------------------------

This is so brilliant! If only we had a really nice testing abstraction that could capture this "bug" as a test case! Ho hum, I'll get to it this weekend. ;)
                
> Master check failure.
> ---------------------
>
>                 Key: MESOS-365
>                 URL: https://issues.apache.org/jira/browse/MESOS-365
>             Project: Mesos
>          Issue Type: Bug
>            Reporter: Benjamin Mahler
>            Assignee: Vinod Kone
>            Priority: Critical
>
> In a test cluster under scale testing, during a roll of the masters, one of the newly elected masters failed with this:
> I0227 23:50:48.406574  1584 master.cpp:822] Asked to kill task 1362008747374-wickman-seizure-4-933a8193-96b1-411f-9392-3e4bd2cda6f0 of framework 201103282247-0000000019-0000
> F0227 23:50:48.406697  1584 master.cpp:830] Check failed: slave != NULL 
> *** Check failure stack trace: ***
>     @     0x7fb439418e6d  google::LogMessage::Fail()
>     @     0x7fb43941ead7  google::LogMessage::SendToLog()
>     @     0x7fb43941a71c  google::LogMessage::Flush()
>     @     0x7fb43941a986  google::LogMessageFatal::~LogMessageFatal()
>     @     0x7fb43908b176  mesos::internal::master::Master::killTask()
>     @     0x7fb4390c4645  ProtobufProcess<>::handler2<>()
>     @     0x7fb439090b27  std::tr1::_Function_handler<>::_M_invoke()
>     @     0x7fb4390c5b6b  ProtobufProcess<>::visit()
>     @     0x7fb4392e2624  process::MessageEvent::visit()
>     @     0x7fb4392d68cd  process::ProcessManager::resume()
>     @     0x7fb4392d7118  process::schedule()
>     @     0x7fb4389f573d  start_thread
>     @     0x7fb4373d9f6d  clone
> Looks like this CHECK is too aggressive, as it's possible for a newly rolled master to not have all of the slave's registered yet?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira