You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mesos.apache.org by "Till Toenshoff (JIRA)" <ji...@apache.org> on 2015/02/14 19:29:11 UTC

[jira] [Commented] (MESOS-2354) Under certain circumstances master assigns the same ID to different slaves.

    [ https://issues.apache.org/jira/browse/MESOS-2354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14321617#comment-14321617 ] 

Till Toenshoff commented on MESOS-2354:
---------------------------------------

[~arojas] Did you find out yet on why the master is creating dupe slave id's?

> Under certain circumstances master assigns the same ID to different slaves.
> ---------------------------------------------------------------------------
>
>                 Key: MESOS-2354
>                 URL: https://issues.apache.org/jira/browse/MESOS-2354
>             Project: Mesos
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 0.20.1
>            Reporter: Alexander Rojas
>
> If two slaves are created one after the other in quick succession, sometimes the master assigns both slaves the same ID. Example of this is the following test (use in {{master_tests.cpp}}):
> {code}
> TEST_F(MasterTest, SlavesWithTheSameID)
> {
>   // Start up the master.
>   Try<PID<Master>> master = StartMaster();
>   ASSERT_SOME(master);
>   // Start a couple of slaves. Their only use is for them to register
>   // to the master.
>   Future<SlaveRegisteredMessage> slave1RegisteredMessage =
>     FUTURE_PROTOBUF(SlaveRegisteredMessage(), master.get(), _);
>   StartSlave();
>   AWAIT_READY(slave1RegisteredMessage);
>   Future<SlaveRegisteredMessage> slave2RegisteredMessage =
>     FUTURE_PROTOBUF(SlaveRegisteredMessage(), master.get(), _);
>   StartSlave();
>   AWAIT_READY(slave2RegisteredMessage);
>   ASSERT_FALSE(
>       slave1RegisteredMessage.get().slave_id() ==
>         slave2RegisteredMessage.get().slave_id());
>   Shutdown();
> }
> {code}
> The test needs to be ran multiple times for it to at some point fail. ie. 
> {noformat}./bin/mesos-tests.sh --gtest_filter="MasterTest.SlavesWithTheSameID" --gtest_repeat=1000 --gtest_break_on_failure{noformat}
> At some point, the output will be:
> {noformat}
> ../../src/tests/master_tests.cpp:1618: Failure
> Value of: slave1RegisteredMessage.get().slave_id() == slave2RegisteredMessage.get().slave_id()
>   Actual: true
> Expected: false
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)