You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mesos.apache.org by "Klaus Ma (JIRA)" <ji...@apache.org> on 2015/09/01 08:27:45 UTC

[jira] [Updated] (MESOS-3351) nextSlaveId in master was not updated when recover

     [ https://issues.apache.org/jira/browse/MESOS-3351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Klaus Ma updated MESOS-3351:
----------------------------
    Attachment: test.log

Append the log of UT cases. In the log, the two slave are using the same slaveID; and the new started slave was rejected by the master.

{code}
da-macbookair:build dma$ grep "Registering slave at" test.log 
I0901 13:44:40.462039 430882816 master.cpp:3670] Registering slave at slave(1)@9.181.90.57:49795 (da-macbookair.cn.ibm.com) with id 20150901-134440-962245897-49795-59127-S0
I0901 13:44:40.660033 433565696 master.cpp:3670] Registering slave at slave(2)@9.181.90.57:49795 (da-macbookair.cn.ibm.com) with id 20150901-134440-962245897-49795-59127-S0
{code}

> nextSlaveId in master was not updated when recover
> --------------------------------------------------
>
>                 Key: MESOS-3351
>                 URL: https://issues.apache.org/jira/browse/MESOS-3351
>             Project: Mesos
>          Issue Type: Bug
>          Components: master
>         Environment: Mac OS (Darwin da-macbookair.cn.ibm.com 14.5.0 Darwin Kernel Version 14.5.0: Wed Jul 29 02:26:53 PDT 2015; root:xnu-2782.40.9~1/RELEASE_X86_64 x86_64)
>            Reporter: Klaus Ma
>            Assignee: Klaus Ma
>              Labels: race-condition, uuid
>         Attachments: test.log
>
>
> When a slave register to master, master will generate a slave ID for it by slaveInfo.id + "-S" + nextSlaveId (in master.cpp) to avoid duplicate slaveInfo.id. But if master failover, nextSlaveId was reset to 0 which may trigger duplicated slaveId between old slave & new slave.
> For now, it's only reproduced in Mac OS unstably, and can NOT reproduce in Ubuntu; not sure the other OS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)