You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mesos.apache.org by "Benjamin Mahler (JIRA)" <ji...@apache.org> on 2016/10/05 20:59:20 UTC

[jira] [Comment Edited] (MESOS-6249) On Mesos master failover the reregistered callback is not triggered

    [ https://issues.apache.org/jira/browse/MESOS-6249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15549924#comment-15549924 ] 

Benjamin Mahler edited comment on MESOS-6249 at 10/5/16 8:58 PM:
-----------------------------------------------------------------

Linking in MESOS-786 which describes the lifecycle of registered and re-registered callbacks. Note that MESOS-786 was resolved but AFAICT we did not update to the newer semantics described in this ticket for schedulers that use the old-style driver.

However, it sounds like you care about this because you're trying to detect that the master has failed over. To do this you must introspect the {{MasterInfo}} provided to you in order to see if {{MasterInfo.id}} has changed.


was (Author: bmahler):
Linking in MESOS-786 which describes the lifecycle of registered and re-registered callbacks. Note that MESOS-786 was resolved but AFAICT we did not update to the newer semantics described in this ticket for schedulers that use the old-style driver.

However, it sounds like you care about this because you're to detect that the master has failed over. To do this you must introspect the {{MasterInfo}} provided to you in order to see if {{MasterInfo.id}} has changed.

> On Mesos master failover the reregistered callback is not triggered
> -------------------------------------------------------------------
>
>                 Key: MESOS-6249
>                 URL: https://issues.apache.org/jira/browse/MESOS-6249
>             Project: Mesos
>          Issue Type: Bug
>          Components: java api
>    Affects Versions: 0.28.0, 0.28.1, 1.0.1
>         Environment: OS X 10.11.6
>            Reporter: Markus Jura
>
> On a Mesos master failover the reregistered callback of the Java API is not triggered. Only the registration callback is triggered which makes it hard for a framework to distinguish between these scenarios.
> This behaviour has been tested with the ConductR framework, both with the Java API version 0.28.0, 0.28.1 and 1.0.1. Below you find the logs from the master that got re-elected and from the ConductR framework.
> *Log: Mesos master on a master re-election*
> {code:bash}
> I0926 11:44:20.008306 3747840 zookeeper.cpp:259] A new leading master (UPID=master@127.0.0.1:5050) is detected
> I0926 11:44:20.008458 3747840 master.cpp:1847] The newly elected leader is master@127.0.0.1:5050 with id ca5b9713-1eec-43e1-9d27-9ebc5c0f95b1
> I0926 11:44:20.008484 3747840 master.cpp:1860] Elected as the leading master!
> I0926 11:44:20.008498 3747840 master.cpp:1547] Recovering from registrar
> I0926 11:44:20.008607 3747840 registrar.cpp:332] Recovering registrar
> I0926 11:44:20.016340 4284416 registrar.cpp:365] Successfully fetched the registry (0B) in 7.702016ms
> I0926 11:44:20.016393 4284416 registrar.cpp:464] Applied 1 operations in 12us; attempting to update the 'registry'
> I0926 11:44:20.021428 4284416 registrar.cpp:509] Successfully updated the 'registry' in 5.019904ms
> I0926 11:44:20.021481 4284416 registrar.cpp:395] Successfully recovered registrar
> I0926 11:44:20.021611 528384 master.cpp:1655] Recovered 0 agents from the Registry (118B) ; allowing 10mins for agents to re-register
> I0926 11:44:20.536859 3747840 master.cpp:2424] Received SUBSCRIBE call for framework 'conductr' at scheduler-3f8b9645-7a17-4e9f-8ad5-077fe8c23b39@192.168.2.106:57164
> I0926 11:44:20.536969 3747840 master.cpp:2500] Subscribing framework conductr with checkpointing disabled and capabilities [  ]
> I0926 11:44:20.537401 3211264 hierarchical.cpp:271] Added framework conductr
> I0926 11:44:20.807895 528384 master.cpp:4787] Re-registering agent b99256c3-6905-44d3-bcc9-0d9e00d20fbe-S0 at slave(1)@127.0.0.1:5051 (127.0.0.1)
> I0926 11:44:20.808145 1601536 registrar.cpp:464] Applied 1 operations in 38us; attempting to update the 'registry'
> I0926 11:44:20.815757 1601536 registrar.cpp:509] Successfully updated the 'registry' in 7.568896ms
> I0926 11:44:20.815992 3747840 master.cpp:7447] Adding task 6abce9bb-895f-4f6f-be5b-25f6bd09f548 with resources mem(*):0 on agent b99256c3-6905-44d3-bcc9-0d9e00d20fbe-S0 (127.0.0.1)
> I0926 11:44:20.816339 3747840 master.cpp:4872] Re-registered agent b99256c3-6905-44d3-bcc9-0d9e00d20fbe-S0 at slave(1)@127.0.0.1:5051 (127.0.0.1) with cpus(*):8; mem(*):15360; disk(*):470832; ports(*):[31000-32000]
> I0926 11:44:20.816385 1601536 hierarchical.cpp:478] Added agent b99256c3-6905-44d3-bcc9-0d9e00d20fbe-S0 (127.0.0.1) with cpus(*):8; mem(*):15360; disk(*):470832; ports(*):[31000-32000] (allocated: cpus(*):0.9; mem(*):402.653; disk(*):1000; ports(*):[31000-31000, 31001-31500])
> I0926 11:44:20.816437 3747840 master.cpp:4940] Sending updated checkpointed resources  to agent b99256c3-6905-44d3-bcc9-0d9e00d20fbe-S0 at slave(1)@127.0.0.1:5051 (127.0.0.1)
> I0926 11:44:20.816787 4284416 master.cpp:5725] Sending 1 offers to framework conductr (conductr) at scheduler-3f8b9645-7a17-4e9f-8ad5-077fe8c23b39@192.168.2.106:57164
> {code}
> *Log: ConductR framework*
> {code:bash}
> I0926 11:44:20.007189 66441216 detector.cpp:152] Detected a new leader: (id='87')
> I0926 11:44:20.007524 64294912 group.cpp:706] Trying to get '/mesos/json.info_0000000087' in ZooKeeper
> I0926 11:44:20.008625 63758336 zookeeper.cpp:259] A new leading master (UPID=master@127.0.0.1:5050) is detected
> I0926 11:44:20.008965 63758336 sched.cpp:330] New master detected at master@127.0.0.1:5050
> 2016-09-26T09:44:20Z MacBook-Pro-6.local INFO  MesosSchedulerClient [sourceThread=conductr-akka.actor.default-dispatcher-2, akkaTimestamp=09:44:20.009UTC, akkaSource=akka.tcp://conductr@127.0.0.1:9004/user/reaper/mesos-client-supervisor/singleton/mesos-client, sourceActorSystem=conductr] - Mesos master has been disconnected..
> I0926 11:44:20.012472 63758336 sched.cpp:341] No credentials provided. Attempting to register without authentication
> I0926 11:44:20.537613 65904640 sched.cpp:743] Framework registered with conductr
> 2016-09-26T09:44:20Z MacBook-Pro-6.local INFO  MesosSchedulerClient [sourceThread=conductr-akka.actor.default-dispatcher-18, akkaTimestamp=09:44:20.538UTC, akkaSource=akka.tcp://conductr@127.0.0.1:9004/user/reaper/mesos-client-supervisor/singleton/mesos-client, sourceActorSystem=conductr] - Mesos master on localhost:5050 has been registered with ConductR framework id: conductr
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)