You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mesos.apache.org by "Benjamin Mahler (JIRA)" <ji...@apache.org> on 2013/03/26 23:43:15 UTC

[jira] [Commented] (MESOS-305) Inform the framework about a master failover

    [ https://issues.apache.org/jira/browse/MESOS-305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13614637#comment-13614637 ] 

Benjamin Mahler commented on MESOS-305:
---------------------------------------

I'm working on a fix for this as we discussed offline.

For transparency, we need to adjust the master detector to allow the messages. As result, there needs to be changes to the master as well to ensure that after a network partition, we disallow disconnected slaves from re-registering. This is because we've already informed frameworks of LOST tasks upon disconnecting the slave.
                
> Inform the framework about a master failover
> --------------------------------------------
>
>                 Key: MESOS-305
>                 URL: https://issues.apache.org/jira/browse/MESOS-305
>             Project: Mesos
>          Issue Type: Improvement
>            Reporter: Vinod Kone
>            Assignee: Benjamin Mahler
>            Priority: Critical
>
> With the recent changes in the master detecter code, we no longer send 'NoMasterDetected' to the scheduler driver, which in turn means the 'disconnected' scheduler callback is never invoked.
> At Twitter this manifested as a spew of LOST tasks whenever a master failover happens. This is because the scheduler holds on to offers for a while and never knows about the invalidity of offers, until after tasks are launched. Though this is a race, it is ideal to minimize this window as much as possible by informing the scheduler of the master failover.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira