You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mesos.apache.org by "Alexander Rukletsov (JIRA)" <ji...@apache.org> on 2016/12/22 14:25:58 UTC

[jira] [Commented] (MESOS-6676) Always re-link with scheduler during re-registration.

    [ https://issues.apache.org/jira/browse/MESOS-6676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15770153#comment-15770153 ] 

Alexander Rukletsov commented on MESOS-6676:
--------------------------------------------

I've backported this to 1.1.1. [~vinodkone] I believe you still might want to backport it to 1.0.x.

> Always re-link with scheduler during re-registration.
> -----------------------------------------------------
>
>                 Key: MESOS-6676
>                 URL: https://issues.apache.org/jira/browse/MESOS-6676
>             Project: Mesos
>          Issue Type: Bug
>          Components: master
>            Reporter: Neil Conway
>            Assignee: Neil Conway
>              Labels: mesosphere
>             Fix For: 1.1.1, 1.2.0
>
>
> Scenario:
> # Framework registers with master using a non-zero {{failover_timeout}} and is assigned a FrameworkID.
> # The master sees an {{ExitedEvent}} for the master->scheduler link. This could happen due to some transient network error, e.g., 1-way partition. The master sends a {{FrameworkErrorMessage}} to the framework. The master marks the framework as disconnected, but keeps the {{Framework*}} for it around in {{frameworks.registered}}.
> # The framework doesn't receive the {{FrameworkErrorMessage}} because it is dropped by the network.
> # The scheduler might receive an {{ExitedEvent}} for the scheduler -> master link, but it ignores this anyway (see MESOS-887).
> # The scheduler sees a new-master-detected event and re-registers with the master. It doesn _not_ set the {{force}} flag. This means we follow [this code path|https://github.com/apache/mesos/blob/a6bab9015cd63121081495b8291635f386b95a92/src/master/master.cpp#L2771] in the master, which does _not_ relink with the scheduler.
> The result is that scheduler re-registration succeds, but the master -> scheduler link is never re-established.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)