You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mesos.apache.org by "Anand Mazumdar (JIRA)" <ji...@apache.org> on 2017/11/02 00:33:00 UTC

[jira] [Commented] (MESOS-7867) Master doesn't handle scheduler driver downgrade from HTTP based to PID based

    [ https://issues.apache.org/jira/browse/MESOS-7867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16235013#comment-16235013 ] 

Anand Mazumdar commented on MESOS-7867:
---------------------------------------

{noformat}
commit fa747ac3620a6695cfe29ac443f833fad6c991a2
Author: Ilya Pronin <ip...@twopensource.com>
Date:   Wed Nov 1 17:29:58 2017 -0700

    Added SchedulerHttpApiTest.UpdateHttpToPidSchedulerAndBack test.

    This test verifies that we are able to upgrade from a PID based
    framework to an HTTP framework and then downgrade back without
    restarting the master.

    Review: https://reviews.apache.org/r/62241/

commit 5b93d6ac60725679399fe15233267c02cc9918df
Author: Ilya Pronin <ip...@twopensource.com>
Date:   Wed Nov 1 17:29:49 2017 -0700

    Removed metrics removal from Master::failoverFramework().

    When a framework upgrades from a PID based driver to an HTTP based
    driver, the master removes its per-principal metrics. When the same
    framework downgrades back to a PID based driver, the master doesn't
    reinstate those metrics. This causes a crash when the master receives a
    message from the failed over framework and tries to increment its
    metrics.

    This patch fixes the issue by removing metrics removal from framework
    failover handling code. Note that it doesn't handle the case when the
    framework's principal change. This situation is being dealt with
    separately in MESOS-2842.

    Review: https://reviews.apache.org/r/62240/
{noformat}

> Master doesn't handle scheduler driver downgrade from HTTP based to PID based
> -----------------------------------------------------------------------------
>
>                 Key: MESOS-7867
>                 URL: https://issues.apache.org/jira/browse/MESOS-7867
>             Project: Mesos
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 1.3.0
>            Reporter: Ilya Pronin
>            Assignee: Ilya Pronin
>            Priority: Major
>             Fix For: 1.5.0
>
>
> When a framework upgrades from a PID based driver to an HTTP based driver, master removes its per-framework-principal metrics ({{messages_received}} and {{messages_processed}}) in {{Master::failoverFramework}}. When the same framework downgrades back to a PID based driver, the master doesn't reinstate those metrics. This causes a crash when the master receives a message from the failed over framework and increments {{messages_received}} counter in {{Master::visit(const MessageEvent&)}}.
> {noformat}
> I0807 18:17:45.713220 19095 master.cpp:2916] Framework 70822e80-ca38-4470-916e-e6da073a4742-0000 (TwitterScheduler) failed over
> F0807 18:18:20.725908 19079 master.cpp:1451] Check failed: metrics->frameworks.contains(principal.get())
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)