You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@mesos.apache.org by "Vinod Kone (JIRA)" <ji...@apache.org> on 2018/01/02 19:39:00 UTC

[jira] [Assigned] (MESOS-8341) Agent can become stuck in (re-)registering state during upgrades

     [ https://issues.apache.org/jira/browse/MESOS-8341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Vinod Kone reassigned MESOS-8341:
---------------------------------

       Resolution: Fixed
         Assignee: Benno Evers
    Fix Version/s: 1.5.1

commit 3eb57cae3674fc835c784cac9eaa63e1aab7ba1c
Author: Benno Evers <bevers@mesosphere.com\>
Date:   Tue Jan 2 10:58:23 2018 -0800

    Correctly reset slave status when aborting a registration.
    
    Previously, the slave was not erased from the \`registering\`
    and \`reregistering\` sets in the master for some code paths
    that would result in a failed (re-)registration attempt.
    
    This could lead to a state where the reason of the unsuccessful
    (re-)registration attempt is fixed on the agent, but the master
    ignores subsequent attempts because it assumes the previous
    operation is still in progress.
    
    Review: https://reviews.apache.org/r/64506/

> Agent can become stuck in (re-)registering state during upgrades
> ----------------------------------------------------------------
>
>                 Key: MESOS-8341
>                 URL: https://issues.apache.org/jira/browse/MESOS-8341
>             Project: Mesos
>          Issue Type: Bug
>            Reporter: Benno Evers
>            Assignee: Benno Evers
>             Fix For: 1.5.1
>
>
> Currently, an agent will not be erased from the set of currently (re-)registering agents if
>  - it tries to (re-)register with a malformed version string
>  - it tries to (re-)register with a version smaller than the minimum supported version
>  - it tries to (re-)register with a domain when the master has no domain configured
>  - the operator marks the slave as gone while the (re-)registration is ongoing
> Afterwards, all further (re-)registration attempts with the same agent id will be discarded, because the master still  thinks that the original (re-)registration is ongoing.
> Since most realistic way to encounter this issue would be during cluster upgrades, and it will fix itself with a master restart, it is unlikely to be reported externally.
> Review: https://reviews.apache.org/r/64506



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)