You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mesos.apache.org by "Vinod Kone (JIRA)" <ji...@apache.org> on 2018/01/02 19:39:00 UTC
[jira] [Assigned] (MESOS-8341) Agent can become stuck in
(re-)registering state during upgrades
[ https://issues.apache.org/jira/browse/MESOS-8341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Vinod Kone reassigned MESOS-8341:
---------------------------------
Resolution: Fixed
Assignee: Benno Evers
Fix Version/s: 1.5.1
commit 3eb57cae3674fc835c784cac9eaa63e1aab7ba1c
Author: Benno Evers <bevers@mesosphere.com\>
Date: Tue Jan 2 10:58:23 2018 -0800
Correctly reset slave status when aborting a registration.
Previously, the slave was not erased from the \`registering\`
and \`reregistering\` sets in the master for some code paths
that would result in a failed (re-)registration attempt.
This could lead to a state where the reason of the unsuccessful
(re-)registration attempt is fixed on the agent, but the master
ignores subsequent attempts because it assumes the previous
operation is still in progress.
Review: https://reviews.apache.org/r/64506/
> Agent can become stuck in (re-)registering state during upgrades
> ----------------------------------------------------------------
>
> Key: MESOS-8341
> URL: https://issues.apache.org/jira/browse/MESOS-8341
> Project: Mesos
> Issue Type: Bug
> Reporter: Benno Evers
> Assignee: Benno Evers
> Fix For: 1.5.1
>
>
> Currently, an agent will not be erased from the set of currently (re-)registering agents if
> - it tries to (re-)register with a malformed version string
> - it tries to (re-)register with a version smaller than the minimum supported version
> - it tries to (re-)register with a domain when the master has no domain configured
> - the operator marks the slave as gone while the (re-)registration is ongoing
> Afterwards, all further (re-)registration attempts with the same agent id will be discarded, because the master still thinks that the original (re-)registration is ongoing.
> Since most realistic way to encounter this issue would be during cluster upgrades, and it will fix itself with a master restart, it is unlikely to be reported externally.
> Review: https://reviews.apache.org/r/64506
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)