You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mesos.apache.org by "Michael Park (JIRA)" <ji...@apache.org> on 2017/05/09 23:28:04 UTC
[jira] [Updated] (MESOS-7487) A framework upgrading into
PARTITION_AWARE capability will continue to receive TASK_LOST on old
agents.
[ https://issues.apache.org/jira/browse/MESOS-7487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Michael Park updated MESOS-7487:
--------------------------------
Summary: A framework upgrading into PARTITION_AWARE capability will continue to receive TASK_LOST on old agents. (was: A framework upgrading into PARTITION_AWARE capability will continue to receive {{TASK_LOST}} on old agents.)
> A framework upgrading into PARTITION_AWARE capability will continue to receive TASK_LOST on old agents.
> -------------------------------------------------------------------------------------------------------
>
> Key: MESOS-7487
> URL: https://issues.apache.org/jira/browse/MESOS-7487
> Project: Mesos
> Issue Type: Bug
> Components: agent
> Affects Versions: 1.1.0, 1.2.0
> Reporter: Michael Park
>
> Before 1.3.0, the master did not send a {{FrameworkInfo}} in the {{UpdateFrameworkMessage}}. In general, this means that a pre-1.3.0 agent will not have the {{FrameworkInfo}} updated when a framework changes their {{FrameworkInfo}}. In specific, if a framework upgrades into having a {{PARTITION_AWARE}} capability, the 1.1.x and 1.2.x agents will not be aware of the update, and incorrectly treat report {{TASK_LOST}} in some cases.
> Note that the run task path is okay since the master sends the new {{FrameworkInfo}}. The instances that are incorrect have the following check:
> {code}
> if (!protobuf::frameworkHasCapability(
> framework->info, // This is the one in agent memory!
> FrameworkInfo::Capability::PARTITION_AWARE))
> {code}
> One solution is to backport the changes to {{UpdateFrameworkMessage}} to 1.1.x and 1.2.x, but only update the capabilities portion of the {{FrameworkInfo}}.
> If we update the entire {{FrameworkInfo}}, 1.1.x agent will run into an issue where it doesn't know how to deal with changes to {{FrameworkInfo.roles}}. Frameworks changing their roles is a 1.3.x feature. Note that 1.2.x agent can handle the role changes correctly because of {{Resource.allocation_info}} that was introduced in multi-role support in 1.2.x.
> Refer to MESOS-7460 for the potential issue with backporting to 1.1.x.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)