You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mesos.apache.org by "Michael Park (JIRA)" <ji...@apache.org> on 2017/05/09 23:28:04 UTC

[jira] [Updated] (MESOS-7487) A framework upgrading into PARTITION_AWARE capability will continue to receive TASK_LOST on old agents.

     [ https://issues.apache.org/jira/browse/MESOS-7487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael Park updated MESOS-7487:
--------------------------------
    Summary: A framework upgrading into PARTITION_AWARE capability will continue to receive TASK_LOST on old agents.  (was: A framework upgrading into PARTITION_AWARE capability will continue to receive {{TASK_LOST}} on old agents.)

> A framework upgrading into PARTITION_AWARE capability will continue to receive TASK_LOST on old agents.
> -------------------------------------------------------------------------------------------------------
>
>                 Key: MESOS-7487
>                 URL: https://issues.apache.org/jira/browse/MESOS-7487
>             Project: Mesos
>          Issue Type: Bug
>          Components: agent
>    Affects Versions: 1.1.0, 1.2.0
>            Reporter: Michael Park
>
> Before 1.3.0, the master did not send a {{FrameworkInfo}} in the {{UpdateFrameworkMessage}}. In general, this means that a pre-1.3.0 agent will not have the {{FrameworkInfo}} updated when a framework changes their {{FrameworkInfo}}. In specific, if a framework upgrades into having a {{PARTITION_AWARE}} capability, the 1.1.x and 1.2.x agents will not be aware of the update, and incorrectly treat report {{TASK_LOST}} in some cases.
> Note that the run task path is okay since the master sends the new {{FrameworkInfo}}. The instances that are incorrect have the following check:
> {code}
>       if (!protobuf::frameworkHasCapability(
>               framework->info,  // This is the one in agent memory!
>               FrameworkInfo::Capability::PARTITION_AWARE))
> {code}
> One solution is to backport the changes to {{UpdateFrameworkMessage}} to 1.1.x and 1.2.x, but only update the capabilities portion of the {{FrameworkInfo}}.
> If we update the entire {{FrameworkInfo}}, 1.1.x agent will run into an issue where it doesn't know how to deal with changes to {{FrameworkInfo.roles}}. Frameworks changing their roles is a 1.3.x feature. Note that 1.2.x agent can handle the role changes correctly because of {{Resource.allocation_info}} that was introduced in multi-role support in 1.2.x.
> Refer to MESOS-7460 for the potential issue with backporting to 1.1.x.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)