You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mesos.apache.org by "Vinod Kone (JIRA)" <ji...@apache.org> on 2016/02/11 23:24:18 UTC

[jira] [Commented] (MESOS-4659) Consider how to handle orphaned tasks after master failover

    [ https://issues.apache.org/jira/browse/MESOS-4659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15143612#comment-15143612 ] 

Vinod Kone commented on MESOS-4659:
-----------------------------------

This is a consequence of Master not knowing about FrameworkInfo (and hence failover timeout) after a failover until a framework re-registers. We need framework persistence for this to work correctly. https://issues.apache.org/jira/browse/MESOS-1719

> Consider how to handle orphaned tasks after master failover
> -----------------------------------------------------------
>
>                 Key: MESOS-4659
>                 URL: https://issues.apache.org/jira/browse/MESOS-4659
>             Project: Mesos
>          Issue Type: Bug
>          Components: master
>            Reporter: Neil Conway
>              Labels: failover, mesosphere
>
> If a framework becomes disconnected from the master, its tasks are killed after waiting for {{failover_timeout}}.
> However, if a master failover occurs but a framework never reconnects to the new master, we never kill any of the tasks associated with that framework. These tasks remain orphaned and presumably would need to be manually removed by the operator.
> We should consider whether to kill such orphaned tasks automatically, likely after waiting for some (framework-configurable?) timeout.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)