You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mesos.apache.org by "Joseph Wu (JIRA)" <ji...@apache.org> on 2016/09/07 23:40:20 UTC
[jira] [Comment Edited] (MESOS-6136) Duplicate framework id
handling
[ https://issues.apache.org/jira/browse/MESOS-6136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15472146#comment-15472146 ]
Joseph Wu edited comment on MESOS-6136 at 9/7/16 11:40 PM:
-----------------------------------------------------------
Sounds like you're asking for:
a) A way to orphan tasks on purpose (e.g. [MESOS-4659]); or
b) The {{failover_timeout}} that the framework is supposed to set: https://github.com/apache/mesos/blob/3e52a107c4073778de9c14bf5fcdeb6e342821aa/include/mesos/mesos.proto#L229-L237
was (Author: kaysoky):
Sounds like you're asking for:
a) A way to orphan tasks on purpose; or
b) The {{failover_timeout}} that the framework is supposed to set: https://github.com/apache/mesos/blob/3e52a107c4073778de9c14bf5fcdeb6e342821aa/include/mesos/mesos.proto#L229-L237
> Duplicate framework id handling
> -------------------------------
>
> Key: MESOS-6136
> URL: https://issues.apache.org/jira/browse/MESOS-6136
> Project: Mesos
> Issue Type: Improvement
> Components: general
> Affects Versions: 0.28.1
> Environment: DCOS 1.7 Cloud Formation scripts
> Reporter: Christopher Hunt
> Priority: Critical
> Labels: framework, lifecyclemanagement, task
>
> We have observed a situation where Mesos will kill tasks belonging to a framework where that framework times out with the Mesos master for some reason, perhaps even because of a network partition.
> While we can provide a long timeout so that Mesos will not kill a framework's tasks for practical purposes, I'm wondering if there's an improvement where a framework shouldn't be permitted to re-register for a given id (as now), but Mesos doesn't also kill tasks? What I'm thinking is that Mesos could be "told" by an operator that this condition should be cleared.
> IMHO frameworks should be the only entity requesting that tasks be killed unless manually overridden by an operator.
> I'm flagging this as a critical improvement because a) the focus should be on keeping tasks running in a system, and it isn't; and b) Mesos is working as designed.
> In summary I feel that Mesos is taking on a responsibility in killing tasks where it shouldn't be.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)