You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mesos.apache.org by "Neil Conway (JIRA)" <ji...@apache.org> on 2016/11/19 15:34:58 UTC

[jira] [Commented] (MESOS-6136) Duplicate framework id handling

    [ https://issues.apache.org/jira/browse/MESOS-6136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15679434#comment-15679434 ] 

Neil Conway commented on MESOS-6136:
------------------------------------

Why is reusing the same framework ID important?

Reusing framework IDs does not seem wise. Even after a framework has been torn down, reusing a framework ID is not necessarily safe. Consider the following:

* Framework with ID A runs a partition-aware task X on agent Y
* Agent Y is partitioned
* Framework A is torn down
* Another framework registers with ID A (right now this would be rejected)
* Agent Y re-registers.

Does task X belong to the "original" framework A or the new one?

Reusing framework IDs also means that the output of the HTTP endpoints will be hard to interpret -- e.g., the same framework ID will appear in both the {{frameworks}} and {{completed_frameworks}} keys of the {{state}} endpoint.

> Duplicate framework id handling
> -------------------------------
>
>                 Key: MESOS-6136
>                 URL: https://issues.apache.org/jira/browse/MESOS-6136
>             Project: Mesos
>          Issue Type: Improvement
>          Components: general
>    Affects Versions: 0.28.1
>         Environment: DCOS 1.7 Cloud Formation scripts
>            Reporter: Christopher Hunt
>            Priority: Critical
>              Labels: framework, lifecyclemanagement, task
>
> We have observed a situation where Mesos will kill tasks belonging to a framework where that framework times out with the Mesos master for some reason, perhaps even because of a network partition.
> While we can provide a long timeout so that Mesos will not kill a framework's tasks for practical purposes, I'm wondering if there's an improvement where a framework shouldn't be permitted to re-register for a given id (as now), but Mesos doesn't also kill tasks? What I'm thinking is that Mesos could be "told" by an operator that this condition should be cleared.
> IMHO frameworks should be the only entity requesting that tasks be killed unless manually overridden by an operator.
> I'm flagging this as a critical improvement because a) the focus should be on keeping tasks running in a system, and it isn't; and b) Mesos is working as designed. 
> In summary I feel that Mesos is taking on a responsibility in killing tasks where it shouldn't be.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)