You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mesos.apache.org by "Yong Qiao Wang (JIRA)" <ji...@apache.org> on 2015/08/31 07:43:45 UTC

[jira] [Commented] (MESOS-3324) Resource leak issue in Mesos

    [ https://issues.apache.org/jira/browse/MESOS-3324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14722951#comment-14722951 ] 

Yong Qiao Wang commented on MESOS-3324:
---------------------------------------

My proposal to address this resource leak issue:
1. Add a timeout (for example, --framework_reregister_timeout) for framework reregister;
2. Add a new libprocess object to manage those orphaned tasks or executors, it will 
    - Clean up the orphaned tasks or executors after --framework_reregister_timeout when Mesos master restart;
    - Run to clean up  the orphaned tasks or executors (those orphaned object have lasted for a framework_reregister_timeout) when Mesos master running;

> Resource leak issue in Mesos
> ----------------------------
>
>                 Key: MESOS-3324
>                 URL: https://issues.apache.org/jira/browse/MESOS-3324
>             Project: Mesos
>          Issue Type: Bug
>            Reporter: Yong Qiao Wang
>            Assignee: Yong Qiao Wang
>            Priority: Critical
>
> In Mesos master recovery case, if one framework is exit during Mesos master downtime and this framework has already launched some long running tasks before Mesos master down. Then after Mesos master recovery, those long running tasks will always running as the orphaned tasks in Mesos cluster, no any other components can kill those tasks later. This should be a resource leak issue in Mesos, I propose to add a timeout to kill those orphaned tasks or executors in Mesos master.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)