You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mesos.apache.org by "Zhitao Li (JIRA)" <ji...@apache.org> on 2018/08/10 17:48:00 UTC

[jira] [Created] (MESOS-9148) Make cgroups destroy timeout configurable for Mesos containerizer

Zhitao Li created MESOS-9148:
--------------------------------

             Summary: Make cgroups destroy timeout configurable for Mesos containerizer
                 Key: MESOS-9148
                 URL: https://issues.apache.org/jira/browse/MESOS-9148
             Project: Mesos
          Issue Type: Task
            Reporter: Zhitao Li
            Assignee: Zhitao Li


Previously all containers from Mesos containerizer uses same 1 minute timeout for destroying cgroup. However, we have observed that for certain containers (possibly with deep system calls), the cgroup hierarchy was not destroyed within that timeout. The is quite problematic because containerizer short-circuits the destroy routine and skips _isolator::cleanup_. We have observed that GPU resources got leaked indefinitely due to such a bug (see MESOS-8038).

The proposed workaround here is to add an optional agent flag to allow operator to override this timeout.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)