You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mesos.apache.org by "Zhitao Li (JIRA)" <ji...@apache.org> on 2018/08/10 17:48:00 UTC
[jira] [Created] (MESOS-9148) Make cgroups destroy timeout
configurable for Mesos containerizer
Zhitao Li created MESOS-9148:
--------------------------------
Summary: Make cgroups destroy timeout configurable for Mesos containerizer
Key: MESOS-9148
URL: https://issues.apache.org/jira/browse/MESOS-9148
Project: Mesos
Issue Type: Task
Reporter: Zhitao Li
Assignee: Zhitao Li
Previously all containers from Mesos containerizer uses same 1 minute timeout for destroying cgroup. However, we have observed that for certain containers (possibly with deep system calls), the cgroup hierarchy was not destroyed within that timeout. The is quite problematic because containerizer short-circuits the destroy routine and skips _isolator::cleanup_. We have observed that GPU resources got leaked indefinitely due to such a bug (see MESOS-8038).
The proposed workaround here is to add an optional agent flag to allow operator to override this timeout.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)