You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@mesos.apache.org by GitBox <gi...@apache.org> on 2021/05/24 13:39:40 UTC

[GitHub] [mesos] kamaradclimber commented on a change in pull request #388: Fixed a bug where the cgroup task killer leaves the cgroup frozen.

kamaradclimber commented on a change in pull request #388:
URL: https://github.com/apache/mesos/pull/388#discussion_r637951921



##########
File path: src/linux/cgroups.cpp
##########
@@ -1403,9 +1403,15 @@ class TasksKiller : public Process<TasksKiller>
 protected:
   void initialize() override
   {
-    // Stop when no one cares.
-    promise.future().onDiscard(lambda::bind(
-        static_cast<void (*)(const UPID&, bool)>(terminate), self(), true));
+    // We don't want to stop immediately upon discard, because
+    // it could leave the cgroup frozen which means that processes
+    // are stuck in uninterrutible sleep (D state), which is quite bad.
+    // So upon discard we still do our best and keep trying to
+    // kill the cgroup for up to FREEZE_RETRY_INTERVAL which should be
+    // a reasonable upper bound.
+    promise.future().onDiscard([this]() {
+      delay(FREEZE_RETRY_INTERVAL, self(), &Self::selfTerminate);

Review comment:
       Would there be a way to avoid this extra delay if we detect the cgroup is not frozen at that time?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org