You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@mesos.apache.org by qi...@apache.org on 2018/02/14 03:15:28 UTC

mesos git commit: Reaped the container process directly in Docker executor.

Repository: mesos
Updated Branches:
  refs/heads/master 709d7a138 -> a7714536f


Reaped the container process directly in Docker executor.

Due to a Docker issue (https://github.com/moby/moby/issues/33820),
Docker daemon can fail to catch a container exit, i.e., the container
process has already exited but the command `docker ps` shows the
container still running, this will lead to the "docker run" command
that we execute in Docker executor never returning, and it will also
cause the `docker stop` command takes no effect, i.e., it will return
without error but `docker ps` shows the container still running, so
the task will stuck in `TASK_KILLING` state.

To workaround this Docker issue, in this patch we made Docker executor
reaps the container process directly so Docker executor will be notified
once the container process exits.

Review: https://reviews.apache.org/r/65518


Project: http://git-wip-us.apache.org/repos/asf/mesos/repo
Commit: http://git-wip-us.apache.org/repos/asf/mesos/commit/a7714536
Tree: http://git-wip-us.apache.org/repos/asf/mesos/tree/a7714536
Diff: http://git-wip-us.apache.org/repos/asf/mesos/diff/a7714536

Branch: refs/heads/master
Commit: a7714536fad1140fd0c07c47e32b40e9ed00a3c3
Parents: 709d7a1
Author: Qian Zhang <zh...@gmail.com>
Authored: Mon Feb 5 20:42:07 2018 +0800
Committer: Qian Zhang <zh...@gmail.com>
Committed: Wed Feb 14 11:12:01 2018 +0800

----------------------------------------------------------------------
 src/docker/executor.cpp | 60 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 60 insertions(+)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/mesos/blob/a7714536/src/docker/executor.cpp
----------------------------------------------------------------------
diff --git a/src/docker/executor.cpp b/src/docker/executor.cpp
index e4c53d5..80e2d81 100644
--- a/src/docker/executor.cpp
+++ b/src/docker/executor.cpp
@@ -270,6 +270,47 @@ public:
           driver->sendStatusUpdate(status);
         }
 
+        // This is a workaround for the Docker issue below:
+        // https://github.com/moby/moby/issues/33820
+        // Due to this issue, Docker daemon can fail to catch a container exit,
+        // which will lead to the `docker run` command that we execute in this
+        // executor never returning although the container has already exited.
+        // To workaround this issue, here we reap the container process directly
+        // so we will be notified when the container exits.
+        if (container.pid.isSome()) {
+          process::reap(container.pid.get())
+            .then(defer(self(), [=](const Option<int>& status) {
+              // We cannot get the actual exit status of the container
+              // process since it is not a child process of this executor,
+              // so here `status` must be `None`.
+              CHECK_NONE(status);
+
+              // There will be a race between the method `reaped` and this
+              // lambda; ideally when the Docker issue mentioned above does
+              // not occur, `reaped` will be invoked (i.e., the `docker run`
+              // command returns) to get the actual exit status of the
+              // container, so here we wait a few seconds for `reaped` to be
+              // invoked. If `reaped` is not invoked within the timeout, that
+              // means we hit that Docker issue.
+              delay(
+                  Seconds(3),
+                  self(),
+                  &Self::reapedContainer,
+                  container.pid.get());
+
+              return Nothing();
+            }));
+        } else {
+          // This is the case that the container process has already exited,
+          // Similar to the above case, let's wait a few seconds for `reaped`
+          // to be invoked.
+          delay(
+              Seconds(3),
+              self(),
+              &Self::reapedContainer,
+              None());
+        }
+
         return Nothing();
       }));
 
@@ -485,8 +526,27 @@ private:
     }
   }
 
+  void reapedContainer(Option<pid_t> pid)
+  {
+    // Do nothing if the method `reaped` has already been invoked.
+    if (terminated) {
+      return;
+    }
+
+    // This means the Docker issue mentioned in `launchTask` occurs.
+    LOG(WARNING) << "The container process"
+                 << (pid.isSome() ? " (pid: " + stringify(pid.get()) + ")" : "")
+                 << " has exited, but Docker daemon failed to catch it.";
+
+    reaped(None());
+  }
+
   void reaped(const Future<Option<int>>& run)
   {
+    if (terminated) {
+      return;
+    }
+
     terminated = true;
 
     // Stop health checking the task.