You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@mesos.apache.org by qi...@apache.org on 2018/02/14 03:15:28 UTC
mesos git commit: Reaped the container process directly in Docker
executor.
Repository: mesos
Updated Branches:
refs/heads/master 709d7a138 -> a7714536f
Reaped the container process directly in Docker executor.
Due to a Docker issue (https://github.com/moby/moby/issues/33820),
Docker daemon can fail to catch a container exit, i.e., the container
process has already exited but the command `docker ps` shows the
container still running, this will lead to the "docker run" command
that we execute in Docker executor never returning, and it will also
cause the `docker stop` command takes no effect, i.e., it will return
without error but `docker ps` shows the container still running, so
the task will stuck in `TASK_KILLING` state.
To workaround this Docker issue, in this patch we made Docker executor
reaps the container process directly so Docker executor will be notified
once the container process exits.
Review: https://reviews.apache.org/r/65518
Project: http://git-wip-us.apache.org/repos/asf/mesos/repo
Commit: http://git-wip-us.apache.org/repos/asf/mesos/commit/a7714536
Tree: http://git-wip-us.apache.org/repos/asf/mesos/tree/a7714536
Diff: http://git-wip-us.apache.org/repos/asf/mesos/diff/a7714536
Branch: refs/heads/master
Commit: a7714536fad1140fd0c07c47e32b40e9ed00a3c3
Parents: 709d7a1
Author: Qian Zhang <zh...@gmail.com>
Authored: Mon Feb 5 20:42:07 2018 +0800
Committer: Qian Zhang <zh...@gmail.com>
Committed: Wed Feb 14 11:12:01 2018 +0800
----------------------------------------------------------------------
src/docker/executor.cpp | 60 ++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 60 insertions(+)
----------------------------------------------------------------------
http://git-wip-us.apache.org/repos/asf/mesos/blob/a7714536/src/docker/executor.cpp
----------------------------------------------------------------------
diff --git a/src/docker/executor.cpp b/src/docker/executor.cpp
index e4c53d5..80e2d81 100644
--- a/src/docker/executor.cpp
+++ b/src/docker/executor.cpp
@@ -270,6 +270,47 @@ public:
driver->sendStatusUpdate(status);
}
+ // This is a workaround for the Docker issue below:
+ // https://github.com/moby/moby/issues/33820
+ // Due to this issue, Docker daemon can fail to catch a container exit,
+ // which will lead to the `docker run` command that we execute in this
+ // executor never returning although the container has already exited.
+ // To workaround this issue, here we reap the container process directly
+ // so we will be notified when the container exits.
+ if (container.pid.isSome()) {
+ process::reap(container.pid.get())
+ .then(defer(self(), [=](const Option<int>& status) {
+ // We cannot get the actual exit status of the container
+ // process since it is not a child process of this executor,
+ // so here `status` must be `None`.
+ CHECK_NONE(status);
+
+ // There will be a race between the method `reaped` and this
+ // lambda; ideally when the Docker issue mentioned above does
+ // not occur, `reaped` will be invoked (i.e., the `docker run`
+ // command returns) to get the actual exit status of the
+ // container, so here we wait a few seconds for `reaped` to be
+ // invoked. If `reaped` is not invoked within the timeout, that
+ // means we hit that Docker issue.
+ delay(
+ Seconds(3),
+ self(),
+ &Self::reapedContainer,
+ container.pid.get());
+
+ return Nothing();
+ }));
+ } else {
+ // This is the case that the container process has already exited,
+ // Similar to the above case, let's wait a few seconds for `reaped`
+ // to be invoked.
+ delay(
+ Seconds(3),
+ self(),
+ &Self::reapedContainer,
+ None());
+ }
+
return Nothing();
}));
@@ -485,8 +526,27 @@ private:
}
}
+ void reapedContainer(Option<pid_t> pid)
+ {
+ // Do nothing if the method `reaped` has already been invoked.
+ if (terminated) {
+ return;
+ }
+
+ // This means the Docker issue mentioned in `launchTask` occurs.
+ LOG(WARNING) << "The container process"
+ << (pid.isSome() ? " (pid: " + stringify(pid.get()) + ")" : "")
+ << " has exited, but Docker daemon failed to catch it.";
+
+ reaped(None());
+ }
+
void reaped(const Future<Option<int>>& run)
{
+ if (terminated) {
+ return;
+ }
+
terminated = true;
// Stop health checking the task.