You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mesos.apache.org by "Qian Zhang (JIRA)" <ji...@apache.org> on 2018/01/31 14:26:00 UTC
[jira] [Commented] (MESOS-8488) Docker bug can cause unkillable tasks

    [ https://issues.apache.org/jira/browse/MESOS-8488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16346900#comment-16346900 ] 

Qian Zhang commented on MESOS-8488:
-----------------------------------

I think both of the two solutions mentioned in the description have problem.
 # For the timeout solution, I think it can handle the issue that framework issues kill task request, but `docker stop` executed by the Docker executor can not stop the Docker container. However, there is a case that it can not handle: the Docker container exits itself, due to that Docker 1.13 issue, Docker executor will not be aware of it (i.e., `docker run` will not return), so user will see the task is still running until they issue a kill task request.
 # For the wait pid solution, I do not think we can wait on the container's pid because the container process is not the direct child of the Docker executor.

> Docker bug can cause unkillable tasks
> -------------------------------------
>
>                 Key: MESOS-8488
>                 URL: https://issues.apache.org/jira/browse/MESOS-8488
>             Project: Mesos
>          Issue Type: Improvement
>          Components: containerization
>    Affects Versions: 1.5.0
>            Reporter: Greg Mann
>            Priority: Major
>              Labels: mesosphere
>
> Due to an [issue on the Moby project|https://github.com/moby/moby/issues/33820], it's possible for Docker versions 1.13 and later to fail to catch a container exit, so that the {{docker run}} command which was used to launch the container will never return. This can lead to the Docker executor becoming stuck in a state where it believes the container is still running and cannot be killed.
> We should update the Docker executor to ensure that containers stuck in such a state cannot cause unkillable Docker executors/tasks.
> One way to do this would be a timeout, after which the Docker executor will commit suicide if a kill task attempt has not succeeded. However, if we do this we should also ensure that in the case that the container was actually still running, either the Docker daemon or the DockerContainerizer would clean up the container when it does exit.
> Another option might be for the Docker executor to directly {{wait()}} on the container's Linux PID, in order to notice when the container exits.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)