You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mesos.apache.org by "Greg Mann (JIRA)" <ji...@apache.org> on 2018/02/28 19:19:00 UTC

[jira] [Commented] (MESOS-8605) Terminal task status update will not send if 'docker inspect' is hung

    [ https://issues.apache.org/jira/browse/MESOS-8605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16380881#comment-16380881 ] 

Greg Mann commented on MESOS-8605:
----------------------------------

{{DockerContainerizer::udpate()}} uses {{inspect}} in order to grab the container PID, which it needs in order to update cgroups: https://github.com/apache/mesos/blob/a15eb712afd51048756148a5af9cd60e86c8e90b/src/slave/containerizer/docker.cpp#L1722-L1740

We only need to grab this once, rather than doing it every time we run {{update()}}. It looks like the location of the containerizer's first invocation of {{update()}} for a particular container depends on whether we go down the {{launchExecutorProcess}} code path or the {{launchExecutorContainer}} code path: https://github.com/apache/mesos/blob/a15eb712afd51048756148a5af9cd60e86c8e90b/src/slave/containerizer/docker.cpp#L1288-L1366

If we do {{launchExecutorContainer}}, then we call {{update()}} as part of the launch path: https://github.com/apache/mesos/blob/a15eb712afd51048756148a5af9cd60e86c8e90b/src/slave/containerizer/docker.cpp#L1352-L1353
If we do {{launchExecutorProcess}}, then the first invocation of {{update()}} occurs when the executor registers: https://github.com/apache/mesos/blob/a15eb712afd51048756148a5af9cd60e86c8e90b/src/slave/slave.cpp#L4875-L4879

I think the most important issue here is making sure that a task can be killed successfully, even if the {{Docker::inspect()}} call in {{Containerizer::update()}} has not returned. We could do this by storing the Future associated with the initial {{inspect()}} call for a container, and then discarding that Future if it’s pending when the container is destroyed.

We could also optimize by updating the {{docker->inspect()}} call in {{update()}} to retry after some duration.

> Terminal task status update will not send if 'docker inspect' is hung
> ---------------------------------------------------------------------
>
>                 Key: MESOS-8605
>                 URL: https://issues.apache.org/jira/browse/MESOS-8605
>             Project: Mesos
>          Issue Type: Bug
>          Components: docker
>    Affects Versions: 1.5.0
>            Reporter: Greg Mann
>            Assignee: Andrei Budnik
>            Priority: Major
>              Labels: mesosphere
>
> When the agent processes a terminal status update for a task, it calls {{containerizer->update()}} on the container before it forwards the update: https://github.com/apache/mesos/blob/9635d4a2d12fc77935c3d5d166469258634c6b7e/src/slave/slave.cpp#L5509-L5514
> In the Docker containerizer, {{update()}} calls {{Docker::inspect()}}, which means that if the inspect call hangs, the terminal update will not be sent: https://github.com/apache/mesos/blob/9635d4a2d12fc77935c3d5d166469258634c6b7e/src/slave/containerizer/docker.cpp#L1714



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)