You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mesos.apache.org by "Greg Mann (JIRA)" <ji...@apache.org> on 2018/02/28 19:19:00 UTC
[jira] [Commented] (MESOS-8605) Terminal task status update will
not send if 'docker inspect' is hung
[ https://issues.apache.org/jira/browse/MESOS-8605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16380881#comment-16380881 ]
Greg Mann commented on MESOS-8605:
----------------------------------
{{DockerContainerizer::udpate()}} uses {{inspect}} in order to grab the container PID, which it needs in order to update cgroups: https://github.com/apache/mesos/blob/a15eb712afd51048756148a5af9cd60e86c8e90b/src/slave/containerizer/docker.cpp#L1722-L1740
We only need to grab this once, rather than doing it every time we run {{update()}}. It looks like the location of the containerizer's first invocation of {{update()}} for a particular container depends on whether we go down the {{launchExecutorProcess}} code path or the {{launchExecutorContainer}} code path: https://github.com/apache/mesos/blob/a15eb712afd51048756148a5af9cd60e86c8e90b/src/slave/containerizer/docker.cpp#L1288-L1366
If we do {{launchExecutorContainer}}, then we call {{update()}} as part of the launch path: https://github.com/apache/mesos/blob/a15eb712afd51048756148a5af9cd60e86c8e90b/src/slave/containerizer/docker.cpp#L1352-L1353
If we do {{launchExecutorProcess}}, then the first invocation of {{update()}} occurs when the executor registers: https://github.com/apache/mesos/blob/a15eb712afd51048756148a5af9cd60e86c8e90b/src/slave/slave.cpp#L4875-L4879
I think the most important issue here is making sure that a task can be killed successfully, even if the {{Docker::inspect()}} call in {{Containerizer::update()}} has not returned. We could do this by storing the Future associated with the initial {{inspect()}} call for a container, and then discarding that Future if it’s pending when the container is destroyed.
We could also optimize by updating the {{docker->inspect()}} call in {{update()}} to retry after some duration.
> Terminal task status update will not send if 'docker inspect' is hung
> ---------------------------------------------------------------------
>
> Key: MESOS-8605
> URL: https://issues.apache.org/jira/browse/MESOS-8605
> Project: Mesos
> Issue Type: Bug
> Components: docker
> Affects Versions: 1.5.0
> Reporter: Greg Mann
> Assignee: Andrei Budnik
> Priority: Major
> Labels: mesosphere
>
> When the agent processes a terminal status update for a task, it calls {{containerizer->update()}} on the container before it forwards the update: https://github.com/apache/mesos/blob/9635d4a2d12fc77935c3d5d166469258634c6b7e/src/slave/slave.cpp#L5509-L5514
> In the Docker containerizer, {{update()}} calls {{Docker::inspect()}}, which means that if the inspect call hangs, the terminal update will not be sent: https://github.com/apache/mesos/blob/9635d4a2d12fc77935c3d5d166469258634c6b7e/src/slave/containerizer/docker.cpp#L1714
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)