You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mesos.apache.org by "Qian Zhang (Jira)" <ji...@apache.org> on 2020/05/18 02:50:00 UTC

[jira] [Comment Edited] (MESOS-10126) Docker volume isolator needs to clean up the `info` struct regardless the result of unmount operation

    [ https://issues.apache.org/jira/browse/MESOS-10126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17108245#comment-17108245 ] 

Qian Zhang edited comment on MESOS-10126 at 5/18/20, 2:49 AM:
--------------------------------------------------------------

RR:

[https://reviews.apache.org/r/72516/]

[https://reviews.apache.org/r/72523/]


was (Author: qianzhang):
RR:

[https://reviews.apache.org/r/72516/]

> Docker volume isolator needs to clean up the `info` struct regardless the result of unmount operation
> -----------------------------------------------------------------------------------------------------
>
>                 Key: MESOS-10126
>                 URL: https://issues.apache.org/jira/browse/MESOS-10126
>             Project: Mesos
>          Issue Type: Bug
>          Components: containerization
>            Reporter: Qian Zhang
>            Assignee: Qian Zhang
>            Priority: Critical
>
> Currently when [DockerVolumeIsolatorProcess::cleanup()|https://github.com/apache/mesos/blob/1.9.0/src/slave/containerizer/mesos/isolators/docker/volume/isolator.cpp#L610] is called, we will unmount the volume first, but if the unmount operation fails we will not remove the container's checkpoint directory and NOT erase the container's `info` struct from `infos`. This is problematic, because the remaining `info` in the `infos` will cause the reference count of the volume is larger than 0, but actually the volume is not being used by any containers. And next time when another container using this volume is destroyed, we will NOT unmount the volume since its reference count will be larger than 1 (see [here|https://github.com/apache/mesos/blob/1.9.0/src/slave/containerizer/mesos/isolators/docker/volume/isolator.cpp#L631:L651] for details) which should be 2, so we will never have chance to unmount this volume.
> We have this issue since Mesos 1.0.0 release when Docker volume isolator was introduced.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)