You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mesos.apache.org by "Ilya Pronin (JIRA)" <ji...@apache.org> on 2016/11/09 15:58:58 UTC

[jira] [Commented] (MESOS-6563) Shared Filesystem Isolator does not clean up mounts

    [ https://issues.apache.org/jira/browse/MESOS-6563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15651295#comment-15651295 ] 

Ilya Pronin commented on MESOS-6563:
------------------------------------

Mounts list looks strange. Like there's something mounted TO the directory that was supposed to be mounted as {{/tmp}}. Investigating.

> Shared Filesystem Isolator does not clean up mounts
> ---------------------------------------------------
>
>                 Key: MESOS-6563
>                 URL: https://issues.apache.org/jira/browse/MESOS-6563
>             Project: Mesos
>          Issue Type: Bug
>          Components: isolation
>            Reporter: David Robinson
>            Assignee: Ilya Pronin
>
> While testing the agent's 'filesystem/shared' isolator we discovered that mounts are not unmounted, agents ended up with 1000s of mounts, one for each task that has run.
> To reproduce the problem start a mesos agent w/ --isolation="filesystem/shared" and --default_container_info="file:///tmp/the-container-info-below.json", then launch and kill several tasks. After the tasks are killed the mount points should be unmounted, but they are not.
> {noformat:title=container info}
> {
>     "type": "MESOS",
>     "volumes": [
>         {
>             "container_path": "/tmp",
>             "host_path": "tmp",
>             "mode": "RW"
>         }
>     ]
> }
> {noformat}
> Mounts are supposed to be [cleaned automatically by the kernel when the process exits|https://github.com/apache/mesos/blob/3845ab8af83a6eebfbf32e98f9000ab695cf2661/src/slave/containerizer/mesos/isolators/filesystem/shared.cpp#L70].
> {noformat}
> // We only need to implement the `prepare()` function in this
> // isolator. There is nothing to recover because we do not keep any
> // state and do not monitor filesystem usage or perform any action on
> // cleanup. Cleanup of mounts is done automatically done by the kernel
> // when the mount namespace is destroyed after the last process
> // terminates.
> Future<Option<ContainerLaunchInfo>> SharedFilesystemIsolatorProcess::prepare(
>     const ContainerID& containerId,
>     const ContainerConfig& containerConfig)
> {
> {noformat}
> We found during testing that an agent would have 1000s of dangling mounts, all of them attributed to the mesos agent:
> {noformat}
> root[7]server-001 ~ # tail /proc/mounts
> /dev/sda1 /var/lib/mesos/slaves/b600ee92-bb38-4447-984a-4047c3d2c176-S2/frameworks/201103282247-0000000019-0000/executors/thermos-drobinson-test-sleep2-0-dda59747-848a-4b3b-8424-d0032f8a38f7/runs/e31bea31-22d7-4758-bc8b-6837919d7ed7/tmp xfs rw,noatime,attr2,nobarrier,inode64,prjquota 0 0
> /dev/sda1 /var/lib/mesos/slaves/b600ee92-bb38-4447-984a-4047c3d2c176-S2/frameworks/201103282247-0000000019-0000/executors/thermos-drobinson-test-sleep2-0-3a001926-a442-45c4-9cbc-dad182954fed/runs/bd0a8e36-d147-4511-9cc5-afff9f1c0fbe/tmp xfs rw,noatime,attr2,nobarrier,inode64,prjquota 0 0
> /dev/sda1 /var/lib/mesos/slaves/b600ee92-bb38-4447-984a-4047c3d2c176-S2/frameworks/201103282247-0000000019-0000/executors/thermos-drobinson-test-sleep2-0-04204a72-53d8-44a8-bac5-613835ff85a7/runs/967739ea-5284-41ed-af1a-1cb5a77dd690/tmp xfs rw,noatime,attr2,nobarrier,inode64,prjquota 0 0
> /dev/sda1 /var/lib/mesos/slaves/b600ee92-bb38-4447-984a-4047c3d2c176-S2/frameworks/201103282247-0000000019-0000/executors/thermos-drobinson-test-sleep2-0-95d1ac39-323a-4c15-b1dc-645ed79c4128/runs/6ff6d2b3-2867-4ad4-b2bb-20e27a0fa925/tmp xfs rw,noatime,attr2,nobarrier,inode64,prjquota 0 0
> /dev/sda1 /var/lib/mesos/slaves/b600ee92-bb38-4447-984a-4047c3d2c176-S2/frameworks/201103282247-0000000019-0000/executors/thermos-drobinson-test-sleep2-0-91f6a946-f560-43a3-95c2-424c5dd71684/runs/a4821acc-58f8-4457-bdc9-bd83bdeb8231/tmp xfs rw,noatime,attr2,nobarrier,inode64,prjquota 0 0
> /dev/sda1 /var/lib/mesos/slaves/b600ee92-bb38-4447-984a-4047c3d2c176-S2/frameworks/201103282247-0000000019-0000/executors/thermos-drobinson-test-sleep2-0-dd3b34f1-10c6-43d3-8741-a3164a642e93/runs/0ef8cf17-6c18-48a4-9943-66c448de5d44/tmp xfs rw,noatime,attr2,nobarrier,inode64,prjquota 0 0
> /dev/sda1 /var/lib/mesos/slaves/b600ee92-bb38-4447-984a-4047c3d2c176-S2/frameworks/201103282247-0000000019-0000/executors/thermos-drobinson-test-sleep2-0-fb704ef8-1cf9-4d35-854d-7b6247cf4bc2/runs/e65ec976-057f-4939-9053-1ddcddfc98f8/tmp xfs rw,noatime,attr2,nobarrier,inode64,prjquota 0 0
> /dev/sda1 /var/lib/mesos/slaves/b600ee92-bb38-4447-984a-4047c3d2c176-S2/frameworks/201103282247-0000000019-0000/executors/thermos-drobinson-test-sleep2-0-cdf7b06d-2265-41fe-b1e9-84366dc88b62/runs/1bed4289-7442-4a91-bf45-a7de10ab79bb/tmp xfs rw,noatime,attr2,nobarrier,inode64,prjquota 0 0
> /dev/sda1 /var/lib/mesos/slaves/b600ee92-bb38-4447-984a-4047c3d2c176-S2/frameworks/201103282247-0000000019-0000/executors/thermos-drobinson-test-sleep2-0-58582496-e551-4d80-8ae5-9eacac5e8a36/runs/6b5a7f56-af89-4eab-bbfa-883ca43744ad/tmp xfs rw,noatime,attr2,nobarrier,inode64,prjquota 0 0
> /dev/sda1 /var/lib/mesos/slaves/b600ee92-bb38-4447-984a-4047c3d2c176-S2/frameworks/201103282247-0000000019-0000/executors/thermos-drobinson-test-sleep2-0-5d6bc25a-6ba7-48f9-9655-85da6ff0a383/runs/d5cc4b31-7876-4bca-b1fa-b177c5d88bfc/tmp xfs rw,noatime,attr2,nobarrier,inode64,prjquota 0 0
> root[7]server-001 ~ # grep -c 'drobinson-test-sleep2' /proc/mounts
> 4950
> root[7]server-001 ~ # pgrep -f /usr/local/bin/mesos-slave
> 27799
> root[7]server-001 ~ # wc -l /proc/27799/mounts
> 5079 /proc/27799/mounts
> root[7]server-001 ~ # grep -c 'drobinson-test-sleep2' /proc/27799/mounts
> 4950
> root[7]server-001 ~ # ps auxww | grep 'drobinson-test-sleep2' -c
> 5
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)