You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mesos.apache.org by "Qian Zhang (JIRA)" <ji...@apache.org> on 2019/05/23 03:11:00 UTC

[jira] [Comment Edited] (MESOS-9536) Nested container launched with non-root user may not be able to write to its sandbox via the environment variable `MESOS_SANDBOX`

    [ https://issues.apache.org/jira/browse/MESOS-9536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16846398#comment-16846398 ] 

Qian Zhang edited comment on MESOS-9536 at 5/23/19 3:10 AM:
------------------------------------------------------------

The root cause of the secret issue is, in [this patch|https://reviews.apache.org/r/70514/], for the nested container which has no its own rootfs we bind mount it sandbox directory to the directory `/mnt/mesos/sandbox` (i.e., the agent flag `–sandbox_directory`) in the container's mount namespace, these two directories can see the same file operations (e.g., create/delete files) but NOT the same mount/unmount operations. The reason is in `launch.cpp` we make the `/` in the container mount namespace as a slave mount of the `/` in the host mount namespace, and the above bind mount is also a slave mount of the `/` in the host mount namespace, so the mount operations we do under host's `/` can be propagated to these two slave mounts, but the mount propagation will not happen between these two slave mounts. So the ramfs that the `volume/secret` isolator mounts in container's sandbox directory will not be propagated to `/mnt/mesos/sandbox` and the subsequent bind mount from the ramfs to the target secret file in the container's sandbox directory will not be propagated to `/mnt/mesos/sandbox` too, that's why we see a correct secret file in container's sandbox directory but an empty secret file in `/mnt/mesos/sandbox`.

The possible fix would be, before bind mount container's sandbox to `/mnt/mesos/sandbox`, we should make container's sandbox a shared mount first, in this way the container's sandbox and `/mnt/mesos/sandbox` will be in the same peer mount group so the mount propagation will happen between them. And we may also need to allow shared mount in `launch.cpp`, here is a draft fix based on [this patch|https://reviews.apache.org/r/70514/]:
{code:java}
diff --git a/src/slave/containerizer/mesos/isolators/filesystem/linux.cpp b/src/slave/containerizer/mesos/isolators/filesystem/linux.cpp
index 7b50258ef..70dc82b84 100644
--- a/src/slave/containerizer/mesos/isolators/filesystem/linux.cpp
+++ b/src/slave/containerizer/mesos/isolators/filesystem/linux.cpp
@@ -765,6 +765,12 @@ Future<Option<ContainerLaunchInfo>> LinuxFilesystemIsolatorProcess::prepare(
// sandbox, if we still set `MESOS_SANDBOX` to `containerConfig.directory()`
// for nested container, it will not have permission to access its sandbox
// via `MESOS_SANDBOX` if its user is different from its parent's user.
+ *launchInfo.add_mounts() = createContainerMount(
+ containerConfig.directory(), containerConfig.directory(), MS_BIND);
+
+ *launchInfo.add_mounts() = createContainerMount(
+ containerConfig.directory(), containerConfig.directory(), MS_SHARED);
+
*launchInfo.add_mounts() = createContainerMount(
containerConfig.directory(), flags.sandbox_directory, MS_BIND | MS_REC);
}
diff --git a/src/slave/containerizer/mesos/launch.cpp b/src/slave/containerizer/mesos/launch.cpp
index 88b97a572..e3e851b14 100644
--- a/src/slave/containerizer/mesos/launch.cpp
+++ b/src/slave/containerizer/mesos/launch.cpp
@@ -299,7 +299,7 @@ static Try<Nothing> prepareMounts(const ContainerLaunchInfo& launchInfo)
launchInfo.mounts().begin(),
launchInfo.mounts().end(),
[](const ContainerMountInfo& mount) {
- return (mount.flags() & MS_SHARED) != 0;
+ return !mount.has_source() && ((mount.flags() & MS_SHARED) != 0);
}) != launchInfo.mounts().end();

if (!hasSharedMount) {
@@ -348,7 +348,7 @@ static Try<Nothing> prepareMounts(const ContainerLaunchInfo& launchInfo)

foreach (const ContainerMountInfo& mount, launchInfo.mounts()) {
// Skip those mounts that are used for setting up propagation.
- if ((mount.flags() & MS_SHARED) != 0) {
+ if (!mount.has_source() && ((mount.flags() & MS_SHARED) != 0)) {
continue;
}
{code}
And there may be another issue, the nested container may not have permissions to enter `/mnt/mesos/sandbox` if it is launched as a non-root user. The root cause is in [this patch|https://reviews.apache.org/r/70514/] we create the directory `/mnt/mesos/sandbox` as root user (since agent is running as root), we may need to set the permissions of `/mnt` and `/mnt/mesos` to `drwxr-xr-x` so that the non-root user can enter it.


was (Author: qianzhang):
The root cause of the secret issue is, in [this patch|https://reviews.apache.org/r/70514/], for the nested container which has no its own rootfs we bind mount it sandbox directory to the directory `/mnt/mesos/sandbox` (i.e., the agent flag `–sandbox_directory`) in the container's mount namespace, these two directories can see the same file operations (e.g., create/delete files) but NOT the same mount/unmount operations. The reason in `launch.cpp` we make the `/` in the container mount namespace as a slave mount of the `/` in the host mount namespace, and the above bind mount is also a slave mount of the `/` in the host mount namespace, so the mount operations we do under host's `/` can be propagated to these two slave mounts, but the mount propagation will not happen between these two slave mounts. So the ramfs that the `volume/secret` isolator mounts in container's sandbox directory will not be propagated to `/mnt/mesos/sandbox` and the subsequent bind mount from the ramfs to the target secret file in the container's sandbox directory will not be propagated to `/mnt/mesos/sandbox` too, that's why we see a correct secret file in container's sandbox directory but an empty secret file in `/mnt/mesos/sandbox`.

The possible fix would be, before bind mount container's sandbox to `/mnt/mesos/sandbox`, we should make container's sandbox a shared mount first, in this way the container's sandbox and `/mnt/mesos/sandbox` will be in the same peer mount group so the mount propagation will happen between them. And we may also need to allow shared mount in `launch.cpp`, here is a draft fix based on [this patch|https://reviews.apache.org/r/70514/]:
{code:java}
diff --git a/src/slave/containerizer/mesos/isolators/filesystem/linux.cpp b/src/slave/containerizer/mesos/isolators/filesystem/linux.cpp
index 7b50258ef..70dc82b84 100644
--- a/src/slave/containerizer/mesos/isolators/filesystem/linux.cpp
+++ b/src/slave/containerizer/mesos/isolators/filesystem/linux.cpp
@@ -765,6 +765,12 @@ Future<Option<ContainerLaunchInfo>> LinuxFilesystemIsolatorProcess::prepare(
// sandbox, if we still set `MESOS_SANDBOX` to `containerConfig.directory()`
// for nested container, it will not have permission to access its sandbox
// via `MESOS_SANDBOX` if its user is different from its parent's user.
+ *launchInfo.add_mounts() = createContainerMount(
+ containerConfig.directory(), containerConfig.directory(), MS_BIND);
+
+ *launchInfo.add_mounts() = createContainerMount(
+ containerConfig.directory(), containerConfig.directory(), MS_SHARED);
+
*launchInfo.add_mounts() = createContainerMount(
containerConfig.directory(), flags.sandbox_directory, MS_BIND | MS_REC);
}
diff --git a/src/slave/containerizer/mesos/launch.cpp b/src/slave/containerizer/mesos/launch.cpp
index 88b97a572..e3e851b14 100644
--- a/src/slave/containerizer/mesos/launch.cpp
+++ b/src/slave/containerizer/mesos/launch.cpp
@@ -299,7 +299,7 @@ static Try<Nothing> prepareMounts(const ContainerLaunchInfo& launchInfo)
launchInfo.mounts().begin(),
launchInfo.mounts().end(),
[](const ContainerMountInfo& mount) {
- return (mount.flags() & MS_SHARED) != 0;
+ return !mount.has_source() && ((mount.flags() & MS_SHARED) != 0);
}) != launchInfo.mounts().end();

if (!hasSharedMount) {
@@ -348,7 +348,7 @@ static Try<Nothing> prepareMounts(const ContainerLaunchInfo& launchInfo)

foreach (const ContainerMountInfo& mount, launchInfo.mounts()) {
// Skip those mounts that are used for setting up propagation.
- if ((mount.flags() & MS_SHARED) != 0) {
+ if (!mount.has_source() && ((mount.flags() & MS_SHARED) != 0)) {
continue;
}
{code}
And there may be another issue, the nested container may not have permissions to enter `/mnt/mesos/sandbox` if it is launched as a non-root user. The root cause is in [this patch|https://reviews.apache.org/r/70514/] we create the directory `/mnt/mesos/sandbox` as root user (since agent is running as root), we may need to set the permissions of `/mnt` and `/mnt/mesos` to `drwxr-xr-x` so that the non-root user can enter it.

> Nested container launched with non-root user may not be able to write to its sandbox via the environment variable `MESOS_SANDBOX`
> ---------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: MESOS-9536
>                 URL: https://issues.apache.org/jira/browse/MESOS-9536
>             Project: Mesos
>          Issue Type: Bug
>          Components: containerization
>    Affects Versions: 1.6.0, 1.6.1, 1.7.0, 1.8.0
>            Reporter: Qian Zhang
>            Assignee: Qian Zhang
>            Priority: Critical
>
> Launch a nested container to write to its sandbox via the env var `MESOS_SANDBOX`. The nested container is launched with a non-root user (e.g., `nobody`) and its parent container (i.e., the default executor) is launched with root since `mesos-execute` is executed with `sudo` in the example below.
> {code:java}
> $ sudo src/mesos-execute --master=<master-IP>:5050 --task_group=file:///tmp/task_group.json
> $ cat /tmp/task_group.json
> {
>   "tasks":[
>     {
>       "name" : "test",
>       "task_id" : {"value" : "test"},
>       "agent_id": {"value" : ""},
>       "resources": [
>         {"name": "cpus", "type": "SCALAR", "scalar": {"value": 0.1}},
>         {"name": "mem", "type": "SCALAR", "scalar": {"value": 32}}
>       ],
>       "command": {
>         "user": "nobody",
>         "value": "echo data > $MESOS_SANDBOX/file"
>       }
>     }
>   ]
> }
> {code}
> The nested container will fail.
> {code:java}
> I0125 16:04:03.610659 10064 scheduler.cpp:189] Version: 1.8.0
> I0125 16:04:03.641856 10066 scheduler.cpp:355] Using default 'basic' HTTP authenticatee
> I0125 16:04:03.643841 10063 scheduler.cpp:538] New master detected at master@192.168.56.5:5050
> Subscribed with ID 1ae64562-dbf9-4b24-af88-1cbcdc2ae71d-0002
> Submitted task group with tasks [ test ] to agent '12866186-dc2b-48a9-88ad-f9d951cf8c7f-S0'
> Received status update TASK_STARTING for task 'test'
>   source: SOURCE_EXECUTOR
> Received status update TASK_RUNNING for task 'test'
>   source: SOURCE_EXECUTOR
> Received status update TASK_FAILED for task 'test'
>   message: 'Command exited with status 2'
>   source: SOURCE_EXECUTOR
> {code}
> In the stderr of the nested container, we can see it has no permission to do the write.
> {code:java}
> $ sudo cat /opt/mesos/slaves/12866186-dc2b-48a9-88ad-f9d951cf8c7f-S0/frameworks/1ae64562-dbf9-4b24-af88-1cbcdc2ae71d-0002/executors/default-executor/runs/c7173fd8-9c01-49f5-a092-bdad78609260/containers/bf8f6ac8-2f8a-4300-9fe6-a830f602f654/stderr 
> Marked '/' as rslave
> sh: 1: cannot create /opt/mesos/slaves/12866186-dc2b-48a9-88ad-f9d951cf8c7f-S0/frameworks/1ae64562-dbf9-4b24-af88-1cbcdc2ae71d-0002/executors/default-executor/runs/c7173fd8-9c01-49f5-a092-bdad78609260/containers/bf8f6ac8-2f8a-4300-9fe6-a830f602f654/file: Permission denied
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)