You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mesos.apache.org by "James Peach (JIRA)" <ji...@apache.org> on 2018/10/15 21:17:00 UTC

[jira] [Created] (MESOS-9319) Create all container devices at isolation time

James Peach created MESOS-9319:
----------------------------------

             Summary: Create all container devices at isolation time
                 Key: MESOS-9319
                 URL: https://issues.apache.org/jira/browse/MESOS-9319
             Project: Mesos
          Issue Type: Bug
          Components: containerization
         Environment: When using a custom user namespace isolator, the task fails at launch because opening devices fails with a {{EPERM}} error. This problem is described in [this system issue|https://github.com/systemd/systemd/pull/9483] and this [lxd issue|https://github.com/lxc/lxd/issues/4950].

The problem arises in the Mesos containerizer due to the order of operations:

# Clone the containerizer with CLONE_NEWNS
# Mount a tmpfs for the devices
# mknod for the various device nodes

Referring back to the lxc issue, because we do (1) before (2), the tmpfs on /dev is marked SB_I_NODEV. Due to the new 4.18 behavior, the mkdir in (3) now succeeds (see commit [55956b59df33|https://github.com/torvalds/linux/commit/55956b59df336f6738da916dbb520b6e37df9fbd]). Previously it would fail and we would fall back to bind mounting the device. However, even though we created the device, we can't actually open it due to the SB_I_NODEV flag on the tmpfs mount. It appears that the purpose of allowing mknod is to that containers can create overlayfs whiteouts.

One approach to deal with this in the Mesos containerizer is to complete the device node cleanup that was begun in with the linux/devices isolator. This approach involves moving all the responsibility for creating devices back to the isolators. Then, at containerization time, we simply bind-mount the whole of /dev from the per-container staging area. Since the isolators create the devices in the host namespace and on the Mesos work directory, none of the conditions that trigger the failure would be invoked.

            Reporter: James Peach






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)