You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@mesos.apache.org by ji...@apache.org on 2018/12/10 03:49:33 UTC

[mesos] branch master updated (f24be0c -> f050bf0)

This is an automated email from the ASF dual-hosted git repository.

jieyu pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/mesos.git.


    from f24be0c  Fixed a regression in binding GPU container devices.
     new 35cdfa2  Used strings::format in os::shell.
     new f050bf0  Made sure containers runtime dir has device file access.

The 2 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 3rdparty/stout/include/stout/os/posix/shell.hpp    |   2 +-
 3rdparty/stout/include/stout/os/windows/shell.hpp  |   2 +-
 src/linux/fs.cpp                                   |   7 +-
 src/linux/fs.hpp                                   |   2 +-
 .../mesos/isolators/filesystem/linux.cpp           | 260 +++++++++++++++------
 .../mesos/isolators/volume/host_path.cpp           |  13 +-
 src/slave/containerizer/mesos/launch.cpp           |  11 +-
 7 files changed, 204 insertions(+), 93 deletions(-)


[mesos] 02/02: Made sure containers runtime dir has device file access.

Posted by ji...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

jieyu pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/mesos.git

commit f050bf01af8f9f92bbada2c0a2025a459290ed98
Author: Jie Yu <yu...@gmail.com>
AuthorDate: Fri Dec 7 16:51:19 2018 -0800

    Made sure containers runtime dir has device file access.
    
    Make sure that container's runtime dir has device file access.  Some
    Linux distributions will mount `/run` with `nodev`, restricting
    accessing to device files under `/run`. However, Mesos prepares device
    files for containers under container's runtime dir (which is typically
    under `/run`) and bind mount into container root filesystems. Therefore,
    we need to make sure those device files can be accessed by the
    container. We need to do a self bind mount and remount with proper
    options if necessary. See MESOS-9462 for more details.
    
    Review: https://reviews.apache.org/r/69532
---
 src/linux/fs.cpp                                   |   7 +-
 src/linux/fs.hpp                                   |   2 +-
 .../mesos/isolators/filesystem/linux.cpp           | 260 +++++++++++++++------
 .../mesos/isolators/volume/host_path.cpp           |  13 +-
 src/slave/containerizer/mesos/launch.cpp           |  11 +-
 5 files changed, 202 insertions(+), 91 deletions(-)

diff --git a/src/linux/fs.cpp b/src/linux/fs.cpp
index 6ff8b4d..7d05716 100644
--- a/src/linux/fs.cpp
+++ b/src/linux/fs.cpp
@@ -275,16 +275,11 @@ Try<MountInfoTable::Entry> MountInfoTable::findByTarget(
         ": " + (realTarget.isError() ? realTarget.error() : "Not found"));
   }
 
-  Try<MountInfoTable> table = read(None(), true);
-  if (table.isError()) {
-    return Error("Failed to get mount table: " + table.error());
-  }
-
   // Trying to find the mount entry that contains the 'realTarget'. We
   // achieve that by doing a reverse traverse of the mount table to
   // find the first entry whose target is a prefix of the specified
   // 'realTarget'.
-  foreach (const Entry& entry, adaptor::reverse(table->entries)) {
+  foreach (const Entry& entry, adaptor::reverse(entries)) {
     if (entry.target == realTarget.get()) {
       return entry;
     }
diff --git a/src/linux/fs.hpp b/src/linux/fs.hpp
index 04bd706..e4badb7 100644
--- a/src/linux/fs.hpp
+++ b/src/linux/fs.hpp
@@ -281,7 +281,7 @@ struct MountInfoTable {
   // no mount table entry that matches the exact target path, return
   // the mount table entry that is the immediate parent of the given
   // target path (similar to `findmnt --target [TARGET]`).
-  static Try<Entry> findByTarget(const std::string& target);
+  Try<Entry> findByTarget(const std::string& target);
 
   std::vector<Entry> entries;
 };
diff --git a/src/slave/containerizer/mesos/isolators/filesystem/linux.cpp b/src/slave/containerizer/mesos/isolators/filesystem/linux.cpp
index c318fb9..2a9ea44 100644
--- a/src/slave/containerizer/mesos/isolators/filesystem/linux.cpp
+++ b/src/slave/containerizer/mesos/isolators/filesystem/linux.cpp
@@ -246,38 +246,25 @@ static Try<Nothing> makeDevicesDir(
 }
 
 
-Try<Isolator*> LinuxFilesystemIsolatorProcess::create(const Flags& flags)
+// Make sure that the specified target directory is in a shared mount
+// so that when forking a child process (with a new mount namespace),
+// the child process does not hold extra references to the mounts
+// underneath the target directory. For instance, container's
+// persistent volume mounts and provisioner mounts (e.g., when using
+// the bind/overlayfs backend) under agent's `work_dir`. This ensures
+// that cleanup operations (i.e., unmount) on the host mount namespace
+// can be propagated to child's mount namespaces. See MESOS-3483 for
+// more details.
+// TODO(jieyu): Consider moving this helper to 'src/linux/fs.hpp|cpp'.
+static Try<Nothing> ensureSharedMount(const string& _targetDir)
 {
-  if (geteuid() != 0) {
-    return Error("'filesystem/linux' isolator requires root privileges");
-  }
-
-  if (flags.launcher != "linux") {
-    return Error("'filesystem/linux' isolator requires 'linux' launcher");
-  }
-
-
-  Try<bool> supported = ns::supported(CLONE_NEWNS);
-  if (supported.isError() || !supported.get()) {
-    return Error(
-        "The 'filesystem/linux' isolator requires mount namespace support");
-  }
-
-  // Make sure that slave's working directory is in a shared mount so
-  // that when forking a child process (with a new mount namespace),
-  // the child process does not hold extra references to container's
-  // persistent volume mounts and provisioner mounts (e.g., when using
-  // the bind/overlayfs backend). This ensures that cleanup operations
-  // within slave's working directory can be propagated to all
-  // containers. See MESOS-3483 for more details.
-
   // Mount table entries use realpaths. Therefore, we first get the
-  // realpath of the slave's working directory.
-  Result<string> workDir = os::realpath(flags.work_dir);
-  if (!workDir.isSome()) {
+  // realpath of the target directory.
+  Result<string> targetDir = os::realpath(_targetDir);
+  if (!targetDir.isSome()) {
     return Error(
-        "Failed to get the realpath of slave's working directory: " +
-        (workDir.isError() ? workDir.error() : "Not found"));
+        "Failed to get the realpath of '" + _targetDir + "': " +
+        (targetDir.isError() ? targetDir.error() : "Not found"));
   }
 
   Try<fs::MountInfoTable> table = fs::MountInfoTable::read();
@@ -285,44 +272,37 @@ Try<Isolator*> LinuxFilesystemIsolatorProcess::create(const Flags& flags)
     return Error("Failed to get mount table: " + table.error());
   }
 
-  // Trying to find the mount entry that contains the slave's working
+  // Trying to find the mount entry that contains the target
   // directory. We achieve that by doing a reverse traverse of the
   // mount table to find the first entry whose target is a prefix of
-  // slave's working directory.
-  Option<fs::MountInfoTable::Entry> workDirMount;
-  foreach (const fs::MountInfoTable::Entry& entry,
-           adaptor::reverse(table->entries)) {
-    if (strings::startsWith(workDir.get(), entry.target)) {
-      workDirMount = entry;
-      break;
-    }
-  }
+  // the target directory.
+  Try<fs::MountInfoTable::Entry> targetDirMount =
+    table->findByTarget(_targetDir);
 
-  // It's unlikely that we cannot find 'workDirMount' because '/' is
-  // always mounted and will be the 'workDirMount' if no other mounts
-  // found in between.
-  if (workDirMount.isNone()) {
-    return Error("Cannot find the mount containing slave's working directory");
+  if (targetDirMount.isError()) {
+    return Error(
+        "Failed to find the mount containing '" + _targetDir +
+        "': " + targetDirMount.error());
   }
 
-  // If 'workDirMount' is a shared mount in its own peer group, then
+  // If 'targetDirMount' is a shared mount in its own peer group, then
   // we don't need to do anything. Otherwise, we need to do a self
-  // bind mount of slave's working directory to make sure it's a
-  // shared mount in its own peer group.
+  // bind mount of the target directory to make sure it's a shared
+  // mount in its own peer group.
   bool bindMountNeeded = false;
 
-  if (workDirMount->shared().isNone()) {
+  if (targetDirMount->shared().isNone()) {
     bindMountNeeded = true;
   } else {
     foreach (const fs::MountInfoTable::Entry& entry, table->entries) {
-      // Skip 'workDirMount' and any mount underneath it. Also, we
+      // Skip 'targetDirMount' and any mount underneath it. Also, we
       // skip those mounts whose targets are not the parent of the
-      // working directory because even if they are in the same peer
+      // target directory because even if they are in the same peer
       // group as the working directory mount, it won't affect it.
-      if (entry.id != workDirMount->id &&
-          !strings::startsWith(entry.target, workDir.get()) &&
-          entry.shared() == workDirMount->shared() &&
-          strings::startsWith(workDir.get(), entry.target)) {
+      if (entry.id != targetDirMount->id &&
+          !strings::startsWith(entry.target, path::join(targetDir.get(), "")) &&
+          entry.shared() == targetDirMount->shared() &&
+          strings::startsWith(targetDir.get(), path::join(entry.target, ""))) {
         bindMountNeeded = true;
         break;
       }
@@ -330,11 +310,11 @@ Try<Isolator*> LinuxFilesystemIsolatorProcess::create(const Flags& flags)
   }
 
   if (bindMountNeeded) {
-    if (workDirMount->target != workDir.get()) {
-      // This is the case where the working directory mount does not
+    if (targetDirMount->target != targetDir.get()) {
+      // This is the case where the target directory mount does not
       // exist in the mount table (e.g., a new host running Mesos
       // slave for the first time).
-      LOG(INFO) << "Bind mounting '" << workDir.get()
+      LOG(INFO) << "Bind mounting '" << targetDir.get()
                 << "' and making it a shared mount";
 
       // NOTE: Instead of using fs::mount to perform the bind mount,
@@ -350,37 +330,181 @@ Try<Isolator*> LinuxFilesystemIsolatorProcess::create(const Flags& flags)
           "mount --bind %s %s && "
           "mount --make-private %s && "
           "mount --make-shared %s",
-          workDir->c_str(),
-          workDir->c_str(),
-          workDir->c_str(),
-          workDir->c_str());
+          targetDir.get(),
+          targetDir.get(),
+          targetDir.get(),
+          targetDir.get());
 
       if (mount.isError()) {
         return Error(
-            "Failed to bind mount '" + workDir.get() +
+            "Failed to bind mount '" + targetDir.get() +
             "' and make it a shared mount: " + mount.error());
       }
     } else {
-      // This is the case where the working directory mount is in the
+      // This is the case where the target directory mount is in the
       // mount table, but it's not a shared mount in its own peer
       // group (possibly due to slave crash while preparing the
-      // working directory mount). It's safe to re-do the following.
-      LOG(INFO) << "Making '" << workDir.get() << "' a shared mount";
+      // target directory mount). It's safe to re-do the following.
+      LOG(INFO) << "Making '" << targetDir.get() << "' a shared mount";
 
       Try<string> mount = os::shell(
           "mount --make-private %s && "
           "mount --make-shared %s",
-          workDir->c_str(),
-          workDir->c_str());
+          targetDir.get(),
+          targetDir.get());
 
       if (mount.isError()) {
         return Error(
-            "Failed to make '" + workDir.get() +
+            "Failed to make '" + targetDir.get() +
             "' a shared mount: " + mount.error());
       }
     }
   }
 
+  return Nothing();
+}
+
+
+// Make sure the target directory allow device files (i.e., there no
+// `nodev` on the mounted filesystem that contains the target path).
+static Try<Nothing> ensureAllowDevices(const string& _targetDir)
+{
+  // Mount table entries use realpaths. Therefore, we first get the
+  // realpath of the target directory.
+  Result<string> targetDir = os::realpath(_targetDir);
+  if (!targetDir.isSome()) {
+    return Error(
+        "Failed to get the realpath of '" + _targetDir + "': " +
+        (targetDir.isError() ? targetDir.error() : "Not found"));
+  }
+
+  Try<fs::MountInfoTable> table = fs::MountInfoTable::read();
+  if (table.isError()) {
+    return Error("Failed to get mount table: " + table.error());
+  }
+
+  // Trying to find the mount entry that contains the target
+  // directory. We achieve that by doing a reverse traverse of the
+  // mount table to find the first entry whose target is a prefix of
+  // the target directory.
+  Try<fs::MountInfoTable::Entry> targetDirMount =
+    table->findByTarget(_targetDir);
+
+  if (targetDirMount.isError()) {
+    return Error(
+        "Failed to find the mount containing '" + _targetDir +
+        "': " + targetDirMount.error());
+  }
+
+  // No need to do anything if the mount has no `nodev`.
+  if (!strings::contains(targetDirMount->vfsOptions, "nodev")) {
+    return Nothing();
+  }
+
+  if (targetDirMount->target != targetDir.get()) {
+    // This is the case where the target directory mount does not
+    // exist in the mount table (e.g., a new host running Mesos
+    // slave for the first time).
+    LOG(INFO) << "Self bind mounting '" << targetDir.get()
+              << "' and remounting with '-o remount,dev'";
+
+    // NOTE: Instead of using fs::mount to perform the bind mount,
+    // we use the shell command here because the syscall 'mount'
+    // does not update the mount table (i.e., /etc/mtab). In other
+    // words, the mount will not be visible if the operator types
+    // command 'mount'. Since this mount will still be presented
+    // after all containers and the slave are stopped, it's better
+    // to make it visible. It's OK to use the blocking os::shell
+    // here because 'create' will only be invoked during
+    // initialization.
+    Try<string> mount = os::shell(
+        "mount --bind %s %s && "
+        "mount -o remount,dev %s",
+        targetDir.get(),
+        targetDir.get(),
+        targetDir.get());
+
+    if (mount.isError()) {
+      return Error(
+          "Failed to self bind mount '" + targetDir.get() +
+          "' and remount with '-o remount,dev': " + mount.error());
+    }
+  } else {
+    // This is the case where the target directory mount is in the
+    // mount table, but it's not remounted yet to remove 'nodev'
+    // (possibly due to slave crash while preparing the target
+    // directory mount). It's safe to re-do the following.
+    LOG(INFO) << "Remounting '" << targetDir.get() << "' with '-o remount,dev'";
+
+    Try<string> mount = os::shell(
+        "mount -o remount,dev %s",
+        targetDir.get());
+
+    if (mount.isError()) {
+      return Error(
+          "Failed to remount '" + targetDir.get() +
+          "' with '-o remount,dev': " + mount.error());
+    }
+  }
+
+  return Nothing();
+}
+
+
+Try<Isolator*> LinuxFilesystemIsolatorProcess::create(const Flags& flags)
+{
+  if (geteuid() != 0) {
+    return Error("'filesystem/linux' isolator requires root privileges");
+  }
+
+  if (flags.launcher != "linux") {
+    return Error("'filesystem/linux' isolator requires 'linux' launcher");
+  }
+
+
+  Try<bool> supported = ns::supported(CLONE_NEWNS);
+  if (supported.isError() || !supported.get()) {
+    return Error(
+        "The 'filesystem/linux' isolator requires mount namespace support");
+  }
+
+  // Make sure that slave's working directory is in a shared mount so
+  // that when forking a child process (with a new mount namespace),
+  // the child process does not hold extra references to container's
+  // persistent volume mounts and provisioner mounts (e.g., when using
+  // the bind/overlayfs backend). This ensures that cleanup operations
+  // within slave's working directory can be propagated to all
+  // containers. See MESOS-3483 for more details.
+  Try<Nothing> workDirSharedMount = ensureSharedMount(flags.work_dir);
+  if (workDirSharedMount.isError()) {
+    return Error(workDirSharedMount.error());
+  }
+
+  // Make sure that container's runtime dir has device file access.
+  // Some Linux distributions will mount `/run` with `nodev`,
+  // restricting accessing to device files under `/run`. However,
+  // Mesos prepares device files for containers under container's
+  // runtime dir (which is typically under `/run`) and bind mount into
+  // container root filesystems. Therefore, we need to make sure those
+  // device files can be accessed by the container. We need to do a
+  // self bind mount and remount with proper options if necessary. See
+  // MESOS-9462 for more details.
+  const string containersRuntimeDir = path::join(
+      flags.runtime_dir,
+      containerizer::paths::CONTAINER_DIRECTORY);
+
+  Try<Nothing> mkdir = os::mkdir(containersRuntimeDir);
+  if (mkdir.isError()) {
+    return Error(
+        "Failed to create container's runtime dir at '" +
+        containersRuntimeDir + "': " + mkdir.error());
+  }
+
+  Try<Nothing> containersDirMount = ensureAllowDevices(containersRuntimeDir);
+  if (containersDirMount.isError()) {
+    return Error(containersDirMount.error());
+  }
+
   Owned<MesosIsolatorProcess> process(
       new LinuxFilesystemIsolatorProcess(flags));
 
diff --git a/src/slave/containerizer/mesos/isolators/volume/host_path.cpp b/src/slave/containerizer/mesos/isolators/volume/host_path.cpp
index 2e4cd2b..64abfc1 100644
--- a/src/slave/containerizer/mesos/isolators/volume/host_path.cpp
+++ b/src/slave/containerizer/mesos/isolators/volume/host_path.cpp
@@ -268,18 +268,13 @@ Future<Option<ContainerLaunchInfo>> VolumeHostPathIsolatorProcess::prepare(
     if (mountPropagationBidirectional) {
       // First, find the mount entry that is the parent of the host
       // volume source. If it is not a shared mount, return a failure.
-
-      // Get realpath here because the mount table uses realpaths.
-      Result<string> realHostPath = os::realpath(hostPath.get());
-      if (!realHostPath.isSome()) {
-        return Failure(
-            "Failed to get the realpath of the host path '" +
-            hostPath.get() + "': " +
-            (realHostPath.isError() ? realHostPath.error() : "Not found"));
+      Try<fs::MountInfoTable> table = fs::MountInfoTable::read();
+      if (table.isError()) {
+        return Failure("Failed to read mount table: " + table.error());
       }
 
       Try<fs::MountInfoTable::Entry> sourceMountEntry =
-        fs::MountInfoTable::findByTarget(realHostPath.get());
+        table->findByTarget(hostPath.get());
 
       if (sourceMountEntry.isError()) {
         return Failure(
diff --git a/src/slave/containerizer/mesos/launch.cpp b/src/slave/containerizer/mesos/launch.cpp
index 6aa4397..2f1c9e7 100644
--- a/src/slave/containerizer/mesos/launch.cpp
+++ b/src/slave/containerizer/mesos/launch.cpp
@@ -370,16 +370,13 @@ static Try<Nothing> prepareMounts(const ContainerLaunchInfo& launchInfo)
     // should consider forcing all isolators to use
     // `ContainerMountInfo` for volume mounts.
     if (hasSharedMount) {
-      Result<string> realTargetPath = os::realpath(mount.target());
-      if (!realTargetPath.isSome()) {
-        return Error(
-            "Failed to get the realpath of the mount target '" +
-            mount.target() + "': " +
-            (realTargetPath.isError() ? realTargetPath.error() : "Not found"));
+      Try<fs::MountInfoTable> table = fs::MountInfoTable::read();
+      if (table.isError()) {
+        return Error("Failed to read mount table: " + table.error());
       }
 
       Try<fs::MountInfoTable::Entry> entry =
-        fs::MountInfoTable::findByTarget(realTargetPath.get());
+        table->findByTarget(mount.target());
 
       if (entry.isError()) {
         return Error(


[mesos] 01/02: Used strings::format in os::shell.

Posted by ji...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

jieyu pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/mesos.git

commit 35cdfa2c0bcb75f5801ec60671a3f978b4aa645f
Author: Jie Yu <yu...@gmail.com>
AuthorDate: Sun Dec 9 19:14:09 2018 -0800

    Used strings::format in os::shell.
    
    Previously, `strings::internal::format` was used. It causes issues when
    std::string is passed in as parameters. Switched to use
    `strings::format` instead in `os::shell` implementation.
    
    Review: https://reviews.apache.org/r/69537
---
 3rdparty/stout/include/stout/os/posix/shell.hpp   | 2 +-
 3rdparty/stout/include/stout/os/windows/shell.hpp | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/3rdparty/stout/include/stout/os/posix/shell.hpp b/3rdparty/stout/include/stout/os/posix/shell.hpp
index de6ef23..2cc4e2c 100644
--- a/3rdparty/stout/include/stout/os/posix/shell.hpp
+++ b/3rdparty/stout/include/stout/os/posix/shell.hpp
@@ -72,7 +72,7 @@ constexpr const char* arg1 = "-c";
 template <typename... T>
 Try<std::string> shell(const std::string& fmt, const T&... t)
 {
-  const Try<std::string> command = strings::internal::format(fmt, t...);
+  const Try<std::string> command = strings::format(fmt, t...);
   if (command.isError()) {
     return Error(command.error());
   }
diff --git a/3rdparty/stout/include/stout/os/windows/shell.hpp b/3rdparty/stout/include/stout/os/windows/shell.hpp
index b9e06d6..2f76bba 100644
--- a/3rdparty/stout/include/stout/os/windows/shell.hpp
+++ b/3rdparty/stout/include/stout/os/windows/shell.hpp
@@ -419,7 +419,7 @@ Try<std::string> shell(const std::string& fmt, const T&... t)
   using std::string;
   using std::vector;
 
-  const Try<string> command = strings::internal::format(fmt, t...);
+  const Try<string> command = strings::format(fmt, t...);
   if (command.isError()) {
     return Error(command.error());
   }