You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mesos.apache.org by James Peach <jo...@gmail.com> on 2018/06/27 06:42:41 UTC

Re: mesos git commit: Made `gpu/nvidia` isolator works with `cgroups/all` isolation option.

Do we still need this check? The order of built-in isolators is now fixed, so do we still need to verify this ordering?

> On Jun 27, 2018, at 4:17 PM, gilbert@apache.org wrote:
> 
> Repository: mesos
> Updated Branches:
>  refs/heads/master 2e913d545 -> b581136bd
> 
> 
> Made `gpu/nvidia` isolator works with `cgroups/all` isolation option.
> 
> Review: https://reviews.apache.org/r/67743/
> 
> 
> Project: http://git-wip-us.apache.org/repos/asf/mesos/repo
> Commit: http://git-wip-us.apache.org/repos/asf/mesos/commit/b581136b
> Tree: http://git-wip-us.apache.org/repos/asf/mesos/tree/b581136b
> Diff: http://git-wip-us.apache.org/repos/asf/mesos/diff/b581136b
> 
> Branch: refs/heads/master
> Commit: b581136bd5d28ea74e410c7a57ac10d05c334b5b
> Parents: 2e913d5
> Author: Qian Zhang <zh...@gmail.com>
> Authored: Tue Jun 26 23:16:36 2018 -0700
> Committer: Gilbert Song <so...@gmail.com>
> Committed: Tue Jun 26 23:16:36 2018 -0700
> 
> ----------------------------------------------------------------------
> .../mesos/isolators/gpu/isolator.cpp            | 33 +++++++++++++++++---
> 1 file changed, 29 insertions(+), 4 deletions(-)
> ----------------------------------------------------------------------
> 
> 
> http://git-wip-us.apache.org/repos/asf/mesos/blob/b581136b/src/slave/containerizer/mesos/isolators/gpu/isolator.cpp
> ----------------------------------------------------------------------
> diff --git a/src/slave/containerizer/mesos/isolators/gpu/isolator.cpp b/src/slave/containerizer/mesos/isolators/gpu/isolator.cpp
> index d79c940..5066882 100644
> --- a/src/slave/containerizer/mesos/isolators/gpu/isolator.cpp
> +++ b/src/slave/containerizer/mesos/isolators/gpu/isolator.cpp
> @@ -102,20 +102,45 @@ Try<Isolator*> NvidiaGpuIsolatorProcess::create(
>     const Flags& flags,
>     const NvidiaComponents& components)
> {
> -  // Make sure both the 'cgroups/devices' isolator and the
> -  // 'filesystem/linux' isolators are present and precede the GPU
> -  // isolator.
> +  // Make sure both the 'cgroups/devices' (or 'cgroups/all') isolator and the
> +  // 'filesystem/linux' isolators are present and precede the GPU isolator.
>   vector<string> tokens = strings::tokenize(flags.isolation, ",");
> 
>   auto gpuIsolator =
>     std::find(tokens.begin(), tokens.end(), "gpu/nvidia");
> +
>   auto devicesIsolator =
>     std::find(tokens.begin(), tokens.end(), "cgroups/devices");
> +
> +  auto cgroupsAllIsolator =
> +    std::find(tokens.begin(), tokens.end(), "cgroups/all");
> +
>   auto filesystemIsolator =
>     std::find(tokens.begin(), tokens.end(), "filesystem/linux");
> 
>   CHECK(gpuIsolator != tokens.end());
> 
> +  if (cgroupsAllIsolator != tokens.end()) {
> +    // The reason that we need to check if `devices` cgroups subsystem is
> +    // enabled is, when `cgroups/all` is specified in the `--isolation` agent
> +    // flag, cgroups isolator will only load the enabled subsystems. So if
> +    // `cgroups/all` is specified but `devices` is not enabled, cgroups isolator
> +    // will not load `devices` subsystem in which case we should error out.
> +    Try<bool> result = cgroups::enabled("devices");
> +    if (result.isError()) {
> +      return Error(
> +          "Failed to check if the `devices` cgroups subsystem"
> +          " is enabled by kernel: " + result.error());
> +    } else if (!result.get()) {
> +      return Error(
> +          "The `devices` cgroups subsystem is not enabled by the kernel");
> +    }
> +
> +    if (devicesIsolator > cgroupsAllIsolator) {
> +      devicesIsolator = cgroupsAllIsolator;
> +    }
> +  }
> +
>   if (devicesIsolator == tokens.end()) {
>     return Error("The 'cgroups/devices' isolator must be enabled in"
>                  " order to use the 'gpu/nvidia' isolator");
> @@ -127,7 +152,7 @@ Try<Isolator*> NvidiaGpuIsolatorProcess::create(
>   }
> 
>   if (devicesIsolator > gpuIsolator) {
> -    return Error("'cgroups/devices' must precede 'gpu/nvidia'"
> +    return Error("'cgroups/devices' or 'cgroups/all' must precede 'gpu/nvidia'"
>                  " in the --isolation flag");
>   }
> 
> 


Re: mesos git commit: Made `gpu/nvidia` isolator works with `cgroups/all` isolation option.

Posted by Qian Zhang <zh...@gmail.com>.
Thanks James for reminding this!

You are right, we do not need this check, I have posted a patch (
https://reviews.apache.org/r/67767/) to remove it, can you help review it?


Regards,
Qian Zhang

On Wed, Jun 27, 2018 at 2:42 PM, James Peach <jo...@gmail.com> wrote:

> Do we still need this check? The order of built-in isolators is now fixed,
> so do we still need to verify this ordering?
>
> > On Jun 27, 2018, at 4:17 PM, gilbert@apache.org wrote:
> >
> > Repository: mesos
> > Updated Branches:
> >  refs/heads/master 2e913d545 -> b581136bd
> >
> >
> > Made `gpu/nvidia` isolator works with `cgroups/all` isolation option.
> >
> > Review: https://reviews.apache.org/r/67743/
> >
> >
> > Project: http://git-wip-us.apache.org/repos/asf/mesos/repo
> > Commit: http://git-wip-us.apache.org/repos/asf/mesos/commit/b581136b
> > Tree: http://git-wip-us.apache.org/repos/asf/mesos/tree/b581136b
> > Diff: http://git-wip-us.apache.org/repos/asf/mesos/diff/b581136b
> >
> > Branch: refs/heads/master
> > Commit: b581136bd5d28ea74e410c7a57ac10d05c334b5b
> > Parents: 2e913d5
> > Author: Qian Zhang <zh...@gmail.com>
> > Authored: Tue Jun 26 23:16:36 2018 -0700
> > Committer: Gilbert Song <so...@gmail.com>
> > Committed: Tue Jun 26 23:16:36 2018 -0700
> >
> > ----------------------------------------------------------------------
> > .../mesos/isolators/gpu/isolator.cpp            | 33
> +++++++++++++++++---
> > 1 file changed, 29 insertions(+), 4 deletions(-)
> > ----------------------------------------------------------------------
> >
> >
> > http://git-wip-us.apache.org/repos/asf/mesos/blob/b581136b/
> src/slave/containerizer/mesos/isolators/gpu/isolator.cpp
> > ----------------------------------------------------------------------
> > diff --git a/src/slave/containerizer/mesos/isolators/gpu/isolator.cpp
> b/src/slave/containerizer/mesos/isolators/gpu/isolator.cpp
> > index d79c940..5066882 100644
> > --- a/src/slave/containerizer/mesos/isolators/gpu/isolator.cpp
> > +++ b/src/slave/containerizer/mesos/isolators/gpu/isolator.cpp
> > @@ -102,20 +102,45 @@ Try<Isolator*> NvidiaGpuIsolatorProcess::create(
> >     const Flags& flags,
> >     const NvidiaComponents& components)
> > {
> > -  // Make sure both the 'cgroups/devices' isolator and the
> > -  // 'filesystem/linux' isolators are present and precede the GPU
> > -  // isolator.
> > +  // Make sure both the 'cgroups/devices' (or 'cgroups/all') isolator
> and the
> > +  // 'filesystem/linux' isolators are present and precede the GPU
> isolator.
> >   vector<string> tokens = strings::tokenize(flags.isolation, ",");
> >
> >   auto gpuIsolator =
> >     std::find(tokens.begin(), tokens.end(), "gpu/nvidia");
> > +
> >   auto devicesIsolator =
> >     std::find(tokens.begin(), tokens.end(), "cgroups/devices");
> > +
> > +  auto cgroupsAllIsolator =
> > +    std::find(tokens.begin(), tokens.end(), "cgroups/all");
> > +
> >   auto filesystemIsolator =
> >     std::find(tokens.begin(), tokens.end(), "filesystem/linux");
> >
> >   CHECK(gpuIsolator != tokens.end());
> >
> > +  if (cgroupsAllIsolator != tokens.end()) {
> > +    // The reason that we need to check if `devices` cgroups subsystem
> is
> > +    // enabled is, when `cgroups/all` is specified in the `--isolation`
> agent
> > +    // flag, cgroups isolator will only load the enabled subsystems. So
> if
> > +    // `cgroups/all` is specified but `devices` is not enabled, cgroups
> isolator
> > +    // will not load `devices` subsystem in which case we should error
> out.
> > +    Try<bool> result = cgroups::enabled("devices");
> > +    if (result.isError()) {
> > +      return Error(
> > +          "Failed to check if the `devices` cgroups subsystem"
> > +          " is enabled by kernel: " + result.error());
> > +    } else if (!result.get()) {
> > +      return Error(
> > +          "The `devices` cgroups subsystem is not enabled by the
> kernel");
> > +    }
> > +
> > +    if (devicesIsolator > cgroupsAllIsolator) {
> > +      devicesIsolator = cgroupsAllIsolator;
> > +    }
> > +  }
> > +
> >   if (devicesIsolator == tokens.end()) {
> >     return Error("The 'cgroups/devices' isolator must be enabled in"
> >                  " order to use the 'gpu/nvidia' isolator");
> > @@ -127,7 +152,7 @@ Try<Isolator*> NvidiaGpuIsolatorProcess::create(
> >   }
> >
> >   if (devicesIsolator > gpuIsolator) {
> > -    return Error("'cgroups/devices' must precede 'gpu/nvidia'"
> > +    return Error("'cgroups/devices' or 'cgroups/all' must precede
> 'gpu/nvidia'"
> >                  " in the --isolation flag");
> >   }
> >
> >
>
>