You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@mesos.apache.org by Kevin Klues <kl...@gmail.com> on 2016/04/05 01:41:39 UTC

Review Request 45715: Added support to grant access to /dev/nvidiactl in Nvidia GPU isolator.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/45715/
-----------------------------------------------------------

Review request for mesos, Ben Mahler, Rob Todd, and Vikrama Ditya.


Bugs: MESOS-5115
    https://issues.apache.org/jira/browse/MESOS-5115


Repository: mesos


Description
-------

Previously, calls to 'nvidia-smi' would fail inside a container even
if access to a GPU had been granted. Moreover, access to
/dev/nvidiactl is actually required for a container to do anything
useful with a GPU even if it has access to it.

This patch explicitly grants/revokes access to /dev/nvidiactl as GPUs
are added and removed from a container in the Nvidia GPU isolator.


Diffs
-----

  src/slave/containerizer/mesos/isolators/cgroups/devices/gpus/nvidia.cpp b0f58035c7c819b42e5f249fadd97312f9e3ac7b 

Diff: https://reviews.apache.org/r/45715/diff/


Testing
-------


Thanks,

Kevin Klues


Re: Review Request 45715: Fixed access to /dev/nvidia{ctl, -uvm} in Nvidia GPU isolator.

Posted by Kevin Klues <kl...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/45715/
-----------------------------------------------------------

(Updated April 6, 2016, 12:38 a.m.)


Review request for mesos, Ben Mahler, Rob Todd, and Vikrama Ditya.


Changes
-------

Addressed all of bmahler's comments. Also determined we needed to add access to /dev/nvidia-uvm in addition to /dev/nvidiactl.


Summary (updated)
-----------------

Fixed access to /dev/nvidia{ctl,-uvm} in Nvidia GPU isolator.


Bugs: MESOS-5115
    https://issues.apache.org/jira/browse/MESOS-5115


Repository: mesos


Description (updated)
-------

Previously, calls to 'nvidia-smi' would fail inside a container even
if access to a GPU had been granted. Moreover, access to
/dev/nvidiactl is actually required for a container to do anything
useful with a GPU even if it has access to it.

This patch explicitly grants/revokes access to /dev/nvidiactl and
/dev/nvidia-uvm as GPUs are added and removed from a container in the
Nvidia GPU isolator.


Diffs (updated)
-----

  src/slave/containerizer/mesos/isolators/cgroups/devices/gpus/nvidia.cpp b0f58035c7c819b42e5f249fadd97312f9e3ac7b 

Diff: https://reviews.apache.org/r/45715/diff/


Testing (updated)
-------

Test in subsequent commit.


Thanks,

Kevin Klues


Re: Review Request 45715: Added support to grant access to /dev/nvidiactl in Nvidia GPU isolator.

Posted by Ben Mahler <be...@gmail.com>.

> On April 5, 2016, 11:53 p.m., Ben Mahler wrote:
> >

Do we also need the uvm device?


- Ben


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/45715/#review127255
-----------------------------------------------------------


On April 4, 2016, 11:41 p.m., Kevin Klues wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/45715/
> -----------------------------------------------------------
> 
> (Updated April 4, 2016, 11:41 p.m.)
> 
> 
> Review request for mesos, Ben Mahler, Rob Todd, and Vikrama Ditya.
> 
> 
> Bugs: MESOS-5115
>     https://issues.apache.org/jira/browse/MESOS-5115
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Previously, calls to 'nvidia-smi' would fail inside a container even
> if access to a GPU had been granted. Moreover, access to
> /dev/nvidiactl is actually required for a container to do anything
> useful with a GPU even if it has access to it.
> 
> This patch explicitly grants/revokes access to /dev/nvidiactl as GPUs
> are added and removed from a container in the Nvidia GPU isolator.
> 
> 
> Diffs
> -----
> 
>   src/slave/containerizer/mesos/isolators/cgroups/devices/gpus/nvidia.cpp b0f58035c7c819b42e5f249fadd97312f9e3ac7b 
> 
> Diff: https://reviews.apache.org/r/45715/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Kevin Klues
> 
>


Re: Review Request 45715: Added support to grant access to /dev/nvidiactl in Nvidia GPU isolator.

Posted by Ben Mahler <be...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/45715/#review127255
-----------------------------------------------------------


Fix it, then Ship it!





src/slave/containerizer/mesos/isolators/cgroups/devices/gpus/nvidia.cpp (lines 65 - 78)
<https://reviews.apache.org/r/45715/#comment190468>

    Could we avoid the static non-POD?


- Ben Mahler


On April 4, 2016, 11:41 p.m., Kevin Klues wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/45715/
> -----------------------------------------------------------
> 
> (Updated April 4, 2016, 11:41 p.m.)
> 
> 
> Review request for mesos, Ben Mahler, Rob Todd, and Vikrama Ditya.
> 
> 
> Bugs: MESOS-5115
>     https://issues.apache.org/jira/browse/MESOS-5115
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Previously, calls to 'nvidia-smi' would fail inside a container even
> if access to a GPU had been granted. Moreover, access to
> /dev/nvidiactl is actually required for a container to do anything
> useful with a GPU even if it has access to it.
> 
> This patch explicitly grants/revokes access to /dev/nvidiactl as GPUs
> are added and removed from a container in the Nvidia GPU isolator.
> 
> 
> Diffs
> -----
> 
>   src/slave/containerizer/mesos/isolators/cgroups/devices/gpus/nvidia.cpp b0f58035c7c819b42e5f249fadd97312f9e3ac7b 
> 
> Diff: https://reviews.apache.org/r/45715/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Kevin Klues
> 
>