You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mesos.apache.org by "Jie Yu (JIRA)" <ji...@apache.org> on 2019/02/02 01:20:00 UTC

[jira] [Commented] (MESOS-9549) nvidia/cuda 10 does not work on GPU isolator

    [ https://issues.apache.org/jira/browse/MESOS-9549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16758811#comment-16758811 ] 

Jie Yu commented on MESOS-9549:
-------------------------------

Spent some time on this today, we need to do the following to make cuda:10 works

1. Inject "/usr/local/nvidia/bin" to PATH
2. Inject "/usr/local/nvidia/lib64:/usr/local/nvidia/lib" to LD_LIBRARY_PATH
3. Add one more condition to  inject volume 
{code}
+  if (manifest.config().labels().count("maintainer") &&
+      strings::contains(
+          manifest.config().labels().at("maintainer"),
+          "NVIDIA CORPORATION")) {
+    return true;
+  }
{code}

1 and 2 are because the cuda:10 image removed those env vars (in favor of nvidia docker runtime)
3 is because cuda:10 image remove the original label "com.nvidia.volumes.needed"

> nvidia/cuda 10 does not work on GPU isolator
> --------------------------------------------
>
>                 Key: MESOS-9549
>                 URL: https://issues.apache.org/jira/browse/MESOS-9549
>             Project: Mesos
>          Issue Type: Bug
>            Reporter: Jie Yu
>            Priority: Major
>
> I verified that nvidia/cuda 9 (i.e., 9.2-devel-ubuntu18.04) works with GPU isolator.
> The unit test NvidiaGpuTest.ROOT_INTERNET_CURL_CGROUPS_NVIDIA_GPU_NvidiaDockerImage captures this, and is currently failing on GPU hosts since it uses latest nvidia/cuda image.
> If fails with
> {format}
> sh: 1: nvidia-smi: not found
> {format}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)