You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mesos.apache.org by "Jie Yu (JIRA)" <ji...@apache.org> on 2019/02/02 01:20:00 UTC
[jira] [Commented] (MESOS-9549) nvidia/cuda 10 does not work on GPU
isolator
[ https://issues.apache.org/jira/browse/MESOS-9549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16758811#comment-16758811 ]
Jie Yu commented on MESOS-9549:
-------------------------------
Spent some time on this today, we need to do the following to make cuda:10 works
1. Inject "/usr/local/nvidia/bin" to PATH
2. Inject "/usr/local/nvidia/lib64:/usr/local/nvidia/lib" to LD_LIBRARY_PATH
3. Add one more condition to inject volume
{code}
+ if (manifest.config().labels().count("maintainer") &&
+ strings::contains(
+ manifest.config().labels().at("maintainer"),
+ "NVIDIA CORPORATION")) {
+ return true;
+ }
{code}
1 and 2 are because the cuda:10 image removed those env vars (in favor of nvidia docker runtime)
3 is because cuda:10 image remove the original label "com.nvidia.volumes.needed"
> nvidia/cuda 10 does not work on GPU isolator
> --------------------------------------------
>
> Key: MESOS-9549
> URL: https://issues.apache.org/jira/browse/MESOS-9549
> Project: Mesos
> Issue Type: Bug
> Reporter: Jie Yu
> Priority: Major
>
> I verified that nvidia/cuda 9 (i.e., 9.2-devel-ubuntu18.04) works with GPU isolator.
> The unit test NvidiaGpuTest.ROOT_INTERNET_CURL_CGROUPS_NVIDIA_GPU_NvidiaDockerImage captures this, and is currently failing on GPU hosts since it uses latest nvidia/cuda image.
> If fails with
> {format}
> sh: 1: nvidia-smi: not found
> {format}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)