You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "zhao yufei (Jira)" <ji...@apache.org> on 2020/08/08 03:42:00 UTC
[jira] [Commented] (YARN-10248) when config allowed-gpu-devices ,
excluded GPUs still be visible to containers
[ https://issues.apache.org/jira/browse/YARN-10248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17173567#comment-17173567 ]
zhao yufei commented on YARN-10248:
-----------------------------------
[~tangzhankun] , yes , the TestGpuResourceHandler test class seems have issues, need confirm with you.
i have one questions for this test class,
should all the test can pass when the test server without gpus?
if not , most of the test methods within the class will never succeed.
if yes, the class still have issues, for setupFakeGpuDiscoveryBinary method will fake a file as binary, but
GpuDiscoverer.lookUpAutoDiscoveryBinary method will check if the file is binary or not , if not binary , it will throw exceptions, so most of tests will failure.
> when config allowed-gpu-devices , excluded GPUs still be visible to containers
> ------------------------------------------------------------------------------
>
> Key: YARN-10248
> URL: https://issues.apache.org/jira/browse/YARN-10248
> Project: Hadoop YARN
> Issue Type: Bug
> Components: nodemanager
> Affects Versions: 3.2.1
> Reporter: zhao yufei
> Assignee: zhao yufei
> Priority: Minor
> Labels: pull-request-available
> Attachments: YARN-10248-branch-3.2.001.path, YARN-10248-branch-3.2.001.path
>
>
> I have a server with two GPU, and i want to use only one of them within yarn cluster.
> according to hadoop document, i set configs:
> {code:java}
> <property>
> <name>yarn.nodemanager.resource-plugins.gpu.allowed-gpu-devices</name>
> <value>0:1</value>
> </property>
> <property>
> <name>yarn.nodemanager.resource-plugins.gpu.path-to-discovery-executables</name>
> <value>/etc/alternatives/x86_64-linux-gnu_nvidia_smi</value>
> </property>
> {code}
> then i running following command to test:
> {code:java}
> yarn jar ./share/hadoop/yarn/hadoop-yarn-applications-distributedshell-3.2.1.jar \
> -jar ./share/hadoop/yarn/hadoop-yarn-applications-distributedshell-3.2.1.jar -shell_command ' nvidia-smi & sleep 3 ' \
> -container_resources memory-mb=3072,vcores=1,yarn.io/gpu=1 \
> -num_containers 1 -queue yufei -node_label_expression slaves
> {code}
> iI expected gpu with minor number 0 will not visible to container, but in the launched container, nvidia-smi print two gpu information.
> I check the related source code and find it is a bug.
> the problem is:
> when you specify allowed-gpu-devices, GpuDiscoverer will populate usable gpus from it,
> then when assign to a container some of the gpus, it will set denied gpus for the container,
> but it never consider excluded gpu of the host.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org