You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "Zhankun Tang (JIRA)" <ji...@apache.org> on 2018/11/30 03:15:00 UTC
[jira] [Updated] (YARN-9073) GPU/FPGA whitelist configuration in
container-executor.cfg won't work when yarn-site.xml's allowed devices
doesn't align with it
[ https://issues.apache.org/jira/browse/YARN-9073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Zhankun Tang updated YARN-9073:
-------------------------------
Description:
The current GPU/FPGA behavior may has an issue when c-g.cfg doesn't align with yarn-site.xml. Take GPU for instance:
One host has 1,2,3,4,5. And "GPU.allowed = 1,2,3" configured in c-e.cfg. But yarn-site.xml configured "auto" which means allow 1,2,3,4,5.
And one application request 4 GPU, the scheduler allocated 1,2,4,5. So --excluded-gpus is "3". And c-e will check that 3 is in allowed list(1,2,3) and then only deny 3 in cgroups.
In this case, c-e's allowed-list (1,2,3) doesn't work because the application can access 4 and 5.
was:
The current GPU/FPGA behavior may has an issue when c-g.cfg doesn't align with yarn-site.xml. Take GPU for instance:
One host has 1,2,3,4,5. And "GPU.allowed = 1,2,3" configured in c-e.cfg. But yarn-site.xml configured auto which means 1,2,3,4,5.
And one application request 4 GPU, the scheduler allocated 1,2,4,5. So --excluded-gpus is "3". And c-e will check that 3 is in allowed list(1,2,3) and then only deny 3 in cgroups.
In this case, c-e's allowed-list (1,2,3) doesn't work because the application can access 4 and 5.
> GPU/FPGA whitelist configuration in container-executor.cfg won't work when yarn-site.xml's allowed devices doesn't align with it
> --------------------------------------------------------------------------------------------------------------------------------
>
> Key: YARN-9073
> URL: https://issues.apache.org/jira/browse/YARN-9073
> Project: Hadoop YARN
> Issue Type: Bug
> Reporter: Zhankun Tang
> Assignee: Zhankun Tang
> Priority: Major
>
> The current GPU/FPGA behavior may has an issue when c-g.cfg doesn't align with yarn-site.xml. Take GPU for instance:
> One host has 1,2,3,4,5. And "GPU.allowed = 1,2,3" configured in c-e.cfg. But yarn-site.xml configured "auto" which means allow 1,2,3,4,5.
> And one application request 4 GPU, the scheduler allocated 1,2,4,5. So --excluded-gpus is "3". And c-e will check that 3 is in allowed list(1,2,3) and then only deny 3 in cgroups.
> In this case, c-e's allowed-list (1,2,3) doesn't work because the application can access 4 and 5.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org