You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "Zhankun Tang (JIRA)" <ji...@apache.org> on 2018/12/03 15:27:00 UTC

[jira] [Comment Edited] (YARN-9060) [YARN-8851] Phase 1 - Support device isolation in native container-executor

    [ https://issues.apache.org/jira/browse/YARN-9060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16707359#comment-16707359 ] 

Zhankun Tang edited comment on YARN-9060 at 12/3/18 3:26 PM:
-------------------------------------------------------------

[~leftnoteasy] ,

Let's first see the bug(YARN-9073) we involve in current implementation as GPU/FPGA.

 
 
{code:java}
Scenario:
One host has 1,2,3,4,5,6. And "GPU.allowed = 1,2,3" configured in c-e.cfg. But yarn-site.xml configured "auto" which means allow 1,2,3,4,5,6.
And one application request 4 GPU, the scheduler allocated 1,2,4,5. So --excluded-gpus is "3". And c-e will check that 3 is in allowed list(1,2,3) and then only deny 3 in cgroups.
In this case, c-e's allowed-list (1,2,3) doesn't work because the application can access 4,5,6 now.
{code}
 

It seems that if we passed allowed devices from Java layer (1,2,4,5) and check it with "GPU.allowed"(1,2,3) should solve this issue. In this case, it does solve the bug. The 4 and 5 is not in (1,2,3) and will throw an error.

But another bug still exists. Still, use an example, assume one host has (1,2,3,4,5,6). And "GPU.allowed=1,2,3,4" configured in c-e.cfg. yarn-site.xml indicates devices(1,2,3) can be scheduled. An application request 2 devices, java layers allowed devices are (1). Denied devices will be (2,3). Both (1) and (2,3) are in configured allowed devices. But the application can actually consume (4,5,6).

 

 

*The root cause of these bugs* is that the c-e cannot know the exact devices to deny based on "GPU.allowed" and java layer excluded GPUs. To avoid the above bugs, we can use below solutions.

The configuration in c-e.cfg is as follows. We use "denied-numbers" to let the administrator define what is not permitted exactly. The original "devices.allowed-numbers" can exist but is unnecessary once we use denied-numbers. Better to remove it.

 
{code:java}
[devices]
 module.enabled=true
 device.allowed-numbers=8:32 # this will be unnecessary.
 devices.denied-numbers=8:48,8:16 #comma separated major:minor. Empty means allow default devices reported by device plugin.{code}
 

The CLI options are as below:
{code:java}
c-e --module-devices \
  --excluded_devices b-8:32-rwm \
  --allowed_devices 8:16,8:48 \
  --container_id container_x_y
{code}
The "devices.denied" in c-e.cfg is a blacklist that will be added(no duplicate update) to cgroup "devices.deny" just like the handling of "–excluded_devices" values.

In the above examples, the value of "–allowed_devices" passed from java layer is checked against "devices.denied-numbers" to see if any devices want by Java layer are invalid. Will report error if found. Without this "–allowed_devices" check and error threw, a bug will exist (all devices are (1,2,3). "devices.denied-numbers" is 3, an app request 2 devices, scheduler allocated (1,3). The value of "–excluded_devies" is 2, (2,3) are updated to cgroups. And the app can only use 1 device which is less than expected. When we have --allowed_devices, (1,3) contains denied value 3 configured in c-e.cfg and will report an error to avoid the bug).


was (Author: tangzhankun):
[~leftnoteasy] ,

Let's first see the bug(YARN-9073) we involve in current implementation as GPU/FPGA.

 
 
{noformat}
Scenario:
One host has 1,2,3,4,5,6. And "GPU.allowed = 1,2,3" configured in c-e.cfg. But yarn-site.xml configured "auto" which means allow 1,2,3,4,5,6.
And one application request 4 GPU, the scheduler allocated 1,2,4,5. So --excluded-gpus is "3". And c-e will check that 3 is in allowed list(1,2,3) and then only deny 3 in cgroups.
In this case, c-e's allowed-list (1,2,3) doesn't work because the application can access 4,5,6 now.
{noformat}
 

It seems that if we passed allowed devices from Java layer (1,2,4,5) and check it with "GPU.allowed"(1,2,3) should solve this issue. In this case, it does solve the bug. The 4 and 5 is not in (1,2,3) and will throw an error.

But another bug still exists. Still, use an example, assume one host has (1,2,3,4,5,6). And "GPU.allowed=1,2,3,4" configured in c-e.cfg. yarn-site.xml indicates devices(1,2,3) can be scheduled. An application request 2 devices, java layers allowed devices are (1). Denied devices will be (2,3). Both (1) and (2,3) are in configured allowed devices. But the application can actually consume (4,5,6).

 

 

*The root cause of these bugs* is that the c-e cannot know the exact devices to deny based on "GPU.allowed" and java layer excluded GPUs. To avoid the above bugs, we can use below solutions.

The configuration in c-e.cfg is as follows. We use "denied-numbers" to let the administrator define what is not permitted exactly. The original "devices.allowed-numbers" can exist but is unnecessary once we use denied-numbers. Better to remove it.

 
{code:java}
[devices]
 module.enabled=true
 device.allowed-numbers=8:32 # this will be unnecessary.
 devices.denied-numbers=8:48,8:16 #comma separated major:minor. Empty means allow default devices reported by device plugin.{code}
 

The CLI options are as below:
{code:java}
c-e --module-devices \
  --excluded_devices b-8:32-rwm \
  --allowed_devices 8:16,8:48 \
  --container_id container_x_y
{code}
The "devices.denied" in c-e.cfg is a blacklist that will be added(no duplicate update) to cgroup "devices.deny" just like the handling of "–excluded_devices" values.

In the above examples, the value of "–allowed_devices" passed from java layer is checked against "devices.denied-numbers" to see if any devices want by Java layer are invalid. Will report error if found. Without this "–allowed_devices" check and error threw, a bug will exist (all devices are (1,2,3). "devices.denied-numbers" is 3, an app request 2 devices, scheduler allocated (1,3). The value of "–excluded_devies" is 2, (2,3) are updated to cgroups. And the app can only use 1 device which is less than expected. When we have --allowed_devices, (1,3) contains denied value 3 configured in c-e.cfg and will report an error to avoid the bug).

> [YARN-8851] Phase 1 - Support device isolation in native container-executor
> ---------------------------------------------------------------------------
>
>                 Key: YARN-9060
>                 URL: https://issues.apache.org/jira/browse/YARN-9060
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>            Reporter: Zhankun Tang
>            Assignee: Zhankun Tang
>            Priority: Major
>         Attachments: YARN-9060-trunk.001.patch, YARN-9060-trunk.002.patch
>
>
> Due to the cgroups v1 implementation policy in linux kernel, we cannot update the value of the device cgroups controller unless we have the root permission ([here|https://github.com/torvalds/linux/blob/6f0d349d922ba44e4348a17a78ea51b7135965b1/security/device_cgroup.c#L604]). So we need to support this in container-executor for Java layer to invoke.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org