You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "Miklos Szegedi (JIRA)" <ji...@apache.org> on 2018/05/24 17:03:00 UTC

[jira] [Commented] (YARN-8320) [Umbrella] Support CPU isolation for latency-sensitive (LS) service

    [ https://issues.apache.org/jira/browse/YARN-8320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16489398#comment-16489398 ] 

Miklos Szegedi commented on YARN-8320:
--------------------------------------

[~cheersyang] / [~yangjiandan], thank you for raising this, this would be a very useful feature.

Thank you [~leftnoteasy] for the comments.

1) I agree with  [~leftnoteasy] about the special considerations regarding rounding. Because of this it might make sense to use a separate resource type for this feature. See my other comments regarding this below.

2) I also think like [~leftnoteasy] that users might not need the RESERVED/SHARED modes. It adds complexity reducing the number of users, who would use the feature. On the other hand I admit it nicely applies to cpuset.cpu_exclusive=0/1.

3) I definitely agree with [~leftnoteasy] in the use of resource types. It might be straightforward to have a cpuset resource type that the AMs can request and share the cgroups accordingly. This would also make the configuration more standard. The levels might not even be needed in this case. If an application does not request cpuset, it is shared, otherwise it is exclusive. The current suggestion would work but please consider using resource types.

4) The design lets the AM do a delayed exclusive request directly to the NM avoiding the RM. I think it would be more robust to request from the RM in the container launch context and just forward this to the NM. The RM has the chance to decline or delay the request in this case in the future.

5) [~yangjiandan], how can you make sure a parent cgroup does not interfere with a cgroup marked as {{cpuset.cpu_exclusive=1}}? What if a system service wakes up?

6) Let me mention that this feature negatively affects YARN-1011 and oversubscription. An exclusive CPU with leftover cannot be used by any other container and remains idle. This reduces overall cluster utilization.

7) Also, latency sensitive applications get exclusive protection but can only be assigned to their cpuset disallowing bursts to other CPUs when needed. I do not know how to solve this though.

8) If a cpuset is not exclusive it is considered as a limit by cgroups not a reserve. The feature uses this as a reserve which practically would mean that other container cgroups need to be changed and reduced every time a reserved container starts. Am I correct?

> [Umbrella] Support CPU isolation for latency-sensitive (LS) service
> -------------------------------------------------------------------
>
>                 Key: YARN-8320
>                 URL: https://issues.apache.org/jira/browse/YARN-8320
>             Project: Hadoop YARN
>          Issue Type: New Feature
>          Components: nodemanager
>            Reporter: Jiandan Yang 
>            Priority: Major
>         Attachments: CPU-isolation-for-latency-sensitive-services-v1.pdf, CPU-isolation-for-latency-sensitive-services-v2.pdf, YARN-8320.001.patch
>
>
> Currently NodeManager uses “cpu.cfs_period_us”, “cpu.cfs_quota_us” and “cpu.shares” to isolate cpu resource. However,
>  * Linux Completely Fair Scheduling (CFS) is a throughput-oriented scheduler; no support for differentiated latency
>  * Request latency of services running on container may be frequent shake when all containers share cpus, and latency-sensitive services can not afford in our production environment.
> So we need more fine-grained cpu isolation.
> Here we propose a solution using cgroup cpuset to binds containers to different processors, this is inspired by the isolation technique in [Borg system|http://schd.ws/hosted_files/lcccna2016/a7/CAT%20@%20Scale.pdf].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org