You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "Bibin A Chundatt (JIRA)" <ji...@apache.org> on 2018/06/04 09:21:00 UTC
[jira] [Commented] (YARN-8320) [Umbrella] Support CPU isolation for latency-sensitive (LS) service

    [ https://issues.apache.org/jira/browse/YARN-8320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16499938#comment-16499938 ] 

Bibin A Chundatt commented on YARN-8320:
----------------------------------------

Thank you [~cheersyang]/[~yangjiandan] for design doc

{quote}
When a container’s cpu_share_mode is EXCLUSIVE/RESERVED, the number of allocated
processor  allocateProcessorNum =  container_vcore / Vcore_Ratio,  request will be
rejected if allocateProcessorNum <= 0;
{quote}
# IIUC If we don't  have slots to bind container will be rejecting the container start request. Which will be considered as failed. Scheduler could again allocate container to same nodemanager rt ??
# When nm processors/ nm vcores < 1  and share mode have you considered *strictness per containers* ?? ie using the periods and quota also along with Cpuset assignment ?? If no other process is using  cpu then process will be consuming more than what its supposed to rt ??

Thoughts on having CpuBindHandlerImpl includes 2 Allocators for cgroups subgroups one for cpu and another for cpuset?

Could you also consider the following in design

# Using fixed set of folders for assignment in Allocator (Reduce overload of creation and deletion on containers.)
# Resource calculation could go wrong incase of preemption of  containers rt . kill reject could get processed after container start.



> [Umbrella] Support CPU isolation for latency-sensitive (LS) service
> -------------------------------------------------------------------
>
>                 Key: YARN-8320
>                 URL: https://issues.apache.org/jira/browse/YARN-8320
>             Project: Hadoop YARN
>          Issue Type: New Feature
>          Components: nodemanager
>            Reporter: Jiandan Yang 
>            Priority: Major
>         Attachments: CPU-isolation-for-latency-sensitive-services-v1.pdf, CPU-isolation-for-latency-sensitive-services-v2.pdf, YARN-8320.001.patch
>
>
> Currently NodeManager uses “cpu.cfs_period_us”, “cpu.cfs_quota_us” and “cpu.shares” to isolate cpu resource. However,
>  * Linux Completely Fair Scheduling (CFS) is a throughput-oriented scheduler; no support for differentiated latency
>  * Request latency of services running on container may be frequent shake when all containers share cpus, and latency-sensitive services can not afford in our production environment.
> So we need more fine-grained cpu isolation.
> Here we propose a solution using cgroup cpuset to binds containers to different processors, this is inspired by the isolation technique in [Borg system|http://schd.ws/hosted_files/lcccna2016/a7/CAT%20@%20Scale.pdf].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org