You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "Qi Zhu (Jira)" <ji...@apache.org> on 2021/04/29 02:46:00 UTC

[jira] [Comment Edited] (YARN-10738) When multi thread scheduling with multi node, we should shuffle with a gap to prevent hot accessing nodes.

    [ https://issues.apache.org/jira/browse/YARN-10738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17335104#comment-17335104 ] 

Qi Zhu edited comment on YARN-10738 at 4/29/21, 2:45 AM:
---------------------------------------------------------

Thanks [~Jim_Brennan] for review and very patient investigation.

The original ResourceUsageMultiNodeLookupPolicy policy sometimes cause the hot node in test cluster, and after the gap shuffle about more than 50% reduce the hot node case, but the gap 10 we should discuss about it, it related to the size of the cluster, and it will get better result if we choose the good gap.

I agree with you, that another option to consider would be to have a policy that uses node utilization, which should more accurately reflect how busy the node is. And we should also shuffle based the node utilization, because multi thread scheduling without node heartbeat scheduling, will commit to the first same node, it will cause the hot node, and the hot node is the big bottleneck of real time cluster. 

And actually the hot node is mainly affected the real time cluster, because it is more restrict to the delay of job.

Thanks.


was (Author: zhuqi):
Thanks [~Jim_Brennan] for review and very patient investigation.

The original ResourceUsageMultiNodeLookupPolicy policy sometimes cause the hot node in test cluster, and after the gap shuffle about more than 50% reduce the hot node case, but the gap 10 we should discuss about it, it related to the size of the cluster, and it will get better result if we choose the good gap.

I agree with you, that another option to consider would be to have a policy that uses node utilization, which should more accurately reflect how busy the node is. And we should also shuffle based the node utilization, because multi thread scheduling, will commit to the first same node, it will cause the hot node, and the hot node is the big bottleneck of real time cluster. 

And actually the hot node is mainly affected the real time cluster, because it is more restrict to the delay of job.

Thanks.

> When multi thread scheduling with multi node, we should shuffle with a gap to prevent hot accessing nodes.
> ----------------------------------------------------------------------------------------------------------
>
>                 Key: YARN-10738
>                 URL: https://issues.apache.org/jira/browse/YARN-10738
>             Project: Hadoop YARN
>          Issue Type: Improvement
>            Reporter: Qi Zhu
>            Assignee: Qi Zhu
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 50m
>  Remaining Estimate: 0h
>
> Now the multi threading scheduling with multi node is not reasonable.
> In large clusters, it will cause the hot accessing nodes, which will lead the abnormal boom node.
> Solution:
> I think we should shuffle the sorted node (such the available resource sort policy) with an interval. 
> I will solve the above problem, and avoid the hot accessing node.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org