You are viewing a plain text version of this content. The canonical link for it is here.

Posted to yarn-issues@hadoop.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2022/01/24 09:02:00 UTC

[jira] [Updated] (YARN-11067) Resource overcommitment due to incorrect resource normalisation logical order

     [ https://issues.apache.org/jira/browse/YARN-11067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

ASF GitHub Bot updated YARN-11067:
----------------------------------
    Labels: pull-request-available  (was: )

> Resource overcommitment due to incorrect resource normalisation logical order
> -----------------------------------------------------------------------------
>
>                 Key: YARN-11067
>                 URL: https://issues.apache.org/jira/browse/YARN-11067
>             Project: Hadoop YARN
>          Issue Type: Bug
>            Reporter: Andras Gyori
>            Assignee: Andras Gyori
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> A rather serious overcommitment issue was discovered when using ABSOLUTE resources as capacities. A minimal way to reproduce the issue is the following:
>  # We have a cluster with 32 GB memory and 16 VCores. Create the following hierarchy with the corresponding capacities:
>  ## root.capacity = [memory=54GiB, vcores=28]
>  ## root.a.capacity = [memory=50GiB, vcores=20]
>  ## root.a1.capacity = [memory=30GiB, vcores=15]
>  ## root.a2.capacity = [memory=20GiB, vcores=5]
>  # Remove a Node from the cluster (this is not even an unusual event), eg. a Node with resource [memory=8GiB, vcores=4]
>  # Due to the normalised resource ratio is calculated BEFORE the effective resource of the queue is recalculated, it will create a cascade which results in an overcommitment in the queue hierarchy (see [https://github.com/apache/hadoop/blob/5ef335da1ed49e06cc8973412952e09ed08bb9c0/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.java#L1294)]
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org