You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "Benjamin Teke (Jira)" <ji...@apache.org> on 2022/02/11 14:08:00 UTC

[jira] [Updated] (YARN-11074) User limit factor allows an extra container over the limit

     [ https://issues.apache.org/jira/browse/YARN-11074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Benjamin Teke updated YARN-11074:
---------------------------------
    Attachment:     (was: Screenshot 2022-02-11 at 14.52.30 (2).png)

> User limit factor allows an extra container over the limit
> ----------------------------------------------------------
>
>                 Key: YARN-11074
>                 URL: https://issues.apache.org/jira/browse/YARN-11074
>             Project: Hadoop YARN
>          Issue Type: Bug
>            Reporter: Benjamin Teke
>            Assignee: Benjamin Teke
>            Priority: Major
>         Attachments: Screenshot 2022-02-11 at 15.00.15.png
>
>
> The documentation on the user-limit-factor states:
> yarn.scheduler.capacity.<queue-path>.user-limit-factor: _The multiple of the queue capacity which can be configured to allow a single user to acquire more resources. By default this is set to 1 which *ensures* that a single user _can never take more_ than the queue’s configured capacity irrespective of how idle the cluster is. Value is specified as a float._
> This is not true in this form. Based on the [following unit test|https://github.com/apache/hadoop/blob/8d214cb785724cb930c4938df1bb247a61d33710/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestLeafQueue.java#L731] if the AM is allowed to launch one container will be accepted and allocated regardless of the user-limit-factor. This, most likely is to avoid some "busy waiting" scenarios where the AM is launched, uses resources but isn't progressing, because its containers aren't allocated.
> Checked this behaviour by launching an app on a queue with 2.5% (absolute) capacity (Effective Capacity: memory:1228, vCores:0), with the user-limit-factor set to 1. The app's AM launched with 2 GB memory, and asked for 3 8GB containers. The AM launched, and one of the 8 GB containers launched as well, totalling at 10 GB memory usage, see the attached RM UI screenshot.
> This is not what the documentation says about the user-limit-factor, and in extreme cases (maximum capacity set to 100%) the "extra" container from one user can take all of the resources of one the NM it's launched on, even with low user-limit-factor. Because this is how user-limit-factor works since MAPREDUCE-279, the behaviour shouldn't be changed by default, so a new configuration flag could be introduced, which makes the user-limit-factor a real hard limit. Its default value should be false.
> Additionally the documentation should be updated accordingly.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org