You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "Benjamin Teke (Jira)" <ji...@apache.org> on 2022/02/11 14:08:00 UTC

[jira] [Created] (YARN-11074) User limit factor allows an extra container over the limit

Benjamin Teke created YARN-11074:
------------------------------------

             Summary: User limit factor allows an extra container over the limit
                 Key: YARN-11074
                 URL: https://issues.apache.org/jira/browse/YARN-11074
             Project: Hadoop YARN
          Issue Type: Bug
            Reporter: Benjamin Teke
            Assignee: Benjamin Teke
         Attachments: Screenshot 2022-02-11 at 15.00.15.png

The documentation on the user-limit-factor states:
yarn.scheduler.capacity.<queue-path>.user-limit-factor: _The multiple of the queue capacity which can be configured to allow a single user to acquire more resources. By default this is set to 1 which *ensures* that a single user _can never take more_ than the queue’s configured capacity irrespective of how idle the cluster is. Value is specified as a float._

This is not true in this form. Based on the [following unit test|https://github.com/apache/hadoop/blob/8d214cb785724cb930c4938df1bb247a61d33710/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestLeafQueue.java#L731] if the AM is allowed to launch one container will be accepted and allocated regardless of the user-limit-factor. This, most likely is to avoid some "busy waiting" scenarios where the AM is launched, uses resources but isn't progressing, because its containers aren't allocated.

Checked this behaviour by launching an app on a queue with 2.5% (absolute) capacity (Effective Capacity: memory:1228, vCores:0), with the user-limit-factor set to 1. The app's AM launched with 2 GB memory, and asked for 3 8GB containers. The AM launched, and one of the 8 GB containers launched as well, totalling at 10 GB memory usage, see the attached RM UI screenshot.

This is not what the documentation says about the user-limit-factor, and in extreme cases (maximum capacity set to 100%) the "extra" container from one user can take all of the resources of one the NM it's launched on, even with low user-limit-factor. Because this is how user-limit-factor works since MAPREDUCE-279, the behaviour shouldn't be changed by default, so a new configuration flag could be introduced, which makes the user-limit-factor a real hard limit. Its default value should be false.

Additionally the documentation should be updated accordingly.





--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org