You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-dev@hadoop.apache.org by "Zhengbo Li (Jira)" <ji...@apache.org> on 2021/06/23 21:04:00 UTC

[jira] [Created] (YARN-10831) Allow checking over-commit after reconnect event

Zhengbo Li created YARN-10831:
---------------------------------

             Summary: Allow checking over-commit after reconnect event
                 Key: YARN-10831
                 URL: https://issues.apache.org/jira/browse/YARN-10831
             Project: Hadoop YARN
          Issue Type: Improvement
            Reporter: Zhengbo Li


Currently the container over-commit check is skipped after a node re-connect event, because the "timeout" period is always default to -1, which makes the `signalContainersIfOvercommitted` method skip the check:

[line link]([https://github.com/apache/hadoop/blob/03cfc852791c14fad39db4e5b14104a276c08e59/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AbstractYarnScheduler.java#L1260)]

However in our case, because of the type of VM we use a node's resource could change after a re-connect event. That means its CPU core / memory could be less then causing container overcommit. Therefore we should allow configuring the timeout period for reconnect event to be non -1 value to perform suck overcommit check.

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-dev-help@hadoop.apache.org