You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-dev@hadoop.apache.org by "Prashant Golash (JIRA)" <ji...@apache.org> on 2019/06/28 17:42:00 UTC

[jira] [Created] (YARN-9656) Plugin to avoid scheduling jobs on node which are not in "schedulable" state, but are healthy otherwise.

Prashant Golash created YARN-9656:
-------------------------------------

             Summary: Plugin to avoid scheduling jobs on node which are not in "schedulable" state, but are healthy otherwise.
                 Key: YARN-9656
                 URL: https://issues.apache.org/jira/browse/YARN-9656
             Project: Hadoop YARN
          Issue Type: Improvement
          Components: nodemanager, resourcemanager
    Affects Versions: 3.1.2
            Reporter: Prashant Golash


Creating this Jira to get idea from the community if this is something helpful which can be done in YARN. Some times the nodes go in a bad state for e.g. (H/W problem: I/O is bad; Fan problem). In some other scenarios, if CGroup is not enabled, nodes may be running very high on CPU and the jobs scheduled on them will suffer.

 

The idea is three-fold:
 # Gather relevant metrics from node-managers and put in some form (for e.g. exclude file).
 # RM loads the files and put the nodes as part of the blacklist.
 # Once the node becomes good, they can again be put in the whitelist.

Various optimizations can be done here, but I would like to understand if this is something which could be helpful as an upstream feature in YARN.

 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-dev-help@hadoop.apache.org