You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "chaosju (Jira)" <ji...@apache.org> on 2021/04/07 13:32:00 UTC

[jira] [Issue Comment Deleted] (YARN-10450) Add cpu and memory utilization per node and cluster-wide metrics

     [ https://issues.apache.org/jira/browse/YARN-10450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

chaosju updated YARN-10450:
---------------------------
    Comment: was deleted

(was: Why adaptive Heartbeat ?
 * {color:#ff0000}Regular heartbeats can overload RM.{color}
 * {color:#ff0000}if RM is overloaded things get worse over time as events queue up.{color}
 * Lower work efficiency as important events at NM/AM need to wait for next heartbeat to let RM know of their status.
 * Not every heartbeat from a node or AM may be important. If nodes are running full, heartbeats from such nodes would not be useful for application scheduling. 
 * RM should be able to control heartbeats sent to itself

How adaptive Heartbeat ?

1.Throttle Heartbeat: 
 * {color:#ff0000} HB interval based on scheduler load (LIGHT, NORMAL, BUSY, HEAVY){color}
 * Statistics associated with various scheduler events (processing time vs wait time in queue) is collected. 
 * RM indicates the next HB interval to NM and AM to throttle the heartbeat.

2. Event based Heartbeat:
 * Send out of band heartbeat to send emergent request such as new resource requests, container completion etc. before the heartbeat interval indicated by RM. 
 * RM can notify AM when the containers have been allocated so that AM does not have to wait for the scheduled heartbeat to get resources.

 Reference:https://www.slideshare.net/vsaxenavarun/venturing-into-large-hadoop-clusters

[~Jim_Brennan] )

> Add cpu and memory utilization per node and cluster-wide metrics
> ----------------------------------------------------------------
>
>                 Key: YARN-10450
>                 URL: https://issues.apache.org/jira/browse/YARN-10450
>             Project: Hadoop YARN
>          Issue Type: Improvement
>          Components: yarn
>    Affects Versions: 3.3.1
>            Reporter: Jim Brennan
>            Assignee: Jim Brennan
>            Priority: Minor
>             Fix For: 3.2.2, 3.4.0, 3.3.1, 3.1.5, 2.10.2, 3.2.3
>
>         Attachments: NodesPage.png, YARN-10450-branch-2.10.003.patch, YARN-10450-branch-3.1.003.patch, YARN-10450-branch-3.2.003.patch, YARN-10450.001.patch, YARN-10450.002.patch, YARN-10450.003.patch
>
>
> Add metrics to show actual cpu and memory utilization for each node and aggregated for the entire cluster.  This is information is already passed from NM to RM in the node status update.
> We have been running with this internally for quite a while and found it useful to be able to quickly see the actual cpu/memory utilization on the node/cluster.  It's especially useful if some form of overcommit is used.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org