You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "Wangda Tan (JIRA)" <ji...@apache.org> on 2018/03/08 18:23:00 UTC

[jira] [Commented] (YARN-7844) Expose metrics for scheduler operation (allocate, schedulerEvent) to JMX

    [ https://issues.apache.org/jira/browse/YARN-7844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16391672#comment-16391672 ] 

Wangda Tan commented on YARN-7844:
----------------------------------

Thanks [~ywskycn], I briefly took a look at the patch, and I like the idea to record scheduler events, but I'm not sure should we record frequency in addition to (or instead of) per-invoke latency. In many cases lock contention inside scheduler (especially after 3.0 since we improved lots of scheduler related locking and performance issues) impact less of actual container allocation latency. The frequency could be a very useful to analysis scheduler performance, for example, allocation call. I think we already have some of the frequency metrics. 

And in addition to scheduler events, some other operations can be recorded, such as getQueueInfo/appInfo call, we saw many customer's prod deployment impacted by huge number of read calls since it will grab locks of scheduler.

> Expose metrics for scheduler operation (allocate, schedulerEvent) to JMX
> ------------------------------------------------------------------------
>
>                 Key: YARN-7844
>                 URL: https://issues.apache.org/jira/browse/YARN-7844
>             Project: Hadoop YARN
>          Issue Type: Improvement
>          Components: scheduler
>            Reporter: Wei Yan
>            Assignee: Wei Yan
>            Priority: Major
>         Attachments: YARN-7844.000.patch, YARN-7844.001.patch
>
>
> Currently FairScheduler's FSOpDurations records some scheduler operation metrics: nodeUpdateCall, preemptCall, etc. We may need similar for CapacityScheduler. Also, need to add more metrics there. This could help monitor the RM scheduler performance, and get more insights whether scheduler is under-pressure.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org