You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Devaraj K (Commented) (JIRA)" <ji...@apache.org> on 2011/10/11 18:29:11 UTC

[jira] [Commented] (MAPREDUCE-3059) QueueMetrics do not have metrics for aggregate containers-allocated and aggregate containers-released

    [ https://issues.apache.org/jira/browse/MAPREDUCE-3059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13125166#comment-13125166 ] 

Devaraj K commented on MAPREDUCE-3059:
--------------------------------------

I think we can address this issue at a broader level.

Metrics is definitely one of the important aspects.

As Karam Singh mentioned, we do have metrics at NodeManager level.

What if we gather all the metrics from all NodeManagers at the cetral point which is ResourceManager?

Following points can be considered
1. Minimal processing and communication overhead on the cluster.
2. Addition of more metrics in future
3. Configurable - e.g. OnDemand-admin should be to trigger it from resource manager web UI. or periodic refresh

following two solutions i could think of. 

Solution 1:
Either we can provide a configuration or a link On the ResourceManager UI or through jmx trigger point, we can provide a way to trigger the gathering of metrics from each node manager.
It involves a service on ResourceManager side, which can be a RPC service, which will accept metrics update requests from all NodeManagers.
When Administrator triggers the gathering of metrics, the NodeManager will be informed to report the metrics to ResourceManager through the heart beat response.

Solution 2:
ResourceManager, NodeManager, MRAppMaster everyone bydefault support for org.apache.hadoop.metrics.MetricsServlet which returns the data in JSON format.
ResourceManager can have a service which connects to all NodeManager's MetricsSevlet and uses the JSON data to prepare the metrics information.

please add your views.
                
> QueueMetrics do not have metrics for aggregate containers-allocated and aggregate containers-released
> -----------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-3059
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3059
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2
>    Affects Versions: 0.23.0
>            Reporter: Karam Singh
>            Assignee: Devaraj K
>            Priority: Blocker
>             Fix For: 0.23.0
>
>
> QueueMetrics for ResourceManager do not have any metrics for aggregate containers-allocated and containers-released.
> We need the aggregates of containers-allocated and containers-released to figure out the rate at which RM is dishing out containers. NodeManager do have containers-launched and container-released metrics, but this is not across all nodes; so to get the cluster level aggregate, we need to preprocess NM metrics from all nodes - which is troublesome.
> Currently, we do have AllocatedContainers and PendingContainers which reflect the running containers given out to AMs, and containers waiting for allocation respectively.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira