You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "Varun Saxena (JIRA)" <ji...@apache.org> on 2016/10/16 17:16:20 UTC

[jira] [Commented] (YARN-3816) [Aggregation] App-level aggregation and accumulation for YARN system metrics

    [ https://issues.apache.org/jira/browse/YARN-3816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15580259#comment-15580259 ] 

Varun Saxena commented on YARN-3816:
------------------------------------

[~gtCarrera9], [~sjlee0], few questions.
# We do not aggregate the entities reported since last aggregation run when app collector finishes. Is this intentional ? We however would miss only the last set of metrics which should be fine.
# We also have aggregation interval fixed at 15 sec. Has it not been made configurable due to concerns with somebody setting it too low or too high ? 
# Would it be better to use time weighted average for aggregated metrics. For instance, we aggregate metrics every 15 sec. And in that period container metrics would be reported 4-5 times. Right now, we take the latest reported metrics which means a momentary spike or very low value can influence the aggregated metric value. A time weighted average for each container may avoid application aggregated metrics being influenced by momentary blips in CPU usage. However, this in real scenario may balance out when multiple containers are running concurrently.

> [Aggregation] App-level aggregation and accumulation for YARN system metrics
> ----------------------------------------------------------------------------
>
>                 Key: YARN-3816
>                 URL: https://issues.apache.org/jira/browse/YARN-3816
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineserver
>            Reporter: Junping Du
>            Assignee: Li Lu
>              Labels: yarn-2928-1st-milestone
>             Fix For: 3.0.0-alpha1
>
>         Attachments: Application Level Aggregation of Timeline Data.pdf, YARN-3816-YARN-2928-v1.patch, YARN-3816-YARN-2928-v2.1.patch, YARN-3816-YARN-2928-v2.2.patch, YARN-3816-YARN-2928-v2.3.patch, YARN-3816-YARN-2928-v2.patch, YARN-3816-YARN-2928-v3.1.patch, YARN-3816-YARN-2928-v3.patch, YARN-3816-YARN-2928-v4.patch, YARN-3816-YARN-2928-v5.patch, YARN-3816-YARN-2928-v6.patch, YARN-3816-YARN-2928-v7.patch, YARN-3816-YARN-2928-v8.patch, YARN-3816-YARN-2928-v9.patch, YARN-3816-feature-YARN-2928.v4.1.patch, YARN-3816-poc-v1.patch, YARN-3816-poc-v2.patch
>
>
> We need application level aggregation of Timeline data:
> - To present end user aggregated states for each application, include: resource (CPU, Memory) consumption across all containers, number of containers launched/completed/failed, etc. We need this for apps while they are running as well as when they are done.
> - Also, framework specific metrics, e.g. HDFS_BYTES_READ, should be aggregated to show details of states in framework level.
> - Other level (Flow/User/Queue) aggregation can be more efficient to be based on Application-level aggregations rather than raw entity-level data as much less raws need to scan (with filter out non-aggregated entities, like: events, configurations, etc.).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org