You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "Linlin Zhou (JIRA)" <ji...@apache.org> on 2017/10/13 06:40:00 UTC
[jira] [Commented] (YARN-7109) Extend aggregation operation for new ATS design

    [ https://issues.apache.org/jira/browse/YARN-7109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16203115#comment-16203115 ] 

Linlin Zhou commented on YARN-7109:
-----------------------------------

Totally 7 aggregation functions are added in the patch, they are:
Min, Avg, Count, Median, TopKMin, TopKMax, MaxFreq.
Some comments and questions on the patch:
1. What's the expected functionality of the third parameter 'state'
in below method of TimelineMetricOperation?
abstract TimelineMetric exec(TimelineMetric incoming, TimelineMetric base,
   Map<Object, Object> state);

All the newly added functions except for Min are 'stateful',
e.g. Avg needs to record both the current average value
and the number of metrics which produced the value,
Median needs to store all the metrics fed to it.
But I don’t prefer to put the extra data structures into ‘state’ object.
In the patch, I only use 'state' to hold the k value of TopKMin and TopKMax and the capacity param of Median and MaxFreq.

2. Introduction of TimelineMetricAggregator
This is an abstract base class which encapsulatesthe data and operations needed by an aggregated function. It offers two methods: update(TimelineMetric incoming) and List<Number> getAggregatedValues().
Each aggregation function has a corresponding subclass, e.g. TimelineMetricAggregatorMedian maps to the Median function. 
When we need to add a new aggregation function, we add a new subclass of TimelineMetricAggregator. 

3. Does aggregation only apply to SINGEL_VALUE type? Do we need to support TIME_SERIES?
Implementation of ‘Max’ and ‘Sum’ function treat the incoming metric as SINGLE_VALUE.
In the modified version, I do not make such assumption. Each function will aggregate all the value in values(TreeMap), that is I take TIME_SERIES as multiple SINGLE_VALUE metrics and aggregate them one by one.

4. MaxFreq and Median store all the metrics feeded bounded with or without capacity limit?
To avoid out of memory, user is allowed to set a capacity to restrict the max number of metrics stored.  When metrics reach the capacity, the oldest metrics will be removed.
If user do not specified the capacity, we will store all the metrics.

5. The capacity and k parameters can only be set at the initial time.
For simplicity and elegance, currently the two parameters can only set once.

6. ‘Max’ and ‘Sum’ function will change the first incoming metric, is it ok?
Originally Max and Sum will take the first incoming metric as the aggregated metric, this metric will be changed after more aggregations. In the new version, the aggregated metric is always a new instance and won’t change the first incoming metric.

7. TimelineMetric offers getAggregatedValues() method.
I assume we only care value in an aggregated metric, timestamp can be omitted. TopKMax, TopKMin and MaxFreq may return multiple values, for convenience and meaningful I added the method getAggregatedValues(). For compatibility, we can still use getSingelDateValue and getValues() to get the aggregated results.

8. The result of TopK function does not guarantee any particular order.

9. Do I need to set aggregator as xmlElement?
“TimelineMetricAggregator aggregator;” is a new field in TimelineMetric, I see other fields like type, id, values all have annotation of ‘@XmlElement’, should aggregator do the same setting?

10. TimelineMetricCalculator do not support comparasion between Integer/Long and Float/Double, is this by design?

11. Do we really need the replace aggregation function?
Currently I do not modify it as ‘Max’ and ‘Sum’, and I can’t think of a scenario where this function is needed. 

11. Is there any other aggregation function need to add?[^attachment-name.zip]

> Extend aggregation operation for new ATS design
> -----------------------------------------------
>
>                 Key: YARN-7109
>                 URL: https://issues.apache.org/jira/browse/YARN-7109
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineserver
>            Reporter: Zian Chen
>            Assignee: Linlin Zhou
>              Labels: patch
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org