You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tez.apache.org by "Gopal V (JIRA)" <ji...@apache.org> on 2018/03/13 19:51:00 UTC

[jira] [Comment Edited] (TEZ-2161) Support CRDT aggregation models for counters

    [ https://issues.apache.org/jira/browse/TEZ-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16397546#comment-16397546 ] 

Gopal V edited comment on TEZ-2161 at 3/13/18 7:50 PM:
-------------------------------------------------------

bq. DAG aggregates Vertices which aggregates Tasks which chooses "bestAttempt". And the whole thing runs in various locks. This getAllCounters() flow executes locally on the AM.

CRDT was more for the P-N counter implementation for aggregates which store both the -ve and +ve movements of the counter.

This is useful for things like CPU time where the single counter can hold both the "wasted CPU" and the "spent CPU" in the same structure.

bq. My plan is to add "aggregateAllCounters" to the CounterGroup classes, which will be used similarly to "incrAllCounters", except instead of only doing SUM, it also does MIN, AVG, MAX.

The Counter needs sub-classes which declare what it needs to aggregate on - adding fields to every counter will break everything downstream that exists today.

Adding a MAX_GC_MILLIS counter with new semantics explicitly is better than messing with the existing GC_MILLIS counter.


was (Author: gopalv):
bq. DAG aggregates Vertices which aggregates Tasks which chooses "bestAttempt". And the whole thing runs in various locks. This getAllCounters() flow executes locally on the AM.

CRDT was more for the P-N counter implementation for aggregates which store both the -ve and +ve movements of the counter.

bq. My plan is to add "aggregateAllCounters" to the CounterGroup classes, which will be used similarly to "incrAllCounters", except instead of only doing SUM, it also does MIN, AVG, MAX.

The Counter needs sub-classes which declare what it needs to aggregate on - adding fields to every counter will break everything downstream that exists today.

Adding a MAX_GC_MILLIS counter with new semantics explicitly is better than messing with the existing GC_MILLIS counter.

> Support CRDT aggregation models for counters 
> ---------------------------------------------
>
>                 Key: TEZ-2161
>                 URL: https://issues.apache.org/jira/browse/TEZ-2161
>             Project: Apache Tez
>          Issue Type: Bug
>            Reporter: Hitesh Shah
>            Assignee: Eric Wohlstadter
>            Priority: Major
>
> Some counters such as last event received time need to be handled different to say bytes read counters. Bytes reads requires a summation across all tasks within a vertex. The received time requires doing a max() across all the tasks. First event received time would likely need a min().



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)