You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tez.apache.org by "Eric Wohlstadter (JIRA)" <ji...@apache.org> on 2018/03/13 19:39:00 UTC

[jira] [Commented] (TEZ-2161) Support CRDT aggregation models for counters

    [ https://issues.apache.org/jira/browse/TEZ-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16397524#comment-16397524 ] 

Eric Wohlstadter commented on TEZ-2161:
---------------------------------------

[~gopalv]

I'm not planning to do an actual CRDT implementation for this ticket. I do think that would be really valuable (I have used Akka's CRDT in a previous project), but it's not needed for the use-case where counters are aggregated in the getAllCounters() flow:

i.e. DAG aggregates Vertices which aggregates Tasks which chooses "bestAttempt". And the whole thing runs in various locks. This getAllCounters() flow executes locally on the AM.
----
My plan is to add "aggregateAllCounters" to the CounterGroup classes, which will be used similarly to "incrAllCounters", except instead of only doing SUM, it also does MIN, AVG, MAX. 

Since MIN, AVG, MAX have very efficient incremental algorithms ("embarrassingly incremental"), the overhead should be negligible (but this will need to be profiled to verify my claim). 
----
Since I'm not doing CRDT, do you think we should rename this ticket or should I open a different ticket?

> Support CRDT aggregation models for counters 
> ---------------------------------------------
>
>                 Key: TEZ-2161
>                 URL: https://issues.apache.org/jira/browse/TEZ-2161
>             Project: Apache Tez
>          Issue Type: Bug
>            Reporter: Hitesh Shah
>            Assignee: Eric Wohlstadter
>            Priority: Major
>
> Some counters such as last event received time need to be handled different to say bytes read counters. Bytes reads requires a summation across all tasks within a vertex. The received time requires doing a max() across all the tasks. First event received time would likely need a min().



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)