You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Gabor Gevay (JIRA)" <ji...@apache.org> on 2015/09/26 11:29:04 UTC

[jira] [Commented] (FLINK-2237) Add hash-based Aggregation

    [ https://issues.apache.org/jira/browse/FLINK-2237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14909174#comment-14909174 ] 

Gabor Gevay commented on FLINK-2237:
------------------------------------

Couldn't the CompactingHashTable be reused here? The driver would get a prober from it, and then for each incoming record, the driver would call getMatchFor, then do one step of the reduce, and then write the result back with updateMatch.

I'm interested in this feature because I have a computation where it would really make a huge difference: I'm calling flatMap on an already large data set, which blows it up to about 10 times the size, and then groupBy and reduce. If Flink had hash-based aggregation, then the result of the flatMap wouldn't need to be materialized, which would reduce the memory requirement to about 1/10.

> Add hash-based Aggregation
> --------------------------
>
>                 Key: FLINK-2237
>                 URL: https://issues.apache.org/jira/browse/FLINK-2237
>             Project: Flink
>          Issue Type: New Feature
>            Reporter: Rafiullah Momand
>            Priority: Minor
>
> Aggregation functions at the moment are implemented in a sort-based way.
> How can we implement hash based Aggregation for Flink?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)