You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2016/03/18 16:21:33 UTC

[jira] [Commented] (FLINK-3477) Add hash-based combine strategy for ReduceFunction

    [ https://issues.apache.org/jira/browse/FLINK-3477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15201616#comment-15201616 ] 

ASF GitHub Bot commented on FLINK-3477:
---------------------------------------

Github user fhueske commented on the pull request:

    https://github.com/apache/flink/pull/1517#issuecomment-198406921
  
    Hi @ggevay, how are things going? 
    I would like to add this feature soon and will do another review next week.
    Have you started with the benchmarks or time to do them in the next 1 to 2 weeks?



> Add hash-based combine strategy for ReduceFunction
> --------------------------------------------------
>
>                 Key: FLINK-3477
>                 URL: https://issues.apache.org/jira/browse/FLINK-3477
>             Project: Flink
>          Issue Type: Sub-task
>          Components: Local Runtime
>            Reporter: Fabian Hueske
>
> This issue is about adding a hash-based combine strategy for ReduceFunctions.
> The interface of the {{reduce()}} method is as follows:
> {code}
> public T reduce(T v1, T v2)
> {code}
> Input type and output type are identical and the function returns only a single value. A Reduce function is incrementally applied to compute a final aggregated value. This allows to hold the preaggregated value in a hash-table and update it with each function call. 
> The hash-based strategy requires special implementation of an in-memory hash table. The hash table should support in place updates of elements (if the updated value has the same size as the new value) but also appending updates with invalidation of the old value (if the binary length of the new value differs). The hash table needs to be able to evict and emit all elements if it runs out-of-memory.
> We should also add {{HASH}} and {{SORT}} compiler hints to {{DataSet.reduce()}} and {{Grouping.reduce()}} to allow users to pick the execution strategy.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)