You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flink.apache.org by "Max Michels (JIRA)" <ji...@apache.org> on 2015/03/02 10:39:05 UTC

[jira] [Created] (FLINK-1621) Create a generalized combine function

Max Michels created FLINK-1621:
----------------------------------

             Summary: Create a generalized combine function
                 Key: FLINK-1621
                 URL: https://issues.apache.org/jira/browse/FLINK-1621
             Project: Flink
          Issue Type: Improvement
          Components: Distributed Runtime
    Affects Versions: 0.9
            Reporter: Max Michels
             Fix For: 0.9


Flink allows combiners which accept a type {{I}} and combine the values of this type into type {{O}}. In Google Dataflow, combiners are more generalized. They accept an Input {{I}}, produce an intermediate combine value of {{T}}, and finally an output {{O}}. Flink's combiners are like the {{SimpleCombineFn}} in Google Dataflow.

Right now, we translate the {{KeyedCombineFn}} into a {{SortPartition}} followed by a {{MapPartition}} to emulate the Combiner's behavior. Rudimentary performance tests showed that this behavior causes a significant increase in run time compared to the proper Combine implementation.

Let's implement a more generalized Combiner to create a better mapping from Google Dataflow to Flink.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)