You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Alieh <sa...@informatik.uni-leipzig.de> on 2017/04/19 12:42:40 UTC

Flink groupBy

Hi All

Is there anyway in Flink to send a process to a reducer?

If I do "test.groupby(1).reduceGroup", each group is processed on one 
reducer? And if the number of groups is more than the number of task 
slots we have, does Flink distribute the process evenly? I mean if we 
have for example groups of size 10, 5, 5 and we have two task slots, is 
the process distributed in this way?

task slot1: group of size 10

task slot2: two groups of size 5

Best,

Alieh


Re: Flink groupBy

Posted by Fabian Hueske <fh...@gmail.com>.
Hi Alieh,

Flink uses hash partitioning to assign grouping keys to parallel tasks by
default.
You can implement a custom partitioner or use range partitioning (which has
some overhead) to control the skew.

There is no automatic load balancing happening.

Best, Fabian

2017-04-19 14:42 GMT+02:00 Alieh <sa...@informatik.uni-leipzig.de>:

> Hi All
>
> Is there anyway in Flink to send a process to a reducer?
>
> If I do "test.groupby(1).reduceGroup", each group is processed on one
> reducer? And if the number of groups is more than the number of task slots
> we have, does Flink distribute the process evenly? I mean if we have for
> example groups of size 10, 5, 5 and we have two task slots, is the process
> distributed in this way?
>
> task slot1: group of size 10
>
> task slot2: two groups of size 5
>
> Best,
>
> Alieh
>
>