You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Alieh <sa...@informatik.uni-leipzig.de> on 2017/04/19 12:42:40 UTC
Flink groupBy
Hi All
Is there anyway in Flink to send a process to a reducer?
If I do "test.groupby(1).reduceGroup", each group is processed on one
reducer? And if the number of groups is more than the number of task
slots we have, does Flink distribute the process evenly? I mean if we
have for example groups of size 10, 5, 5 and we have two task slots, is
the process distributed in this way?
task slot1: group of size 10
task slot2: two groups of size 5
Best,
Alieh
Re: Flink groupBy
Posted by Fabian Hueske <fh...@gmail.com>.
Hi Alieh,
Flink uses hash partitioning to assign grouping keys to parallel tasks by
default.
You can implement a custom partitioner or use range partitioning (which has
some overhead) to control the skew.
There is no automatic load balancing happening.
Best, Fabian
2017-04-19 14:42 GMT+02:00 Alieh <sa...@informatik.uni-leipzig.de>:
> Hi All
>
> Is there anyway in Flink to send a process to a reducer?
>
> If I do "test.groupby(1).reduceGroup", each group is processed on one
> reducer? And if the number of groups is more than the number of task slots
> we have, does Flink distribute the process evenly? I mean if we have for
> example groups of size 10, 5, 5 and we have two task slots, is the process
> distributed in this way?
>
> task slot1: group of size 10
>
> task slot2: two groups of size 5
>
> Best,
>
> Alieh
>
>