You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Aeden Jameson <ae...@gmail.com> on 2021/03/18 17:28:07 UTC

Eliminating Shuffling Under FlinkSQL

It's my understanding that a group by is also a key by under the hood.
As a result that will cause a shuffle operation to happen. Our source
is a Kafka topic that is keyed so that any give partition contains all
the data that is needed for any given consuming TM. Is there a way
using FlinkSQL to eliminate the shuffle operation? Or I'm missing
details other details that would make such a change undesirable?

Thank you,
Aeden

Re: Eliminating Shuffling Under FlinkSQL

Posted by Dawid Wysakowicz <dw...@apache.org>.
Your understanding of a group by is correct. It is equivalent to a key
by. I agree it would be a great feature to keep the Source's
partitioning but unfortunately as of now it is not yet supported.

Best,

Dawid

On 18/03/2021 18:28, Aeden Jameson wrote:
> It's my understanding that a group by is also a key by under the hood.
> As a result that will cause a shuffle operation to happen. Our source
> is a Kafka topic that is keyed so that any give partition contains all
> the data that is needed for any given consuming TM. Is there a way
> using FlinkSQL to eliminate the shuffle operation? Or I'm missing
> details other details that would make such a change undesirable?
>
> Thank you,
> Aeden