You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Will Lauer <wl...@yahooinc.com> on 2022/02/14 15:08:40 UTC
How to cogroup multiple streams?
OK, here's what I hope is a stupid question: what's the most efficient way
to co-group more than 2 DataStreams together? I'm looking at porting a
pipeline from pig to flink, and in a couple of places I use Pig's COGROUP
functionality to simultaneously group 3 or 4 and sometimes even more
datasets on the same key simultaneously. Looking at the Datastream API, I
see how to group 2 datastreams, but I don't see anything obvious for
processing more than two simultaneously. Obviously I could cogroup two,
then cogroup the result with the next one, etc adding each stream serially
to the result, but that seems inefficient. Is there a better way?
Will
Will Lauer
Senior Principal Architect, Audience & Advertising Reporting
Data Platforms & Systems Engineering
M 508 561 6427
Champaign Office
1908 S. First St
Champaign, IL 61822
Re: How to cogroup multiple streams?
Posted by Chesnay Schepler <ch...@apache.org>.
You could first transform each stream to a common format (in the worst
case, an ugly Either-like capturing all possible types), union those
streams, and then do a keyBy + window function.
This is how coGroup is implemented internally.
On 14/02/2022 16:08, Will Lauer wrote:
> OK, here's what I hope is a stupid question: what's the most efficient
> way to co-group more than 2 DataStreams together? I'm looking at
> porting a pipeline from pig to flink, and in a couple of places I use
> Pig's COGROUP functionality to simultaneously group 3 or 4 and
> sometimes even more datasets on the same key simultaneously. Looking
> at the Datastream API, I see how to group 2 datastreams, but I don't
> see anything obvious for processing more than two simultaneously.
> Obviously I could cogroup two, then cogroup the result with the next
> one, etc adding each stream serially to the result, but that seems
> inefficient. Is there a better way?
>
> Will
>
>
> *
> *
>
> Will Lauer
>
> *
> *
>
> Senior Principal Architect, Audience & Advertising Reporting
>
> Data Platforms & Systems Engineering
>
> *
> *
>
> M 508 561 6427
>
> Champaign Office
>
> 1908 S. First St
>
> Champaign, IL 61822
>