You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Will Lauer <wl...@yahooinc.com> on 2022/02/14 15:08:40 UTC

How to cogroup multiple streams?

OK, here's what I hope is a stupid question: what's the most efficient way
to co-group more than 2 DataStreams together? I'm looking at porting a
pipeline from pig to flink, and in a couple of places I use Pig's COGROUP
functionality to simultaneously group 3 or 4 and sometimes even more
datasets on the same key simultaneously. Looking at the Datastream API, I
see how to group 2 datastreams, but I don't see anything obvious for
processing more than two simultaneously. Obviously I could cogroup two,
then cogroup the result with the next one, etc adding each stream serially
to the result, but that seems inefficient. Is there a better way?

Will



Will Lauer


Senior Principal Architect, Audience & Advertising Reporting

Data Platforms & Systems Engineering


M 508 561 6427

Champaign Office

1908 S. First St

Champaign, IL 61822

Re: How to cogroup multiple streams?

Posted by Chesnay Schepler <ch...@apache.org>.
You could first transform each stream to a common format (in the worst 
case, an ugly Either-like capturing all possible types), union those 
streams, and then do a keyBy + window function.

This is how coGroup is implemented internally.

On 14/02/2022 16:08, Will Lauer wrote:
> OK, here's what I hope is a stupid question: what's the most efficient 
> way to co-group more than 2 DataStreams together? I'm looking at 
> porting a pipeline from pig to flink, and in a couple of places I use 
> Pig's COGROUP functionality to simultaneously group 3 or 4 and 
> sometimes even more datasets on the same key simultaneously. Looking 
> at the Datastream API, I see how to group 2 datastreams, but I don't 
> see anything obvious for processing more than two simultaneously. 
> Obviously I could cogroup two, then cogroup the result with the next 
> one, etc adding each stream serially to the result, but that seems 
> inefficient. Is there a better way?
>
> Will
>
>
> *
> *
>
> Will Lauer
>
> *
> *
>
> Senior Principal Architect, Audience & Advertising Reporting
>
> Data Platforms & Systems Engineering
>
> *
> *
>
> M 508 561 6427
>
> Champaign Office
>
> 1908 S. First St
>
> Champaign, IL 61822
>