You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by John Tipper <jo...@hotmail.com> on 2019/06/14 07:24:25 UTC

How to join/group 2 streams by key?

Hi All,


I have 2 streams of events that relate to a common base event, where one stream is the result of a flatmap. I want to join all events that share a common identifier.

Thus I have something that looks like:

DataStream<TypeA> streamA = ...
DataStream<TypeB> streamB = someDataStream.flatMap(...) // produces stream of TypeB for each item in someDataStream


Both TypeA and TypeB share an identifier and I know how many TypeB objects there are in the parent object. I want to perform some processing when all of the events associated with a particular identifier have arrived, i.e. when I basically can create a Tuple3<id, TypeA, List<TypeB>> object.

Is this best done with a WindowJoin and a GlobalWindow, a Window CoGroup and a GlobalWindow or by connecting the 2 streams into a ConnectedStream then performing the joining inside a CoProcessFunction?

Many thanks,

John

Re: How to join/group 2 streams by key?

Posted by Congxian Qiu <qc...@gmail.com>.
Hi John
I've seen other people have the same problem to solve,  the following is
their solution:
union the two Datastreams, then use ProcsssFunction[1] to solve this, will
also register timers to do GC things.

[1]
https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/operators/process_function.html
Best,
Congxian


John Tipper <jo...@hotmail.com> 于2019年6月14日周五 下午3:24写道:

> Hi All,
>
>
> I have 2 streams of events that relate to a common base event, where one
> stream is the result of a flatmap. I want to join all events that share a
> common identifier.
>
> Thus I have something that looks like:
>
> DataStream<TypeA> streamA = ...
> DataStream<TypeB> streamB = someDataStream.flatMap(...) // produces stream of TypeB for each item in someDataStream
>
> Both TypeA and TypeB share an identifier and I know how many TypeB objects
> there are in the parent object. I want to perform some processing when all
> of the events associated with a particular identifier have arrived, i.e.
> when I basically can create a Tuple3<id, TypeA, List<TypeB>> object.
>
> Is this best done with a WindowJoin and a GlobalWindow, a Window CoGroup and
> a GlobalWindow or by connecting the 2 streams into a ConnectedStream then
> performing the joining inside a CoProcessFunction?
>
> Many thanks,
>
> John
>