You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Andy Hoang <an...@parcelperform.com> on 2019/06/20 10:39:57 UTC

CoFlatMapFunction vs BroadcastProcessFunction

Hi guys,

I read about
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Dynamic-Rule-Evaluation-in-Flink-td21125.html#a21241 <http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Dynamic-Rule-Evaluation-in-Flink-td21125.html#a21241>
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/BroadcastStream-vs-Broadcasted-DataStream-td23712.html <http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/BroadcastStream-vs-Broadcasted-DataStream-td23712.html>

I tried to use those 2 classes for my problem: One stream as config stream to change behavior on another event stream http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Weird-behavior-with-CoFlatMapFunction-td28140.html <http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Weird-behavior-with-CoFlatMapFunction-td28140.html> (solved, thanks to Fabian)

2 of that implementation basically the same, each of classes we have to implement to method:
flatmap1 vs processElement: process the “event” stream 
flatmap2 vs processBroadcastElement: process the “config” stream

While those implementation is quite similar, I’m not sure which one I should pick.
My gut make me feel like I haven’t harness all the angles of BroadcastProcessFunction yet. I’m curious in which situation we should use this classes, because even with the example in Doc page: https://ci.apache.org/projects/flink/flink-docs-release-1.8/dev/stream/state/broadcast_state.html#important-considerations <https://ci.apache.org/projects/flink/flink-docs-release-1.8/dev/stream/state/broadcast_state.html#important-considerations> we can still use CoFlatMapFunction to do it.

There come another sample I got from google which is using both: https://gist.github.com/syhily/932e0d1e0f12b3e951236d6c36e5a7ed#file-twostreamingjoining-scala-L178 <https://gist.github.com/syhily/932e0d1e0f12b3e951236d6c36e5a7ed#file-twostreamingjoining-scala-L178> I haven’t got the idea what would it want to do:

rule.broadcast.connect…
and connect again
.connect(broadcastStream)

Maybe this is the missing piece that I haven’t understand about BroadcastProcessFunction

I hope you guys can point me some direction on how/when I would choose which classes.

Thanks,

Andy


Re: CoFlatMapFunction vs BroadcastProcessFunction

Posted by Dawid Wysakowicz <dw...@apache.org>.
Hi Andy,

I think the second link you posted describes the differences between the
CoFlatMap vs BroadcastStream approach very well. I will try to summarize
them again. The are two main differences:

1. With CoFlatMap either both inputs can be keyed or none. You cannot
have one input keyed and the other one non keyed. This means also the
state will be stored in operator state.

2. The second difference is how the state is handled. The broadcast
stream implements the broadcast state handling for you. It will ensure
that upon restore all operators will have the same broadcast state, even
if you rescale. I think you can say in this approach not only the stream
is broadcasted, but also the state is broadcasted (on the broadcast side).

As for the example you linked, I don't know enough about it to tell
anything about what it does and why it does that way.

Best,

Dawid

On 20/06/2019 12:39, Andy Hoang wrote:

> Hi guys,
>
> I read about
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Dynamic-Rule-Evaluation-in-Flink-td21125.html#a21241
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/BroadcastStream-vs-Broadcasted-DataStream-td23712.html
>
> I tried to use those 2 classes for my problem: One stream as config
> stream to change behavior on another event
> stream http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Weird-behavior-with-CoFlatMapFunction-td28140.html (solved,
> thanks to Fabian)
>
> 2 of that implementation basically the same, each of classes we have
> to implement to method:
> flatmap1 vs processElement: process the “event” stream 
> flatmap2 vs processBroadcastElement: process the “config” stream
>
> While those implementation is quite similar, I’m not sure which one I
> should pick.
> My gut make me feel like I haven’t harness all the angles
> of BroadcastProcessFunction yet. I’m curious in which situation we
> should use this classes, because even with the example in Doc
> page: https://ci.apache.org/projects/flink/flink-docs-release-1.8/dev/stream/state/broadcast_state.html#important-considerations we
> can still use CoFlatMapFunction to do it.
>
> There come another sample I got from google which is using
> both: https://gist.github.com/syhily/932e0d1e0f12b3e951236d6c36e5a7ed#file-twostreamingjoining-scala-L178 I
> haven’t got the idea what would it want to do:
>
> rule.broadcast.connect…
> and connect again
> .connect(broadcastStream)
>
> Maybe this is the missing piece that I haven’t understand
> about BroadcastProcessFunction
>
> I hope you guys can point me some direction on how/when I would choose
> which classes.
>
> Thanks,
>
> Andy
>