You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Ken Krugler <kk...@transpac.com> on 2018/05/04 22:46:54 UTC

Use of AggregateFunction's merge() method

I’m trying to figure out when/why the AggregateFunction.merge() <https://ci.apache.org/projects/flink/flink-docs-release-1.4/api/java/org/apache/flink/api/common/functions/AggregateFunction.html#merge-ACC-ACC-> method is called in a streaming job, to ensure I’ve implemented it properly.

The documentation for AggregateFunction says "Merging intermediate aggregates (partial aggregates) means merging the accumulators.”

But that sounds more like a combiner in batch processing, not streaming.

From the code, it seems like this could be called if a MergingWindowAssigner is used, right?

And is there any other situation in streaming where merge() could be called?

Thanks,

— Ken

--------------------------------------------
http://about.me/kkrugler
+1 530-210-6378


Re: Use of AggregateFunction's merge() method

Posted by Fabian Hueske <fh...@gmail.com>.
Hi Ken,

You are right. The merge() method combines partial aggregates, similar to a
combinable reducer.

The only situation when merge() is called in a DataStream job (that I am
aware of) is when session windows get merged.
For example when you define a session window with 30 minute gap and you
receive the following records
R1, 12:00:00
R2, 12:05:00
R3, 12:40:00
R4, 12:20:00

In this case, Flink R1 will create a new window W1, R2 will be assigned to
W1, R3 creates a new window W2, and R4 connects and merges W1 and W2.

Best, Fabian

2018-05-05 0:46 GMT+02:00 Ken Krugler <kk...@transpac.com>:

> I’m trying to figure out when/why the AggregateFunction.merge()
> <https://ci.apache.org/projects/flink/flink-docs-release-1.4/api/java/org/apache/flink/api/common/functions/AggregateFunction.html#merge-ACC-ACC->
>  method is called in a streaming job, to ensure I’ve implemented it
> properly.
>
> The documentation for AggregateFunction says "Merging intermediate
> aggregates (partial aggregates) means merging the accumulators.”
>
> But that sounds more like a combiner in batch processing, not streaming.
>
> From the code, it seems like this could be called if a
> MergingWindowAssigner is used, right?
>
> And is there any other situation in streaming where merge() could be
> called?
>
> Thanks,
>
> — Ken
>
> --------------------------------------------
> http://about.me/kkrugler
> +1 530-210-6378
>
>