You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by "Vararu, Vadim" <va...@adswizz.com> on 2022/09/29 13:36:13 UTC

Global window in batch mode

Hi all,

I need to configure a keyed global window that would trigger a reduce function for all the events in each key group before the processing finishes and the job closes.


I have something similar for the realtime(streaming) version of the job, configured with a processing time gap:

.keyBy (new MyKeySelector ())
.window (ProcessingTimeSessionWindows.withGap (timeWindowInSeconds))
.reduce (new MyReduceFunction ())

However for the batch version of the job (reprocessing of the data), I would not use any time gap but rather a global window that would close and reduce just before the job ends.

Is that possible? I’ve seen that the global window does not have any trigger implemented by default. How can I implement/use a trigger that would trigger before job finishes?

Re: Global window in batch mode

Posted by Yunfeng Zhou <fl...@gmail.com>.
Hi Vararu,

Flink ML has a custom implementation of WindowAssigner, called
EndOfStreamWindows, that might help solve your problem. Please have a check
to see if this meets your requirements.
https://github.com/apache/flink-ml/blob/master/flink-ml-core/src/main/java/org/apache/flink/ml/common/datastream/EndOfStreamWindows.java

The usage would be like:
.keyBy(new MyKeySelector())
.window(EndOfStreamWindows.get())
.reduce(new MyReduceFunction())

Best,
Yunfeng Zhou


On Thu, Sep 29, 2022 at 9:36 PM Vararu, Vadim <va...@adswizz.com>
wrote:

> Hi all,
>
>
>
> I need to configure a keyed global window that would trigger a reduce
> function for all the events in each key group before the processing
> finishes and the job closes.
>
>
>
>
>
> I have something similar for the realtime(streaming) version of the job,
> configured with a processing time gap:
>
>
>
> .keyBy (new MyKeySelector ())
> .window (ProcessingTimeSessionWindows.*withGap *(timeWindowInSeconds))
> .reduce (new MyReduceFunction ())
>
>
>
> However for the batch version of the job (reprocessing of the data), I
> would not use any time gap but rather a global window that would close and
> reduce just before the job ends.
>
>
>
> Is that possible? I’ve seen that the global window does not have any
> trigger implemented by default. How can I implement/use a trigger that
> would trigger before job finishes?
>