You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@beam.apache.org by Demin Alexey <di...@gmail.com> on 2016/11/06 12:31:58 UTC

Timer and Window behavior

Hi

I read Unbound stream (read from kafka) and grouped by value,
but on low-throughput streams I have strange behavior:

stream.apply(Window.into(FixedWindows.of(Duration.millis(10)))).apply(GroupByKey.create())
or
stream.apply(
 Window.into(FixedWindows.of(Duration.millis(10)))

 .triggering(Repeatedly.forever(AfterWatermark.pastEndOfWindow()))
.apply(GroupByKey.create())

1ms  event1
3ms  event2
11ms event3 - (triger window)
12ms event4
13ms event5
21ms event6 - (triger window)
22ms event7
23ms event8

<nothing next 5 min>

5m00ms  event9

As result event7 and event8 stay in windows without processing next 5 min
Window and GroupBy will create only on event9

Behavior can reproduce on DirectRunner and FlinkRunner.

This is bug or incorrect using API from my side?

Re: Timer and Window behavior

Posted by Jean-Baptiste Onofré <jb...@nanthrax.net>.
Hi Demin,

I remember to have seen an improvement about watermark in KafkaIO 
(BEAM-591).
I advise you to take a look there.

Regards
JB

On 11/06/2016 01:31 PM, Demin Alexey wrote:
> Hi
>
> I read Unbound stream (read from kafka) and grouped by value,
> but on low-throughput streams I have strange behavior:
>
> stream.apply(Window.into(FixedWindows.of(Duration.millis(10)))).apply(GroupByKey.create())
> or
> stream.apply(
>  Window.into(FixedWindows.of(Duration.millis(10)))
>
>  .triggering(Repeatedly.forever(AfterWatermark.pastEndOfWindow()))
> .apply(GroupByKey.create())
>
> 1ms  event1
> 3ms  event2
> 11ms event3 - (triger window)
> 12ms event4
> 13ms event5
> 21ms event6 - (triger window)
> 22ms event7
> 23ms event8
>
> <nothing next 5 min>
>
> 5m00ms  event9
>
> As result event7 and event8 stay in windows without processing next 5 min
> Window and GroupBy will create only on event9
>
> Behavior can reproduce on DirectRunner and FlinkRunner.
>
> This is bug or incorrect using API from my side?
>

-- 
Jean-Baptiste Onofr�
jbonofre@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com

Re: Timer and Window behavior

Posted by Raghu Angadi <ra...@google.com.INVALID>.
On Sun, Nov 6, 2016 at 4:31 AM, Demin Alexey <di...@gmail.com> wrote:

> This is bug or incorrect using API from my side?


This is a bug in KafkaIO. It should advance the watermark when there are no
messages to read. https://issues.apache.org/jira/browse/BEAM-591

I want to fix it, but may not get to it until Thanksgiving, will see.