You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Juan Rodríguez Hortalá <ju...@gmail.com> on 2016/11/20 04:31:00 UTC

Early events

Hi,

Maybe this is already in the documentation, sorry if I'm asking something
obvious. I was thinking that if you have event time then you can also have
early events, which would be events whose extracted timestampt is in the
future. This might happen in practice for example in sensors with a skewed
clock, that assign timestamps in the future to the events. I have made a
simple test with a time window (
https://github.com/juanrh/flink-state-eviction/commit/09c2c1fe1e6068b0703c0833b8a574313cdca5a2),
and it looks like Flink treats early events like events generated at the
current processing time. What it's the expected behaviour of Flink for
early events?

Early events might be interesting for generating test data, if Flink was
able to buffer those early events until its actual time arrives, although I
guess implementing that would probably impact the performance in
production. But as I say, early events might happen in production because
you can have wrong clocks or wrong code in general in the devices that
generate the events. Maybe a fallback to ingestion time would make sense,
and an approximation to that might be implemented with a timestamp
extractor that overrides future timestamps with System.currenTimeMillis.

Greetings,

Juan

Re: Early events

Posted by Juan Rodríguez Hortalá <ju...@gmail.com>.
That makes sense, thanks for your answer.

Greetings,

Juan

On Mon, Nov 21, 2016 at 9:11 AM, Aljoscha Krettek <al...@apache.org>
wrote:

> Hi,
> yes, Flink is expected to buffer those until the watermark catches up with
> their timestamp.
>
> Cheers,
> Aljoscha
>
> On Sun, 20 Nov 2016 at 06:18 Juan Rodríguez Hortalá <
> juan.rodriguez.hortala@gmail.com> wrote:
>
>> Hi,
>>
>> There was a bug in my code, I was assigning the timestamps wrong and that
>> is why it looked like early events where assigned processing time.
>> Surprisingly enought my test works both ok with early events. In fact I
>> have modified my test data generator to generate early events or late
>> events, and both seem to work ok with my test (https://github.com/juanrh/
>> flink-state-eviction/blob/293fe1cf972b2e4bc6fb4e874eb8ba
>> 70c78f7894/src/main/java/com/github/juanrh/streaming/source/
>> EventTimeDelayedElementsSource.java, https://github.com/juanrh/
>> flink-state-eviction/blob/293fe1cf972b2e4bc6fb4e874eb8ba
>> 70c78f7894/src/test/java/com/github/juanrh/streaming/source/
>> EventTimeDelayedElementsSourceTest.java)
>>
>> Anyway, is this the expected behaviour for early events? Is Flink
>> buffering early events until their future timestamp arrives?
>>
>> Thanks,
>>
>> Juan
>>
>>
>> On Sat, Nov 19, 2016 at 8:31 PM, Juan Rodríguez Hortalá <
>> juan.rodriguez.hortala@gmail.com> wrote:
>>
>> Hi,
>>
>> Maybe this is already in the documentation, sorry if I'm asking something
>> obvious. I was thinking that if you have event time then you can also have
>> early events, which would be events whose extracted timestampt is in the
>> future. This might happen in practice for example in sensors with a skewed
>> clock, that assign timestamps in the future to the events. I have made a
>> simple test with a time window (https://github.com/juanrh/
>> flink-state-eviction/commit/09c2c1fe1e6068b0703c0833b8a574313cdca5a2),
>> and it looks like Flink treats early events like events generated at the
>> current processing time. What it's the expected behaviour of Flink for
>> early events?
>>
>> Early events might be interesting for generating test data, if Flink was
>> able to buffer those early events until its actual time arrives, although I
>> guess implementing that would probably impact the performance in
>> production. But as I say, early events might happen in production because
>> you can have wrong clocks or wrong code in general in the devices that
>> generate the events. Maybe a fallback to ingestion time would make sense,
>> and an approximation to that might be implemented with a timestamp
>> extractor that overrides future timestamps with System.currenTimeMillis.
>>
>> Greetings,
>>
>> Juan
>>
>>
>>

Re: Early events

Posted by Aljoscha Krettek <al...@apache.org>.
Hi,
yes, Flink is expected to buffer those until the watermark catches up with
their timestamp.

Cheers,
Aljoscha

On Sun, 20 Nov 2016 at 06:18 Juan Rodríguez Hortalá <
juan.rodriguez.hortala@gmail.com> wrote:

> Hi,
>
> There was a bug in my code, I was assigning the timestamps wrong and that
> is why it looked like early events where assigned processing time.
> Surprisingly enought my test works both ok with early events. In fact I
> have modified my test data generator to generate early events or late
> events, and both seem to work ok with my test (
> https://github.com/juanrh/flink-state-eviction/blob/293fe1cf972b2e4bc6fb4e874eb8ba70c78f7894/src/main/java/com/github/juanrh/streaming/source/EventTimeDelayedElementsSource.java,
>
> https://github.com/juanrh/flink-state-eviction/blob/293fe1cf972b2e4bc6fb4e874eb8ba70c78f7894/src/test/java/com/github/juanrh/streaming/source/EventTimeDelayedElementsSourceTest.java
> )
>
> Anyway, is this the expected behaviour for early events? Is Flink
> buffering early events until their future timestamp arrives?
>
> Thanks,
>
> Juan
>
>
> On Sat, Nov 19, 2016 at 8:31 PM, Juan Rodríguez Hortalá <
> juan.rodriguez.hortala@gmail.com> wrote:
>
> Hi,
>
> Maybe this is already in the documentation, sorry if I'm asking something
> obvious. I was thinking that if you have event time then you can also have
> early events, which would be events whose extracted timestampt is in the
> future. This might happen in practice for example in sensors with a skewed
> clock, that assign timestamps in the future to the events. I have made a
> simple test with a time window (
> https://github.com/juanrh/flink-state-eviction/commit/09c2c1fe1e6068b0703c0833b8a574313cdca5a2),
> and it looks like Flink treats early events like events generated at the
> current processing time. What it's the expected behaviour of Flink for
> early events?
>
> Early events might be interesting for generating test data, if Flink was
> able to buffer those early events until its actual time arrives, although I
> guess implementing that would probably impact the performance in
> production. But as I say, early events might happen in production because
> you can have wrong clocks or wrong code in general in the devices that
> generate the events. Maybe a fallback to ingestion time would make sense,
> and an approximation to that might be implemented with a timestamp
> extractor that overrides future timestamps with System.currenTimeMillis.
>
> Greetings,
>
> Juan
>
>
>

Re: Early events

Posted by Juan Rodríguez Hortalá <ju...@gmail.com>.
Hi,

There was a bug in my code, I was assigning the timestamps wrong and that
is why it looked like early events where assigned processing time.
Surprisingly enought my test works both ok with early events. In fact I
have modified my test data generator to generate early events or late
events, and both seem to work ok with my test (
https://github.com/juanrh/flink-state-eviction/blob/293fe1cf972b2e4bc6fb4e874eb8ba70c78f7894/src/main/java/com/github/juanrh/streaming/source/EventTimeDelayedElementsSource.java,
https://github.com/juanrh/flink-state-eviction/blob/293fe1cf972b2e4bc6fb4e874eb8ba70c78f7894/src/test/java/com/github/juanrh/streaming/source/EventTimeDelayedElementsSourceTest.java
)

Anyway, is this the expected behaviour for early events? Is Flink buffering
early events until their future timestamp arrives?

Thanks,

Juan


On Sat, Nov 19, 2016 at 8:31 PM, Juan Rodríguez Hortalá <
juan.rodriguez.hortala@gmail.com> wrote:

> Hi,
>
> Maybe this is already in the documentation, sorry if I'm asking something
> obvious. I was thinking that if you have event time then you can also have
> early events, which would be events whose extracted timestampt is in the
> future. This might happen in practice for example in sensors with a skewed
> clock, that assign timestamps in the future to the events. I have made a
> simple test with a time window (https://github.com/juanrh/
> flink-state-eviction/commit/09c2c1fe1e6068b0703c0833b8a574313cdca5a2),
> and it looks like Flink treats early events like events generated at the
> current processing time. What it's the expected behaviour of Flink for
> early events?
>
> Early events might be interesting for generating test data, if Flink was
> able to buffer those early events until its actual time arrives, although I
> guess implementing that would probably impact the performance in
> production. But as I say, early events might happen in production because
> you can have wrong clocks or wrong code in general in the devices that
> generate the events. Maybe a fallback to ingestion time would make sense,
> and an approximation to that might be implemented with a timestamp
> extractor that overrides future timestamps with System.currenTimeMillis.
>
> Greetings,
>
> Juan
>