You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Soheil Pourbafrani <so...@gmail.com> on 2018/07/30 12:22:48 UTC

Detect late data in processing time

In Event Time, we can gather bad data using OutputTag, because in Event
Time we have Watermark and we can detect late data. But in processing time
mode we don't have any watermark to detect bad data. I want to know can we
set watermark (for example according to taskmanager's timestamp) and use
processing time in creating time windows?

Re: Detect late data in processing time

Posted by vino yang <ya...@gmail.com>.
Hi Soheil,

Watermark indicates the progress of the Event time. The reason it exists is
because there is a Time skew between Event time and Processing time. Hequn
is correct and Watermark cannot be used for processing time. The processing
time will be based on the TM local system clock. Usually, when there is a
time field in your event that indicates when it actually happened, we will
choose Event time. When we choose Processing time, we don't rely on the
time information carried by the data itself, so the question is how do you
define "bad data".

Thanks, vino.

2018-07-30 22:29 GMT+08:00 Hequn Cheng <ch...@gmail.com>:

> Hi Soheil,
>
> No, we can't set watermark during processing time.  And there are no late
> data considering processing time window.
> So the problem is what data is bad data when you use processing time?
> Maybe there are other ways to solve your problem.
>
> Best, Hequn
>
> On Mon, Jul 30, 2018 at 8:22 PM, Soheil Pourbafrani <soheil.ir08@gmail.com
> > wrote:
>
>> In Event Time, we can gather bad data using OutputTag, because in Event
>> Time we have Watermark and we can detect late data. But in processing time
>> mode we don't have any watermark to detect bad data. I want to know can we
>> set watermark (for example according to taskmanager's timestamp) and use
>> processing time in creating time windows?
>>
>
>

Re: Detect late data in processing time

Posted by Hequn Cheng <ch...@gmail.com>.
Hi Soheil,

No, we can't set watermark during processing time.  And there are no late
data considering processing time window.
So the problem is what data is bad data when you use processing time? Maybe
there are other ways to solve your problem.

Best, Hequn

On Mon, Jul 30, 2018 at 8:22 PM, Soheil Pourbafrani <so...@gmail.com>
wrote:

> In Event Time, we can gather bad data using OutputTag, because in Event
> Time we have Watermark and we can detect late data. But in processing time
> mode we don't have any watermark to detect bad data. I want to know can we
> set watermark (for example according to taskmanager's timestamp) and use
> processing time in creating time windows?
>

Re: Detect late data in processing time

Posted by vino yang <ya...@gmail.com>.
Hi Averell,

I personally don't recommend this.
In fact, Processing Time uses the local physical clock of the node where
the specific task is located, rather than setting it upstream in advance.
This is a bit like another time concept provided by Flink - Ingestion Time.
So, If you do not specify to use even time, then do not set watermark.

Thanks, vino.

2018-07-31 12:03 GMT+08:00 Averell <lv...@gmail.com>:

> Hi Soheil,
>
> Why don't you just use the processing time as event time? Simply overriding
> extractTimestamp to return your processing time.
>
> Regards,
> Averell
>
>
>
> --
> Sent from: http://apache-flink-user-mailing-list-archive.2336050.
> n4.nabble.com/
>

Re: Detect late data in processing time

Posted by Averell <lv...@gmail.com>.
Hi Soheil,

Why don't you just use the processing time as event time? Simply overriding
extractTimestamp to return your processing time.

Regards,
Averell



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/