You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by "G.S.Vijay Raajaa" <gs...@gmail.com> on 2017/07/26 02:37:06 UTC

Purging Late stream data

Hi,

I am having 3 streams which is being merged from a union of kafka topics on
a given timestamp. The problem I am facing is that, if there is a delay  in
one of the stream and when the data in that particular stream arrives at a
later point in time, the merge happens in a delayed fashion. The way I want
to solve is that, I want to drop such data streams which comes after a
delay ( say 5sec ). Kindly direct me on how to go about it?

Will watermarking (to process in even time) + the allowed lateness help
solve the same?

Regards,
Vijay Raajaa G S

Re: Purging Late stream data

Posted by "G.S.Vijay Raajaa" <gs...@gmail.com>.
Sure, Let me try that out. On the same note, does
BoundedOutOfOrdernessTimestampExtractor
Serve the purpose too?


Regards,
Vijay Raajaa GS

On Wed, Jul 26, 2017 at 9:22 AM, Kien Truong <du...@gmail.com>
wrote:

> Hi,
>
> One method you can use is using a ProcessFunction.
>
> In the process function, you get the timer service through the function
> context,
>
> which can then be used to schedule a task to clean up late data.
>
> Check out the docs for ProcessFunction
>
> https://ci.apache.org/projects/flink/flink-docs-release-1.3/
> dev/stream/process_function.html
>
> Regards,
>
> Kien
>
>
>
> On 7/26/2017 9:37 AM, G.S.Vijay Raajaa wrote:
>
>> Hi,
>>
>> I am having 3 streams which is being merged from a union of kafka topics
>> on a given timestamp. The problem I am facing is that, if there is a delay
>>  in one of the stream and when the data in that particular stream arrives
>> at a later point in time, the merge happens in a delayed fashion. The way I
>> want to solve is that, I want to drop such data streams which comes after a
>> delay ( say 5sec ). Kindly direct me on how to go about it?
>>
>> Will watermarking (to process in even time) + the allowed lateness help
>> solve the same?
>>
>> Regards,
>> Vijay Raajaa G S
>>
>
>

Re: Purging Late stream data

Posted by Kien Truong <du...@gmail.com>.
Hi,

One method you can use is using a ProcessFunction.

In the process function, you get the timer service through the function 
context,

which can then be used to schedule a task to clean up late data.

Check out the docs for ProcessFunction

https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/stream/process_function.html

Regards,

Kien


On 7/26/2017 9:37 AM, G.S.Vijay Raajaa wrote:
> Hi,
>
> I am having 3 streams which is being merged from a union of kafka 
> topics on a given timestamp. The problem I am facing is that, if there 
> is a delay  in one of the stream and when the data in that particular 
> stream arrives at a later point in time, the merge happens in a 
> delayed fashion. The way I want to solve is that, I want to drop such 
> data streams which comes after a delay ( say 5sec ). Kindly direct me 
> on how to go about it?
>
> Will watermarking (to process in even time) + the allowed lateness 
> help solve the same?
>
> Regards,
> Vijay Raajaa G S