You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Marco Villalobos <mv...@kineteque.com> on 2022/01/26 17:47:04 UTC

Is it possible to support many different windows for many different keys?

Hi,

I am working with time series data in the form of (timestamp, name, value),
and an event time that is the timestamp when the data was published onto
kafka, and I have a business requirement in which each stream element
becomes enriched, and then processing requires different time series names
to be processed in different windows with different time averages.

For example, time series with name "a"

might require a one minute window, and five minute window.

time series with name "b" requires no windowing.

time series with name "c" requires a two minute window and 10 minute window.

Does flink support this style of windowing?  I think it doesn't.  Also,
does any streaming platform support that type of windowing?

I was thinking that this type of windowing support might require a
different flink deployment per each window.  Would that scale though, if
there are tens of thousands of time series names / windows?

Any help or advice would be appreciated. Thank you.

Marco A. Villalobos

Re: Is it possible to support many different windows for many different keys?

Posted by Alexander Fedulov <al...@ververica.com>.
>
> Again, thank you for your input.

You are welcome.

I want the stream element to define the window.

Got it, that was the missing bit of detail. That is also doable - not with
the Windows API, but with the more low level ProcessFunction.
Check out my blog post [1] , especially it's third part [2]. Windows
handling is that case is driven by external rules rather than by the
original events themselves, but this material should give you
enough inspiration to implement your required custom logic.

[1] https://flink.apache.org/news/2020/01/15/demo-fraud-detection.html
[2] https://flink.apache.org/news/2020/07/30/demo-fraud-detection-3.html

Best,
Alexander Fedulov

On Wed, Jan 26, 2022 at 10:45 PM Marco Villalobos <mv...@kineteque.com>
wrote:

> Hi Alexander,
>
> Thank you for responding. The solution you proposed uses statically
> defined windows. What I need a are dynamically created windows determined
> by metadata in the stream element.
>
> I want the stream element to define the window.
>
> That’s what I’m trying to research, or an alternate solution.
>
> Again, thank you for your input.
>
> On Jan 26, 2022, at 1:32 PM, Alexander Fedulov <al...@ververica.com>
> wrote:
>
> 
> Hi Marco,
>
> Not sure if I get your problem correctly, but you can process those
> windows on data "split" from the same input within the same Flink job.
> Something along these lines:
>
> DataStream<SomePojo> stream = ...
> DataStream<SomePojo> a = stream.filter( /* time series name == "a" */);
> a.keyBy(...).window(TumblingEventTimeWindows.of(Time.minutes(1)));
>
> DataStream<SomePojo> b = stream.filter( /* time series name == "b" */);
> b.keyBy(...).window(TumblingEventTimeWindows.of(Time.minutes(5)));
>
> If needed, you can then union all of the separate results streams together.
> a.union(b, c ...);
>
> There is no need for separate Flink deployments to create such a pipeline.
>
> Best,
> Alexander Fedulov
>
> On Wed, Jan 26, 2022 at 6:47 PM Marco Villalobos <
> mvillalobos@kineteque.com> wrote:
>
>> Hi,
>>
>> I am working with time series data in the form of (timestamp, name,
>> value), and an event time that is the timestamp when the data was published
>> onto kafka, and I have a business requirement in which each stream element
>> becomes enriched, and then processing requires different time series names
>> to be processed in different windows with different time averages.
>>
>> For example, time series with name "a"
>>
>> might require a one minute window, and five minute window.
>>
>> time series with name "b" requires no windowing.
>>
>> time series with name "c" requires a two minute window and 10 minute
>> window.
>>
>> Does flink support this style of windowing?  I think it doesn't.  Also,
>> does any streaming platform support that type of windowing?
>>
>> I was thinking that this type of windowing support might require a
>> different flink deployment per each window.  Would that scale though, if
>> there are tens of thousands of time series names / windows?
>>
>> Any help or advice would be appreciated. Thank you.
>>
>> Marco A. Villalobos
>>
>>
>>

Re: Is it possible to support many different windows for many different keys?

Posted by Marco Villalobos <mv...@kineteque.com>.
Hi Alexander,

Thank you for responding. The solution you proposed uses statically defined windows. What I need a are dynamically created windows determined by metadata in the stream element.

I want the stream element to define the window.

That’s what I’m trying to research, or an alternate solution.

Again, thank you for your input.

> On Jan 26, 2022, at 1:32 PM, Alexander Fedulov <al...@ververica.com> wrote:
> 
> 
> Hi Marco,
> 
> Not sure if I get your problem correctly, but you can process those windows on data "split" from the same input within the same Flink job.
> Something along these lines:
> 
> DataStream<SomePojo> stream = ...
> DataStream<SomePojo> a = stream.filter( /* time series name == "a" */);
> a.keyBy(...).window(TumblingEventTimeWindows.of(Time.minutes(1)));
> 
> DataStream<SomePojo> b = stream.filter( /* time series name == "b" */);
> b.keyBy(...).window(TumblingEventTimeWindows.of(Time.minutes(5)));
> 
> If needed, you can then union all of the separate results streams together.
> a.union(b, c ...);
> 
> There is no need for separate Flink deployments to create such a pipeline.
> 
> Best,
> Alexander Fedulov
> 
>> On Wed, Jan 26, 2022 at 6:47 PM Marco Villalobos <mv...@kineteque.com> wrote:
>> Hi,
>> 
>> I am working with time series data in the form of (timestamp, name, value), and an event time that is the timestamp when the data was published onto kafka, and I have a business requirement in which each stream element becomes enriched, and then processing requires different time series names to be processed in different windows with different time averages.
>> 
>> For example, time series with name "a"
>> 
>> might require a one minute window, and five minute window.
>> 
>> time series with name "b" requires no windowing.
>> 
>> time series with name "c" requires a two minute window and 10 minute window.
>> 
>> Does flink support this style of windowing?  I think it doesn't.  Also, does any streaming platform support that type of windowing?
>> 
>> I was thinking that this type of windowing support might require a different flink deployment per each window.  Would that scale though, if there are tens of thousands of time series names / windows?
>> 
>> Any help or advice would be appreciated. Thank you.
>> 
>> Marco A. Villalobos
>> 
>> 

Re: Is it possible to support many different windows for many different keys?

Posted by Alexander Fedulov <al...@ververica.com>.
Hi Marco,

Not sure if I get your problem correctly, but you can process those windows
on data "split" from the same input within the same Flink job.
Something along these lines:

DataStream<SomePojo> stream = ...
DataStream<SomePojo> a = stream.filter( /* time series name == "a" */);
a.keyBy(...).window(TumblingEventTimeWindows.of(Time.minutes(1)));

DataStream<SomePojo> b = stream.filter( /* time series name == "b" */);
b.keyBy(...).window(TumblingEventTimeWindows.of(Time.minutes(5)));

If needed, you can then union all of the separate results streams together.
a.union(b, c ...);

There is no need for separate Flink deployments to create such a pipeline.

Best,
Alexander Fedulov

On Wed, Jan 26, 2022 at 6:47 PM Marco Villalobos <mv...@kineteque.com>
wrote:

> Hi,
>
> I am working with time series data in the form of (timestamp, name,
> value), and an event time that is the timestamp when the data was published
> onto kafka, and I have a business requirement in which each stream element
> becomes enriched, and then processing requires different time series names
> to be processed in different windows with different time averages.
>
> For example, time series with name "a"
>
> might require a one minute window, and five minute window.
>
> time series with name "b" requires no windowing.
>
> time series with name "c" requires a two minute window and 10 minute
> window.
>
> Does flink support this style of windowing?  I think it doesn't.  Also,
> does any streaming platform support that type of windowing?
>
> I was thinking that this type of windowing support might require a
> different flink deployment per each window.  Would that scale though, if
> there are tens of thousands of time series names / windows?
>
> Any help or advice would be appreciated. Thank you.
>
> Marco A. Villalobos
>
>
>