You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@streampipes.apache.org by Grainier Perera <gr...@apache.org> on 2020/07/15 05:06:53 UTC

Adding window semantics to the sdk/core

Hi all,

Hope you are doing well. I'm starting this thread to discuss $title. As
we discussed in the mail thread [1], having a common window semantic that
can be leveraged by every PE can be very useful.

I'm thinking of two ways to achieve the same;

   1. Introduce a dedicated Window PE which can be used before any existing
   PE.
      - No need for API changes.
      - But, might have to flag event in a way for the next processor to
      identify, to which window the event belongs to (i.e with sliding windows,
      etc...).
      - So, have to change the processing logic of existing PEs to check
      event flag and process accordingly.
      - So all the existing PEs might not work with window semantics. In
      that case, we need a way to show window compatible PEs (because, having a
      window before a normal PE might result in un-expected outputs)
   2. Introduce windowed EventProcessor/EventSink APIs which allows users
   to write their own windowed extensions (i.e aggregators, etc...)
      - No need for event flagging. Can introduce API methods like
      onCurrentEvent, onExpiredEvent, onResetEvent, etc...
      - Need API changes/refactoring in EventProcessor/EventSink, as well
      as in existing PEs.
      - Need a way to expose the window related parameters through existing
      PEs DataProcessorDescription.

Since there're pros/cons to both, what do you think is the best approach?
Or is there any other approach that we can try out?

[1] PE to rate-limit events

Grainier Perera.

Re: Adding window semantics to the sdk/core

Posted by Philipp Zehnder <ze...@apache.org>.
Hi,

thanks for creating the wiki page Dominik.
I think this is a good place to discuss the solution.

Philipp

> On 19. Jul 2020, at 22:05, Dominik Riemer <ri...@apache.org> wrote:
> 
> Hi,
> 
> I created a wiki page and added our initial thoughts:
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=158868087
> 
> I'll go through the existing wrapper implementation and add any ideas to the wiki. If you have any ideas, feel free to edit the page!
> 
> Dominik
> 
> -----Original Message-----
> From: Patrick Wiener <wi...@apache.org> 
> Sent: Friday, July 17, 2020 7:35 AM
> To: dev@streampipes.apache.org
> Subject: Re: Adding window semantics to the sdk/core
> 
> I also tend towards option 2. It makes it more intuative to develop windowed operations by having a dedicated windowed event processor.
> 
> We can collect the design choices etc in the wiki and discussion points in the wiki.
> 
> Patrick
> 
>> Am 16.07.2020 um 22:55 schrieb Philipp Zehnder <ze...@apache.org>:
>> 
>> Hi,
>> 
>> +1 for creating a wiki page and collecting all the requirements and functionalities.
>> 
>> Philipp
>> 
>>> On 16. Jul 2020, at 20:00, Dominik Riemer <ri...@apache.org> wrote:
>>> 
>>> Hi,
>>> 
>>> I'm also leaning towards option 2 - we could introduce new engine and controller classes for windowed event processors (e.g., WindowedDataProcessor) which handle the windowing logic internally and only expose the higher-level methods to users (as you proposed, onCurrentEvent...). Window parameters could be automatically added to the controller without further configuration. 
>>> Probably the biggest hurdle is to introduce an API design that works for all available wrappers, without re-implementing a new higher-level engine. But as Philipp said, it probably makes sense to start with one of the lightweight wrappers such as Java or Siddhi. 
>>> Should we create a wiki page to collect design requirements for such a feature?
>>> 
>>> Dominik
>>> 
>>> -----Original Message-----
>>> From: Philipp Zehnder <ze...@apache.org>
>>> Sent: Wednesday, July 15, 2020 4:13 PM
>>> To: Grainier Perera <gr...@apache.org>
>>> Cc: dev@streampipes.apache.org
>>> Subject: Re: Adding window semantics to the sdk/core
>>> 
>>> Hi Grainier,
>>> 
>>> That's a very good point.
>>> I am leaning towards proposal 2, because if we have a separate PE, the user must be aware of windowing and how to use windows. 
>>> I think it is easier for the user to configure the relevant parameters directly in the processor configuration itself. Whats your opinion on that?
>>> However, I'm afraid that we will then have to implement all functionalities twice, one version for single values and one version for the windowed version (e.g. numerical filter).
>>> Perhaps we can find a good solution for this. (e.g. make the window function optional).
>>> 
>>> Maybe we can also make a list of processors that we think need  this functionality?
>>> The first thing that came to my mind was the aggregation component. Currently we only have a Flink version of it, and I think it would be very helpful to have a lighter version as well (e.g. Java, or Siddhi).
>>> 
>>> Philipp
>>> PS: The JS Evaluator is really awesome, I use it a lot and it’s a big time saver!
>>> 
>>>> On 15. Jul 2020, at 07:06, Grainier Perera <gr...@apache.org> wrote:
>>>> 
>>>> Hi all,
>>>> 
>>>> Hope you are doing well. I'm starting this thread to discuss $title. As we discussed in the mail thread [1], having a common window semantic that can be leveraged by every PE can be very useful.
>>>> 
>>>> I'm thinking of two ways to achieve the same; Introduce a dedicated 
>>>> Window PE which can be used before any existing PE.
>>>> No need for API changes.
>>>> But, might have to flag event in a way for the next processor to identify, to which window the event belongs to (i.e with sliding windows, etc...).
>>>> So, have to change the processing logic of existing PEs to check event flag and process accordingly. 
>>>> So all the existing PEs might not work with window semantics. In 
>>>> that case, we need a way to show window compatible PEs (because, 
>>>> having a window before a normal PE might result in un-expected outputs) Introduce windowed EventProcessor/EventSink APIs which allows users to write their own windowed extensions (i.e aggregators, etc...) No need for event flagging. Can introduce API methods like onCurrentEvent, onExpiredEvent, onResetEvent, etc...
>>>> Need API changes/refactoring in EventProcessor/EventSink, as well as in existing PEs.
>>>> Need a way to expose the window related parameters through existing PEs DataProcessorDescription.
>>>> Since there're pros/cons to both, what do you think is the best approach? Or is there any other approach that we can try out?
>>>> 
>>>> [1] PE to rate-limit events
>>>> 
>>>> Grainier Perera.
>>> 
>>> 
>> 
> 
> 


RE: Adding window semantics to the sdk/core

Posted by Dominik Riemer <ri...@apache.org>.
Hi,

I created a wiki page and added our initial thoughts:
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=158868087

I'll go through the existing wrapper implementation and add any ideas to the wiki. If you have any ideas, feel free to edit the page!

Dominik

-----Original Message-----
From: Patrick Wiener <wi...@apache.org> 
Sent: Friday, July 17, 2020 7:35 AM
To: dev@streampipes.apache.org
Subject: Re: Adding window semantics to the sdk/core

I also tend towards option 2. It makes it more intuative to develop windowed operations by having a dedicated windowed event processor.

We can collect the design choices etc in the wiki and discussion points in the wiki.

Patrick

> Am 16.07.2020 um 22:55 schrieb Philipp Zehnder <ze...@apache.org>:
> 
> Hi,
> 
> +1 for creating a wiki page and collecting all the requirements and functionalities.
> 
> Philipp
> 
>> On 16. Jul 2020, at 20:00, Dominik Riemer <ri...@apache.org> wrote:
>> 
>> Hi,
>> 
>> I'm also leaning towards option 2 - we could introduce new engine and controller classes for windowed event processors (e.g., WindowedDataProcessor) which handle the windowing logic internally and only expose the higher-level methods to users (as you proposed, onCurrentEvent...). Window parameters could be automatically added to the controller without further configuration. 
>> Probably the biggest hurdle is to introduce an API design that works for all available wrappers, without re-implementing a new higher-level engine. But as Philipp said, it probably makes sense to start with one of the lightweight wrappers such as Java or Siddhi. 
>> Should we create a wiki page to collect design requirements for such a feature?
>> 
>> Dominik
>> 
>> -----Original Message-----
>> From: Philipp Zehnder <ze...@apache.org>
>> Sent: Wednesday, July 15, 2020 4:13 PM
>> To: Grainier Perera <gr...@apache.org>
>> Cc: dev@streampipes.apache.org
>> Subject: Re: Adding window semantics to the sdk/core
>> 
>> Hi Grainier,
>> 
>> That's a very good point.
>> I am leaning towards proposal 2, because if we have a separate PE, the user must be aware of windowing and how to use windows. 
>> I think it is easier for the user to configure the relevant parameters directly in the processor configuration itself. Whats your opinion on that?
>> However, I'm afraid that we will then have to implement all functionalities twice, one version for single values and one version for the windowed version (e.g. numerical filter).
>> Perhaps we can find a good solution for this. (e.g. make the window function optional).
>> 
>> Maybe we can also make a list of processors that we think need  this functionality?
>> The first thing that came to my mind was the aggregation component. Currently we only have a Flink version of it, and I think it would be very helpful to have a lighter version as well (e.g. Java, or Siddhi).
>> 
>> Philipp
>> PS: The JS Evaluator is really awesome, I use it a lot and it’s a big time saver!
>> 
>>> On 15. Jul 2020, at 07:06, Grainier Perera <gr...@apache.org> wrote:
>>> 
>>> Hi all,
>>> 
>>> Hope you are doing well. I'm starting this thread to discuss $title. As we discussed in the mail thread [1], having a common window semantic that can be leveraged by every PE can be very useful.
>>> 
>>> I'm thinking of two ways to achieve the same; Introduce a dedicated 
>>> Window PE which can be used before any existing PE.
>>> No need for API changes.
>>> But, might have to flag event in a way for the next processor to identify, to which window the event belongs to (i.e with sliding windows, etc...).
>>> So, have to change the processing logic of existing PEs to check event flag and process accordingly. 
>>> So all the existing PEs might not work with window semantics. In 
>>> that case, we need a way to show window compatible PEs (because, 
>>> having a window before a normal PE might result in un-expected outputs) Introduce windowed EventProcessor/EventSink APIs which allows users to write their own windowed extensions (i.e aggregators, etc...) No need for event flagging. Can introduce API methods like onCurrentEvent, onExpiredEvent, onResetEvent, etc...
>>> Need API changes/refactoring in EventProcessor/EventSink, as well as in existing PEs.
>>> Need a way to expose the window related parameters through existing PEs DataProcessorDescription.
>>> Since there're pros/cons to both, what do you think is the best approach? Or is there any other approach that we can try out?
>>> 
>>> [1] PE to rate-limit events
>>> 
>>> Grainier Perera.
>> 
>> 
> 



Re: Adding window semantics to the sdk/core

Posted by Patrick Wiener <wi...@apache.org>.
I also tend towards option 2. It makes it more intuative to develop windowed operations by having
a dedicated windowed event processor.

We can collect the design choices etc in the wiki and discussion points in the wiki.

Patrick

> Am 16.07.2020 um 22:55 schrieb Philipp Zehnder <ze...@apache.org>:
> 
> Hi,
> 
> +1 for creating a wiki page and collecting all the requirements and functionalities.
> 
> Philipp
> 
>> On 16. Jul 2020, at 20:00, Dominik Riemer <ri...@apache.org> wrote:
>> 
>> Hi,
>> 
>> I'm also leaning towards option 2 - we could introduce new engine and controller classes for windowed event processors (e.g., WindowedDataProcessor) which handle the windowing logic internally and only expose the higher-level methods to users (as you proposed, onCurrentEvent...). Window parameters could be automatically added to the controller without further configuration. 
>> Probably the biggest hurdle is to introduce an API design that works for all available wrappers, without re-implementing a new higher-level engine. But as Philipp said, it probably makes sense to start with one of the lightweight wrappers such as Java or Siddhi. 
>> Should we create a wiki page to collect design requirements for such a feature?
>> 
>> Dominik
>> 
>> -----Original Message-----
>> From: Philipp Zehnder <ze...@apache.org> 
>> Sent: Wednesday, July 15, 2020 4:13 PM
>> To: Grainier Perera <gr...@apache.org>
>> Cc: dev@streampipes.apache.org
>> Subject: Re: Adding window semantics to the sdk/core
>> 
>> Hi Grainier,
>> 
>> That's a very good point.
>> I am leaning towards proposal 2, because if we have a separate PE, the user must be aware of windowing and how to use windows. 
>> I think it is easier for the user to configure the relevant parameters directly in the processor configuration itself. Whats your opinion on that?
>> However, I'm afraid that we will then have to implement all functionalities twice, one version for single values and one version for the windowed version (e.g. numerical filter).
>> Perhaps we can find a good solution for this. (e.g. make the window function optional).
>> 
>> Maybe we can also make a list of processors that we think need  this functionality?
>> The first thing that came to my mind was the aggregation component. Currently we only have a Flink version of it, and I think it would be very helpful to have a lighter version as well (e.g. Java, or Siddhi).
>> 
>> Philipp
>> PS: The JS Evaluator is really awesome, I use it a lot and it’s a big time saver!
>> 
>>> On 15. Jul 2020, at 07:06, Grainier Perera <gr...@apache.org> wrote:
>>> 
>>> Hi all,
>>> 
>>> Hope you are doing well. I'm starting this thread to discuss $title. As we discussed in the mail thread [1], having a common window semantic that can be leveraged by every PE can be very useful.
>>> 
>>> I'm thinking of two ways to achieve the same; Introduce a dedicated 
>>> Window PE which can be used before any existing PE.
>>> No need for API changes.
>>> But, might have to flag event in a way for the next processor to identify, to which window the event belongs to (i.e with sliding windows, etc...).
>>> So, have to change the processing logic of existing PEs to check event flag and process accordingly. 
>>> So all the existing PEs might not work with window semantics. In that 
>>> case, we need a way to show window compatible PEs (because, having a 
>>> window before a normal PE might result in un-expected outputs) Introduce windowed EventProcessor/EventSink APIs which allows users to write their own windowed extensions (i.e aggregators, etc...) No need for event flagging. Can introduce API methods like onCurrentEvent, onExpiredEvent, onResetEvent, etc...
>>> Need API changes/refactoring in EventProcessor/EventSink, as well as in existing PEs.
>>> Need a way to expose the window related parameters through existing PEs DataProcessorDescription.
>>> Since there're pros/cons to both, what do you think is the best approach? Or is there any other approach that we can try out?
>>> 
>>> [1] PE to rate-limit events
>>> 
>>> Grainier Perera.
>> 
>> 
> 


Re: Adding window semantics to the sdk/core

Posted by Philipp Zehnder <ze...@apache.org>.
Hi,

+1 for creating a wiki page and collecting all the requirements and functionalities.

Philipp

> On 16. Jul 2020, at 20:00, Dominik Riemer <ri...@apache.org> wrote:
> 
> Hi,
> 
> I'm also leaning towards option 2 - we could introduce new engine and controller classes for windowed event processors (e.g., WindowedDataProcessor) which handle the windowing logic internally and only expose the higher-level methods to users (as you proposed, onCurrentEvent...). Window parameters could be automatically added to the controller without further configuration. 
> Probably the biggest hurdle is to introduce an API design that works for all available wrappers, without re-implementing a new higher-level engine. But as Philipp said, it probably makes sense to start with one of the lightweight wrappers such as Java or Siddhi. 
> Should we create a wiki page to collect design requirements for such a feature?
> 
> Dominik
> 
> -----Original Message-----
> From: Philipp Zehnder <ze...@apache.org> 
> Sent: Wednesday, July 15, 2020 4:13 PM
> To: Grainier Perera <gr...@apache.org>
> Cc: dev@streampipes.apache.org
> Subject: Re: Adding window semantics to the sdk/core
> 
> Hi Grainier,
> 
> That's a very good point.
> I am leaning towards proposal 2, because if we have a separate PE, the user must be aware of windowing and how to use windows. 
> I think it is easier for the user to configure the relevant parameters directly in the processor configuration itself. Whats your opinion on that?
> However, I'm afraid that we will then have to implement all functionalities twice, one version for single values and one version for the windowed version (e.g. numerical filter).
> Perhaps we can find a good solution for this. (e.g. make the window function optional).
> 
> Maybe we can also make a list of processors that we think need  this functionality?
> The first thing that came to my mind was the aggregation component. Currently we only have a Flink version of it, and I think it would be very helpful to have a lighter version as well (e.g. Java, or Siddhi).
> 
> Philipp
> PS: The JS Evaluator is really awesome, I use it a lot and it’s a big time saver!
> 
>> On 15. Jul 2020, at 07:06, Grainier Perera <gr...@apache.org> wrote:
>> 
>> Hi all,
>> 
>> Hope you are doing well. I'm starting this thread to discuss $title. As we discussed in the mail thread [1], having a common window semantic that can be leveraged by every PE can be very useful.
>> 
>> I'm thinking of two ways to achieve the same; Introduce a dedicated 
>> Window PE which can be used before any existing PE.
>> No need for API changes.
>> But, might have to flag event in a way for the next processor to identify, to which window the event belongs to (i.e with sliding windows, etc...).
>> So, have to change the processing logic of existing PEs to check event flag and process accordingly. 
>> So all the existing PEs might not work with window semantics. In that 
>> case, we need a way to show window compatible PEs (because, having a 
>> window before a normal PE might result in un-expected outputs) Introduce windowed EventProcessor/EventSink APIs which allows users to write their own windowed extensions (i.e aggregators, etc...) No need for event flagging. Can introduce API methods like onCurrentEvent, onExpiredEvent, onResetEvent, etc...
>> Need API changes/refactoring in EventProcessor/EventSink, as well as in existing PEs.
>> Need a way to expose the window related parameters through existing PEs DataProcessorDescription.
>> Since there're pros/cons to both, what do you think is the best approach? Or is there any other approach that we can try out?
>> 
>> [1] PE to rate-limit events
>> 
>> Grainier Perera.
> 
> 


RE: Adding window semantics to the sdk/core

Posted by Dominik Riemer <ri...@apache.org>.
Hi,

I'm also leaning towards option 2 - we could introduce new engine and controller classes for windowed event processors (e.g., WindowedDataProcessor) which handle the windowing logic internally and only expose the higher-level methods to users (as you proposed, onCurrentEvent...). Window parameters could be automatically added to the controller without further configuration. 
Probably the biggest hurdle is to introduce an API design that works for all available wrappers, without re-implementing a new higher-level engine. But as Philipp said, it probably makes sense to start with one of the lightweight wrappers such as Java or Siddhi. 
Should we create a wiki page to collect design requirements for such a feature?

Dominik

-----Original Message-----
From: Philipp Zehnder <ze...@apache.org> 
Sent: Wednesday, July 15, 2020 4:13 PM
To: Grainier Perera <gr...@apache.org>
Cc: dev@streampipes.apache.org
Subject: Re: Adding window semantics to the sdk/core

Hi Grainier,

That's a very good point.
I am leaning towards proposal 2, because if we have a separate PE, the user must be aware of windowing and how to use windows. 
I think it is easier for the user to configure the relevant parameters directly in the processor configuration itself. Whats your opinion on that?
However, I'm afraid that we will then have to implement all functionalities twice, one version for single values and one version for the windowed version (e.g. numerical filter).
Perhaps we can find a good solution for this. (e.g. make the window function optional).

Maybe we can also make a list of processors that we think need  this functionality?
The first thing that came to my mind was the aggregation component. Currently we only have a Flink version of it, and I think it would be very helpful to have a lighter version as well (e.g. Java, or Siddhi).

Philipp
PS: The JS Evaluator is really awesome, I use it a lot and it’s a big time saver!

> On 15. Jul 2020, at 07:06, Grainier Perera <gr...@apache.org> wrote:
> 
> Hi all,
> 
> Hope you are doing well. I'm starting this thread to discuss $title. As we discussed in the mail thread [1], having a common window semantic that can be leveraged by every PE can be very useful.
> 
> I'm thinking of two ways to achieve the same; Introduce a dedicated 
> Window PE which can be used before any existing PE.
> No need for API changes.
> But, might have to flag event in a way for the next processor to identify, to which window the event belongs to (i.e with sliding windows, etc...).
> So, have to change the processing logic of existing PEs to check event flag and process accordingly. 
> So all the existing PEs might not work with window semantics. In that 
> case, we need a way to show window compatible PEs (because, having a 
> window before a normal PE might result in un-expected outputs) Introduce windowed EventProcessor/EventSink APIs which allows users to write their own windowed extensions (i.e aggregators, etc...) No need for event flagging. Can introduce API methods like onCurrentEvent, onExpiredEvent, onResetEvent, etc...
> Need API changes/refactoring in EventProcessor/EventSink, as well as in existing PEs.
> Need a way to expose the window related parameters through existing PEs DataProcessorDescription.
> Since there're pros/cons to both, what do you think is the best approach? Or is there any other approach that we can try out?
> 
> [1] PE to rate-limit events
> 
> Grainier Perera.



Re: Adding window semantics to the sdk/core

Posted by Philipp Zehnder <ze...@apache.org>.
Hi Grainier,

That's a very good point.
I am leaning towards proposal 2, because if we have a separate PE, the user must be aware of windowing and how to use windows. 
I think it is easier for the user to configure the relevant parameters directly in the processor configuration itself. Whats your opinion on that?
However, I'm afraid that we will then have to implement all functionalities twice, one version for single values and one version for the windowed version (e.g. numerical filter).
Perhaps we can find a good solution for this. (e.g. make the window function optional).

Maybe we can also make a list of processors that we think need  this functionality?
The first thing that came to my mind was the aggregation component. Currently we only have a Flink version of it, and I think it would be very helpful to have a lighter version as well (e.g. Java, or Siddhi).

Philipp
PS: The JS Evaluator is really awesome, I use it a lot and it’s a big time saver!

> On 15. Jul 2020, at 07:06, Grainier Perera <gr...@apache.org> wrote:
> 
> Hi all,
> 
> Hope you are doing well. I'm starting this thread to discuss $title. As we discussed in the mail thread [1], having a common window semantic that can be leveraged by every PE can be very useful.
> 
> I'm thinking of two ways to achieve the same;
> Introduce a dedicated Window PE which can be used before any existing PE.
> No need for API changes.
> But, might have to flag event in a way for the next processor to identify, to which window the event belongs to (i.e with sliding windows, etc...).
> So, have to change the processing logic of existing PEs to check event flag and process accordingly. 
> So all the existing PEs might not work with window semantics. In that case, we need a way to show window compatible PEs (because, having a window before a normal PE might result in un-expected outputs)
> Introduce windowed EventProcessor/EventSink APIs which allows users to write their own windowed extensions (i.e aggregators, etc...)
> No need for event flagging. Can introduce API methods like onCurrentEvent, onExpiredEvent, onResetEvent, etc...
> Need API changes/refactoring in EventProcessor/EventSink, as well as in existing PEs.
> Need a way to expose the window related parameters through existing PEs DataProcessorDescription.
> Since there're pros/cons to both, what do you think is the best approach? Or is there any other approach that we can try out?
> 
> [1] PE to rate-limit events
> 
> Grainier Perera.