You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Emmanuel <el...@msn.com> on 2015/03/27 21:36:20 UTC

streaming window operations

Hello,
Looking at the window operators, I see things like sum, min, max but they're always for a single 'field'.Is there an easy way to do stats like min, max, average on a window but on many different fields at once?Should I split the stream into many parallel streams with single fields to achieve that? it sounds like it would be more efficient to parse the many fields and do the stats in parallel within the same stream, I guess then with a customer window operator
Your thoughts on this?Thanks 		 	   		  

Re: streaming window operations

Posted by Gyula Fóra <gy...@apache.org>.
Hello,

There has been some effort some time ago to implement the functionality
what you want (not just for windows) to apply multiple aggregations at
once, and at some point it will be in there (unfortunately its not high on
the priority list at the moment).

There are different ways of achieving this:

1. Just take your windowed data stream and apply all your transformations
on it. If you are using some standard policy like count or time, or any
tumbling eviction policy, this should in fact be very efficient. In case of
these policies data will not be replicated over the network as we reuse the
discretizers and we also do local prereduces.

2. The most efficient way of doing this would be of course to write a
simple reduce function that does the intended behaviour. For the basic
aggregation types, this is a trivial task.

3. You could of course project the datastream to different fields and apply
windowing and transformations on them, but this has a large runtime
overhead of having to replicate window discretization operators. I would
only do this if you have some user defined trigger and eviction policy.

I hope this helped.

Cheers,
Gyula

On Fri, Mar 27, 2015 at 9:36 PM, Emmanuel <el...@msn.com> wrote:

> Hello,
>
> Looking at the window operators, I see things like sum, min, max but
> they're always for a single 'field'.
> Is there an easy way to do stats like min, max, average on a window but on
> many different fields at once?
> Should I split the stream into many parallel streams with single fields to
> achieve that?
> it sounds like it would be more efficient to parse the many fields and do
> the stats in parallel within the same stream, I guess then with a customer
> window operator
>
> Your thoughts on this?
> Thanks
>