You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@flink.apache.org by Gyula Fóra <gy...@apache.org> on 2015/06/22 17:32:21 UTC

Removing reduce/aggregations from non-grouped data streams

Hey all,
Currently we have reduce and aggregation methods for non-grouped
DataStreams as well, which will produce local aggregates depending on the
parallelism of the operator.

This behaviour is neither intuitive nor useful as it only produces sensible
results if the user specifically sets the parallelism to 1 which should not
be encouraged.

I would like to remove these methods from the DataStream api and only keep
it for GroupedDataStreams and WindowedDataStream where the aggregation is
either executed per-key or per-window.

Cheers,
Gyula

Re: Removing reduce/aggregations from non-grouped data streams

Posted by Gyula Fóra <gy...@gmail.com>.

I opened a PR <https://github.com/apache/flink/pull/860> for this.

Stephan Ewen <se...@apache.org> ezt írta (időpont: 2015. jún. 22., H,
19:25):

> +1 totally agreed
>
> On Mon, Jun 22, 2015 at 5:32 PM, Gyula Fóra <gy...@apache.org> wrote:
>
> > Hey all,
> > Currently we have reduce and aggregation methods for non-grouped
> > DataStreams as well, which will produce local aggregates depending on the
> > parallelism of the operator.
> >
> > This behaviour is neither intuitive nor useful as it only produces
> sensible
> > results if the user specifically sets the parallelism to 1 which should
> not
> > be encouraged.
> >
> > I would like to remove these methods from the DataStream api and only
> keep
> > it for GroupedDataStreams and WindowedDataStream where the aggregation is
> > either executed per-key or per-window.
> >
> > Cheers,
> > Gyula
> >
>

Re: Removing reduce/aggregations from non-grouped data streams

Posted by Stephan Ewen <se...@apache.org>.

+1 totally agreed

On Mon, Jun 22, 2015 at 5:32 PM, Gyula Fóra <gy...@apache.org> wrote:

> Hey all,
> Currently we have reduce and aggregation methods for non-grouped
> DataStreams as well, which will produce local aggregates depending on the
> parallelism of the operator.
>
> This behaviour is neither intuitive nor useful as it only produces sensible
> results if the user specifically sets the parallelism to 1 which should not
> be encouraged.
>
> I would like to remove these methods from the DataStream api and only keep
> it for GroupedDataStreams and WindowedDataStream where the aggregation is
> either executed per-key or per-window.
>
> Cheers,
> Gyula
>