You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by M Singh <ma...@yahoo.com> on 2017/12/31 21:28:38 UTC

Apache Flink - Question about rolling window function on KeyedStream

Hi:
Apache Flink documentation (https://ci.apache.org/projects/flink/flink-docs-release-1.4/dev/stream/operators/index.html) indicates that a reduce function on a KeyedStream  as follows:
A "rolling" reduce on a keyed data stream. Combines the current element with the last reduced value and emits the new value. 

A reduce function that creates a stream of partial sums:keyedStream.reduce(new ReduceFunction<Integer>() {
    @Override
    public Integer reduce(Integer value1, Integer value2)
    throws Exception {
        return value1 + value2;
    }
});
The KeyedStream is not windowed, so when does the reduce function kick in to produce the DataStream (ie, is there a default time out, or collection size that triggers it, since we have not defined any window on it).
Thanks
Mans

Re: Apache Flink - Question about rolling window function on KeyedStream

Posted by M Singh <ma...@yahoo.com>.
Hi Fabian:
Thanks for your answer - it is starting to make sense to me now. 

    On Thursday, January 4, 2018 12:58 AM, Fabian Hueske <fh...@gmail.com> wrote:
 

 Hi,

the ReduceFunction holds the last emitted record as state. When a new record arrives, it reduces the new record and last emitted record, updates its state, and emits the new result.
Therefore, a ReduceFunction emits one output record for each input record, i.e., it is triggered for each input record. The output of the ReduceFunction should be treated as a stream of updates not of final results.

Best, Fabian

2018-01-03 18:46 GMT+01:00 M Singh <ma...@yahoo.com>:

Hi Stefan:
Thanks for your response.
A follow up question - In a streaming environment, we invoke the operation reduce and then output results to the sink. Does this mean reduce will be executed once on every trigger per partition with all the items in each partition ?

Thanks 

    On Wednesday, January 3, 2018 2:46 AM, Stefan Richter <s....@data-artisans.com> wrote:
 

 Hi,
I would interpret this as: the reduce produces an output for every new reduce call, emitting the updated value. There is no need for a window because it kicks in on every single invocation.
Best,Stefan


Am 31.12.2017 um 22:28 schrieb M Singh <ma...@yahoo.com>:
Hi:
Apache Flink documentation (https://ci.apache.org/ projects/flink/flink-docs- release-1.4/dev/stream/ operators/index.html) indicates that a reduce function on a KeyedStream  as follows:
A "rolling" reduce on a keyed data stream. Combines the current element with the last reduced value and emits the new value. 

A reduce function that creates a stream of partial sums:keyedStream.reduce(new ReduceFunction<Integer>() {
    @Override
    public Integer reduce(Integer value1, Integer value2)
    throws Exception {
        return value1 + value2;
    }
});
The KeyedStream is not windowed, so when does the reduce function kick in to produce the DataStream (ie, is there a default time out, or collection size that triggers it, since we have not defined any window on it).
Thanks
Mans



   



   

Re: Apache Flink - Question about rolling window function on KeyedStream

Posted by Fabian Hueske <fh...@gmail.com>.
Hi,

the ReduceFunction holds the last emitted record as state. When a new
record arrives, it reduces the new record and last emitted record, updates
its state, and emits the new result.
Therefore, a ReduceFunction emits one output record for each input record,
i.e., it is triggered for each input record. The output of the
ReduceFunction should be treated as a stream of updates not of final
results.

Best, Fabian

2018-01-03 18:46 GMT+01:00 M Singh <ma...@yahoo.com>:

> Hi Stefan:
>
> Thanks for your response.
>
> A follow up question - In a streaming environment, we invoke the operation
> reduce and then output results to the sink. Does this mean reduce will be
> executed once on every trigger per partition with all the items in each
> partition ?
>
> Thanks
>
>
> On Wednesday, January 3, 2018 2:46 AM, Stefan Richter <
> s.richter@data-artisans.com> wrote:
>
>
> Hi,
>
> I would interpret this as: the reduce produces an output for every new
> reduce call, emitting the updated value. There is no need for a window
> because it kicks in on every single invocation.
>
> Best,
> Stefan
>
>
> Am 31.12.2017 um 22:28 schrieb M Singh <ma...@yahoo.com>:
>
> Hi:
>
> Apache Flink documentation (https://ci.apache.org/
> projects/flink/flink-docs-release-1.4/dev/stream/operators/index.html)
> indicates that a reduce function on a KeyedStream  as follows:
>
> A "rolling" reduce on a keyed data stream. Combines the current element
> with the last reduced value and emits the new value.
>
> A reduce function that creates a stream of partial sums:
>
> keyedStream.reduce(new ReduceFunction<Integer>() {
>     @Override
>     public Integer reduce(Integer value1, Integer value2)
>     throws Exception {
>         return value1 + value2;
>     }});
>
>
> The KeyedStream is not windowed, so when does the reduce function kick in
> to produce the DataStream (ie, is there a default time out, or collection
> size that triggers it, since we have not defined any window on it).
>
> Thanks
>
> Mans
>
>
>
>
>

Re: Apache Flink - Question about rolling window function on KeyedStream

Posted by M Singh <ma...@yahoo.com>.
Hi Stefan:
Thanks for your response.
A follow up question - In a streaming environment, we invoke the operation reduce and then output results to the sink. Does this mean reduce will be executed once on every trigger per partition with all the items in each partition ?

Thanks 

    On Wednesday, January 3, 2018 2:46 AM, Stefan Richter <s....@data-artisans.com> wrote:
 

 Hi,
I would interpret this as: the reduce produces an output for every new reduce call, emitting the updated value. There is no need for a window because it kicks in on every single invocation.
Best,Stefan


Am 31.12.2017 um 22:28 schrieb M Singh <ma...@yahoo.com>:
Hi:
Apache Flink documentation (https://ci.apache.org/projects/flink/flink-docs-release-1.4/dev/stream/operators/index.html) indicates that a reduce function on a KeyedStream  as follows:
A "rolling" reduce on a keyed data stream. Combines the current element with the last reduced value and emits the new value. 

A reduce function that creates a stream of partial sums:keyedStream.reduce(new ReduceFunction<Integer>() {
    @Override
    public Integer reduce(Integer value1, Integer value2)
    throws Exception {
        return value1 + value2;
    }
});
The KeyedStream is not windowed, so when does the reduce function kick in to produce the DataStream (ie, is there a default time out, or collection size that triggers it, since we have not defined any window on it).
Thanks
Mans



   

Re: Apache Flink - Question about rolling window function on KeyedStream

Posted by Stefan Richter <s....@data-artisans.com>.
Hi,

I would interpret this as: the reduce produces an output for every new reduce call, emitting the updated value. There is no need for a window because it kicks in on every single invocation.

Best,
Stefan


> Am 31.12.2017 um 22:28 schrieb M Singh <ma...@yahoo.com>:
> 
> Hi:
> 
> Apache Flink documentation (https://ci.apache.org/projects/flink/flink-docs-release-1.4/dev/stream/operators/index.html) indicates that a reduce function on a KeyedStream  as follows:
> 
> A "rolling" reduce on a keyed data stream. Combines the current element with the last reduced value and emits the new value. 
> 
> A reduce function that creates a stream of partial sums:
> keyedStream.reduce(new ReduceFunction<Integer>() {
>     @Override
>     public Integer reduce(Integer value1, Integer value2)
>     throws Exception {
>         return value1 + value2;
>     }
> });
> 
> The KeyedStream is not windowed, so when does the reduce function kick in to produce the DataStream (ie, is there a default time out, or collection size that triggers it, since we have not defined any window on it).
> 
> Thanks
> 
> Mans