You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Teena Kappen // BPRISE <te...@bprise.com> on 2018/04/19 06:15:46 UTC

Efficiency with different approaches of aggregation in Flink

Hi,

If I have to aggregate a value in a stream of records, which one of the below approaches will be the most/least efficient?


  1.  Using a Global Window to aggregate the value and emit the record when it reaches a particular threshold value.
  2.  Using a FlatMap with a State Variable which gets updated with each incoming record and emit the record when it reaches the threshold value.
  3.  Using a FlatMap to store the aggregated value in an in-memory DB like Redis and query the value and update it with each incoming record, and emit the record when it reaches the threshold value.

Please rate the three approaches according to their efficiency.

Regards,
Teena

Re: Efficiency with different approaches of aggregation in Flink

Posted by Fabian Hueske <fh...@gmail.com>.
Hi Teena,

I'd go with approach 2. The performance difference shouldn't be significant
compared to 1. but it is much easier to implement, IMO.

Avoid approach 3. It will be much slower because you need at least one call
to an external data store and more difficult to implement.
Flink's checkpointing mechanism (as used by 1. and 2. ) gives you better
consistency and protection against failures than what you can achieve with
3.

Cheers, Fabian

2018-04-19 8:42 GMT+02:00 Puneet Kinra <pu...@customercentria.com>:

> Hi Teena
>
> If you are proceeding with point 3, no doubt it will add some overhead but
> major significance is that you are persisting the state as per
> some key. so there will not be data loss in case of the job failure.
>
>
>
> On Thu, Apr 19, 2018 at 11:45 AM, Teena Kappen // BPRISE <
> teena.kappen@bprise.com> wrote:
>
>> Hi,
>>
>>
>>
>> If I have to aggregate a value in a stream of records, which one of the
>> below approaches will be the most/least efficient?
>>
>>
>>
>>    1. Using a Global Window to aggregate the value and emit the record
>>    when it reaches a particular threshold value.
>>    2. Using a FlatMap with a State Variable which gets updated with each
>>    incoming record and emit the record when it reaches the threshold value.
>>    3. Using a FlatMap to store the aggregated value in an in-memory DB
>>    like Redis and query the value and update it with each incoming record, and
>>    emit the record when it reaches the threshold value.
>>
>>
>>
>> Please rate the three approaches according to their efficiency.
>>
>>
>>
>> Regards,
>>
>> Teena
>>
>
>
>
> --
> *Cheers *
>
> *Puneet Kinra*
>
> *Mobile:+918800167808 | Skype : puneet.kinra@customercentria.com
> <pu...@customercentria.com>*
>
> *e-mail :puneet.kinra@customercentria.com
> <pu...@customercentria.com>*
>
>
>

Re: Efficiency with different approaches of aggregation in Flink

Posted by Puneet Kinra <pu...@customercentria.com>.
Hi Teena

If you are proceeding with point 3, no doubt it will add some overhead but
major significance is that you are persisting the state as per
some key. so there will not be data loss in case of the job failure.



On Thu, Apr 19, 2018 at 11:45 AM, Teena Kappen // BPRISE <
teena.kappen@bprise.com> wrote:

> Hi,
>
>
>
> If I have to aggregate a value in a stream of records, which one of the
> below approaches will be the most/least efficient?
>
>
>
>    1. Using a Global Window to aggregate the value and emit the record
>    when it reaches a particular threshold value.
>    2. Using a FlatMap with a State Variable which gets updated with each
>    incoming record and emit the record when it reaches the threshold value.
>    3. Using a FlatMap to store the aggregated value in an in-memory DB
>    like Redis and query the value and update it with each incoming record, and
>    emit the record when it reaches the threshold value.
>
>
>
> Please rate the three approaches according to their efficiency.
>
>
>
> Regards,
>
> Teena
>



-- 
*Cheers *

*Puneet Kinra*

*Mobile:+918800167808 | Skype : puneet.kinra@customercentria.com
<pu...@customercentria.com>*

*e-mail :puneet.kinra@customercentria.com
<pu...@customercentria.com>*