You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@flink.apache.org by 杨力 <bi...@gmail.com> on 2018/04/16 03:21:58 UTC

User-defined aggregation function and parallelism

I am running flink SQL in streaming mode and implemented a UDAGG, which is
used in keyed HOP windows. But I found that the throughput decreases
dramatically when the function is used. Does UDAGG run in parallell? Or
does it run only in one thread?

Regards,
Bill

Re: User-defined aggregation function and parallelism

Posted by Fabian Hueske <fh...@gmail.com>.

Hi Bill,

Flink's built-in aggregation functions are implemented against the same
interface as UDAGGs and are applied in parallel.
The performance depends of course on the implementation of the UDAGG. For
example, you should try to keep the size of the accumulator as small as
possible because it will be stored in the state backend.
If you are using the RocksDBStatebackend, this means that the accumulator
is de/serialized for every records.

Best, Fabian

2018-04-16 5:21 GMT+02:00 杨力 <bi...@gmail.com>:

> I am running flink SQL in streaming mode and implemented a UDAGG, which is
> used in keyed HOP windows. But I found that the throughput decreases
> dramatically when the function is used. Does UDAGG run in parallell? Or
> does it run only in one thread?
>
> Regards,
> Bill
>