You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Fabian Hueske (JIRA)" <ji...@apache.org> on 2017/05/20 21:04:04 UTC

[jira] [Commented] (FLINK-6649) Improve Non-window group aggregate with configurable `earlyFire`.

    [ https://issues.apache.org/jira/browse/FLINK-6649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16018621#comment-16018621 ] 

Fabian Hueske commented on FLINK-6649:
--------------------------------------

I agree, that we need to provide a mechanism to reduce the number of rows emitted by non-windowed aggregate.

However, I have two concerns about the proposed method:

1. IMO, the mechanism should not be called {{Early Fire}}. In my understanding, early firing refers to enabling the emission of incomplete results from a certain point in time. For example emitting early results for an hourly tumbling window after the first 15 minutes have passed. In this example, {{Early Fire = -45 Minutes}} would specify that the first results for the window may be emitted after 15 minutes (60 minutes - 45 minutes). By definition, a non-windowed aggregation is never complete. Whenever a new record for the group arrives it has to be added to the result. So non-windowed aggregates are always early firing, because the result never completes and we would otherwise never emit a result. Instead of {{Early Fire}}, I would call this mechanism {{Update Rate}} to specify how often a result is emitted.
2. The {{Update Rate}} may not be defined in terms of record count, but only in terms of time. Given an non-windowed grouped aggregate with {{Update Rate = 10 Rows}}, we would never emit a result for a group that only received 9 rows. By defining the update rate as a time interval, we can effectively reduce the number of outgoing records and ensure that updates are eventually propagated.

What do you think [~sunjincheng121]?

> Improve Non-window group aggregate with configurable `earlyFire`.
> -----------------------------------------------------------------
>
>                 Key: FLINK-6649
>                 URL: https://issues.apache.org/jira/browse/FLINK-6649
>             Project: Flink
>          Issue Type: Improvement
>          Components: Table API & SQL
>    Affects Versions: 1.4.0
>            Reporter: sunjincheng
>            Assignee: sunjincheng
>
> Currently,  Non-windowed group aggregate is earlyFiring at count(1), that is every row will emit a aggregate result. But some times user want config count number (`early firing with count[N]`) , to reduce the downstream pressure. This JIRA. will enable the config of e`earlyFiring` for  Non-windowed group aggregate.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)