You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Kurt Young (JIRA)" <ji...@apache.org> on 2019/04/13 12:05:00 UTC
[jira] [Closed] (FLINK-12161) Supports partial-final optimization
for stream group aggregate
[ https://issues.apache.org/jira/browse/FLINK-12161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Kurt Young closed FLINK-12161.
------------------------------
Resolution: Fixed
Fix Version/s: 1.9.0
fixed in f1a45157238d2f76c4879d953ccd3877958da7c8
> Supports partial-final optimization for stream group aggregate
> ---------------------------------------------------------------
>
> Key: FLINK-12161
> URL: https://issues.apache.org/jira/browse/FLINK-12161
> Project: Flink
> Issue Type: New Feature
> Components: Table SQL / Planner
> Reporter: godfrey he
> Assignee: godfrey he
> Priority: Major
> Labels: pull-request-available
> Fix For: 1.9.0
>
> Time Spent: 20m
> Remaining Estimate: 0h
>
> To resolve data-skew for distinct aggregates on stream, we introduce a rule named {{SplitAggregateRule}} which rewrites an aggregate query with distinct aggregations into an expanded double aggregations. The first aggregation compute the results in sub-partition(with bucket) and the results are combined by the second aggregation.
> if two-stage aggregation is also enabled, we find that many plans have common pattern, looks like:
> {code}
> ... (output)
> StreamExecGlobalGroupAggregate (final global agg)
> +- StreamExecExchange
> +- StreamExecLocalGroupAggregate (final local agg)
> +- StreamExecGlobalGroupAggregate (partial global agg)
> +- .... (input)
> {code}
> There is no exchange between the final local aggregate and the partial global aggregate, so they will be executed in a same JobVertex, and could share state. We introduce a rule named {{IncrementalAggregateRule}} to do that optimization.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)