You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Fabian Hueske (JIRA)" <ji...@apache.org> on 2017/01/20 23:41:26 UTC

[jira] [Commented] (FLINK-5564) User Defined Aggregates

    [ https://issues.apache.org/jira/browse/FLINK-5564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15832615#comment-15832615 ] 

Fabian Hueske commented on FLINK-5564:
--------------------------------------

Thanks for the great proposal! 
User defined aggregation functions are a very important feature for the Table API and SQL.

Can you add subissues to this JIRA to break it down in individual steps, such as

1. add the new UDAGG interface and migrate existing aggregation functions to it
2. use the new aggregation function for batch tables
3. use the new aggregation function for streaming tables (depends on FLINK-5582)
4. add API to register user-defined aggregation functions
5. add support for retraction (streaming only)
6. add support for local global aggregate (streaming only)

Thanks, Fabian

> User Defined Aggregates
> -----------------------
>
>                 Key: FLINK-5564
>                 URL: https://issues.apache.org/jira/browse/FLINK-5564
>             Project: Flink
>          Issue Type: Improvement
>          Components: Table API & SQL
>            Reporter: Shaoxuan Wang
>            Assignee: Shaoxuan Wang
>
> User-defined aggregates would be a great addition to the Table API / SQL.
> The current aggregate interface is not well suited for the external users.  This issue proposes to redesign the aggregate such that we can expose an better external UDAGG interface to the users. The detailed design proposal can be found here: https://docs.google.com/document/d/19JXK8jLIi8IqV9yf7hOs_Oz67yXOypY7Uh5gIOK2r-U/edit
> Motivation:
> 1. The current aggregate interface is not very concise to the users. One needs to know the design details of the intermediate Row buffer before implements an Aggregate. Seven functions are needed even for a simple Count aggregate.
> 2. Another limitation of current aggregate function is that it can only be applied on one single column. There are many scenarios which require the aggregate function taking multiple columns as the inputs.
> 3. “Retraction” is not considered and covered in the current Aggregate.
> 4. It might be very good to have a local/global aggregate query plan optimization, which is very promising to optimize UDAGG performance in some scenarios.
> Proposed Changes:
> 1. Implement an aggregate dataStream API
> 2. Update all the existing aggregates to use the new aggregate dataStream API
> 3. Provide a better User-Design Aggregate interface
> 4. Add retraction support
> 5. Add local/global aggregate



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)