You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Fabian Hueske (JIRA)" <ji...@apache.org> on 2016/05/20 09:16:12 UTC

[jira] [Commented] (FLINK-3475) DISTINCT aggregate function support

    [ https://issues.apache.org/jira/browse/FLINK-3475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15293052#comment-15293052 ] 

Fabian Hueske commented on FLINK-3475:
--------------------------------------

DISTINCT aggregates can be computed by sorting the reduce group on the distinct attribute (secondary sort) and not considering duplicate values. A first step would be to add support for a single distinct attribute (groups can only be primarily sorted on one attribute). 

In case of multiple distinct aggregates, we have to split the aggregation into several group reduce operators and join the result afterwards. The join can be done locally and in a streamed merge join (partitioning and sorting will be preserved).

> DISTINCT aggregate function support
> -----------------------------------
>
>                 Key: FLINK-3475
>                 URL: https://issues.apache.org/jira/browse/FLINK-3475
>             Project: Flink
>          Issue Type: Sub-task
>          Components: Table API
>            Reporter: Chengxiang Li
>            Assignee: Chengxiang Li
>
> DISTINCT aggregate function may be able to reuse the aggregate function instead of separate implementation, and let Flink runtime take care of duplicate records.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)