You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Fabian Hueske (JIRA)" <ji...@apache.org> on 2018/01/09 15:41:03 UTC

[jira] [Commented] (FLINK-8355) DataSet Should not union a NULL row for AGG without GROUP BY clause.

    [ https://issues.apache.org/jira/browse/FLINK-8355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16318619#comment-16318619 ] 

Fabian Hueske commented on FLINK-8355:
--------------------------------------

The motivation for the {{DataSetAggregateWithNullValuesRule}} is to prevent incorrect aggreagtion results for empty tables. For instance the query {{SELECT COUNT(*) FROM mytable}} should return a row {{(0}} and not an empty result. 

Until now, the built-in aggregations were working correctly because they ignored {{null}} values. However, UDAGGs might compute incorrect results if they would not ignore {{null}} values. Hence, it definitely makes sense to remove the rule.

A solution would be to add a {{MapPartitionFunction}} with parallelism 1 after a groupless aggregation. The {{MapPartitionFunction}} would simply forward all input data. If the input is empty, it emits a single result row with all aggregates at initialized state.

> DataSet Should not union a NULL row for AGG without GROUP BY clause.
> --------------------------------------------------------------------
>
>                 Key: FLINK-8355
>                 URL: https://issues.apache.org/jira/browse/FLINK-8355
>             Project: Flink
>          Issue Type: Bug
>          Components: Table API & SQL
>    Affects Versions: 1.5.0
>            Reporter: sunjincheng
>
> Currently {{DataSetAggregateWithNullValuesRule}} will UINON a NULL row for  non grouped aggregate query. when {{CountAggFunction}} support {{COUNT(*)}}(FLINK-8325).  the result will incorrect.
> for example, if Tabble {{T1}} has 3 records. when we run the follow SQL in DataSet: 
> {code}
> SELECT COUNT(*) as cnt from Tab // cnt = 4(incorrect).
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)