You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Fabian Hueske (JIRA)" <ji...@apache.org> on 2018/01/09 15:48:00 UTC
[jira] [Comment Edited] (FLINK-8355) DataSet Should not union a
NULL row for AGG without GROUP BY clause.
[ https://issues.apache.org/jira/browse/FLINK-8355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16318619#comment-16318619 ]
Fabian Hueske edited comment on FLINK-8355 at 1/9/18 3:47 PM:
--------------------------------------------------------------
The motivation for the {{DataSetAggregateWithNullValuesRule}} is to prevent incorrect aggreagtion results for empty tables. For instance the query {{SELECT COUNT( *) FROM mytable}} should return a row {{(0)}} and not an empty result.
Until now, the built-in aggregations were working correctly because they ignored {{null}} values. However, UDAGGs might compute incorrect results if they would not ignore {{null}} values. Hence, it definitely makes sense to remove the rule.
A solution would be to add a {{MapPartitionFunction}} with parallelism 1 after a groupless aggregation. The {{MapPartitionFunction}} would simply forward all input data. If the input is empty, it emits a single result row with all aggregates at initialized state.
was (Author: fhueske):
The motivation for the {{DataSetAggregateWithNullValuesRule}} is to prevent incorrect aggreagtion results for empty tables. For instance the query {{SELECT COUNT(*) FROM mytable}} should return a row {{(0}} and not an empty result.
Until now, the built-in aggregations were working correctly because they ignored {{null}} values. However, UDAGGs might compute incorrect results if they would not ignore {{null}} values. Hence, it definitely makes sense to remove the rule.
A solution would be to add a {{MapPartitionFunction}} with parallelism 1 after a groupless aggregation. The {{MapPartitionFunction}} would simply forward all input data. If the input is empty, it emits a single result row with all aggregates at initialized state.
> DataSet Should not union a NULL row for AGG without GROUP BY clause.
> --------------------------------------------------------------------
>
> Key: FLINK-8355
> URL: https://issues.apache.org/jira/browse/FLINK-8355
> Project: Flink
> Issue Type: Bug
> Components: Table API & SQL
> Affects Versions: 1.5.0
> Reporter: sunjincheng
>
> Currently {{DataSetAggregateWithNullValuesRule}} will UINON a NULL row for non grouped aggregate query. when {{CountAggFunction}} support {{COUNT(*)}}(FLINK-8325). the result will incorrect.
> for example, if Tabble {{T1}} has 3 records. when we run the follow SQL in DataSet:
> {code}
> SELECT COUNT(*) as cnt from Tab // cnt = 4(incorrect).
> {code}
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)