You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "honghui.Liu (Jira)" <ji...@apache.org> on 2022/05/18 02:49:00 UTC
[jira] [Assigned] (HIVE-26236) count(1) with subquery count(distinct) gives wrong results with hive.optimize.distinct.rewrite=true and cbo on
[ https://issues.apache.org/jira/browse/HIVE-26236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
honghui.Liu reassigned HIVE-26236:
----------------------------------
> count(1) with subquery count(distinct) gives wrong results with hive.optimize.distinct.rewrite=true and cbo on
> --------------------------------------------------------------------------------------------------------------
>
> Key: HIVE-26236
> URL: https://issues.apache.org/jira/browse/HIVE-26236
> Project: Hive
> Issue Type: Bug
> Components: CBO, Logical Optimizer
> Affects Versions: All Versions
> Reporter: honghui.Liu
> Assignee: honghui.Liu
> Priority: Major
>
> {code:java}
> create table count_distinct(a int, b int);
> insert into table count_distinct values (1,2),(2,3);
> set hive.execution.engine=tez;
> set hive.cbo.enable=true;
> set hive.optimize.distinct.rewrite=true;
> select count(1) from (
> select count(distinct a) from count_distinct
> ) tmp; {code}
> it give wrong result when hive.optimize.distinct.rewrite is true, By default, it's true for all 3.x versions. The test result is 2, and the expected result is 1.
> Before CBO optimization,RelNode tree as this,
>
> {code:java}
> HiveProject(_o__c0=[$0])
> HiveAggregate(group=[{}], agg#0=[count($0)])
> HiveProject($f0=[1])
> HiveProject(_o__c0=[$0])
> HiveAggregate(group=[{}], agg#0=[count(DISTINCT $0)])
> HiveProject($f0=[$0])
> HiveTableScan(table=[[default.count_distinct]], table:alias=[count_distinct]) {code}
>
> Optimized by HiveExpandDistinctAggregatesRule, RelNode tree as this,
>
> {code:java}
> HiveProject(_o__c0=[$0])
> HiveAggregate(group=[{}], agg#0=[count($0)])
> HiveProject($f0=[1])
> HiveProject(_o__c0=[$0])
> HiveAggregate(group=[{}], agg#0=[count($0)])
> HiveAggregate(group=[{0}])
> HiveProject($f0=[$0])
> HiveProject($f0=[$0])
> HiveTableScan(table=[[default.count_distinct]], table:alias=[count_distinct]) {code}
> count(distinct xx) converte to count (xx) from (select xx from table_name group by xx)
>
> Optimized by Projection Pruning, RelNode tree as this,
> {code:java}
> HiveAggregate(group=[{}], agg#0=[count()])
> HiveProject(DUMMY=[0])
> HiveAggregate(group=[{}])
> HiveAggregate(group=[{0}])
> HiveProject(a=[$0])
> HiveTableScan(table=[[default.count_distinct]], table:alias=[count_distinct]) {code}
> In this case, an error occurs in the execution plan.
--
This message was sent by Atlassian Jira
(v8.20.7#820007)