You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@hive.apache.org by "honghui.Liu (Jira)" <ji...@apache.org> on 2022/05/18 02:49:00 UTC

[jira] [Assigned] (HIVE-26236) count(1) with subquery count(distinct) gives wrong results with hive.optimize.distinct.rewrite=true and cbo on

     [ https://issues.apache.org/jira/browse/HIVE-26236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

honghui.Liu reassigned HIVE-26236:
----------------------------------


> count(1) with subquery count(distinct) gives wrong results with hive.optimize.distinct.rewrite=true and cbo on
> --------------------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-26236
>                 URL: https://issues.apache.org/jira/browse/HIVE-26236
>             Project: Hive
>          Issue Type: Bug
>          Components: CBO, Logical Optimizer
>    Affects Versions: All Versions
>            Reporter: honghui.Liu
>            Assignee: honghui.Liu
>            Priority: Major
>
> {code:java}
> create table count_distinct(a int, b int);
> insert into table count_distinct values (1,2),(2,3);
> set hive.execution.engine=tez;
> set hive.cbo.enable=true;
> set hive.optimize.distinct.rewrite=true;
> select count(1) from ( 
>       select count(distinct a) from count_distinct
> ) tmp; {code}
> it give wrong result when hive.optimize.distinct.rewrite is true, By default, it's true for all 3.x versions. The test result is 2, and the expected result is 1.
> Before CBO optimization，RelNode tree as this，
>  
> {code:java}
> HiveProject(_o__c0=[$0])
>   HiveAggregate(group=[{}], agg#0=[count($0)])
>     HiveProject($f0=[1])
>       HiveProject(_o__c0=[$0])
>         HiveAggregate(group=[{}], agg#0=[count(DISTINCT $0)])
>           HiveProject($f0=[$0])
>             HiveTableScan(table=[[default.count_distinct]], table:alias=[count_distinct]) {code}
>  
> Optimized by HiveExpandDistinctAggregatesRule, RelNode tree as this，
>  
> {code:java}
> HiveProject(_o__c0=[$0])
>   HiveAggregate(group=[{}], agg#0=[count($0)])
>     HiveProject($f0=[1])
>       HiveProject(_o__c0=[$0])
>         HiveAggregate(group=[{}], agg#0=[count($0)])
>           HiveAggregate(group=[{0}])
>             HiveProject($f0=[$0])
>               HiveProject($f0=[$0])
>                 HiveTableScan(table=[[default.count_distinct]], table:alias=[count_distinct]) {code}
> count(distinct xx) converte to count (xx) from (select xx from table_name group by xx) 
>  
> Optimized by Projection Pruning, RelNode tree as this, 
> {code:java}
> HiveAggregate(group=[{}], agg#0=[count()])
>   HiveProject(DUMMY=[0])
>     HiveAggregate(group=[{}])
>       HiveAggregate(group=[{0}])
>         HiveProject(a=[$0])
>           HiveTableScan(table=[[default.count_distinct]], table:alias=[count_distinct]) {code}
> In this case, an error occurs in the execution plan.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)