You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Ashutosh Chauhan (JIRA)" <ji...@apache.org> on 2018/04/27 04:51:00 UTC

[jira] [Commented] (HIVE-19294) grouping sets when contains a constant column

    [ https://issues.apache.org/jira/browse/HIVE-19294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16455755#comment-16455755 ] 

Ashutosh Chauhan commented on HIVE-19294:
-----------------------------------------

Sounds like a bug in Hive's constant folding rule. a can either be 'all' or null after grouping set inner query so outer query case statement folding to  CASE WHEN (true) THEN ('x') is incorrect.

> grouping sets when contains a constant column
> ---------------------------------------------
>
>                 Key: HIVE-19294
>                 URL: https://issues.apache.org/jira/browse/HIVE-19294
>             Project: Hive
>          Issue Type: Bug
>          Components: CBO
>    Affects Versions: 2.3.2
>            Reporter: Song Jun
>            Priority: Major
>
> We have different results between Hive-1.2.2 and Hive-2.3.2, SQL like this:
> {code:java}
> select 
>     case when a='all' then 'x' 
>          when b=1 then 'y' 
>          else 'z' 
>     end, c 
> from ( 
>     select 
>          a,b,count(1) as c
>     from ( 
>          select 
>             'all' as a,b 
>          from test 
>     ) t1 group by a,b grouping sets(a,b) 
> ) t2;
> {code}
> We have a grouping sets using the column a which is a contant value 'all' in its subquery.
>  
> The result of Hive 1.2.2(same result when set hive.cbo.enable to true or false):
> {code:java}
> x 3
> y 2
> z 1   {code}
> The result of Hive 2.3.2(same result when set hive.cbo.enable to true or false):
> {code:java}
> x 3
> x 2
> x 1{code}
> I dig it out for Hive 2.3.2 and set hive.cbo.enable=false, I found it that the optimizer 
> ConstantPropagate optimize the code according to the constant column value 'all' in the subquery:
> {code:java}
> case when a='all' then 'x' 
>          when b=1 then 'y' 
>          else 'z' 
> end
> {code}
> to 
> {code:java}
> Select Operator
> expressions: CASE WHEN (true) THEN ('x') WHEN ((_col1 = 1)) THEN ('y') ELSE ('z') END (type: string), _col3 (type: bigint)
> outputColumnNames: _col0, _col1
> Statistics: Num rows: 3 Data size: 3 Basic stats: COMPLETE Column stats: NONE
> {code}
> That is case when a = 'all'  explained as case when (true), so we always has the value of 'x'.
>  
> So, which should be right for the above query case? 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)