You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Ashutosh Chauhan (JIRA)" <ji...@apache.org> on 2018/04/27 04:51:00 UTC
[jira] [Commented] (HIVE-19294) grouping sets when contains a
constant column
[ https://issues.apache.org/jira/browse/HIVE-19294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16455755#comment-16455755 ]
Ashutosh Chauhan commented on HIVE-19294:
-----------------------------------------
Sounds like a bug in Hive's constant folding rule. a can either be 'all' or null after grouping set inner query so outer query case statement folding to CASE WHEN (true) THEN ('x') is incorrect.
> grouping sets when contains a constant column
> ---------------------------------------------
>
> Key: HIVE-19294
> URL: https://issues.apache.org/jira/browse/HIVE-19294
> Project: Hive
> Issue Type: Bug
> Components: CBO
> Affects Versions: 2.3.2
> Reporter: Song Jun
> Priority: Major
>
> We have different results between Hive-1.2.2 and Hive-2.3.2, SQL like this:
> {code:java}
> select
> case when a='all' then 'x'
> when b=1 then 'y'
> else 'z'
> end, c
> from (
> select
> a,b,count(1) as c
> from (
> select
> 'all' as a,b
> from test
> ) t1 group by a,b grouping sets(a,b)
> ) t2;
> {code}
> We have a grouping sets using the column a which is a contant value 'all' in its subquery.
>
> The result of Hive 1.2.2(same result when set hive.cbo.enable to true or false):
> {code:java}
> x 3
> y 2
> z 1 {code}
> The result of Hive 2.3.2(same result when set hive.cbo.enable to true or false):
> {code:java}
> x 3
> x 2
> x 1{code}
> I dig it out for Hive 2.3.2 and set hive.cbo.enable=false, I found it that the optimizer
> ConstantPropagate optimize the code according to the constant column value 'all' in the subquery:
> {code:java}
> case when a='all' then 'x'
> when b=1 then 'y'
> else 'z'
> end
> {code}
> to
> {code:java}
> Select Operator
> expressions: CASE WHEN (true) THEN ('x') WHEN ((_col1 = 1)) THEN ('y') ELSE ('z') END (type: string), _col3 (type: bigint)
> outputColumnNames: _col0, _col1
> Statistics: Num rows: 3 Data size: 3 Basic stats: COMPLETE Column stats: NONE
> {code}
> That is case when a = 'all' explained as case when (true), so we always has the value of 'x'.
>
> So, which should be right for the above query case?
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)