You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Zoltan Haindrich (Jira)" <ji...@apache.org> on 2021/11/30 08:56:00 UTC

[jira] [Resolved] (HIVE-25734) Wrongly-typed constant in case expression leads to incorrect empty result

     [ https://issues.apache.org/jira/browse/HIVE-25734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Zoltan Haindrich resolved HIVE-25734.
-------------------------------------
    Fix Version/s: 4.0.0
       Resolution: Fixed

merged into master. Thank you [~asolimando]!

> Wrongly-typed constant in case expression leads to incorrect empty result
> -------------------------------------------------------------------------
>
>                 Key: HIVE-25734
>                 URL: https://issues.apache.org/jira/browse/HIVE-25734
>             Project: Hive
>          Issue Type: Bug
>          Components: CBO, Query Planning
>    Affects Versions: 4.0.0
>            Reporter: Alessandro Solimando
>            Assignee: Alessandro Solimando
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 4.0.0
>
>          Time Spent: 1h
>  Remaining Estimate: 0h
>
>  
> The type of constants in case expressions should be inferred, if possible, by the "surrounding" input reference columns, if any.
> Consider the following table and query: 
> {code:java}
> create external table test_case (row_seq smallint, row_desc string) stored as parquet;
> insert into test_case values (1, 'a');
> insert into test_case values (2, 'aa');
> insert into test_case values (6, 'aaaaaa');
> with base_t as (select row_seq, row_desc,
>   case row_seq
>     when 1 then '34'
>     when 6 then '35'
>     when 2 then '36'
>   end as zb from test_case where row_seq in (1,2,6))
> select row_seq, row_desc, zb from base_t where zb <> '34';{code}
> The aforementioned query fails by returning an empty results, while "1 a 34" is expected.
>  
> To understand the root cause, let's consider the debug input and output of some related CBO rules which are triggered during the evaluation of the query: 
>  
> {noformat}
> --$0 is the column 'row_seq'
> 1. HiveReduceExpressionsRule
> Input: AND(IN($0, 1:SMALLINT, 2:SMALLINT, 6:SMALLINT), <>(CASE(=($0, 1:INTEGER), '34':VARCHAR, =($0, 6:INTEGER), '35':VARCHAR, =($0, 2:INTEGER), '36':VARCHAR, null:VARCHAR), '34':CHAR(2)))
> Output: AND(IN($0, 1:SMALLINT, 2:SMALLINT, 6:SMALLINT), OR(=($0, 6:INTEGER), =($0, 2:INTEGER)), IS NOT TRUE(=($0, 1:INTEGER)))
> 2. HivePointLookupOptimizerRule.RexTransformIntoInClause
> Input: AND(IN($0, 1:SMALLINT, 2:SMALLINT, 6:SMALLINT), OR(=($0, 6:INTEGER), =($0, 2:INTEGER)), IS NOT TRUE(=($0, 1:INTEGER)))
> Output: AND(IN($0, 1:SMALLINT, 2:SMALLINT, 6:SMALLINT), IN($0, 6:INTEGER, 2:INTEGER), IS NOT TRUE(=($0, 1:INTEGER)))
> 3. HivePointLookupOptimizerRule.RexMergeInClause
> Input: AND(IN($0, 1:SMALLINT, 2:SMALLINT, 6:SMALLINT), IN($0, 6:INTEGER, 2:INTEGER), IS NOT TRUE(=($0, 1:INTEGER)))
> Output: false{noformat}
> In the first part, we can see that the constants are correctly typed as "SMALLINT" in the first part of the "AND" operand, while they are typed as "INTEGER" for the "CASE" expression, despite the input reference "$0" being available for inferring a more precise type.
> This type difference makes "HivePointLookupOptimizerRule.RexMergeInClause" missing the commonality between the two "IN" expressions, whose intersection is considered empty, hence the empty result.
> Providing a more refined type inference for "case" expressions should fix the issue.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)