You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2021/11/24 08:19:00 UTC

[jira] [Work logged] (HIVE-25734) Wrongly-typed constant in case expression leads to incorrect empty result

     [ https://issues.apache.org/jira/browse/HIVE-25734?focusedWorklogId=685703&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-685703 ]

ASF GitHub Bot logged work on HIVE-25734:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 24/Nov/21 08:18
            Start Date: 24/Nov/21 08:18
    Worklog Time Spent: 10m 
      Work Description: kgyrtkirk commented on a change in pull request #2811:
URL: https://github.com/apache/hive/pull/2811#discussion_r755789803



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/parse/type/TypeCheckProcFactory.java
##########
@@ -985,6 +986,11 @@ protected T getXpathOrFuncExprNodeDesc(ASTNode node,
             children.set(constIdx, newChild);
           }
         }
+        if(funcText.equalsIgnoreCase("case")) {
+          ListMultimap<TypeInfo, T> expressions = getTypeInfoTListMultimap((List<T>) children);

Review comment:
       making this for CASE doesn't look right - this `getTypeInfoTListMultimap` which was extracted from the IN part below - main purpose was to make all the IN values conform to the type of the value being looked up;
   
   in case of CASE the 1st argument is a boolean condition - forcing all the other childs in that node to be boolean doesn't make sense in all cases.

##########
File path: ql/src/java/org/apache/hadoop/hive/ql/parse/type/TypeCheckProcFactory.java
##########
@@ -74,6 +74,7 @@
 import org.apache.hadoop.hive.serde2.typeinfo.TypeInfo;
 import org.apache.hadoop.hive.serde2.typeinfo.TypeInfoFactory;
 import org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils;
+import org.jetbrains.annotations.NotNull;

Review comment:
       `javax.annotation.NotNull`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Issue Time Tracking
-------------------

            Worklog Id:     (was: 685703)
    Remaining Estimate: 0h
            Time Spent: 10m

> Wrongly-typed constant in case expression leads to incorrect empty result
> -------------------------------------------------------------------------
>
>                 Key: HIVE-25734
>                 URL: https://issues.apache.org/jira/browse/HIVE-25734
>             Project: Hive
>          Issue Type: Bug
>          Components: CBO
>    Affects Versions: 4.0.0
>            Reporter: Alessandro Solimando
>            Assignee: Ayush Saxena
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
>  
> The type of constants in case expressions should be inferred, if possible, by the "surrounding" input reference columns, if any.
> Consider the following table and query: 
> {code:java}
> create external table test_case (row_seq smallint, row_desc string) stored as parquet;
> insert into test_case values (1, 'a');
> insert into test_case values (2, 'aa');
> insert into test_case values (6, 'aaaaaa');
> with base_t as (select row_seq, row_desc,
>   case row_seq
>     when 1 then '34'
>     when 6 then '35'
>     when 2 then '36'
>   end as zb from test_case where row_seq in (1,2,6))
> select row_seq, row_desc, zb from base_t where zb <> '34';{code}
> The aforementioned query fails by returning an empty results, while "1 a 34" is expected.
>  
> To understand the root cause, let's consider the debug input and output of some related CBO rules which are triggered during the evaluation of the query: 
>  
> {noformat}
> --$0 is the column 'row_seq'
> 1. HiveReduceExpressionsRule
> Input: AND(IN($0, 1:SMALLINT, 2:SMALLINT, 6:SMALLINT), <>(CASE(=($0, 1:INTEGER), '34':VARCHAR, =($0, 6:INTEGER), '35':VARCHAR, =($0, 2:INTEGER), '36':VARCHAR, null:VARCHAR), '34':CHAR(2)))
> Output: AND(IN($0, 1:SMALLINT, 2:SMALLINT, 6:SMALLINT), OR(=($0, 6:INTEGER), =($0, 2:INTEGER)), IS NOT TRUE(=($0, 1:INTEGER)))
> 2. HivePointLookupOptimizerRule.RexTransformIntoInClause
> Input: AND(IN($0, 1:SMALLINT, 2:SMALLINT, 6:SMALLINT), OR(=($0, 6:INTEGER), =($0, 2:INTEGER)), IS NOT TRUE(=($0, 1:INTEGER)))
> Output: AND(IN($0, 1:SMALLINT, 2:SMALLINT, 6:SMALLINT), IN($0, 6:INTEGER, 2:INTEGER), IS NOT TRUE(=($0, 1:INTEGER)))
> 3. HivePointLookupOptimizerRule.RexMergeInClause
> Input: AND(IN($0, 1:SMALLINT, 2:SMALLINT, 6:SMALLINT), IN($0, 6:INTEGER, 2:INTEGER), IS NOT TRUE(=($0, 1:INTEGER)))
> Output: false{noformat}
> In the first part, we can see that the constants are correctly typed as "SMALLINT" in the first part of the "AND" operand, while they are typed as "INTEGER" for the "CASE" expression, despite the input reference "$0" being available for inferring a more precise type.
> This type difference makes "HivePointLookupOptimizerRule.RexMergeInClause" missing the commonality between the two "IN" expressions, whose intersection is considered empty, hence the empty result.
> Providing a more refined type inference for "case" expressions should fix the issue.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)