You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Vlad Gudikov (JIRA)" <ji...@apache.org> on 2017/08/03 06:06:00 UTC
[jira] [Updated] (HIVE-17148) Incorrect result for Hive join query with COALESCE in WHERE condition

     [ https://issues.apache.org/jira/browse/HIVE-17148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Vlad Gudikov updated HIVE-17148:
--------------------------------
    Attachment: HIVE-17148.1.patch

> Incorrect result for Hive join query with COALESCE in WHERE condition
> ---------------------------------------------------------------------
>
>                 Key: HIVE-17148
>                 URL: https://issues.apache.org/jira/browse/HIVE-17148
>             Project: Hive
>          Issue Type: Bug
>          Components: CBO
>    Affects Versions: 2.1.1
>            Reporter: Vlad Gudikov
>            Assignee: Vlad Gudikov
>         Attachments: HIVE-17148.1.patch, HIVE-17148.patch
>
>
> The issue exists in Hive-2.1. In Hive-1.2 the query works fine with cbo enabled:
> STEPS TO REPRODUCE:
> {code}
> Step 1: Create a table ct1
> create table ct1 (a1 string,b1 string);
> Step 2: Create a table ct2
> create table ct2 (a2 string);
> Step 3 : Insert following data into table ct1
> insert into table ct1 (a1) values ('1');
> Step 4 : Insert following data into table ct2
> insert into table ct2 (a2) values ('1');
> Step 5 : Execute the following query 
> select * from ct1 c1, ct2 c2 where COALESCE(a1,b1)=a2;
> {code}
> ACTUAL RESULT:
> {code}
> The query returns nothing;
> {code}
> EXPECTED RESULT:
> {code}
> 1       NULL    1
> {code}
> The issue seems to be because of the incorrect query plan. In the plan we can see:
> predicate:(a1 is not null and b1 is not null)
> which does not look correct. As a result, it is filtering out all the rows is any column mentioned in the COALESCE has null value.
> Please find the query plan below:
> {code}
> Plan optimized by CBO.
> Vertex dependency in root stage
> Map 1 <- Map 2 (BROADCAST_EDGE)
> Stage-0
>   Fetch Operator
>     limit:-1
>     Stage-1
>       Map 1
>       File Output Operator [FS_10]
>         Map Join Operator [MAPJOIN_15] (rows=1 width=4)
>           Conds:SEL_2.COALESCE(_col0,_col1)=RS_7._col0(Inner),HybridGraceHashJoin:true,Output:["_col0","_col1","_col2"]
>         <-Map 2 [BROADCAST_EDGE]
>           BROADCAST [RS_7]
>             PartitionCols:_col0
>             Select Operator [SEL_5] (rows=1 width=1)
>               Output:["_col0"]
>               Filter Operator [FIL_14] (rows=1 width=1)
>                 predicate:a2 is not null
>                 TableScan [TS_3] (rows=1 width=1)
>                   default@ct2,c2,Tbl:COMPLETE,Col:NONE,Output:["a2"]
>         <-Select Operator [SEL_2] (rows=1 width=4)
>             Output:["_col0","_col1"]
>             Filter Operator [FIL_13] (rows=1 width=4)
>               predicate:(a1 is not null and b1 is not null)
>               TableScan [TS_0] (rows=1 width=4)
>                 default@ct1,c1,Tbl:COMPLETE,Col:NONE,Output:["a1","b1"]
> {code}
> This happens only if join is inner type, otherwise HiveJoinAddNotRule which creates this problem is skipped.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)