You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "zengxl (JIRA)" <ji...@apache.org> on 2019/03/15 08:28:00 UTC

[jira] [Created] (HIVE-21452) Loss of query condition when exist exists

zengxl created HIVE-21452:
-----------------------------

             Summary: Loss of query condition when exist exists
                 Key: HIVE-21452
                 URL: https://issues.apache.org/jira/browse/HIVE-21452
             Project: Hive
          Issue Type: Bug
    Affects Versions: 1.1.0
            Reporter: zengxl


In our production environment, there are four tables to do association queries. There are exists in the conditions, and we found that the first two conditions were lost.

The following two conditions are missing:

{color:#f79232}t2.cust_no is null and t4.cust_level not in ('4','5'){color}
 
In the test environment, I prepared the data of the following four tables, and a condition was lost in the simulation.

tables:

test_table1
cust_no,name
60001,lisa
60002,tina
60003,kylin
60004,jeny
60005,john
60006,jamse

test_table2
cust_no,acct_type
60001,1
60001,1
60001,2
60002,1
60003,2
60003,3

test_table3
cust_no
60001
60002
60003
60004
60005
60007

test_table4
cust_no,cust_level
60001,1
60002,2
60003,3
60004,4
60005,5

 

create table tmp.test_table1(cust_no string,name string);
create table tmp.test_table2(cust_no string,acct_type string);
create table tmp.test_table3(cust_no string);
create table tmp.test_table4(cust_no string,cust_level string);

insert into tmp.test_table1 select '60001','lisa';
insert into tmp.test_table1 select '60002','tina';
insert into tmp.test_table1 select '60003','kylin';
insert into tmp.test_table1 select '60004','jeny';
insert into tmp.test_table1 select '60005','john';
insert into tmp.test_table1 select '60006','jamse';

insert into tmp.test_table2 select '60001','1';
insert into tmp.test_table2 select '60001','1';
insert into tmp.test_table2 select '60001','2';
insert into tmp.test_table2 select '60002','1';
insert into tmp.test_table2 select '60003','2';
insert into tmp.test_table2 select '60002','3';

insert into tmp.test_table3 select '60001';
insert into tmp.test_table3 select '60002';
insert into tmp.test_table3 select '60003';
insert into tmp.test_table3 select '60004';
insert into tmp.test_table3 select '60005';
insert into tmp.test_table3 select '60007';

insert into tmp.test_table4 select '60001','1';
insert into tmp.test_table4 select '60002','2';
insert into tmp.test_table4 select '60003','3';
insert into tmp.test_table4 select '60004','4';
insert into tmp.test_table4 select '60005','5';
 
Here is my query SQL And shut down mapjoin:

set hive.auto.convert.join=false;

select t1.cust_no as cust_no,t2.cust_no as custNO,t1.name from tmp.test_table1 t1
left join tmp.test_table2 t2 on t1.cust_no=t2.cust_no
and t2.acct_type='1'
left join tmp.test_table4 t4 on t1.cust_no=t4.cust_no
where t2.cust_no is null and t4.cust_level not in ('4','5') and exists (select 1 from tmp.test_table3 t3 where t1.cust_no=t3.cust_no)

 

All I want is to include cust_no for 6003,But the result is inclusive 6004 and 6005,this wrong 。{color:#f79232}In my production environment, 6001 came out。Loss of condition because cust_no is  null。{color}

{color:#f6c342}View the execution plan, t4.cust_level not in ('4','5') condition missing{color}

*result:*

60003 NULL kylin
60003 NULL kylin
60003 NULL kylin
60004 NULL jeny
60005 NULL john

 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)