You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "James Norvell (JIRA)" <ji...@apache.org> on 2019/02/25 22:38:00 UTC
[jira] [Created] (HIVE-21322) Multiple table LEFT OUTER JOIN
results are incorrect when 'is not null' used in WHERE clause.
James Norvell created HIVE-21322:
------------------------------------
Summary: Multiple table LEFT OUTER JOIN results are incorrect when 'is not null' used in WHERE clause.
Key: HIVE-21322
URL: https://issues.apache.org/jira/browse/HIVE-21322
Project: Hive
Issue Type: Bug
Components: CBO
Affects Versions: 2.3.4
Environment: Hive 2.3.4 (emr-5.21.0)
Reporter: James Norvell
Attachments: explain-plans.txt
Reproduction:
Create tables:
{code:java}
create table a (id string); insert into a values (1),(2),(3);
create table b (id string, name string); insert into b values (1,'a'),(2,'b'),(3,null);
create table c (id string); insert into c values (11),(22),(33);
{code}
When joining a -> b -> c on id, the following query is correct:
{code:java}
select a.id, b.name from a left outer join b on a.id = b.id left outer join c on a.id = c.id where b.name is not null;
OK
1 a
2 b
Time taken: 10.231 seconds, Fetched: 2 row(s)
{code}
Switching the join order from a -> c -> b results in incorrect results:
{code:java}
select a.id, b.name from a
left outer join c on a.id = c.id
left outer join b on a.id = b.id
where b.name is not null;
OK
2 b
Time taken: 10.321 seconds, Fetched: 1 row(s)
{code}
Disabling hive.cbo.enable or changing execution engine to mr avoids the issue:
{code:java}
set hive.cbo.enable=false;
select a.id, b.name from a left outer join c on a.id = c.id left outer join b on a.id = b.id where b.name is not null;
OK
1 a
2 b
Time taken: 9.614 seconds, Fetched: 2 row(s)
set hive.cbo.enable=true;
set hive.execution.engine=mr;
select a.id, b.name from a left outer join c on a.id = c.id left outer join b on a.id = b.id where b.name is not null;
OK
1 a
2 b
Time taken: 29.377 seconds, Fetched: 2 row(s)
{code}
Issue doesn't occur when using 'is null':
{code:java}
select a.id, b.name from a left outer join c on a.id = c.id left outer join b on a.id = b.id where b.name is null;
OK
3 NULL
Time taken: 9.673 seconds, Fetched: 1 row(s)
{code}
Explain plans for queries attached.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)