You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@impala.apache.org by "Alexander Behm (JIRA)" <ji...@apache.org> on 2017/09/15 04:01:00 UTC
[jira] [Resolved] (IMPALA-5856) Queries with full outer and left
join miss result rows
[ https://issues.apache.org/jira/browse/IMPALA-5856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Alexander Behm resolved IMPALA-5856.
------------------------------------
Resolution: Fixed
Fix Version/s: Impala 2.11.0
commit 032dee28de2fe95f80d9284eb41b64fd12a56c86
Author: Alex Behm <al...@cloudera.com>
Date: Mon Sep 11 10:33:39 2017 -0700
IMPALA-5856: Fix outer join predicate assignment.
Fixes incorrect assignment of join predicates with
the following properties:
- from the On-clause of a left outer join
- only references the left-hand side tuples (not the
right hand side tuple)
- references full-outer joined tuples; the full outer
join appears on the left
Testing:
- a core/hdfs run passed
- added new regression test
Change-Id: I93db34d988cb66e00aa05d7dc161e0ca47042acb
Reviewed-on: http://gerrit.cloudera.org:8080/8039
Reviewed-by: Alex Behm <al...@cloudera.com>
Tested-by: Impala Public Jenkins
> Queries with full outer and left join miss result rows
> ------------------------------------------------------
>
> Key: IMPALA-5856
> URL: https://issues.apache.org/jira/browse/IMPALA-5856
> Project: IMPALA
> Issue Type: Bug
> Components: Frontend
> Affects Versions: Impala 2.7.0, Impala 2.8.0, Impala 2.9.0, Impala 2.10.0
> Environment: CDH 5.10.0
> Reporter: Julian Eberius
> Assignee: Alexander Behm
> Priority: Blocker
> Labels: correctness
> Fix For: Impala 2.11.0
>
>
> When combining a full outer join with a left join, some of the left join predicates seem to be treated as general WHERE-clauses, which leads to missing rows. Minimal working example:
> {code:sql}
> create table A (a int, av int);
> create table B (a int, bv int);
> create table C (a int, cv int);
> insert into A values (1,1), (2,2), (3,3);
> insert into B values (2,22),(4,44);
> insert into C values (2,222);
> -- all results are returned as expected
> select * from A full outer join B on a.a=b.a left join C on (coalesce(a.a, b.a)=c.a);
> -- only one row is returned, as if the last clause was a WHERE clause
> select * from A full outer join B on a.a=b.a left join C on (coalesce(a.a, b.a)=c.a and coalesce(a.av,b.bv)=2);
> -- no rows are returned at all, even though only the columns of C should be affected
> select * from A full outer join B on a.a=b.a left join C on (coalesce(a.a, b.a)=c.a and coalesce(a.av,b.bv)=100);
> -- removing the full outer join leads to the expected result
> select * from A left join C on (coalesce(a.a)=c.a and coalesce(a.av)=100);
> {code}
> Running the exact same SQL in PostgreSQL, only the columns of C are ever affected by the left join ON condition, the number of rows never changes. As far as we understand, this should be the expected behaviour.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)