You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@impala.apache.org by "Alexander Behm (JIRA)" <ji...@apache.org> on 2017/09/15 04:01:00 UTC

[jira] [Resolved] (IMPALA-5856) Queries with full outer and left join miss result rows

     [ https://issues.apache.org/jira/browse/IMPALA-5856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Alexander Behm resolved IMPALA-5856.
------------------------------------
       Resolution: Fixed
    Fix Version/s: Impala 2.11.0

commit 032dee28de2fe95f80d9284eb41b64fd12a56c86
Author: Alex Behm <al...@cloudera.com>
Date:   Mon Sep 11 10:33:39 2017 -0700

    IMPALA-5856: Fix outer join predicate assignment.
    
    Fixes incorrect assignment of join predicates with
    the following properties:
    - from the On-clause of a left outer join
    - only references the left-hand side tuples (not the
      right hand side tuple)
    - references full-outer joined tuples; the full outer
      join appears on the left
    
    Testing:
    - a core/hdfs run passed
    - added new regression test
    
    Change-Id: I93db34d988cb66e00aa05d7dc161e0ca47042acb
    Reviewed-on: http://gerrit.cloudera.org:8080/8039
    Reviewed-by: Alex Behm <al...@cloudera.com>
    Tested-by: Impala Public Jenkins


> Queries with full outer and left join miss result rows
> ------------------------------------------------------
>
>                 Key: IMPALA-5856
>                 URL: https://issues.apache.org/jira/browse/IMPALA-5856
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Frontend
>    Affects Versions: Impala 2.7.0, Impala 2.8.0, Impala 2.9.0, Impala 2.10.0
>         Environment: CDH 5.10.0
>            Reporter: Julian Eberius
>            Assignee: Alexander Behm
>            Priority: Blocker
>              Labels: correctness
>             Fix For: Impala 2.11.0
>
>
> When combining a full outer join with a left join, some of the left join predicates seem to be treated as general WHERE-clauses, which leads to missing rows. Minimal working example:
> {code:sql}
> create table A (a int, av int);
> create table B (a int, bv int);
> create table C (a int, cv int);
> insert into A values (1,1), (2,2), (3,3);
> insert into B values (2,22),(4,44);
> insert into C values (2,222);
> -- all results are returned as expected
> select * from A full outer join B on a.a=b.a left join C on (coalesce(a.a, b.a)=c.a);
> -- only one row is returned, as if the last clause was a WHERE clause
> select * from A full outer join B on a.a=b.a left join C on (coalesce(a.a, b.a)=c.a and coalesce(a.av,b.bv)=2);
> -- no rows are returned at all, even though only the columns of C should be affected
> select * from A full outer join B on a.a=b.a left join C on (coalesce(a.a, b.a)=c.a and coalesce(a.av,b.bv)=100);
> -- removing the full outer join leads to the expected result
> select * from A left join C on (coalesce(a.a)=c.a and coalesce(a.av)=100);
> {code}
> Running the exact same SQL in PostgreSQL, only the columns of C are ever affected by the left join ON condition, the number of rows never changes. As far as we understand, this should be the expected behaviour. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)