You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "ASF subversion and git services (Jira)" <ji...@apache.org> on 2021/01/27 17:31:00 UTC

[jira] [Commented] (IMPALA-10382) Predicate with coalesce on both sides of LOJ isn't NULL filtering

    [ https://issues.apache.org/jira/browse/IMPALA-10382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17273029#comment-17273029 ] 

ASF subversion and git services commented on IMPALA-10382:
----------------------------------------------------------

Commit 4ae847bf94e0a1e07860c9c7aa3f0dfcdf548fac in impala's branch refs/heads/master from xqhe
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=4ae847b ]

IMPALA-10382: fix invalid outer join simplification

When set ENABLE_OUTER_JOIN_TO_INNER_TRANSFORMATION = true, the planner
will simplify outer joins if the predicate with case expr or conditional
function on both sides of outer join.
However, the predicate maybe not null-rejecting, if simplify the outer
join, the result is incorrect. E.g. t1.b > coalesce(t1.c, t2.c) can
return true if t2.c is null, so it is not null-rejecting predicate
for t2.

The fix is simply to support the case that the predicate with two
operands and the operator is one of (=, !=, >, <, >=, <=),
1. one of the operands or
2. if the operand is arithmetic expression and one of the children
does not contain conditional builtin function or case expr and has
tuple id in outer joined tuples.
E.g. t1.b > coalesce(t2.c, t1.c) or t1.b + coalesce(t2.c, t1.c) >
coalesce(t2.c, t1.c) is null-rejecting predicate for t1.

Testing:
* Add new plan tests in outer-to-inner-joins.test
* Add new query tests to verify the correctness on transformation

Change-Id: I84a3812f4212fa823f3d1ced6e12f2df05aedb2b
Reviewed-on: http://gerrit.cloudera.org:8080/16845
Tested-by: Impala Public Jenkins <im...@cloudera.com>
Reviewed-by: Tim Armstrong <ta...@cloudera.com>


> Predicate with coalesce on both sides of LOJ isn't NULL filtering
> -----------------------------------------------------------------
>
>                 Key: IMPALA-10382
>                 URL: https://issues.apache.org/jira/browse/IMPALA-10382
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Frontend
>            Reporter: Shant Hovsepian
>            Assignee: Xianqing He
>            Priority: Critical
>              Labels: correctness
>
> A query like the below will have the outer join simplified to an inner join when the predicate with coalesce isn't always NULL filtering.
> {code:sql}
> select t1.int_col from alltypestiny t1 left outer join alltypesagg t2 on t1.tinyint_col = t2.tinyint_col left outer join alltypes t3 on t1.int_col = t3.int_col where t2.tinyint_col >= coalesce(t1.int_col, t2.int_col);
> {code}
> {noformat}
> functional> set ENABLE_OUTER_JOIN_TO_INNER_TRANSFORMATION=true;
> functional> explain select t1.int_col from alltypestiny t1 left outer join alltypesagg t2 on t1.tinyint_col = t2.tinyint_col left outer join alltypes t3 on t1.int_col = t3.int_col where t2.tinyint_col >= coalesce(t1.int_col, t2.int_col);
> Query: explain select t1.int_col from alltypestiny t1 left outer join alltypesagg t2 on t1.tinyint_col = t2.tinyint_col left outer join alltypes t3 on t1.int_col = t3.int_col where t2.tinyint_col >= coalesce(t1.int_col, t2.int_col)
> +------------------------------------------------------------+
> | Explain String                                             |
> +------------------------------------------------------------+
> | Max Per-Host Resource Reservation: Memory=5.04MB Threads=8 |
> | Per-Host Resource Estimates: Memory=287MB                  |
> |                                                            |
> | PLAN-ROOT SINK                                             |
> | 08:EXCHANGE [UNPARTITIONED]                                |
> | 04:HASH JOIN [LEFT OUTER JOIN, PARTITIONED]                |
> | |--07:EXCHANGE [HASH(t3.int_col)]                          |
> | |  02:SCAN HDFS [functional.alltypes t3]                   |
> | 06:EXCHANGE [HASH(t1.int_col)]                             |
> | 03:HASH JOIN [INNER JOIN, BROADCAST]                       |
> | |--05:EXCHANGE [BROADCAST]                                 |
> | |  00:SCAN HDFS [functional.alltypestiny t1]               |
> | 01:SCAN HDFS [functional.alltypesagg t2]                   |
> +------------------------------------------------------------+
> Fetched 13 row(s) in 0.02s
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org