You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@impala.apache.org by "Alexander Behm (JIRA)" <ji...@apache.org> on 2017/12/06 21:55:00 UTC

[jira] [Created] (IMPALA-6286) Wrong results with outer join and RUNTIME_FILTER_MODE=GLOBAL

Alexander Behm created IMPALA-6286:
--------------------------------------

             Summary: Wrong results with outer join and RUNTIME_FILTER_MODE=GLOBAL
                 Key: IMPALA-6286
                 URL: https://issues.apache.org/jira/browse/IMPALA-6286
             Project: IMPALA
          Issue Type: Bug
          Components: Frontend
    Affects Versions: Impala 2.10.0, Impala 2.9.0, Impala 2.8.0, Impala 2.7.0, Impala 2.6.0, Impala 2.5.0
            Reporter: Alexander Behm
            Priority: Blocker


Queries with the following characteristics may produce wrong results due to an incorrectly assigned runtime filter:
* The query option RUNTIME_FILTER_MODE is set to GLOBAL
* The query has an outer join
* A scan on the nullable side of that outer join has a runtime filter with a NULL-checking expression such as COALESCE/IFNULL/CASE
* The latter point imples that there is another join above the outer join with a NULL-checking expression in it's join condition

Reproduction:
{code}
select count(*) from functional.alltypestiny t1
left outer join functional.alltypestiny t2
  on t1.id = t2.id
where coalesce(t2.id + 10, 100) in (select 100)
+----------+
| count(*) |
+----------+
| 8        |
+----------+
{code}
We expect a count of 0. A count of 8 is incorrect. 

Query plan:
{code}
+---------------------------------------------------------------+
| Explain String                                                |
+---------------------------------------------------------------+
| Max Per-Host Resource Reservation: Memory=3.88MB              |
| Per-Host Resource Estimates: Memory=87.88MB                   |
| Codegen disabled by planner                                   |
|                                                               |
| PLAN-ROOT SINK                                                |
| |                                                             |
| 10:AGGREGATE [FINALIZE]                                       |
| |  output: count:merge(*)                                     |
| |                                                             |
| 09:EXCHANGE [UNPARTITIONED]                                   |
| |                                                             |
| 05:AGGREGATE                                                  |
| |  output: count(*)                                           |
| |                                                             |
| 04:HASH JOIN [LEFT SEMI JOIN, BROADCAST]                      |
| |  hash predicates: coalesce(t2.id + 10, 100) = `$a$1`.`$c$1` |
| |  runtime filters: RF000 <- `$a$1`.`$c$1`                    |
| |                                                             |
| |--08:EXCHANGE [BROADCAST]                                    |
| |  |                                                          |
| |  02:UNION                                                   |
| |     constant-operands=1                                     |
| |                                                             |
| 03:HASH JOIN [LEFT OUTER JOIN, PARTITIONED]                   |
| |  hash predicates: t1.id = t2.id                             |
| |                                                             |
| |--07:EXCHANGE [HASH(t2.id)]                                  |
| |  |                                                          |
| |  01:SCAN HDFS [functional.alltypestiny t2]                  |
| |     partitions=4/4 files=4 size=460B                        |
| |     runtime filters: RF000 -> coalesce(t2.id + 10, 100)  <--- This runtime filter is not correct   |
| |                                                             |
| 06:EXCHANGE [HASH(t1.id)]                                     |
| |                                                             |
| 00:SCAN HDFS [functional.alltypestiny t1]                     |
|    partitions=4/4 files=4 size=460B                           |
+---------------------------------------------------------------+
{code}

Explanation:
* RF000 filters out all rows in the scan
* In join 03 there are no join matches since the right-hand is empty. All rows from the right-hand side are nulled.
* The join condition in join 04 now satisfies all input rows because every "t2.id" is NULL, so after the COALESCE() the join condition becomes 100 = 100

*Workaround*
* Set RUNTIME_FILTER_MODE to LOCAL or OFF



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)