You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Quanlong Huang (JIRA)" <ji...@apache.org> on 2019/06/11 06:05:00 UTC

[jira] [Resolved] (IMPALA-8386) Incorrect predicate in a left outer join query

     [ https://issues.apache.org/jira/browse/IMPALA-8386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Quanlong Huang resolved IMPALA-8386.
------------------------------------
    Resolution: Fixed

> Incorrect predicate in a left outer join query
> ----------------------------------------------
>
>                 Key: IMPALA-8386
>                 URL: https://issues.apache.org/jira/browse/IMPALA-8386
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Frontend
>    Affects Versions: Impala 2.12.0, Impala 3.2.0
>            Reporter: Quanlong Huang
>            Assignee: Quanlong Huang
>            Priority: Critical
>              Labels: correctness
>             Fix For: Impala 3.3.0
>
>
> skyyws <sk...@163.com> reported a bug [in the mailing list|https://lists.apache.org/thread.html/0bdbbaa6bb35b552f050ae30587b7d75b78a72ec60007a8bc0a4a8a9@%3Cdev.impala.apache.org%3E] on the following data set:
> {code:java}
> table A
> +------+
> | a_id |
> +------+
> | 1    |
> | 2    |
> +------+
> table B
> +------+--------+
> | b_id | amount |
> +------+--------+
> | 1    | 10     |
> | 1    | 20     |
> | 2    | NULL   |
> +------+--------+
> table C
> +------+------+
> | a_id | b_id |
> +------+------+
> | 1    | 1    |
> | 2    | 2    |
> +------+------+{code}
> The following query returns a wrong result "1":
> {code:java}
> select count(1) from (
>     select t2.a_id,t2.amount1,t2.amount2
>     from( select a_id from a) t1
>     left outer join (
>         select c.a_id,sum(amount) as amount1,sum(amount) as amount2
>         from b join c  on b.b_id = c.b_id group by c.a_id) t2
>     on t1.a_id = t2.a_id
> ) t;
> {code}
> Removing "t2.amount2" can get the right result "2":
> {code:java}
> select count(1) from (
>     select t2.a_id,t2.amount1
>     from( select a_id from a) t1
>     left outer join (
>         select c.a_id,sum(amount) as amount1,sum(amount) as amount2
>         from b join c  on b.b_id = c.b_id group by c.a_id) t2
>     on t1.a_id = t2.a_id
> ) t;
> {code}
> The problem is that in query 1, Impala generates a wrong predicate "sum(amount) = sum(amount)" which rejects nulls:
> {code:java}
> +-------------------------------------------------------------+
> | Explain String                                              |
> +-------------------------------------------------------------+
> | Max Per-Host Resource Reservation: Memory=10.84MB Threads=8 |
> | Per-Host Resource Estimates: Memory=143MB                   |
> | Codegen disabled by planner                                 |
> |                                                             |
> | PLAN-ROOT SINK                                              |
> | |                                                           |
> | 12:AGGREGATE [FINALIZE]                                     |
> | |  output: count:merge(*)                                   |
> | |  row-size=8B cardinality=1                                |
> | |                                                           |
> | 11:EXCHANGE [UNPARTITIONED]                                 |
> | |                                                           |
> | 06:AGGREGATE                                                |
> | |  output: count(*)                                         |
> | |  row-size=8B cardinality=1                                |
> | |                                                           |
> | 05:HASH JOIN [RIGHT OUTER JOIN, PARTITIONED]                |
> | |  hash predicates: c.a_id = a_id                           |
> | |  other predicates: sum(amount) = sum(amount)      <---------- Wrong inferred predicate which incorrectly reject nulls
> | |  runtime filters: RF000 <- a_id                           |
> | |  row-size=16B cardinality=2                               |
> | |                                                           |
> | |--10:EXCHANGE [HASH(a_id)]                                 |
> | |  |                                                        |
> | |  00:SCAN HDFS [default.a]                                 |
> | |     partitions=1/1 files=1 size=4B                        |
> | |     row-size=4B cardinality=2                             |
> | |                                                           |
> | 09:AGGREGATE [FINALIZE]                                     |
> | |  output: sum:merge(amount)                                |
> | |  group by: c.a_id                                         |
> | |  row-size=12B cardinality=2                               |
> | |                                                           |
> | 08:EXCHANGE [HASH(c.a_id)]                                  |
> | |                                                           |
> | 04:AGGREGATE [STREAMING]                                    |
> | |  output: sum(amount)                                      |
> | |  group by: c.a_id                                         |
> | |  row-size=12B cardinality=2                               |
> | |                                                           |
> | 03:HASH JOIN [INNER JOIN, BROADCAST]                        |
> | |  hash predicates: b.b_id = c.b_id                         |
> | |  runtime filters: RF002 <- c.b_id                         |
> | |  row-size=16B cardinality=3                               |
> | |                                                           |
> | |--07:EXCHANGE [BROADCAST]                                  |
> | |  |                                                        |
> | |  02:SCAN HDFS [default.c]                                 |
> | |     partitions=1/1 files=1 size=8B                        |
> | |     runtime filters: RF000 -> default.c.a_id              |
> | |     row-size=8B cardinality=2                             |
> | |                                                           |
> | 01:SCAN HDFS [default.b]                                    |
> |    partitions=1/1 files=1 size=15B                          |
> |    runtime filters: RF002 -> b.b_id                         |
> |    row-size=8B cardinality=3                                |
> +-------------------------------------------------------------+
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org