You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Quanlong Huang (JIRA)" <ji...@apache.org> on 2019/06/11 06:05:00 UTC
[jira] [Resolved] (IMPALA-8386) Incorrect predicate in a left outer
join query
[ https://issues.apache.org/jira/browse/IMPALA-8386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Quanlong Huang resolved IMPALA-8386.
------------------------------------
Resolution: Fixed
> Incorrect predicate in a left outer join query
> ----------------------------------------------
>
> Key: IMPALA-8386
> URL: https://issues.apache.org/jira/browse/IMPALA-8386
> Project: IMPALA
> Issue Type: Bug
> Components: Frontend
> Affects Versions: Impala 2.12.0, Impala 3.2.0
> Reporter: Quanlong Huang
> Assignee: Quanlong Huang
> Priority: Critical
> Labels: correctness
> Fix For: Impala 3.3.0
>
>
> skyyws <sk...@163.com> reported a bug [in the mailing list|https://lists.apache.org/thread.html/0bdbbaa6bb35b552f050ae30587b7d75b78a72ec60007a8bc0a4a8a9@%3Cdev.impala.apache.org%3E] on the following data set:
> {code:java}
> table A
> +------+
> | a_id |
> +------+
> | 1 |
> | 2 |
> +------+
> table B
> +------+--------+
> | b_id | amount |
> +------+--------+
> | 1 | 10 |
> | 1 | 20 |
> | 2 | NULL |
> +------+--------+
> table C
> +------+------+
> | a_id | b_id |
> +------+------+
> | 1 | 1 |
> | 2 | 2 |
> +------+------+{code}
> The following query returns a wrong result "1":
> {code:java}
> select count(1) from (
> select t2.a_id,t2.amount1,t2.amount2
> from( select a_id from a) t1
> left outer join (
> select c.a_id,sum(amount) as amount1,sum(amount) as amount2
> from b join c on b.b_id = c.b_id group by c.a_id) t2
> on t1.a_id = t2.a_id
> ) t;
> {code}
> Removing "t2.amount2" can get the right result "2":
> {code:java}
> select count(1) from (
> select t2.a_id,t2.amount1
> from( select a_id from a) t1
> left outer join (
> select c.a_id,sum(amount) as amount1,sum(amount) as amount2
> from b join c on b.b_id = c.b_id group by c.a_id) t2
> on t1.a_id = t2.a_id
> ) t;
> {code}
> The problem is that in query 1, Impala generates a wrong predicate "sum(amount) = sum(amount)" which rejects nulls:
> {code:java}
> +-------------------------------------------------------------+
> | Explain String |
> +-------------------------------------------------------------+
> | Max Per-Host Resource Reservation: Memory=10.84MB Threads=8 |
> | Per-Host Resource Estimates: Memory=143MB |
> | Codegen disabled by planner |
> | |
> | PLAN-ROOT SINK |
> | | |
> | 12:AGGREGATE [FINALIZE] |
> | | output: count:merge(*) |
> | | row-size=8B cardinality=1 |
> | | |
> | 11:EXCHANGE [UNPARTITIONED] |
> | | |
> | 06:AGGREGATE |
> | | output: count(*) |
> | | row-size=8B cardinality=1 |
> | | |
> | 05:HASH JOIN [RIGHT OUTER JOIN, PARTITIONED] |
> | | hash predicates: c.a_id = a_id |
> | | other predicates: sum(amount) = sum(amount) <---------- Wrong inferred predicate which incorrectly reject nulls
> | | runtime filters: RF000 <- a_id |
> | | row-size=16B cardinality=2 |
> | | |
> | |--10:EXCHANGE [HASH(a_id)] |
> | | | |
> | | 00:SCAN HDFS [default.a] |
> | | partitions=1/1 files=1 size=4B |
> | | row-size=4B cardinality=2 |
> | | |
> | 09:AGGREGATE [FINALIZE] |
> | | output: sum:merge(amount) |
> | | group by: c.a_id |
> | | row-size=12B cardinality=2 |
> | | |
> | 08:EXCHANGE [HASH(c.a_id)] |
> | | |
> | 04:AGGREGATE [STREAMING] |
> | | output: sum(amount) |
> | | group by: c.a_id |
> | | row-size=12B cardinality=2 |
> | | |
> | 03:HASH JOIN [INNER JOIN, BROADCAST] |
> | | hash predicates: b.b_id = c.b_id |
> | | runtime filters: RF002 <- c.b_id |
> | | row-size=16B cardinality=3 |
> | | |
> | |--07:EXCHANGE [BROADCAST] |
> | | | |
> | | 02:SCAN HDFS [default.c] |
> | | partitions=1/1 files=1 size=8B |
> | | runtime filters: RF000 -> default.c.a_id |
> | | row-size=8B cardinality=2 |
> | | |
> | 01:SCAN HDFS [default.b] |
> | partitions=1/1 files=1 size=15B |
> | runtime filters: RF002 -> b.b_id |
> | row-size=8B cardinality=3 |
> +-------------------------------------------------------------+
> {code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org