You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Shant Hovsepian (Code Review)" <ge...@cloudera.org> on 2020/12/07 19:48:11 UTC

[Impala-ASF-CR] IMPALA-10252: fix invalid runtime filters for outer joins

Shant Hovsepian has posted comments on this change. ( http://gerrit.cloudera.org:8080/16622 )

Change subject: IMPALA-10252: fix invalid runtime filters for outer joins
......................................................................


Patch Set 6: Code-Review+1

> Patch Set 5:
> 
> Updated the commit message as requested.
> 
> Shant, I think hasNullRejectingConjucts (sp) in Analyzer.java handles at least this case correctly - it does call isTrueWithNullSlots() on the expression. I guess it's possible that it might handle more complex expressions incorrectly, e.g. if the expression has slots from both sides of the join and is false when all slots are null but true if a subset of slots is null.
> 
> 
> 
>   [localhost.EXAMPLE.COM:21050] default> set ENABLE_OUTER_JOIN_TO_INNER_TRANSFORMATION=1;
>   ENABLE_OUTER_JOIN_TO_INNER_TRANSFORMATION set to 1
>   [localhost.EXAMPLE.COM:21050] default> explain select * from functional.alltypes t1 left outer join functional.alltypestiny t2 on  t1.id = t2.id where zeroifnull(t2.int_col) = 0;
>   Query: explain select * from functional.alltypes t1 left outer join functional.alltypestiny t2 on  t1.id = t2.id where zeroifnull(t2.int_col) = 0
>   +------------------------------------------------------------+
>   | Explain String                                             |
>   +------------------------------------------------------------+
>   | Max Per-Host Resource Reservation: Memory=1.98MB Threads=5 |
>   | Per-Host Resource Estimates: Memory=163MB                  |
>   | Codegen disabled by planner                                |
>   |                                                            |
>   | PLAN-ROOT SINK                                             |
>   | |                                                          |
>   | 04:EXCHANGE [UNPARTITIONED]                                |
>   | |                                                          |
>   | 02:HASH JOIN [LEFT OUTER JOIN, BROADCAST]                  |
>   | |  hash predicates: t1.id = t2.id                          |
>   | |  other predicates: zeroifnull(t2.int_col) = 0            |
>   | |  row-size=178B cardinality=7.30K                         |
>   | |                                                          |
>   | |--03:EXCHANGE [BROADCAST]                                 |
>   | |  |                                                       |
>   | |  01:SCAN HDFS [functional.alltypestiny t2]               |
>   | |     HDFS partitions=4/4 files=4 size=460B                |
>   | |     row-size=89B cardinality=8                           |
>   | |                                                          |
>   | 00:SCAN HDFS [functional.alltypes t1]                      |
>   |    HDFS partitions=24/24 files=24 size=478.45KB            |
>   |    row-size=89B cardinality=7.30K                          |
>   +------------------------------------------------------------+
>   Fetched 22 row(s) in 0.05s
>   [localhost.EXAMPLE.COM:21050] default> explain select * from functional.alltypes t1 left outer join functional.alltypestiny t2 on  t1.id = t2.id where t2.int_col = 0;
>   Query: explain select * from functional.alltypes t1 left outer join functional.alltypestiny t2 on  t1.id = t2.id where t2.int_col = 0
>   +------------------------------------------------------------+
>   | Explain String                                             |
>   +------------------------------------------------------------+
>   | Max Per-Host Resource Reservation: Memory=2.98MB Threads=5 |
>   | Per-Host Resource Estimates: Memory=163MB                  |
>   | Codegen disabled by planner                                |
>   |                                                            |
>   | PLAN-ROOT SINK                                             |
>   | |                                                          |
>   | 04:EXCHANGE [UNPARTITIONED]                                |
>   | |                                                          |
>   | 02:HASH JOIN [INNER JOIN, BROADCAST]                       |
>   | |  hash predicates: t1.id = t2.id                          |
>   | |  runtime filters: RF000 <- t2.id                         |
>   | |  row-size=178B cardinality=4                             |
>   | |                                                          |
>   | |--03:EXCHANGE [BROADCAST]                                 |
>   | |  |                                                       |
>   | |  01:SCAN HDFS [functional.alltypestiny t2]               |
>   | |     HDFS partitions=4/4 files=4 size=460B                |
>   | |     predicates: t2.int_col = 0                           |
>   | |     row-size=89B cardinality=4                           |
>   | |                                                          |
>   | 00:SCAN HDFS [functional.alltypes t1]                      |
>   |    HDFS partitions=24/24 files=24 size=478.45KB            |
>   |    runtime filters: RF000 -> t1.id                         |
>   |    row-size=89B cardinality=7.30K                          |
>   +------------------------------------------------------------+
>   Fetched 24 row(s) in 0.05s

Yeah exactly I was thinking something like:

t1 join t2 join t3 join t3 with a coalesce(t1, t2, t3, t4) I'll try to mock up an example, but it isn't relevant for this fix.

Nice job!


-- 
To view, visit http://gerrit.cloudera.org:8080/16622
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I507af1cc8df15bca21e0d8555019997812087261
Gerrit-Change-Number: 16622
Gerrit-PatchSet: 6
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Shant Hovsepian <sh...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Mon, 07 Dec 2020 19:48:11 +0000
Gerrit-HasComments: No