You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Yida Wu (Code Review)" <ge...@cloudera.org> on 2023/05/03 21:09:01 UTC

[Impala-ASF-CR] IMPALA-10861: Optimize the plan for identical predicates

Yida Wu has posted comments on this change. ( http://gerrit.cloudera.org:8080/19511 )

Change subject: IMPALA-10861: Optimize the plan for identical predicates
......................................................................


Patch Set 3:

(2 comments)

Sorry for late feedback. Looks good to me, just one or two questions.

http://gerrit.cloudera.org:8080/#/c/19511/3/fe/src/main/java/org/apache/impala/analysis/Expr.java
File fe/src/main/java/org/apache/impala/analysis/Expr.java:

http://gerrit.cloudera.org:8080/#/c/19511/3/fe/src/main/java/org/apache/impala/analysis/Expr.java@1265
PS3, Line 1265:     for (C expr: origList) {
The time complexity can be O(n^2) in the worst case, because every conjuncts would need to call removeDuplicates() if my understanding is correct, do you think it necessary to optimize it?


http://gerrit.cloudera.org:8080/#/c/19511/3/testdata/workloads/functional-planner/queries/PlannerTest/joins.test
File testdata/workloads/functional-planner/queries/PlannerTest/joins.test:

http://gerrit.cloudera.org:8080/#/c/19511/3/testdata/workloads/functional-planner/queries/PlannerTest/joins.test@3122
PS3, Line 3122: ON c.c_custkey = l.l_orderkey and c.c_custkey = l.l_orderkey
Tried below query, the "other predicates" should remove the duplicates, but seems not as expected.
Query: explain SELECT c_custkey
from tpch.customer c
left outer join tpch.lineitem l
ON c.c_custkey = l.l_orderkey
where l.l_discount > c.c_acctbal and c.c_acctbal < l.l_discount
+-----------------------------------------------------------------------------+
| Explain String                                                              |
+-----------------------------------------------------------------------------+
| Max Per-Host Resource Reservation: Memory=30.50MB Threads=6                 |
| Per-Host Resource Estimates: Memory=819MB                                   |
|                                                                             |
| PLAN-ROOT SINK                                                              |
| |                                                                           |
| 05:EXCHANGE [UNPARTITIONED]                                                 |
| |                                                                           |
| 02:HASH JOIN [RIGHT OUTER JOIN, PARTITIONED]                                |
| |  hash predicates: l.l_orderkey = c.c_custkey                              |
| |  other predicates: c.c_acctbal < l.l_discount, l.l_discount > c.c_acctbal |
| |  runtime filters: RF000 <- c.c_custkey                                    |
| |  row-size=32B cardinality=575.77K                                         |
| |                                                                           |
| |--04:EXCHANGE [HASH(c.c_custkey)]                                          |
| |  |                                                                        |
| |  00:SCAN HDFS [tpch.customer c]                                           |
| |     HDFS partitions=1/1 files=1 size=23.08MB                              |
| |     row-size=16B cardinality=150.00K                                      |
| |                                                                           |
| 03:EXCHANGE [HASH(l.l_orderkey)]                                            |
| |                                                                           |
| 01:SCAN HDFS [tpch.lineitem l]                                              |
|    HDFS partitions=1/1 files=1 size=718.94MB                                |
|    runtime filters: RF000 -> l.l_orderkey                                   |
|    row-size=16B cardinality=6.00M                                           |
+-----------------------------------------------------------------------------+



-- 
To view, visit http://gerrit.cloudera.org:8080/19511
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia249c8146215fad602e9310bf922c6bfa050b96b
Gerrit-Change-Number: 19511
Gerrit-PatchSet: 3
Gerrit-Owner: Baike Xia <xi...@163.com>
Gerrit-Reviewer: Baike Xia <xi...@163.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Yida Wu <wy...@gmail.com>
Gerrit-Comment-Date: Wed, 03 May 2023 21:09:01 +0000
Gerrit-HasComments: Yes