You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Aman Sinha (Code Review)" <ge...@cloudera.org> on 2021/02/20 23:51:30 UTC

[Impala-ASF-CR] IMPALA-9356: fix the predicate with empty tuple id invalid pushed down

Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/16915 )

Change subject: IMPALA-9356: fix the predicate with empty tuple id invalid pushed down
......................................................................


Patch Set 2:

(7 comments)

Thanks for the patch ! I have been meaning to review it sooner. I left some comments about the plan correctness in the JIRA. It might be easier to discuss it there.

http://gerrit.cloudera.org:8080/#/c/16915/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/16915/2//COMMIT_MSG@7
PS2, Line 7: IMPALA-9356: fix the predicate with empty tuple id invalid pushed down
The 'empty tuple id' is an internal detail of the implementation.  Since this issue can occur for any non-deterministic predicate such as rand(),  can we call it 'Fix the predicate pushdown for non-deterministic predicates for outer joins'


http://gerrit.cloudera.org:8080/#/c/16915/2/fe/src/main/java/org/apache/impala/analysis/Analyzer.java
File fe/src/main/java/org/apache/impala/analysis/Analyzer.java:

http://gerrit.cloudera.org:8080/#/c/16915/2/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@2655
PS2, Line 2655: unssigned
typo: unssigned-> unassigned


http://gerrit.cloudera.org:8080/#/c/16915/2/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@2655
PS2, Line 2655: can reassigned
nit: 'can be'


http://gerrit.cloudera.org:8080/#/c/16915/2/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@2655
PS2, Line 2655: node
nit: 'nodes'


http://gerrit.cloudera.org:8080/#/c/16915/2/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
File fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java:

http://gerrit.cloudera.org:8080/#/c/16915/2/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java@1767
PS2, Line 1767:     if (analyzer.isOuterJoined(tid)) {
Could you move this check into a separate Analyzer method ?


http://gerrit.cloudera.org:8080/#/c/16915/2/testdata/workloads/functional-planner/queries/PlannerTest/predicate-propagation.test
File testdata/workloads/functional-planner/queries/PlannerTest/predicate-propagation.test:

http://gerrit.cloudera.org:8080/#/c/16915/2/testdata/workloads/functional-planner/queries/PlannerTest/predicate-propagation.test@1608
PS2, Line 1608: |     predicates: rand() = 2
I think for non-deterministic functions such as rand(), it would be wrong to push to both the scans in case of the Full OJ.  I think it should only be evaluated by the Hash Join above.  I left some comments in the JIRA instead of re-typing it here.


http://gerrit.cloudera.org:8080/#/c/16915/2/testdata/workloads/functional-planner/queries/PlannerTest/predicate-propagation.test@1744
PS2, Line 1744: |     predicates: rand() = 2
For ROJ or even for LOJ, for reasons mentioned above,  it would be wrong to push to both sides of the join.  Either we push only to 1 side (the non-null producing side) or keep it at the outer join node as other predicate.  Specifically for this query, since ROJ will produce non-null rows from the right child, we should only push to the right child which is the 00:SCAN HDFS node.



-- 
To view, visit http://gerrit.cloudera.org:8080/16915
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I50ae71ef31ff138adeeb57008a01eb02e7bb976f
Gerrit-Change-Number: 16915
Gerrit-PatchSet: 2
Gerrit-Owner: Xianqing He <he...@126.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Sat, 20 Feb 2021 23:51:30 +0000
Gerrit-HasComments: Yes