You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@impala.apache.org by "Thomas Tauber-Marshall (Code Review)" <ge...@cloudera.org> on 2016/09/01 19:44:55 UTC

[Impala-CR] IMPALA-3063: Separate join inversion from join ordering.

Hello Internal Jenkins, Alex Behm,

I'd like you to do a code review.  Please visit

    http://gerrit.cloudera.org:8080/4239

to review the following change.

Change subject: IMPALA-3063: Separate join inversion from join ordering.
......................................................................

IMPALA-3063: Separate join inversion from join ordering.

Before this change joins were inverted while doing join ordering.
That approach was unnecessarily complex because it required
modifying the global analysis state for correct conjunct
placement, etc. However, join inversion is independent of join
ordering, and the existing approach could lead to generating
invalid plans with distributed non-equi right outer/semi joins,
which we cannot execute in the backend.

After this change joins are inverted in a separate pass over
the single-node plan. This simplifies the inversion
logic and allows us to avoid generating those invalid plans.

Note that this change is not only a separation of functionality
for the following reasons:
1. Our join cardinality estimation is not symmetric, i.e., A JOIN B
may not give the same estimate as B JOIN A due to our FK/PK detection
heuristic. In the context of this patch this means that an inverted
join may have a different cardinality estimate, so plans may change
depending on whether the inversion is done during join ordering of after.
2. We currently only invert outer/semi/anti joins based on the rhs table
ref join op. In this patch I want to preserve the existing behavior as
much as possible, but when doing the join ordering in a separate pass we
may see a join opn in a JoinNode that is different from the rhs table ref.
So in some situations the inversion behavior based on the join op could be
different and there are some examples in this patch.

This patch also moves the logic of converting hash joins to
nested-loop joins into a separate pass over the single-node plan.

Change-Id: If86db7753fc585bb4c69612745ec0103278888a4
Reviewed-on: http://gerrit.cloudera.org:8080/3846
Reviewed-by: Alex Behm <al...@cloudera.com>
Tested-by: Internal Jenkins
(cherry picked from commit 532b1fe1186725b8e81fff93b59fc7cebf563c8b)
---
M be/src/exec/blocking-join-node.h
M fe/src/main/java/com/cloudera/impala/analysis/Analyzer.java
M fe/src/main/java/com/cloudera/impala/analysis/BinaryPredicate.java
M fe/src/main/java/com/cloudera/impala/analysis/SelectList.java
M fe/src/main/java/com/cloudera/impala/analysis/TableRef.java
M fe/src/main/java/com/cloudera/impala/planner/AggregationNode.java
M fe/src/main/java/com/cloudera/impala/planner/AnalyticEvalNode.java
M fe/src/main/java/com/cloudera/impala/planner/DistributedPlanner.java
M fe/src/main/java/com/cloudera/impala/planner/ExchangeNode.java
M fe/src/main/java/com/cloudera/impala/planner/HashJoinNode.java
M fe/src/main/java/com/cloudera/impala/planner/HdfsScanNode.java
M fe/src/main/java/com/cloudera/impala/planner/JoinNode.java
M fe/src/main/java/com/cloudera/impala/planner/NestedLoopJoinNode.java
M fe/src/main/java/com/cloudera/impala/planner/PlanNode.java
M fe/src/main/java/com/cloudera/impala/planner/Planner.java
M fe/src/main/java/com/cloudera/impala/planner/SelectNode.java
M fe/src/main/java/com/cloudera/impala/planner/SingleNodePlanner.java
M fe/src/main/java/com/cloudera/impala/planner/SingularRowSrcNode.java
M fe/src/main/java/com/cloudera/impala/planner/SortNode.java
M fe/src/main/java/com/cloudera/impala/planner/SubplanNode.java
M fe/src/main/java/com/cloudera/impala/planner/UnionNode.java
M fe/src/main/java/com/cloudera/impala/planner/UnnestNode.java
M testdata/workloads/functional-planner/queries/PlannerTest/implicit-joins.test
M testdata/workloads/functional-planner/queries/PlannerTest/join-order.test
M testdata/workloads/functional-planner/queries/PlannerTest/joins.test
M testdata/workloads/functional-planner/queries/PlannerTest/nested-collections.test
M testdata/workloads/functional-planner/queries/PlannerTest/nested-loop-join.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds-all.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpch-nested.test
29 files changed, 661 insertions(+), 363 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala refs/changes/39/4239/1
-- 
To view, visit http://gerrit.cloudera.org:8080/4239
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: If86db7753fc585bb4c69612745ec0103278888a4
Gerrit-PatchSet: 1
Gerrit-Project: Impala
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: anujphadke <ap...@cloudera.com>

[Impala-CR] IMPALA-3063: Separate join inversion from join ordering.

Posted by "Thomas Tauber-Marshall (Code Review)" <ge...@cloudera.org>.
Thomas Tauber-Marshall has abandoned this change.

Change subject: IMPALA-3063: Separate join inversion from join ordering.
......................................................................


Abandoned

-- 
To view, visit http://gerrit.cloudera.org:8080/4239
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: abandon
Gerrit-Change-Id: If86db7753fc585bb4c69612745ec0103278888a4
Gerrit-PatchSet: 1
Gerrit-Project: Impala
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: anujphadke <ap...@cloudera.com>