You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@impala.apache.org by ta...@apache.org on 2017/05/08 21:46:14 UTC

[5/7] incubator-impala git commit: IMPALA-5120: Default to partitioned join when stats are missing

IMPALA-5120: Default to partitioned join when stats are missing

Previously, we defaulted to broadcast join when stats were
missing, but this can lead to disastrous plans when the
right hand side is actually large.

Its always difficult to make good plans when stats are missing,
but defaulting to partitioned joins should reduce the risk of
disastrous plans.

Testing:
- Added a planner test that joins a table with no stats.

Change-Id: Ie168ecfcd5e7c5d3c60d16926c151f8f134c81e0
Reviewed-on: http://gerrit.cloudera.org:8080/6803
Reviewed-by: Thomas Tauber-Marshall <tm...@cloudera.com>
Tested-by: Impala Public Jenkins


Project: http://git-wip-us.apache.org/repos/asf/incubator-impala/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-impala/commit/aca07ee8
Tree: http://git-wip-us.apache.org/repos/asf/incubator-impala/tree/aca07ee8
Diff: http://git-wip-us.apache.org/repos/asf/incubator-impala/diff/aca07ee8

Branch: refs/heads/master
Commit: aca07ee8160bbea0812dc4ba3c08dff818240d22
Parents: 374f112
Author: Thomas Tauber-Marshall <tm...@cloudera.com>
Authored: Thu May 4 13:51:08 2017 -0700
Committer: Impala Public Jenkins <im...@gerrit.cloudera.org>
Committed: Mon May 8 19:05:11 2017 +0000

----------------------------------------------------------------------
 .../impala/planner/DistributedPlanner.java      |  3 ++-
 .../queries/PlannerTest/joins.test              | 22 ++++++++++++++++++++
 2 files changed, 24 insertions(+), 1 deletion(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/aca07ee8/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
----------------------------------------------------------------------
diff --git a/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java b/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
index e0e325c..83c8ccb 100644
--- a/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
+++ b/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
@@ -458,7 +458,8 @@ public class DistributedPlanner {
     // repartition: both left- and rightChildFragment are partitioned on the
     // join exprs, and a hash table is built with the rightChildFragment's output.
     PlanNode lhsTree = leftChildFragment.getPlanRoot();
-    long partitionCost = Long.MAX_VALUE;
+    // Subtract 1 here so that if stats are missing we default to partitioned.
+    long partitionCost = Long.MAX_VALUE - 1;
     List<Expr> lhsJoinExprs = Lists.newArrayList();
     List<Expr> rhsJoinExprs = Lists.newArrayList();
     for (Expr joinConjunct: node.getEqJoinConjuncts()) {

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/aca07ee8/testdata/workloads/functional-planner/queries/PlannerTest/joins.test
----------------------------------------------------------------------
diff --git a/testdata/workloads/functional-planner/queries/PlannerTest/joins.test b/testdata/workloads/functional-planner/queries/PlannerTest/joins.test
index 0fdb19d..26b8c64 100644
--- a/testdata/workloads/functional-planner/queries/PlannerTest/joins.test
+++ b/testdata/workloads/functional-planner/queries/PlannerTest/joins.test
@@ -2519,3 +2519,25 @@ PLAN-ROOT SINK
 00:SCAN HDFS [tpch.customer a]
    partitions=1/1 files=1 size=23.08MB
 ====
+# If stats aren't available, default to partitioned join.
+select * from functional.tinytable x, functional.tinytable y where x.a = y.a
+---- DISTRIBUTEDPLAN
+PLAN-ROOT SINK
+|
+05:EXCHANGE [UNPARTITIONED]
+|
+02:HASH JOIN [INNER JOIN, PARTITIONED]
+|  hash predicates: x.a = y.a
+|  runtime filters: RF000 <- y.a
+|
+|--04:EXCHANGE [HASH(y.a)]
+|  |
+|  01:SCAN HDFS [functional.tinytable y]
+|     partitions=1/1 files=1 size=38B
+|
+03:EXCHANGE [HASH(x.a)]
+|
+00:SCAN HDFS [functional.tinytable x]
+   partitions=1/1 files=1 size=38B
+   runtime filters: RF000 -> x.a
+====