You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@spark.apache.org by sr...@apache.org on 2020/04/06 14:03:56 UTC

[spark] branch branch-3.0 updated: [SPARK-31333][SQL][DOCS] Document Join Hints

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
     new 95b3e01  [SPARK-31333][SQL][DOCS] Document Join Hints
95b3e01 is described below

commit 95b3e01bc4a650019c66bc0c286ac984b5c30524
Author: Huaxin Gao <hu...@us.ibm.com>
AuthorDate: Mon Apr 6 09:02:22 2020 -0500

    [SPARK-31333][SQL][DOCS] Document Join Hints
    
    ### What changes were proposed in this pull request?
    Document Join Hints
    
    ### Why are the changes needed?
    To make SQL Reference complete
    
    ### Does this PR introduce any user-facing change?
    Yes
    
    <img width="1049" alt="Screen Shot 2020-04-03 at 9 20 15 AM" src="https://user-images.githubusercontent.com/13592258/78382976-7c546b80-758c-11ea-9a8e-e46cfb7106f5.png">
    
    <img width="1051" alt="Screen Shot 2020-04-03 at 10 39 55 AM" src="https://user-images.githubusercontent.com/13592258/78389778-356c7300-7598-11ea-8e6c-3742dadda11c.png">
    
    ### How was this patch tested?
    Manually build and check
    
    Closes #28113 from huaxingao/join-hints.
    
    Authored-by: Huaxin Gao <hu...@us.ibm.com>
    Signed-off-by: Sean Owen <sr...@gmail.com>
    (cherry picked from commit 44d37efba2606217d131038ad81b61fedb16cb07)
    Signed-off-by: Sean Owen <sr...@gmail.com>
---
 docs/_data/menu-sql.yaml                |  2 +
 docs/sql-ref-syntax-qry-select-hints.md | 71 ++++++++++++++++++++++++++++++---
 2 files changed, 68 insertions(+), 5 deletions(-)

diff --git a/docs/_data/menu-sql.yaml b/docs/_data/menu-sql.yaml
index 500895a..787ee3e 100644
--- a/docs/_data/menu-sql.yaml
+++ b/docs/_data/menu-sql.yaml
@@ -154,6 +154,8 @@
                   url: sql-ref-syntax-qry-select-distribute-by.html
                 - text: LIMIT Clause 
                   url: sql-ref-syntax-qry-select-limit.html
+                - text: Join Hints
+                  url: sql-ref-syntax-qry-select-hints.html
             - text: EXPLAIN
               url: sql-ref-syntax-qry-explain.html
         - text: Auxiliary Statements
diff --git a/docs/sql-ref-syntax-qry-select-hints.md b/docs/sql-ref-syntax-qry-select-hints.md
index eb9655f..4713ea5 100644
--- a/docs/sql-ref-syntax-qry-select-hints.md
+++ b/docs/sql-ref-syntax-qry-select-hints.md
@@ -1,7 +1,7 @@
 ---
 layout: global
-title: Query Hints
-displayTitle: Query Hints
+title: Join Hints
+displayTitle: Join Hints
 license: |
   Licensed to the Apache Software Foundation (ASF) under one or more
   contributor license agreements.  See the NOTICE file distributed with
@@ -9,14 +9,75 @@ license: |
   The ASF licenses this file to You under the Apache License, Version 2.0
   (the "License"); you may not use this file except in compliance with
   the License.  You may obtain a copy of the License at
- 
+
      http://www.apache.org/licenses/LICENSE-2.0
- 
+
   Unless required by applicable law or agreed to in writing, software
   distributed under the License is distributed on an "AS IS" BASIS,
   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
   See the License for the specific language governing permissions and
   limitations under the License.
 ---
+### Description
+
+Join Hints allow users to suggest the join strategy that Spark should use. Prior to Spark 3.0, only the `BROADCAST` Join Hint was supported. `MERGE`, `SHUFFLE_HASH` and `SHUFFLE_REPLICATE_NL` Joint Hints support was added in 3.0. When different join strategy hints are specified on both sides of a join, Spark prioritizes hints in the following order: `BROADCAST` over `MERGE` over `SHUFFLE_HASH` over `SHUFFLE_REPLICATE_NL`. When both sides are specified with the `BROADCAST` hint or the `SH [...]
+
+### Join Hints Types
+
+<dl>
+  <dt><code><em>BROADCAST</em></code></dt>
+  <dd>
+    Suggests that Spark use broadcast join. The join side with the hint will be broadcast regardless of <code>autoBroadcastJoinThreshold</code>. If both sides of the join have the broadcast hints, the one with the smaller size (based on stats) will be broadcast. The aliases for <code>BROADCAST</code> are <code>BROADCASTJOIN</code> and <code>MAPJOIN</code>.
+  </dd>
+</dl>
+
+<dl>
+  <dt><code><em>MERGE</em></code></dt>
+  <dd>
+     Suggests that Spark use shuffle sort merge join. The aliases for <code>MERGE</code> are <code>SHUFFLE_MERGE</code> and <code>MERGEJOIN</code>.
+  </dd>
+</dl>
+
+<dl>
+  <dt><code><em>SHUFFLE_HASH</em></code></dt>
+  <dd>
+     Suggests that Spark use shuffle hash join. If both sides have the shuffle hash hints, Spark chooses the smaller side (based on stats) as the build side.
+  </dd>
+</dl>
+
+<dl>
+  <dt><code><em>SHUFFLE_REPLICATE_NL</em></code></dt>
+  <dd>
+    Suggests that Spark use shuffle-and-replicate nested loop join.
+  </dd>
+</dl>
+
+### Examples
+
+{% highlight sql %}
+
+-- Join Hints for broadcast join
+SELECT /*+ BROADCAST(t1) */ * FROM t1 INNER JOIN t2 ON t1.key = t2.key;
+SELECT /*+ BROADCASTJOIN (t1) */ * FROM t1 left JOIN t2 ON t1.key = t2.key;
+SELECT /*+ MAPJOIN(t2) */ * FROM t1 right JOIN t2 ON t1.key = t2.key;
+
+-- Join Hints for shuffle sort merge join
+SELECT /*+ SHUFFLE_MERGE(t1) */ * FROM t1 INNER JOIN t2 ON t1.key = t2.key;
+SELECT /*+ MERGEJOIN(t2) */ * FROM t1 INNER JOIN t2 ON t1.key = t2.key;
+SELECT /*+ MERGE(t1) */ * FROM t1 INNER JOIN t2 ON t1.key = t2.key;
+
+-- Join Hints for shuffle hash join
+SELECT /*+ SHUFFLE_HASH(t1) */ * FROM t1 INNER JOIN t2 ON t1.key = t2.key;
+
+-- Join Hints for shuffle-and-replicate nested loop join
+SELECT /*+ SHUFFLE_REPLICATE_NL(t1) */ * FROM t1 INNER JOIN t2 ON t1.key = t2.key;
+
+-- When different join strategy hints are specified on both sides of a join, Spark
+-- prioritizes the BROADCAST hint over the MERGE hint over the SHUFFLE_HASH hint
+-- over the SHUFFLE_REPLICATE_NL hint.
+-- Spark will issue Warning in the following example
+-- org.apache.spark.sql.catalyst.analysis.HintErrorLogger: Hint (strategy=merge)
+-- is overridden by another hint and will not take effect.
+SELECT /*+ BROADCAST(t1) */ /*+ MERGE(t1, t2) */ * FROM t1 INNER JOIN t2 ON t1.key = t2.key;
 
-**This page is under construction**
+{% endhighlight %}


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org