You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Kousuke Saruta (Jira)" <ji...@apache.org> on 2020/09/08 09:26:00 UTC
[jira] [Created] (SPARK-32820) Remove redundant shuffle exchanges
inserted by EnsureRequirements
Kousuke Saruta created SPARK-32820:
--------------------------------------
Summary: Remove redundant shuffle exchanges inserted by EnsureRequirements
Key: SPARK-32820
URL: https://issues.apache.org/jira/browse/SPARK-32820
Project: Spark
Issue Type: Improvement
Components: SQL
Affects Versions: 3.1.0
Reporter: Kousuke Saruta
Assignee: Kousuke Saruta
Redundant repartition operations are removed by CollapseRepartition rule but EnsureRequirements can insert another HashPartitioning or RangePartitioning immediately after the repartition, leading adjacent ShuffleExchanges will be in the physical plan.
{code:java}
val ordered = spark.range(1, 100).repartitionByRange(10, $"id".desc).orderBy($"id")
ordered.explain(true)
...
== Physical Plan ==
*(2) Sort [id#0L ASC NULLS FIRST], true, 0
+- Exchange rangepartitioning(id#0L ASC NULLS FIRST, 200), true, [id=#15]
+- Exchange rangepartitioning(id#0L DESC NULLS LAST, 10), false, [id=#14]
+- *(1) Range (1, 100, step=1, splits=12){code}
{code:java}
spark.conf.set("spark.sql.autoBroadcastJoinThreshold", 0)
val left = Seq(1,2,3).toDF.repartition(10, $"value")
val right = Seq(1,2,3).toDF
val joined = left.join(right, left("value") + 1 === right("value")
joined.explain(true)
...
== Physical Plan ==
*(3) SortMergeJoin [(value#7 + 1)], [value#12], Inner
:- *(1) Sort [(value#7 + 1) ASC NULLS FIRST], false, 0
: +- Exchange hashpartitioning((value#7 + 1), 200), true, [id=#67]
: +- Exchange hashpartitioning(value#7, 10), false, [id=#63]
: +- LocalTableScan [value#7]
+- *(2) Sort [value#12 ASC NULLS FIRST], false, 0
+- Exchange hashpartitioning(value#12, 200), true, [id=#68]
+- LocalTableScan [value#12]{code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org