You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Artsiom Yudovin (Jira)" <ji...@apache.org> on 2019/09/18 11:31:00 UTC
[jira] [Created] (SPARK-29147) Spark sortMergeJoin is not changing
to shuffleHashJoin
Artsiom Yudovin created SPARK-29147:
---------------------------------------
Summary: Spark sortMergeJoin is not changing to shuffleHashJoin
Key: SPARK-29147
URL: https://issues.apache.org/jira/browse/SPARK-29147
Project: Spark
Issue Type: Question
Components: Spark Core, SQL
Affects Versions: 2.4.4, 2.4.3
Reporter: Artsiom Yudovin
I run the following code:
{code:java}
val spark = SparkSession.builder()
.appName("ShuffleHashJoin")
.master("local[*]")
.config("spark.sql.autoBroadcastJoinThreshold", 0)
.config("spark.sql.join.preferSortMergeJoin", value = false)
.getOrCreate()
import spark.implicits._
val dataset = Seq(
("1", "playing"),
("2", "with"),
("3", "ShuffledHashJoinExec")
).toDF("id", "token")
val dataset1 = Seq(
("1", "playing"),
("2", "with"),
("3", "ShuffledHashJoinExec")
).toDF("id1", "token")
dataset.join(dataset1, $"id" === $"id1", "inner").foreach(t => println(t))
{code}
My expectation that Spark will use 'shuffleHashJoin' but I see in SparkUI and explain() that Spark uses 'sortMergeJoin'
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org