You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Artsiom Yudovin (Jira)" <ji...@apache.org> on 2019/09/18 11:31:00 UTC

[jira] [Created] (SPARK-29147) Spark sortMergeJoin is not changing to shuffleHashJoin

Artsiom Yudovin created SPARK-29147:
---------------------------------------

             Summary: Spark sortMergeJoin is not changing to shuffleHashJoin
                 Key: SPARK-29147
                 URL: https://issues.apache.org/jira/browse/SPARK-29147
             Project: Spark
          Issue Type: Question
          Components: Spark Core, SQL
    Affects Versions: 2.4.4, 2.4.3
            Reporter: Artsiom Yudovin


I run the following code:
{code:java}
val spark = SparkSession.builder()
      .appName("ShuffleHashJoin")
      .master("local[*]")
      .config("spark.sql.autoBroadcastJoinThreshold", 0)
      .config("spark.sql.join.preferSortMergeJoin", value = false)
      .getOrCreate()

    import spark.implicits._
    val dataset = Seq(
      ("1", "playing"),
      ("2", "with"),
      ("3", "ShuffledHashJoinExec")
    ).toDF("id", "token")

    val dataset1 = Seq(
      ("1", "playing"),
      ("2", "with"),
      ("3", "ShuffledHashJoinExec")
    ).toDF("id1", "token")
  
   dataset.join(dataset1, $"id" === $"id1", "inner").foreach(t => println(t))
{code}
My expectation that Spark will use 'shuffleHashJoin' but I see in SparkUI and explain() that Spark uses 'sortMergeJoin'



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org