You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Yuming Wang (Jira)" <ji...@apache.org> on 2023/01/11 11:57:00 UTC
[jira] [Created] (SPARK-41986) Introduce shuffle on SinglePartition

Yuming Wang created SPARK-41986:
-----------------------------------

             Summary: Introduce shuffle on SinglePartition
                 Key: SPARK-41986
                 URL: https://issues.apache.org/jira/browse/SPARK-41986
             Project: Spark
          Issue Type: Improvement
          Components: SQL
    Affects Versions: 3.4.0
            Reporter: Yuming Wang


{code:scala}
spark.range(100000000L).selectExpr("id as a", "id as b").write.saveAsTable("t1")

sql(
  """
    |WITH base
    |     AS (select *, ROW_NUMBER() OVER(ORDER BY a) AS new_a FROM t1)
    |SELECT * FROM base t1 JOIN base t2 ON t1.a = t2.b
    |""".stripMargin).explain()
{code}

The output:
{noformat}
== Physical Plan ==
AdaptiveSparkPlan isFinalPlan=false
+- SortMergeJoin [a#10L], [b#26L], Inner
   :- Filter isnotnull(a#10L)
   :  +- Window [row_number() windowspecdefinition(a#10L ASC NULLS FIRST, specifiedwindowframe(RowFrame, unboundedpreceding$(), currentrow$())) AS new_a#8], [a#10L ASC NULLS FIRST]
   :     +- Sort [a#10L ASC NULLS FIRST], false, 0
   :        +- Exchange SinglePartition, ENSURE_REQUIREMENTS, [plan_id=50]
   :           +- FileScan parquet spark_catalog.default.t1[a#10L,b#11L] Batched: true, DataFilters: [], Format: Parquet, Location: InMemoryFileIndex(1 paths)[file:/Users/yumwang/opensource/spark/spark-warehouse/org.apache.spark...., PartitionFilters: [], PushedFilters: [], ReadSchema: struct<a:bigint,b:bigint>
   +- Sort [b#26L ASC NULLS FIRST], false, 0
      +- Filter isnotnull(b#26L)
         +- Window [row_number() windowspecdefinition(a#25L ASC NULLS FIRST, specifiedwindowframe(RowFrame, unboundedpreceding$(), currentrow$())) AS new_a#27], [a#25L ASC NULLS FIRST]
            +- Sort [a#25L ASC NULLS FIRST], false, 0
               +- Exchange SinglePartition, ENSURE_REQUIREMENTS, [plan_id=54]
                  +- FileScan parquet spark_catalog.default.t1[a#25L,b#26L] Batched: true, DataFilters: [], Format: Parquet, Location: InMemoryFileIndex(1 paths)[file:/Users/yumwang/opensource/spark/spark-warehouse/org.apache.spark...., PartitionFilters: [], PushedFilters: [], ReadSchema: struct<a:bigint,b:bigint>

{noformat}




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org