You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Apache Spark (Jira)" <ji...@apache.org> on 2023/01/11 12:32:00 UTC
[jira] [Commented] (SPARK-41986) Introduce shuffle on SinglePartition

    [ https://issues.apache.org/jira/browse/SPARK-41986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17675546#comment-17675546 ] 

Apache Spark commented on SPARK-41986:
--------------------------------------

User 'wangyum' has created a pull request for this issue:
https://github.com/apache/spark/pull/39512

> Introduce shuffle on SinglePartition
> ------------------------------------
>
>                 Key: SPARK-41986
>                 URL: https://issues.apache.org/jira/browse/SPARK-41986
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 3.4.0
>            Reporter: Yuming Wang
>            Priority: Major
>
> {code:scala}
> spark.range(100000000L).selectExpr("id as a", "id as b").write.saveAsTable("t1")
> sql(
>   """
>     |WITH base
>     |     AS (select *, ROW_NUMBER() OVER(ORDER BY a) AS new_a FROM t1)
>     |SELECT * FROM base t1 JOIN base t2 ON t1.a = t2.b
>     |""".stripMargin).explain()
> {code}
> The output:
> {noformat}
> == Physical Plan ==
> AdaptiveSparkPlan isFinalPlan=false
> +- SortMergeJoin [a#10L], [b#26L], Inner
>    :- Filter isnotnull(a#10L)
>    :  +- Window [row_number() windowspecdefinition(a#10L ASC NULLS FIRST, specifiedwindowframe(RowFrame, unboundedpreceding$(), currentrow$())) AS new_a#8], [a#10L ASC NULLS FIRST]
>    :     +- Sort [a#10L ASC NULLS FIRST], false, 0
>    :        +- Exchange SinglePartition, ENSURE_REQUIREMENTS, [plan_id=50]
>    :           +- FileScan parquet spark_catalog.default.t1[a#10L,b#11L] Batched: true, DataFilters: [], Format: Parquet, Location: InMemoryFileIndex(1 paths)[file:/Users/yumwang/opensource/spark/spark-warehouse/org.apache.spark...., PartitionFilters: [], PushedFilters: [], ReadSchema: struct<a:bigint,b:bigint>
>    +- Sort [b#26L ASC NULLS FIRST], false, 0
>       +- Filter isnotnull(b#26L)
>          +- Window [row_number() windowspecdefinition(a#25L ASC NULLS FIRST, specifiedwindowframe(RowFrame, unboundedpreceding$(), currentrow$())) AS new_a#27], [a#25L ASC NULLS FIRST]
>             +- Sort [a#25L ASC NULLS FIRST], false, 0
>                +- Exchange SinglePartition, ENSURE_REQUIREMENTS, [plan_id=54]
>                   +- FileScan parquet spark_catalog.default.t1[a#25L,b#26L] Batched: true, DataFilters: [], Format: Parquet, Location: InMemoryFileIndex(1 paths)[file:/Users/yumwang/opensource/spark/spark-warehouse/org.apache.spark...., PartitionFilters: [], PushedFilters: [], ReadSchema: struct<a:bigint,b:bigint>
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org