You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Apache Spark (Jira)" <ji...@apache.org> on 2023/01/11 12:32:00 UTC
[jira] [Commented] (SPARK-41986) Introduce shuffle on SinglePartition
[ https://issues.apache.org/jira/browse/SPARK-41986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17675546#comment-17675546 ]
Apache Spark commented on SPARK-41986:
--------------------------------------
User 'wangyum' has created a pull request for this issue:
https://github.com/apache/spark/pull/39512
> Introduce shuffle on SinglePartition
> ------------------------------------
>
> Key: SPARK-41986
> URL: https://issues.apache.org/jira/browse/SPARK-41986
> Project: Spark
> Issue Type: Improvement
> Components: SQL
> Affects Versions: 3.4.0
> Reporter: Yuming Wang
> Priority: Major
>
> {code:scala}
> spark.range(100000000L).selectExpr("id as a", "id as b").write.saveAsTable("t1")
> sql(
> """
> |WITH base
> | AS (select *, ROW_NUMBER() OVER(ORDER BY a) AS new_a FROM t1)
> |SELECT * FROM base t1 JOIN base t2 ON t1.a = t2.b
> |""".stripMargin).explain()
> {code}
> The output:
> {noformat}
> == Physical Plan ==
> AdaptiveSparkPlan isFinalPlan=false
> +- SortMergeJoin [a#10L], [b#26L], Inner
> :- Filter isnotnull(a#10L)
> : +- Window [row_number() windowspecdefinition(a#10L ASC NULLS FIRST, specifiedwindowframe(RowFrame, unboundedpreceding$(), currentrow$())) AS new_a#8], [a#10L ASC NULLS FIRST]
> : +- Sort [a#10L ASC NULLS FIRST], false, 0
> : +- Exchange SinglePartition, ENSURE_REQUIREMENTS, [plan_id=50]
> : +- FileScan parquet spark_catalog.default.t1[a#10L,b#11L] Batched: true, DataFilters: [], Format: Parquet, Location: InMemoryFileIndex(1 paths)[file:/Users/yumwang/opensource/spark/spark-warehouse/org.apache.spark...., PartitionFilters: [], PushedFilters: [], ReadSchema: struct<a:bigint,b:bigint>
> +- Sort [b#26L ASC NULLS FIRST], false, 0
> +- Filter isnotnull(b#26L)
> +- Window [row_number() windowspecdefinition(a#25L ASC NULLS FIRST, specifiedwindowframe(RowFrame, unboundedpreceding$(), currentrow$())) AS new_a#27], [a#25L ASC NULLS FIRST]
> +- Sort [a#25L ASC NULLS FIRST], false, 0
> +- Exchange SinglePartition, ENSURE_REQUIREMENTS, [plan_id=54]
> +- FileScan parquet spark_catalog.default.t1[a#25L,b#26L] Batched: true, DataFilters: [], Format: Parquet, Location: InMemoryFileIndex(1 paths)[file:/Users/yumwang/opensource/spark/spark-warehouse/org.apache.spark...., PartitionFilters: [], PushedFilters: [], ReadSchema: struct<a:bigint,b:bigint>
> {noformat}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org