You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Apache Spark (Jira)" <ji...@apache.org> on 2022/11/22 10:31:00 UTC

[jira] [Commented] (SPARK-41220) Range partitioner sample supports column pruning

    [ https://issues.apache.org/jira/browse/SPARK-41220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17637148#comment-17637148 ] 

Apache Spark commented on SPARK-41220:
--------------------------------------

User 'ulysses-you' has created a pull request for this issue:
https://github.com/apache/spark/pull/38756

> Range partitioner sample supports column pruning
> ------------------------------------------------
>
>                 Key: SPARK-41220
>                 URL: https://issues.apache.org/jira/browse/SPARK-41220
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 3.4.0
>            Reporter: XiDuo You
>            Priority: Major
>
> When do a global sort, firstly we do sample to get range bounds, then we use the range partitioner to do shuffle exchange.
> The issue is, the sample plan is coupled with the shuffle plan that causes we can not optimize the sample plan. What we need for sample plan is the columns for sort order but the shuffle plan contains all data columns.So at least, we can do column pruning for the sample plan to only fetch the ordering columns.
> A common example is: `OPTIMIZE table ZORDER BY columns`



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org