You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Dongjoon Hyun (Jira)" <ji...@apache.org> on 2022/04/05 02:32:00 UTC

[jira] [Resolved] (SPARK-37377) Refactor V2 Partitioning interface and remove deprecated usage of Distribution

     [ https://issues.apache.org/jira/browse/SPARK-37377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dongjoon Hyun resolved SPARK-37377.
-----------------------------------
    Fix Version/s: 3.4.0
         Assignee: Chao Sun
       Resolution: Fixed

This is resolved via https://github.com/apache/spark/pull/35657

> Refactor V2 Partitioning interface and remove deprecated usage of Distribution
> ------------------------------------------------------------------------------
>
>                 Key: SPARK-37377
>                 URL: https://issues.apache.org/jira/browse/SPARK-37377
>             Project: Spark
>          Issue Type: Sub-task
>          Components: SQL
>    Affects Versions: 3.3.0
>            Reporter: Chao Sun
>            Assignee: Chao Sun
>            Priority: Major
>             Fix For: 3.4.0
>
>
> Currently {{Partitioning}} is defined as follow:
> {code:scala}
> @Evolving
> public interface Partitioning {
>   int numPartitions();
>   boolean satisfy(Distribution distribution);
> }
> {code}
> There are two issues with the interface: 1) it uses a deprecated {{Distribution}} interface, and should switch to {{org.apache.spark.sql.connector.distributions.Distribution}}. 2) currently there is no way to use this in join where we want to compare reported partitionings from both sides and decide whether they are "compatible" (and thus allows Spark to eliminate shuffle). 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org