You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Wenchen Fan (Jira)" <ji...@apache.org> on 2021/03/29 14:34:00 UTC

[jira] [Assigned] (SPARK-34255) DataSource V2: support static partitioning on required distribution and ordering

     [ https://issues.apache.org/jira/browse/SPARK-34255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Wenchen Fan reassigned SPARK-34255:
-----------------------------------

    Assignee: Jungtaek Lim

> DataSource V2: support static partitioning on required distribution and ordering
> --------------------------------------------------------------------------------
>
>                 Key: SPARK-34255
>                 URL: https://issues.apache.org/jira/browse/SPARK-34255
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 3.2.0
>            Reporter: Jungtaek Lim
>            Assignee: Jungtaek Lim
>            Priority: Major
>
> SPARK-34026 addressed the functionality of requiring repartition and sort order from data source, but left the number of partitions during repartition as depending on the config (default number of shuffle partitions).
> Some special data sources may require the "static number of partitions" during repartition - for example, state data source. Spark stores the state via partitioned by "hash(group key) % default number of shuffle partitions", which means state data source should do the same to rewrite the state data. And the data source is required to "change" the default number of shuffle partitions, as the value is not guaranteed to be same, and also there's a chance we change the number of partitions to non-static one (like letting AQE decides it, SPARK-34230).
> This issue tracks the effort to support static number of partitions during repartition.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org