You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Apache Spark (Jira)" <ji...@apache.org> on 2023/03/14 17:16:00 UTC

[jira] [Assigned] (SPARK-42779) Allow V2 writes to indicate advisory partition size

     [ https://issues.apache.org/jira/browse/SPARK-42779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Apache Spark reassigned SPARK-42779:
------------------------------------

    Assignee: Apache Spark

> Allow V2 writes to indicate advisory partition size
> ---------------------------------------------------
>
>                 Key: SPARK-42779
>                 URL: https://issues.apache.org/jira/browse/SPARK-42779
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 3.5.0
>            Reporter: Anton Okolnychyi
>            Assignee: Apache Spark
>            Priority: Major
>
> Data sources may request a particular distribution and ordering of data for V2 writes. If AQE is enabled, the default session advisory partition size (64MB) will be used as guidance. Unfortunately, this default value can still lead to small files because the written data can be compressed nicely using columnar file formats. Spark should allow data sources to indicate the advisory shuffle partition size, just like it lets data sources request a particular number of partitions.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org