You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "XiDuo You (Jira)" <ji...@apache.org> on 2021/11/17 13:47:00 UTC

[jira] [Created] (SPARK-37357) Add merged last partition factor for rebalance

XiDuo You created SPARK-37357:
---------------------------------

             Summary: Add merged last partition factor for rebalance
                 Key: SPARK-37357
                 URL: https://issues.apache.org/jira/browse/SPARK-37357
             Project: Spark
          Issue Type: Sub-task
          Components: SQL
    Affects Versions: 3.3.0
            Reporter: XiDuo You


`Rebalance` provide a functionality that split the large reduce partition into smalls. However we have seen many SQL produce small files due to the last partition.

Let's say we have one reduce partition and three map partitions and the blocks are: [40, 60, 10, 10] and the target size is 100. We will get two files with 110 and 10. And it will get worse if there thousands of reduce partitions.

It should be helpful if we can merge the last small partition into previous.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org