You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "thomasgx (Jira)" <ji...@apache.org> on 2023/03/24 06:54:00 UTC

[jira] [Resolved] (SPARK-42912) Some cases do not take effect when using OptimizeSkewInRebalancePartitions

     [ https://issues.apache.org/jira/browse/SPARK-42912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

thomasgx resolved SPARK-42912.
------------------------------
    Resolution: Fixed

> Some cases do not take effect when using OptimizeSkewInRebalancePartitions
> --------------------------------------------------------------------------
>
>                 Key: SPARK-42912
>                 URL: https://issues.apache.org/jira/browse/SPARK-42912
>             Project: Spark
>          Issue Type: Question
>          Components: Spark Core
>    Affects Versions: 3.3.0
>         Environment: spark3.3.0
>            Reporter: thomasgx
>            Priority: Major
>         Attachments: image-2023-03-24-11-30-42-239.png, image-2023-03-24-11-31-42-564.png, image-2023-03-24-11-34-34-070.png, image-2023-03-24-11-36-54-539.png, image-2023-03-24-11-37-42-289.png
>
>
> Questioin:
> When using OptimizeSkewInRebalancePartitions to insert dynamic partitions (three-level partitions) into the hive table (partitions are skewed), it is found that when spark.sql.shuffle.partitions is set to a relatively large value (10000), the written results do not follow the preset advisoryPartitionSizeInBytes Size to file (the skewed partition data is only processed by one task and written into one file), but when I reduce spark.sql.shuffle.partitions (2000), I found that the skewed partition can be optimized according to OptimizeSkewInRebalancePartitions Data is processed in batches and written to a file.
>  
> spark aqe config:
> spark.sql.adaptive.coalescePartitions.enabled true
> spark.sql.adaptive.skewedJoin.enabled true
> spark.sql.adaptive.advisoryPartitionSizeInBytes 128M
> spark.sql.finalStage.adaptive.advisoryPartitionSizeInBytes 512M
> spark.sql.finalStage.adaptive.coalescePartitions.minPartitionSize 128M
> spark.sql.finalStage.adaptive.coalescePartitions.parallelismFirst false
> spark.sql.adaptive.skewJoin.skewedPartitionThresholdInBytes  1024M
>  
> 10000 partitions
> !image-2023-03-24-11-30-42-239.png|width=929,height=150!
>  
>  
> 2000 partition:
> !image-2023-03-24-11-31-42-564.png|width=936,height=172!
>  
>  
> sql time
> !image-2023-03-24-11-34-34-070.png|width=962,height=220!
>  
>  
> plan:
> !image-2023-03-24-11-36-54-539.png|width=339,height=389!
>  
>  
>  
> !image-2023-03-24-11-37-42-289.png|width=334,height=306!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org