You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Kapil Kumar Singh <ka...@microsoft.com.INVALID> on 2022/09/12 04:31:30 UTC

RE: [EXTERNAL] Re: Dynamic shuffle partitions in a single job

AFAIK there is no way to set it at each Exchange.
If you are using Spark 3.2+ AQE Performance Tuning - Spark 3.3.0 Documentation (apache.org)<https://spark.apache.org/docs/latest/sql-performance-tuning.html#adaptive-query-execution> is enabled by default, which can automatically changes post shuffle partition number on each exchange which can help your use case. This feature is available from Spark 3.

From: Anupam Singh <an...@gmail.com>
Sent: Saturday, September 10, 2022 10:23 PM
To: Vibhor Gupta <Vi...@walmart.com.invalid>
Cc: user@spark.apache.org
Subject: [EXTERNAL] Re: Dynamic shuffle partitions in a single job

You don't often get email from anupamsngh518@gmail.com<ma...@gmail.com>. Learn why this is important<https://aka.ms/LearnAboutSenderIdentification>
Commenting for better reach :)

On Thu, Sep 8, 2022, 11:56 AM Vibhor Gupta <Vi...@walmart.com.invalid>> wrote:
Hi Community,

Is it possible to set no of shuffle partitions per exchange ?

My spark query contains a lot of joins/aggregations involving big tables and small tables, so keeping a high value of spark.sql.shuffle.partitions helps with big tables, for small tables it creates a lot of overhead on the scheduler.

Cheers,
Vibhor