You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Manu Zhang (Jira)" <ji...@apache.org> on 2020/08/25 12:48:00 UTC

[jira] [Created] (SPARK-32698) Do not fall back to default parallelism if the minimum number of coalesced partitions is not set in AQE

Manu Zhang created SPARK-32698:
----------------------------------

             Summary: Do not fall back to default parallelism if the minimum number of coalesced partitions is not set in AQE
                 Key: SPARK-32698
                 URL: https://issues.apache.org/jira/browse/SPARK-32698
             Project: Spark
          Issue Type: Improvement
          Components: SQL
    Affects Versions: 3.0.0
            Reporter: Manu Zhang


Currently in AQE when coalescing shuffling partitions,
{quote}We fall back to Spark default parallelism if the minimum number of coalesced partitions is not set, so to avoid perf regressions compared to no coalescing.
{quote}
From our experience, this has resulted in a lot of uncertainty of the number of tasks after coalescing especially with dynamic allocation, and also lead to many small output files. It's complex and hard to reason about.

Hence, I'm proposing not falling back to the default parallelism but coalescing towards the target size when the minimum number of coalesced partitions is not set.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org