You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@spark.apache.org by do...@apache.org on 2022/07/23 22:48:12 UTC

[spark] branch master updated: [SPARK-39846][CORE] Enable `spark.dynamicAllocation.shuffleTracking.enabled` by default

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
     new 1b6cdf10406 [SPARK-39846][CORE] Enable `spark.dynamicAllocation.shuffleTracking.enabled` by default
1b6cdf10406 is described below

commit 1b6cdf1040645486ae9b5cbb0247d8869f4f259f
Author: Dongjoon Hyun <do...@apache.org>
AuthorDate: Sat Jul 23 15:48:01 2022 -0700

    [SPARK-39846][CORE] Enable `spark.dynamicAllocation.shuffleTracking.enabled` by default
    
    ### What changes were proposed in this pull request?
    
    This PR aims to enable `spark.dynamicAllocation.shuffleTracking.enabled` by default in Apache Spark 3.4 when `spark.dynamicAllocation.enabled=true` and `spark.shuffle.service.enabled=false`
    
    ### Why are the changes needed?
    
    Here is a brief history around `spark.dynamicAllocation.shuffleTracking.enabled`.
    - Apache Spark 3.0.0 added it via SPARK-27963 for K8s environment.
      > One immediate use case is the ability to use dynamic allocation on Kubernetes, which doesn't yet have that service.
    - Apache Spark 3.1.1 made K8s GA via SPARK-33005 and started to used it in K8s widely.
    - Apache Spark 3.2.0 started to support shuffle data recovery on the reused PVCs via SPARK-35593
    - Apache Spark 3.3.0 removed `Experimental` tag from it via SPARK-39322.
    - Apache Spark 3.4.0 will enable it by default via SPARK-39846 (this PR) to help Spark K8s users to dynamic allocation more easily.
    
    ### Does this PR introduce _any_ user-facing change?
    
    The `Core` migration guide is updated.
    
    ### How was this patch tested?
    
    Pass the CIs including K8s IT GitHub Action job.
    
    Closes #37257 from dongjoon-hyun/SPARK-39846.
    
    Authored-by: Dongjoon Hyun <do...@apache.org>
    Signed-off-by: Dongjoon Hyun <do...@apache.org>
---
 core/src/main/scala/org/apache/spark/internal/config/package.scala | 2 +-
 docs/configuration.md                                              | 2 +-
 docs/core-migration-guide.md                                       | 2 ++
 3 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/core/src/main/scala/org/apache/spark/internal/config/package.scala b/core/src/main/scala/org/apache/spark/internal/config/package.scala
index 02a52e86454..72a03a4d1fb 100644
--- a/core/src/main/scala/org/apache/spark/internal/config/package.scala
+++ b/core/src/main/scala/org/apache/spark/internal/config/package.scala
@@ -646,7 +646,7 @@ package object config {
     ConfigBuilder("spark.dynamicAllocation.shuffleTracking.enabled")
       .version("3.0.0")
       .booleanConf
-      .createWithDefault(false)
+      .createWithDefault(true)
 
   private[spark] val DYN_ALLOCATION_SHUFFLE_TRACKING_TIMEOUT =
     ConfigBuilder("spark.dynamicAllocation.shuffleTracking.timeout")
diff --git a/docs/configuration.md b/docs/configuration.md
index 26addffe88b..957c430c37b 100644
--- a/docs/configuration.md
+++ b/docs/configuration.md
@@ -2760,7 +2760,7 @@ Apart from these, the following properties are also available, and may be useful
 </tr>
 <tr>
   <td><code>spark.dynamicAllocation.shuffleTracking.enabled</code></td>
-  <td><code>false</code></td>
+  <td><code>true</code></td>
   <td>
     Enables shuffle file tracking for executors, which allows dynamic allocation
     without the need for an external shuffle service. This option will try to keep alive executors
diff --git a/docs/core-migration-guide.md b/docs/core-migration-guide.md
index 1a16b8f112a..a4af47b016a 100644
--- a/docs/core-migration-guide.md
+++ b/docs/core-migration-guide.md
@@ -26,6 +26,8 @@ license: |
 
 - Since Spark 3.4, Spark driver will own `PersistentVolumnClaim`s and try to reuse if they are not assigned to live executors. To restore the behavior before Spark 3.4, you can set `spark.kubernetes.driver.ownPersistentVolumeClaim` to `false` and `spark.kubernetes.driver.reusePersistentVolumeClaim` to `false`.
 
+- Since Spark 3.4, Spark driver will track shuffle data when dynamic allocation is enabled without shuffle service. To restore the behavior before Spark 3.4, you can set `spark.dynamicAllocation.shuffleTracking.enabled` to `false`.
+
 ## Upgrading from Core 3.2 to 3.3
 
 - Since Spark 3.3, Spark migrates its log4j dependency from 1.x to 2.x because log4j 1.x has reached end of life and is no longer supported by the community. Vulnerabilities reported after August 2015 against log4j 1.x were not checked and will not be fixed. Users should rewrite original log4j properties files using log4j2 syntax (XML, JSON, YAML, or properties format). Spark rewrites the `conf/log4j.properties.template` which is included in Spark distribution, to `conf/log4j2.properties [...]


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org