You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@spark.apache.org by gu...@apache.org on 2023/02/28 02:19:47 UTC

[spark] branch branch-3.4 updated: [SPARK-42596][CORE][YARN] OMP_NUM_THREADS not set to number of executor cores by default

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-3.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.4 by this push:
     new 334e49b4553 [SPARK-42596][CORE][YARN] OMP_NUM_THREADS not set to number of executor cores by default
334e49b4553 is described below

commit 334e49b4553d912ac0013c8044bd9a3073907c5a
Author: John Zhuge <jz...@apache.org>
AuthorDate: Tue Feb 28 11:19:23 2023 +0900

    [SPARK-42596][CORE][YARN] OMP_NUM_THREADS not set to number of executor cores by default
    
    ### What changes were proposed in this pull request?
    
    The PR fixes a mistake in SPARK-41188 that removed the PythonRunner code setting OMP_NUM_THREADS to number of executor cores by default. That author and reviewers thought it's a duplicate.
    
    ### Why are the changes needed?
    
    SPARK-41188 stopped setting OMP_NUM_THREADS to number of executor cores by default when running Python UDF on YARN.
    
    ### Does this PR introduce _any_ user-facing change?
    
    No
    
    ### How was this patch tested?
    
    Manual testing
    
    Closes #40199 from jzhuge/SPARK-42596.
    
    Authored-by: John Zhuge <jz...@apache.org>
    Signed-off-by: Hyukjin Kwon <gu...@apache.org>
    (cherry picked from commit 43b15b31d26bbf1e539728e6c64aab4eda7ade62)
    Signed-off-by: Hyukjin Kwon <gu...@apache.org>
---
 core/src/main/scala/org/apache/spark/api/python/PythonRunner.scala | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/core/src/main/scala/org/apache/spark/api/python/PythonRunner.scala b/core/src/main/scala/org/apache/spark/api/python/PythonRunner.scala
index cdb2c620656..14d5df14ed8 100644
--- a/core/src/main/scala/org/apache/spark/api/python/PythonRunner.scala
+++ b/core/src/main/scala/org/apache/spark/api/python/PythonRunner.scala
@@ -135,6 +135,13 @@ private[spark] abstract class BasePythonRunner[IN, OUT](
     val execCoresProp = Option(context.getLocalProperty(EXECUTOR_CORES_LOCAL_PROPERTY))
     val memoryMb = Option(context.getLocalProperty(PYSPARK_MEMORY_LOCAL_PROPERTY)).map(_.toLong)
     val localdir = env.blockManager.diskBlockManager.localDirs.map(f => f.getPath()).mkString(",")
+    // if OMP_NUM_THREADS is not explicitly set, override it with the number of cores
+    if (conf.getOption("spark.executorEnv.OMP_NUM_THREADS").isEmpty) {
+      // SPARK-28843: limit the OpenMP thread pool to the number of cores assigned to this executor
+      // this avoids high memory consumption with pandas/numpy because of a large OpenMP thread pool
+      // see https://github.com/numpy/numpy/issues/10455
+      execCoresProp.foreach(envVars.put("OMP_NUM_THREADS", _))
+    }
     envVars.put("SPARK_LOCAL_DIRS", localdir) // it's also used in monitor thread
     if (reuseWorker) {
       envVars.put("SPARK_REUSE_WORKER", "1")


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org