You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2019/08/26 15:58:42 UTC

[GitHub] [spark] rdblue commented on a change in pull request #25545: [SPARK-28843][PYTHON] Set OMP_NUM_THREADS to executor cores for python

rdblue commented on a change in pull request #25545: [SPARK-28843][PYTHON] Set OMP_NUM_THREADS to executor cores for python
URL: https://github.com/apache/spark/pull/25545#discussion_r317672582
 
 

 ##########
 File path: core/src/main/scala/org/apache/spark/api/python/PythonRunner.scala
 ##########
 @@ -106,6 +106,13 @@ private[spark] abstract class BasePythonRunner[IN, OUT](
     val startTime = System.currentTimeMillis
     val env = SparkEnv.get
     val localdir = env.blockManager.diskBlockManager.localDirs.map(f => f.getPath()).mkString(",")
+    // if OMP_NUM_THREADS is not explicitly set, override it with the number of cores
+    if (conf.getOption("spark.executorEnv.OMP_NUM_THREADS").isEmpty) {
+      // SPARK-28843: limit the OpenMP thread pool to the number of cores assigned to this executor
+      // this avoids high memory consumption with pandas/numpy because of a large OpenMP thread pool
+      // see https://github.com/numpy/numpy/issues/10455
 
 Review comment:
   The first line of the comment explains the fix, the second gives context about when this happens.
   
   If you would like to clarify further, please open a PR against my branch.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org