You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@spark.apache.org by sr...@apache.org on 2017/07/12 10:02:09 UTC

spark git commit: [SPARK-21305][ML][MLLIB] Add options to disable multi-threading of native BLAS

Repository: spark
Updated Branches:
  refs/heads/master f587d2e3f -> 5ed134ee2


[SPARK-21305][ML][MLLIB] Add options to disable multi-threading of native BLAS

## What changes were proposed in this pull request?

Many ML/MLLIB algorithms use native BLAS (like Intel MKL, ATLAS, OpenBLAS) to improvement the performance.
Many popular Native BLAS, like Intel MKL, OpenBLAS, use multi-threading technology, which will conflict with Spark.  Spark should provide options to disable multi-threading of Native BLAS.

https://github.com/xianyi/OpenBLAS/wiki/faq#multi-threaded
https://software.intel.com/en-us/articles/recommended-settings-for-calling-intel-mkl-routines-from-multi-threaded-applications

## How was this patch tested?
The existing UT.

Author: Peng Meng <pe...@intel.com>

Closes #18551 from mpjlu/optimzeBLAS.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/5ed134ee
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/5ed134ee
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/5ed134ee

Branch: refs/heads/master
Commit: 5ed134ee213060882c6e3ed713473fa6cc158d36
Parents: f587d2e
Author: Peng Meng <pe...@intel.com>
Authored: Wed Jul 12 11:02:04 2017 +0100
Committer: Sean Owen <so...@cloudera.com>
Committed: Wed Jul 12 11:02:04 2017 +0100

----------------------------------------------------------------------
 conf/spark-env.sh.template | 4 ++++
 docs/ml-guide.md           | 6 ++++++
 2 files changed, 10 insertions(+)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/spark/blob/5ed134ee/conf/spark-env.sh.template
----------------------------------------------------------------------
diff --git a/conf/spark-env.sh.template b/conf/spark-env.sh.template
index b9aab5a..1663019 100755
--- a/conf/spark-env.sh.template
+++ b/conf/spark-env.sh.template
@@ -61,3 +61,7 @@
 # - SPARK_IDENT_STRING  A string representing this instance of spark. (Default: $USER)
 # - SPARK_NICENESS      The scheduling priority for daemons. (Default: 0)
 # - SPARK_NO_DAEMONIZE  Run the proposed command in the foreground. It will not output a PID file.
+# Options for native BLAS, like Intel MKL, OpenBLAS, and so on.
+# You might get better performance to enable these options if using native BLAS (see SPARK-21305).
+# - MKL_NUM_THREADS=1        Disable multi-threading of Intel MKL
+# - OPENBLAS_NUM_THREADS=1   Disable multi-threading of OpenBLAS

http://git-wip-us.apache.org/repos/asf/spark/blob/5ed134ee/docs/ml-guide.md
----------------------------------------------------------------------
diff --git a/docs/ml-guide.md b/docs/ml-guide.md
index fb46213..adb1c9a 100644
--- a/docs/ml-guide.md
+++ b/docs/ml-guide.md
@@ -61,6 +61,12 @@ To configure `netlib-java` / Breeze to use system optimised binaries, include
 project and read the [netlib-java](https://github.com/fommil/netlib-java) documentation for your
 platform's additional installation instructions.
 
+The most popular native BLAS such as [Intel MKL](https://software.intel.com/en-us/mkl), [OpenBLAS](http://www.openblas.net), can use multiple threads in a single operation, which can conflict with Spark's execution model.
+
+Configuring these BLAS implementations to use a single thread for operations may actually improve performance (see [SPARK-21305](https://issues.apache.org/jira/browse/SPARK-21305)). It is usually optimal to match this to the number of cores each Spark task is configured to use, which is 1 by default and typically left at 1.
+
+Please refer to resources like the following to understand how to configure the number of threads these BLAS implementations use: [Intel MKL](https://software.intel.com/en-us/articles/recommended-settings-for-calling-intel-mkl-routines-from-multi-threaded-applications) and [OpenBLAS](https://github.com/xianyi/OpenBLAS/wiki/faq#multi-threaded).
+
 To use MLlib in Python, you will need [NumPy](http://www.numpy.org) version 1.4 or newer.
 
 [^1]: To learn more about the benefits and background of system optimised natives, you may wish to


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org