You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2020/02/12 08:48:56 UTC

[GitHub] [spark] yma11 opened a new pull request #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines

yma11 opened a new pull request #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines
URL: https://github.com/apache/spark/pull/27546
 
 
   ### What changes were proposed in this pull request?
   Change BLAS for level-1 routines from java implementation to NativeBLAS
   
   ### Why are the changes needed?
   In current ML BLAS.scala, all level-1 routines are fixed to use java
   implementation. But NativeBLAS(intel MKL, OpenBLAS) can bring up to 11X
   performance improvement based on performance test which apply direct
   calls against these methods. We should provide a way to allow user take
   advantage of NativeBLAS for level-1 routines. Here we do it through
   switching to NativeBLAS for these methods from f2jBLAS.
   
   ### Does this PR introduce any user-facing change?
   Yes, level-1 routines will switch to NativeBLAS and will fallback to
   f2jBLAS if native BLAS is not properly configured in system.
   
   ### How was this patch tested?
   Perf test direct calls level-1 routines
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on issue #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on issue #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines
URL: https://github.com/apache/spark/pull/27546#issuecomment-600801614
 
 
   **[Test build #120005 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120005/testReport)** for PR 27546 at commit [`04559df`](https://github.com/apache/spark/commit/04559df1615b9d6a97cd7fed731ca6cccf55bb6a).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines
URL: https://github.com/apache/spark/pull/27546#issuecomment-600839723
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120005/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines
URL: https://github.com/apache/spark/pull/27546#issuecomment-599619689
 
 
   **[Test build #119873 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/119873/testReport)** for PR 27546 at commit [`abddf4c`](https://github.com/apache/spark/commit/abddf4c45e17f108ebf8f3c89300cfc444433b61).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines
URL: https://github.com/apache/spark/pull/27546#issuecomment-599581979
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/24601/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] winningsix commented on a change in pull request #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines

Posted by GitBox <gi...@apache.org>.
winningsix commented on a change in pull request #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines
URL: https://github.com/apache/spark/pull/27546#discussion_r384282022
 
 

 ##########
 File path: mllib-local/src/main/scala/org/apache/spark/ml/linalg/BLAS.scala
 ##########
 @@ -27,15 +27,35 @@ private[spark] object BLAS extends Serializable {
 
   @transient private var _f2jBLAS: NetlibBLAS = _
   @transient private var _nativeBLAS: NetlibBLAS = _
+  @transient private val vectorSizeThreshold: Int = 256
 
 Review comment:
   Hint: Make this final and static

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhengruifeng commented on issue #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines

Posted by GitBox <gi...@apache.org>.
zhengruifeng commented on issue #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines
URL: https://github.com/apache/spark/pull/27546#issuecomment-594342913
 
 
   @yma11 Yes, but there are still some algorithms have not been migrated to `.ml`.
   
   BTW, did you also change `.mllib.linalg.BLAS` in your KMeans tests? Since KMeans is actually implemented in the `.mllib` side, so `.mllib.linalg.BLAS` is used instead of `.ml.linalg.BLAS`.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] yma11 commented on issue #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines

Posted by GitBox <gi...@apache.org>.
yma11 commented on issue #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines
URL: https://github.com/apache/spark/pull/27546#issuecomment-597999782
 
 
   Perf data for nativeBLAS(mkl, openblas) and java implementation got collected on IA platform **SandyBridge**, a quite old generation of CPU.  Generally, 256 is still a safe value under this scenario. Please refer attached for details.
   [Level1-routines-perf - part-4.xlsx](https://github.com/apache/spark/files/4322257/Level1-routines-perf.-.part-4.xlsx)
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] srowen commented on a change in pull request #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines

Posted by GitBox <gi...@apache.org>.
srowen commented on a change in pull request #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines
URL: https://github.com/apache/spark/pull/27546#discussion_r386032347
 
 

 ##########
 File path: mllib-local/src/main/scala/org/apache/spark/ml/linalg/BLAS.scala
 ##########
 @@ -20,22 +20,42 @@ package org.apache.spark.ml.linalg
 import com.github.fommil.netlib.{BLAS => NetlibBLAS, F2jBLAS}
 import com.github.fommil.netlib.BLAS.{getInstance => NativeBLAS}
 
+import org.apache.spark.SparkConf
+import org.apache.spark.internal.config.NATIVE_L1_THRESHOLD
+
 /**
  * BLAS routines for MLlib's vectors and matrices.
  */
 private[spark] object BLAS extends Serializable {
 
   @transient private var _f2jBLAS: NetlibBLAS = _
   @transient private var _nativeBLAS: NetlibBLAS = _
+  private val nativeL1Threshold =
+    new SparkConf().getInt(NATIVE_L1_THRESHOLD.key, 256)
 
 Review comment:
   You should be able to access the value from the ConfigEntry directly, and not repeat the default

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] yma11 commented on a change in pull request #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines

Posted by GitBox <gi...@apache.org>.
yma11 commented on a change in pull request #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines
URL: https://github.com/apache/spark/pull/27546#discussion_r384959562
 
 

 ##########
 File path: mllib-local/src/main/scala/org/apache/spark/ml/linalg/BLAS.scala
 ##########
 @@ -27,15 +27,35 @@ private[spark] object BLAS extends Serializable {
 
   @transient private var _f2jBLAS: NetlibBLAS = _
   @transient private var _nativeBLAS: NetlibBLAS = _
+  @transient private val vectorSizeThreshold: Int = 256
 
-  // For level-1 routines, we use Java implementation.
+  // For level-3 routines, we use the native BLAS.
+  // if native BLAS is not properly configured in system, will automatically fallback to f2jBLAS
+  private[ml] def nativeBLAS: NetlibBLAS = {
 
 Review comment:
   I add a note in ml-guide to highlight nativeBLAS usage depends on proper configuration in system otherwise will fallback to java implementation.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines
URL: https://github.com/apache/spark/pull/27546#issuecomment-600801614
 
 
   **[Test build #120005 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120005/testReport)** for PR 27546 at commit [`04559df`](https://github.com/apache/spark/commit/04559df1615b9d6a97cd7fed731ca6cccf55bb6a).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] srowen closed pull request #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines

Posted by GitBox <gi...@apache.org>.
srowen closed pull request #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines
URL: https://github.com/apache/spark/pull/27546
 
 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] mengxr commented on issue #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines

Posted by GitBox <gi...@apache.org>.
mengxr commented on issue #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines
URL: https://github.com/apache/spark/pull/27546#issuecomment-587132383
 
 
   @yma11 Thanks for re-running the benchmark! Could you confirm whether AVX is used by MKL/OpenBLAS? I think that is why it is faster on level-1 on large vectors.
   
   It seems reasonable to switch between f2jBLAS and nativeBLAS based on vector size (256?). 
   
   There is still one scenario we need to check. I don't think users will remember limiting native BLAS threads to 1. They might have 8 concurrent tasks running on a 8-core worker while each using multi-thread BLAS. I saw cases before that it significantly degrade the performance. It might be worth running a benchmark for this scenario.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines
URL: https://github.com/apache/spark/pull/27546#issuecomment-585096909
 
 
   Can one of the admins verify this patch?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] srowen commented on a change in pull request #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines

Posted by GitBox <gi...@apache.org>.
srowen commented on a change in pull request #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines
URL: https://github.com/apache/spark/pull/27546#discussion_r386789749
 
 

 ##########
 File path: mllib-local/pom.xml
 ##########
 @@ -61,13 +61,17 @@
       This spark-tags test-dep is needed even though it isn't used in this module, otherwise testing-cmds that exclude
       them will yield errors.
     -->
+    <dependency>
 
 Review comment:
   Ah, OK that is not even available, right. I think I'd just make it a hard-coded constant then. I don't think we can reasonably expect people to find and discover and env variable that probably isn't that meaningful to tune. Just set it conservatively. 512?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] yma11 commented on issue #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines

Posted by GitBox <gi...@apache.org>.
yma11 commented on issue #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines
URL: https://github.com/apache/spark/pull/27546#issuecomment-597114153
 
 
   > I think the two remaining issues are:
   > 
   > * Should this threshold be configurable, only within `mllib` at least?
   > * Should the default / hard-coded value be larger, to be 'safer'?
   > 
   > Possibly still partly configurable, yes; I don't feel strongly about it.
   > @mengxr 's last question was, how does this compare on older hardware, like Ivy Bridge or similar, maybe? I think we're aiming for a value that should not cause a slowdown on hardware people might be realistically running now, especially if they haven't tuned BLAS threads. That number might be higher than 256 and I expect it is, but just guessing.
   
   I am trying to get perf data of an older IA platform to double check whether 256 should be a safe threshold, even it's not all the case but it can make us more confident. Not finished yet and will update you once get the result.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines
URL: https://github.com/apache/spark/pull/27546#issuecomment-585097459
 
 
   Can one of the admins verify this patch?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] srowen commented on a change in pull request #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines

Posted by GitBox <gi...@apache.org>.
srowen commented on a change in pull request #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines
URL: https://github.com/apache/spark/pull/27546#discussion_r386505447
 
 

 ##########
 File path: mllib-local/pom.xml
 ##########
 @@ -61,13 +61,17 @@
       This spark-tags test-dep is needed even though it isn't used in this module, otherwise testing-cmds that exclude
       them will yield errors.
     -->
+    <dependency>
 
 Review comment:
   A property file is even more convoluted. You can still access the property; you just can't use the config API or symbol.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines
URL: https://github.com/apache/spark/pull/27546#issuecomment-600802152
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/24724/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] yma11 commented on issue #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines

Posted by GitBox <gi...@apache.org>.
yma11 commented on issue #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines
URL: https://github.com/apache/spark/pull/27546#issuecomment-599363686
 
 
   @kiszk and @mengxr  any more comments? I can see there is an old change request from kiszk.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhengruifeng commented on issue #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines

Posted by GitBox <gi...@apache.org>.
zhengruifeng commented on issue #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines
URL: https://github.com/apache/spark/pull/27546#issuecomment-594326474
 
 
   Should `mllib.linalg.BLAS` also be changed? Some algorithms now are implemented in the `.mllib` side.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] srowen commented on a change in pull request #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines

Posted by GitBox <gi...@apache.org>.
srowen commented on a change in pull request #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines
URL: https://github.com/apache/spark/pull/27546#discussion_r386429848
 
 

 ##########
 File path: mllib-local/pom.xml
 ##########
 @@ -61,13 +61,17 @@
       This spark-tags test-dep is needed even though it isn't used in this module, otherwise testing-cmds that exclude
       them will yield errors.
     -->
+    <dependency>
 
 Review comment:
   Ah, right. OK, in that case I would leave the property in `core`, but reference it only by name (unfortunately) here in `mllib-local`, with a comment about why. You can remove `.ml` from the name too; it's not specific to ML really.
   
   I don't think env variables are a good solution here.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] mengxr commented on issue #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines

Posted by GitBox <gi...@apache.org>.
mengxr commented on issue #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines
URL: https://github.com/apache/spark/pull/27546#issuecomment-585864274
 
 
   Could you provide reproducible microbenchmark results for both OpenBLAS and MKL on arrays of different sizes? Last time we tested it it didn't help. But that was years ago.
   
   Note that JVM also improved SIMD. We should also check whether we can improve along that direction.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] mengxr commented on issue #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines

Posted by GitBox <gi...@apache.org>.
mengxr commented on issue #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines
URL: https://github.com/apache/spark/pull/27546#issuecomment-599577586
 
 
   jenkins, test this please

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] srowen commented on issue #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines

Posted by GitBox <gi...@apache.org>.
srowen commented on issue #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines
URL: https://github.com/apache/spark/pull/27546#issuecomment-592537637
 
 
   Yeah I was just thinking of keeping it conservative, where it should always be a win. OK add an .ml config somewhere, I suppose. Maybe `spark.blas.nativeL1Threshold`?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] yma11 commented on a change in pull request #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines

Posted by GitBox <gi...@apache.org>.
yma11 commented on a change in pull request #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines
URL: https://github.com/apache/spark/pull/27546#discussion_r386799727
 
 

 ##########
 File path: mllib-local/pom.xml
 ##########
 @@ -61,13 +61,17 @@
       This spark-tags test-dep is needed even though it isn't used in this module, otherwise testing-cmds that exclude
       them will yield errors.
     -->
+    <dependency>
 
 Review comment:
   Hi @srowen, how about 256 which we have finalized in previous discussions?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] winningsix commented on a change in pull request #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines

Posted by GitBox <gi...@apache.org>.
winningsix commented on a change in pull request #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines
URL: https://github.com/apache/spark/pull/27546#discussion_r386498502
 
 

 ##########
 File path: mllib-local/pom.xml
 ##########
 @@ -61,13 +61,17 @@
       This spark-tags test-dep is needed even though it isn't used in this module, otherwise testing-cmds that exclude
       them will yield errors.
     -->
+    <dependency>
 
 Review comment:
   hmm, instead env variables, we can pass this configuration via blas property file considering ```mllib-local``` package does not have access to ```core``` module.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] yma11 commented on issue #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines

Posted by GitBox <gi...@apache.org>.
yma11 commented on issue #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines
URL: https://github.com/apache/spark/pull/27546#issuecomment-591289225
 
 
   Hi @srowen and @mengxr, I have updated this PR using 256 as the value for nativaBLAS and java implementation switch.  Pls take a further view. Thanks.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] mengxr commented on issue #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines

Posted by GitBox <gi...@apache.org>.
mengxr commented on issue #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines
URL: https://github.com/apache/spark/pull/27546#issuecomment-586387070
 
 
   Please also limit the blas threads to 1 because one core per task is the common setting.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] mengxr commented on issue #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines

Posted by GitBox <gi...@apache.org>.
mengxr commented on issue #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines
URL: https://github.com/apache/spark/pull/27546#issuecomment-595592042
 
 
   I'm okay with a hardcoded constant. We are start with a safer one, say 512, if there are concerns around environment differences.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] yma11 commented on issue #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines

Posted by GitBox <gi...@apache.org>.
yma11 commented on issue #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines
URL: https://github.com/apache/spark/pull/27546#issuecomment-590792591
 
 
   @srowen In my kmeans test, each executor is configured with 6 cores and I can confirm it's fully utilized through web UI.  and under this condition, using default setting of MKL_NUM_THREADS is better than setting value to 1 but similar with setting value to 4. So I suggest to tune this env variable based on specific configuration of spark workloads.
   To determine the threshold for logic change in level-1 methods, I have collected perf data for more vector sizes.  Thus based on the data of vector size(10, 100, 128, 150, 256, 1000), seems 128 is the point that nativeBlas has equal performance with java implementation and after that it shows obvious gain. Will it be reasonable to use this value as the logic change bar?
   Attached includes perf data for 128/150 elements vectors.
   [Level1-routines-perf - part-3.xlsx](https://github.com/apache/spark/files/4249399/Level1-routines-perf.-.part-3.xlsx)
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines
URL: https://github.com/apache/spark/pull/27546#issuecomment-600839710
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] srowen commented on issue #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines

Posted by GitBox <gi...@apache.org>.
srowen commented on issue #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines
URL: https://github.com/apache/spark/pull/27546#issuecomment-597077280
 
 
   I think the two remaining issues are:
   - Should this threshold be configurable, only within `mllib` at least?
   - Should the default / hard-coded value be larger, to be 'safer'?
   
   Possibly still partly configurable, yes; I don't feel strongly about it.
   @mengxr 's last question was, how does this compare on older hardware, like Ivy Bridge or similar, maybe? I think we're aiming for a value that should not cause a slowdown on hardware people might be realistically running now, especially if they haven't tuned BLAS threads. That number might be higher than 256 and I expect it is, but just guessing.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] yma11 commented on a change in pull request #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines

Posted by GitBox <gi...@apache.org>.
yma11 commented on a change in pull request #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines
URL: https://github.com/apache/spark/pull/27546#discussion_r384959926
 
 

 ##########
 File path: mllib-local/src/main/scala/org/apache/spark/ml/linalg/BLAS.scala
 ##########
 @@ -220,20 +240,12 @@ private[spark] object BLAS extends Serializable {
       case sx: SparseVector =>
         f2jBLAS.dscal(sx.values.length, a, sx.values, 1)
 
 Review comment:
   done.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] yma11 commented on issue #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines

Posted by GitBox <gi...@apache.org>.
yma11 commented on issue #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines
URL: https://github.com/apache/spark/pull/27546#issuecomment-592951889
 
 
   > Yeah I was just thinking of keeping it conservative, where it should always be a win. OK add an .ml config somewhere, I suppose. Maybe `spark.blas.nativeL1Threshold`?
   
   Hi @srowen @kiszk @mengxr,  I have add a new property in spark and also updated the configuration doc.  Please take a review.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines
URL: https://github.com/apache/spark/pull/27546#issuecomment-599581979
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/24601/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] mengxr commented on issue #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines

Posted by GitBox <gi...@apache.org>.
mengxr commented on issue #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines
URL: https://github.com/apache/spark/pull/27546#issuecomment-595595323
 
 
   @yma11 You also need to try old generation CPUs / RAMs to confirm 256 is absolutely safe. We cannot assume all users are on the latest generation.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] kiszk commented on issue #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines

Posted by GitBox <gi...@apache.org>.
kiszk commented on issue #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines
URL: https://github.com/apache/spark/pull/27546#issuecomment-592561556
 
 
   The prop name sound good to me

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines
URL: https://github.com/apache/spark/pull/27546#issuecomment-600802141
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] srowen commented on a change in pull request #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines

Posted by GitBox <gi...@apache.org>.
srowen commented on a change in pull request #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines
URL: https://github.com/apache/spark/pull/27546#discussion_r387051234
 
 

 ##########
 File path: mllib-local/pom.xml
 ##########
 @@ -61,13 +61,17 @@
       This spark-tags test-dep is needed even though it isn't used in this module, otherwise testing-cmds that exclude
       them will yield errors.
     -->
+    <dependency>
 
 Review comment:
   Sure, I'm just trying to think of a threshold that should easily be a win regardless of other factors.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] srowen commented on a change in pull request #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines

Posted by GitBox <gi...@apache.org>.
srowen commented on a change in pull request #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines
URL: https://github.com/apache/spark/pull/27546#discussion_r386032330
 
 

 ##########
 File path: mllib-local/pom.xml
 ##########
 @@ -61,13 +61,17 @@
       This spark-tags test-dep is needed even though it isn't used in this module, otherwise testing-cmds that exclude
       them will yield errors.
     -->
+    <dependency>
 
 Review comment:
   This has to be removed; it can't depend on core.
   I think that means you need to move the property out of core.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] yma11 commented on issue #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines

Posted by GitBox <gi...@apache.org>.
yma11 commented on issue #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines
URL: https://github.com/apache/spark/pull/27546#issuecomment-595003661
 
 
   > @yma11 Yes, but there are still some algorithms have not been migrated to `.ml`.
   > 
   > BTW, did you also change `.mllib.linalg.BLAS` in your KMeans tests? Since KMeans is actually implemented in the `.mllib` side, so `.mllib.linalg.BLAS` is used instead of `.ml.linalg.BLAS`.
   
   Yes,  in our Kmeans test, we modified the BLAS.scala in mllib and now trying to apply the change in mllib-local since mllib in maintenance mode. This change actually can't be leveraged by mllib Kmeans directly and if you want to do so, you need change the import location of BLAS.axpy to ml instead of mllib.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] srowen commented on a change in pull request #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines

Posted by GitBox <gi...@apache.org>.
srowen commented on a change in pull request #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines
URL: https://github.com/apache/spark/pull/27546#discussion_r386764934
 
 

 ##########
 File path: mllib-local/pom.xml
 ##########
 @@ -61,13 +61,17 @@
       This spark-tags test-dep is needed even though it isn't used in this module, otherwise testing-cmds that exclude
       them will yield errors.
     -->
+    <dependency>
 
 Review comment:
   You don't, you just need to reference the config key by name, without references to symbols in core. This is a fine compromise.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] srowen commented on a change in pull request #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines

Posted by GitBox <gi...@apache.org>.
srowen commented on a change in pull request #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines
URL: https://github.com/apache/spark/pull/27546#discussion_r384514174
 
 

 ##########
 File path: mllib-local/src/main/scala/org/apache/spark/ml/linalg/BLAS.scala
 ##########
 @@ -27,15 +27,35 @@ private[spark] object BLAS extends Serializable {
 
   @transient private var _f2jBLAS: NetlibBLAS = _
   @transient private var _nativeBLAS: NetlibBLAS = _
+  @transient private val vectorSizeThreshold: Int = 256
 
-  // For level-1 routines, we use Java implementation.
+  // For level-3 routines, we use the native BLAS.
+  // if native BLAS is not properly configured in system, will automatically fallback to f2jBLAS
+  private[ml] def nativeBLAS: NetlibBLAS = {
+    if (_nativeBLAS == null) {
+      _nativeBLAS = NativeBLAS
+    }
+    _nativeBLAS
+  }
+
+  // For level-1 functions scal(double, SparseVector) and dspmv, use f2jBLAS for better performance.
 
 Review comment:
   I think you can remove this comment

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines
URL: https://github.com/apache/spark/pull/27546#issuecomment-599620372
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] srowen commented on issue #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines

Posted by GitBox <gi...@apache.org>.
srowen commented on issue #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines
URL: https://github.com/apache/spark/pull/27546#issuecomment-589079380
 
 
   In your k-means test, are the executors fairly fully utilized? like with 8 cores, do they have 8 tasks? that's the situation in which MKL_NUM_THREADS might make a difference, as trying to use 8*8=64 threads might hurt.
   
   I can imagine just changing the logic in level 1 methods to use native if the array is, say, >= 256 elements. How about that? this should be a win.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] yma11 commented on issue #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines

Posted by GitBox <gi...@apache.org>.
yma11 commented on issue #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines
URL: https://github.com/apache/spark/pull/27546#issuecomment-599383599
 
 
   I have tested 128, 256 and 512 cases in Level1-routines-perf - part-4.xlsx.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines
URL: https://github.com/apache/spark/pull/27546#issuecomment-600839723
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120005/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] yma11 commented on issue #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines

Posted by GitBox <gi...@apache.org>.
yma11 commented on issue #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines
URL: https://github.com/apache/spark/pull/27546#issuecomment-595595033
 
 
   > I'm okay with a hardcoded constant. We are start with a safer one, say 512, if there are concerns around environment differences.
   
   I believe 256 is a very safe threshold since based on our experiments on this value, nativeBLAS has quite obvious advantage compared with f2jBLAS(at least 1.5X in the methods), without detailed tuning on nativeBLAS. Make it configurable in mllib will provide direct and flxible help for workloads like Kmeans. With fixed threshold is also okay.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] yma11 commented on a change in pull request #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines

Posted by GitBox <gi...@apache.org>.
yma11 commented on a change in pull request #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines
URL: https://github.com/apache/spark/pull/27546#discussion_r386032174
 
 

 ##########
 File path: mllib-local/src/main/scala/org/apache/spark/ml/linalg/BLAS.scala
 ##########
 @@ -218,22 +234,14 @@ private[spark] object BLAS extends Serializable {
   def scal(a: Double, x: Vector): Unit = {
     x match {
       case sx: SparseVector =>
-        f2jBLAS.dscal(sx.values.length, a, sx.values, 1)
+        getBLAS(sx.size).dscal(sx.values.length, a, sx.values, 1)
 
 Review comment:
   Resolved in commit https://github.com/apache/spark/pull/27546/commits/118a0d8364641197ccf6e141ae5942309cd27909.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] mengxr commented on issue #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines

Posted by GitBox <gi...@apache.org>.
mengxr commented on issue #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines
URL: https://github.com/apache/spark/pull/27546#issuecomment-599368497
 
 
   LGTM. Could you add some tests to cover both <256 and >=256 cases?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines
URL: https://github.com/apache/spark/pull/27546#issuecomment-600839710
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines
URL: https://github.com/apache/spark/pull/27546#issuecomment-600802141
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] srowen commented on a change in pull request #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines

Posted by GitBox <gi...@apache.org>.
srowen commented on a change in pull request #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines
URL: https://github.com/apache/spark/pull/27546#discussion_r384513855
 
 

 ##########
 File path: mllib-local/src/main/scala/org/apache/spark/ml/linalg/BLAS.scala
 ##########
 @@ -27,15 +27,35 @@ private[spark] object BLAS extends Serializable {
 
   @transient private var _f2jBLAS: NetlibBLAS = _
   @transient private var _nativeBLAS: NetlibBLAS = _
+  @transient private val vectorSizeThreshold: Int = 256
 
-  // For level-1 routines, we use Java implementation.
+  // For level-3 routines, we use the native BLAS.
+  // if native BLAS is not properly configured in system, will automatically fallback to f2jBLAS
+  private[ml] def nativeBLAS: NetlibBLAS = {
+    if (_nativeBLAS == null) {
+      _nativeBLAS = NativeBLAS
+    }
+    _nativeBLAS
+  }
+
+  // For level-1 functions scal(double, SparseVector) and dspmv, use f2jBLAS for better performance.
   private[ml] def f2jBLAS: NetlibBLAS = {
     if (_f2jBLAS == null) {
       _f2jBLAS = new F2jBLAS
     }
     _f2jBLAS
   }
 
+  // For level-1 functions axpy(), dot() and scal(double, DenseVector), use nativeBLAS when
+  // vector size >= 256, otherwise use f2jBLAS instead
+  private[ml] def getBLAS(vectorSize: Int): NetlibBLAS = {
+    if (vectorSize < vectorSizeThreshold) {
+      return f2jBLAS
 
 Review comment:
   You can omit "return"

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on issue #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on issue #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines
URL: https://github.com/apache/spark/pull/27546#issuecomment-599581331
 
 
   **[Test build #119873 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/119873/testReport)** for PR 27546 at commit [`abddf4c`](https://github.com/apache/spark/commit/abddf4c45e17f108ebf8f3c89300cfc444433b61).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] yma11 edited a comment on issue #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines

Posted by GitBox <gi...@apache.org>.
yma11 edited a comment on issue #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines
URL: https://github.com/apache/spark/pull/27546#issuecomment-588608946
 
 
   Hi @srowen and @mengxr, 
   I tested vector size 256  and nativeBLAS shows obvious perf gain compared with f2jBLAS in methods axpy(~1.7X), dot(~2.8X) and scal(double, dense)(>1.5X).  For MKL, I can confirm it uses AVX in the methods from output but for OpenBLAS, seems it haven't used AVX in level-1 routines as based on the info from [https://github.com/xianyi/OpenBLAS/blob/develop/README.md](url).
   As to the MKL_NUM_THREADS and OPENBLAS_NUM_THREADS, limiting the threads to 1 doesn't always mean the best end-2-end performance. We used to test Kmeans using HiBench in a 1+4 cluster with MKL configured. Set the threads number to 1 or use default setting have no obvious performance change. By the way, using default MKL threads setting, MKL will bring 1.09X perf gain than java implementation in end-2-end.
   I also have updated this PR to revert back to use java implementation for scal(sparse) and dspmv(). Please take a further review.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] srowen commented on a change in pull request #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines

Posted by GitBox <gi...@apache.org>.
srowen commented on a change in pull request #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines
URL: https://github.com/apache/spark/pull/27546#discussion_r386032301
 
 

 ##########
 File path: core/src/main/scala/org/apache/spark/internal/config/package.scala
 ##########
 @@ -1541,4 +1541,10 @@ package object config {
     .bytesConf(ByteUnit.BYTE)
     .createOptional
 
+  private[spark] val NATIVE_L1_THRESHOLD =
+    ConfigBuilder("spark.ml.blas.nativeL1Threshold")
+      .doc("vector size for logic change between f2jBLAS and natveBLAS in " +
 
 Review comment:
   "Minimum vector size to use native BLAS libraries (if available) for Level 1 routines rather than Java implementation"
   I'd also actually remove the "ml" from the property name as it isn't in an ML package, and isn't specific to ML. However we can document it in the context of ML as that is its only real use right now.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines
URL: https://github.com/apache/spark/pull/27546#issuecomment-599620377
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/119873/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] yma11 commented on a change in pull request #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines

Posted by GitBox <gi...@apache.org>.
yma11 commented on a change in pull request #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines
URL: https://github.com/apache/spark/pull/27546#discussion_r387454997
 
 

 ##########
 File path: mllib-local/pom.xml
 ##########
 @@ -61,13 +61,17 @@
       This spark-tags test-dep is needed even though it isn't used in this module, otherwise testing-cmds that exclude
       them will yield errors.
     -->
+    <dependency>
 
 Review comment:
   revert configurable related commits and now use 256 as the nativeL1Threshold.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] srowen commented on issue #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines

Posted by GitBox <gi...@apache.org>.
srowen commented on issue #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines
URL: https://github.com/apache/spark/pull/27546#issuecomment-601761740
 
 
   Merged to master/3.0

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] srowen commented on a change in pull request #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines

Posted by GitBox <gi...@apache.org>.
srowen commented on a change in pull request #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines
URL: https://github.com/apache/spark/pull/27546#discussion_r384512680
 
 

 ##########
 File path: mllib-local/src/main/scala/org/apache/spark/ml/linalg/BLAS.scala
 ##########
 @@ -27,15 +27,35 @@ private[spark] object BLAS extends Serializable {
 
   @transient private var _f2jBLAS: NetlibBLAS = _
   @transient private var _nativeBLAS: NetlibBLAS = _
+  @transient private val vectorSizeThreshold: Int = 256
 
 Review comment:
   This does not need to be transient. It's an int.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] yma11 edited a comment on issue #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines

Posted by GitBox <gi...@apache.org>.
yma11 edited a comment on issue #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines
URL: https://github.com/apache/spark/pull/27546#issuecomment-594335144
 
 
   mllib package is in maintenance mode that's why change applied in ml.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines
URL: https://github.com/apache/spark/pull/27546#issuecomment-599620377
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/119873/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] yma11 commented on a change in pull request #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines

Posted by GitBox <gi...@apache.org>.
yma11 commented on a change in pull request #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines
URL: https://github.com/apache/spark/pull/27546#discussion_r386336068
 
 

 ##########
 File path: mllib-local/pom.xml
 ##########
 @@ -61,13 +61,17 @@
       This spark-tags test-dep is needed even though it isn't used in this module, otherwise testing-cmds that exclude
       them will yield errors.
     -->
+    <dependency>
 
 Review comment:
   Without dependency of core, ConfigBuilder/ConfigEntry won't be accessible. So I'd propose to add an environment variable in spark-env.sh. see new commit for details.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] yma11 commented on issue #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines

Posted by GitBox <gi...@apache.org>.
yma11 commented on issue #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines
URL: https://github.com/apache/spark/pull/27546#issuecomment-595026452
 
 
   > Yes that's useful.
   
   Update PR with new commit for mllib BLAS. Please check. 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] mengxr commented on issue #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines

Posted by GitBox <gi...@apache.org>.
mengxr commented on issue #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines
URL: https://github.com/apache/spark/pull/27546#issuecomment-600185011
 
 
   I'm fine with merging into 3.0 too.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] yma11 commented on issue #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines

Posted by GitBox <gi...@apache.org>.
yma11 commented on issue #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines
URL: https://github.com/apache/spark/pull/27546#issuecomment-586835613
 
 
   Hi @srowen and @mengxr ,
   With MKL_NUM_THREADS/OPENBLAS_NUM_THREADS set 1, I have tested 10-element, 100-element, 1000-element vectors.  Within 10-element vector, nativeBLAS has big gap compared with java implementation, which is quite easy to understand. But when elements number increases from 10 to 100, the performance gap drops dramatically and when the vectors have 1000 elements, nativeBLAS has quite big advantage against f2jBLAS in some dense vector operations. Please see details in the attached excel.
   [Level1-routines-perf - part-2.xlsx](https://github.com/apache/spark/files/4212090/Level1-routines-perf.-.part-2.xlsx)
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] srowen commented on issue #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines

Posted by GitBox <gi...@apache.org>.
srowen commented on issue #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines
URL: https://github.com/apache/spark/pull/27546#issuecomment-586378579
 
 
   These are microbenchmarks only, and for large vector sizes. The problem is that I don't think there's much if any win at smaller sizes, and that's often what the usage in Spark is like. Like, what's the difference for a 10-element vector? 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] srowen commented on issue #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines

Posted by GitBox <gi...@apache.org>.
srowen commented on issue #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines
URL: https://github.com/apache/spark/pull/27546#issuecomment-587035245
 
 
   Yeah, the problem is that it could be a win in some cases, but harmful in others. It depends on what real-world Spark usage is like, and many of these ops are used for small arrays in general.
   
   It would be more persuasive to find what actual Spark usages benefit, and when, and by how much, from this change. It's possible some more targeted replacements seem to help much more than they may hurt.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] yma11 commented on issue #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines

Posted by GitBox <gi...@apache.org>.
yma11 commented on issue #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines
URL: https://github.com/apache/spark/pull/27546#issuecomment-588608946
 
 
   Hi @srowen and @mengxr, 
   I tested vector size 256  and nativeBLAS shows obvious perf gain compared with f2jBLAS in methods axpy(~1.7X), dot(~2.8X) and scal(double, dense)(>1.5X).  For MKL, I can confirm it uses AVX in the methods from output but for OpenBLAS, seems it haven't used AVX in level-1 routines as based on the info from [https://github.com/xianyi/OpenBLAS/blob/develop/README.md](url).
   As to the MKL_NUM_THREADS and OPENBLAS_NUM_THREADS, limiting the threads to 1 doesn't always mean the best end-to-end performance. We used to test Kmeans using HiBench in a 1+4 cluster with MKL configured. Set the threads number to 1 or use default setting have no obvious performance change. By the way, using default MKL threads setting, MKL will bring 1.09X perf gain than java implementation in end-to-end.
   I also have updated this PR to revert back to use java implementation for scal(sparse) and dspmv(). Please take a further review.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] kiszk commented on a change in pull request #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines

Posted by GitBox <gi...@apache.org>.
kiszk commented on a change in pull request #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines
URL: https://github.com/apache/spark/pull/27546#discussion_r384714926
 
 

 ##########
 File path: mllib-local/src/main/scala/org/apache/spark/ml/linalg/BLAS.scala
 ##########
 @@ -27,15 +27,35 @@ private[spark] object BLAS extends Serializable {
 
   @transient private var _f2jBLAS: NetlibBLAS = _
   @transient private var _nativeBLAS: NetlibBLAS = _
+  @transient private val vectorSizeThreshold: Int = 256
 
 Review comment:
   `256` depends on runtime environment such as CPU spec, JVM version, and so on.   
   It should be taken from the property.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] srowen commented on issue #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines

Posted by GitBox <gi...@apache.org>.
srowen commented on issue #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines
URL: https://github.com/apache/spark/pull/27546#issuecomment-585828921
 
 
   I think the general problem is that Level 1 routines aren't faster. Are you sure it actually helps Spark? where and by how much? CC @mengxr 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] srowen commented on a change in pull request #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines

Posted by GitBox <gi...@apache.org>.
srowen commented on a change in pull request #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines
URL: https://github.com/apache/spark/pull/27546#discussion_r384531297
 
 

 ##########
 File path: mllib-local/src/main/scala/org/apache/spark/ml/linalg/BLAS.scala
 ##########
 @@ -27,15 +27,35 @@ private[spark] object BLAS extends Serializable {
 
   @transient private var _f2jBLAS: NetlibBLAS = _
   @transient private var _nativeBLAS: NetlibBLAS = _
+  @transient private val vectorSizeThreshold: Int = 256
 
-  // For level-1 routines, we use Java implementation.
+  // For level-3 routines, we use the native BLAS.
+  // if native BLAS is not properly configured in system, will automatically fallback to f2jBLAS
+  private[ml] def nativeBLAS: NetlibBLAS = {
 
 Review comment:
   I don't think you need to move this or expand its visibility?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] yma11 commented on issue #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines

Posted by GitBox <gi...@apache.org>.
yma11 commented on issue #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines
URL: https://github.com/apache/spark/pull/27546#issuecomment-600924559
 
 
   > For the affected BLAS method, could you also add tests for size >= threshold?
   
   I added tests for size 128, 256, 512, should cover what you want me to do, right?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] srowen commented on issue #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines

Posted by GitBox <gi...@apache.org>.
srowen commented on issue #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines
URL: https://github.com/apache/spark/pull/27546#issuecomment-590885475
 
 
   I think we could be conservative and let it switch over at operation on >= 256 elements? that seems like a win. It sounds like even when the cores are over-subscribed it could still be fast-er enough to justify than f2jblas

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] srowen commented on a change in pull request #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines

Posted by GitBox <gi...@apache.org>.
srowen commented on a change in pull request #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines
URL: https://github.com/apache/spark/pull/27546#discussion_r384532798
 
 

 ##########
 File path: mllib-local/src/main/scala/org/apache/spark/ml/linalg/BLAS.scala
 ##########
 @@ -123,7 +143,7 @@ private[spark] object BLAS extends Serializable {
    */
   private def dot(x: DenseVector, y: DenseVector): Double = {
     val n = x.size
-    f2jBLAS.ddot(n, x.values, 1, y.values, 1)
+    getBLAS(x.size).ddot(n, x.values, 1, y.values, 1)
 
 Review comment:
   just use n

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] kiszk commented on issue #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines

Posted by GitBox <gi...@apache.org>.
kiszk commented on issue #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines
URL: https://github.com/apache/spark/pull/27546#issuecomment-596350412
 
 
   Regarding configuration, since `mllib` has [a dependency on `core`](https://github.com/apache/spark/blob/master/mllib/pom.xml#L43), it looks ok that `mllib-local` also has a dependency on `core` newly.   
   Is it not fine?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] kiszk commented on issue #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines

Posted by GitBox <gi...@apache.org>.
kiszk commented on issue #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines
URL: https://github.com/apache/spark/pull/27546#issuecomment-595568767
 
 
   Umm, I believe that this value depends on the environment (CPU, lib. version, JIT compiler version, ...). According to the past experiments, in general, the constants will be replaced with the property later.   
   
   How about following [this approach](https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala#L696
   ) in mllib?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] srowen commented on a change in pull request #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines

Posted by GitBox <gi...@apache.org>.
srowen commented on a change in pull request #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines
URL: https://github.com/apache/spark/pull/27546#discussion_r385115857
 
 

 ##########
 File path: mllib-local/src/main/scala/org/apache/spark/ml/linalg/BLAS.scala
 ##########
 @@ -218,22 +234,14 @@ private[spark] object BLAS extends Serializable {
   def scal(a: Double, x: Vector): Unit = {
     x match {
       case sx: SparseVector =>
-        f2jBLAS.dscal(sx.values.length, a, sx.values, 1)
+        getBLAS(sx.size).dscal(sx.values.length, a, sx.values, 1)
 
 Review comment:
   Not .size, but .values.length here. The actual array is smaller than the size, usually.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] yma11 commented on a change in pull request #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines

Posted by GitBox <gi...@apache.org>.
yma11 commented on a change in pull request #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines
URL: https://github.com/apache/spark/pull/27546#discussion_r394061413
 
 

 ##########
 File path: mllib-local/src/test/scala/org/apache/spark/ml/linalg/BLASSuite.scala
 ##########
 @@ -23,6 +23,15 @@ import org.apache.spark.ml.util.TestingUtils._
 
 class BLASSuite extends SparkMLFunSuite {
 
+  test("nativeL1Threshold") {
+    var vectorSize = 128
+    assert(getBLAS(vectorSize) == BLAS.f2jBLAS)
 
 Review comment:
   UT updated to use constant directly.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] srowen commented on a change in pull request #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines

Posted by GitBox <gi...@apache.org>.
srowen commented on a change in pull request #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines
URL: https://github.com/apache/spark/pull/27546#discussion_r386032364
 
 

 ##########
 File path: mllib-local/src/main/scala/org/apache/spark/ml/linalg/BLAS.scala
 ##########
 @@ -27,15 +27,35 @@ private[spark] object BLAS extends Serializable {
 
   @transient private var _f2jBLAS: NetlibBLAS = _
   @transient private var _nativeBLAS: NetlibBLAS = _
+  @transient private val vectorSizeThreshold: Int = 256
 
-  // For level-1 routines, we use Java implementation.
+  // For level-3 routines, we use the native BLAS.
+  // if native BLAS is not properly configured in system, will automatically fallback to f2jBLAS
+  private[ml] def nativeBLAS: NetlibBLAS = {
 
 Review comment:
   This still can move back to where it was

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines
URL: https://github.com/apache/spark/pull/27546#issuecomment-600802152
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/24724/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] winningsix commented on a change in pull request #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines

Posted by GitBox <gi...@apache.org>.
winningsix commented on a change in pull request #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines
URL: https://github.com/apache/spark/pull/27546#discussion_r384282022
 
 

 ##########
 File path: mllib-local/src/main/scala/org/apache/spark/ml/linalg/BLAS.scala
 ##########
 @@ -27,15 +27,35 @@ private[spark] object BLAS extends Serializable {
 
   @transient private var _f2jBLAS: NetlibBLAS = _
   @transient private var _nativeBLAS: NetlibBLAS = _
+  @transient private val vectorSizeThreshold: Int = 256
 
 Review comment:
   Hint: Make this final and static

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] kiszk commented on issue #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines

Posted by GitBox <gi...@apache.org>.
kiszk commented on issue #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines
URL: https://github.com/apache/spark/pull/27546#issuecomment-599396322
 
 
   Sorry for being late. LGTM.
   I guess that @mengxr asked to add test cases into the Spark test cases or benchmark.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] yma11 commented on a change in pull request #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines

Posted by GitBox <gi...@apache.org>.
yma11 commented on a change in pull request #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines
URL: https://github.com/apache/spark/pull/27546#discussion_r386764088
 
 

 ##########
 File path: mllib-local/pom.xml
 ##########
 @@ -61,13 +61,17 @@
       This spark-tags test-dep is needed even though it isn't used in this module, otherwise testing-cmds that exclude
       them will yield errors.
     -->
+    <dependency>
 
 Review comment:
   If we leave the property in core, we need to import org.apache.spark.internal.config to reference it in mllib-local thus still can't avoid dependency of core module.  Another option is that we implement a logic to directly read this value from spark-defaults.conf through method like System.getProperties, which will need no modification and dependency of core module. 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] srowen commented on a change in pull request #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines

Posted by GitBox <gi...@apache.org>.
srowen commented on a change in pull request #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines
URL: https://github.com/apache/spark/pull/27546#discussion_r384534200
 
 

 ##########
 File path: mllib-local/src/main/scala/org/apache/spark/ml/linalg/BLAS.scala
 ##########
 @@ -220,20 +240,12 @@ private[spark] object BLAS extends Serializable {
       case sx: SparseVector =>
         f2jBLAS.dscal(sx.values.length, a, sx.values, 1)
 
 Review comment:
   I think you can apply the optimization here too; just depends on the length of sx.values

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] srowen commented on issue #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines

Posted by GitBox <gi...@apache.org>.
srowen commented on issue #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines
URL: https://github.com/apache/spark/pull/27546#issuecomment-595011039
 
 
   Yes that's useful.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines
URL: https://github.com/apache/spark/pull/27546#issuecomment-599581331
 
 
   **[Test build #119873 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/119873/testReport)** for PR 27546 at commit [`abddf4c`](https://github.com/apache/spark/commit/abddf4c45e17f108ebf8f3c89300cfc444433b61).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] srowen commented on issue #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines

Posted by GitBox <gi...@apache.org>.
srowen commented on issue #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines
URL: https://github.com/apache/spark/pull/27546#issuecomment-595591399
 
 
   Yeah, the point of the above discussion is unfortunately there is no SparkConf in mllib-local, so that's not possible. It is possible in mllib. We could make it configurable just for mllib?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] kiszk commented on issue #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines

Posted by GitBox <gi...@apache.org>.
kiszk commented on issue #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines
URL: https://github.com/apache/spark/pull/27546#issuecomment-592309688
 
 
   This performance experiment is done only one environment (one CPU, one JVM, and others). Then, we say `256` looks good threshold.
   
   I am afraid that other environments have other thresholds (e.g. 128, 1024, or others). This is why the threshold should be configurable.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] srowen commented on issue #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines

Posted by GitBox <gi...@apache.org>.
srowen commented on issue #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines
URL: https://github.com/apache/spark/pull/27546#issuecomment-596520000
 
 
   I think the point of mllib-local was to contain code that can be used outside of a Spark app, so it can't have that dependency.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines
URL: https://github.com/apache/spark/pull/27546#issuecomment-599620372
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines
URL: https://github.com/apache/spark/pull/27546#issuecomment-599581959
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] yma11 commented on a change in pull request #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines

Posted by GitBox <gi...@apache.org>.
yma11 commented on a change in pull request #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines
URL: https://github.com/apache/spark/pull/27546#discussion_r386777198
 
 

 ##########
 File path: mllib-local/pom.xml
 ##########
 @@ -61,13 +61,17 @@
       This spark-tags test-dep is needed even though it isn't used in this module, otherwise testing-cmds that exclude
       them will yield errors.
     -->
+    <dependency>
 
 Review comment:
   Do you mean get the value using SparkConf by key? But SparkConf also has dependency on core module.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] kiszk commented on issue #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines

Posted by GitBox <gi...@apache.org>.
kiszk commented on issue #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines
URL: https://github.com/apache/spark/pull/27546#issuecomment-600223488
 
 
   While it is fine to merge it master now, I am not positive to merge it into 3.0. This is because it looks nice to have. 
   
   But, I do not disagree with merging to 3.0 if others agree with the merge.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] srowen commented on issue #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines

Posted by GitBox <gi...@apache.org>.
srowen commented on issue #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines
URL: https://github.com/apache/spark/pull/27546#issuecomment-600118297
 
 
   @mengxr @kiszk do you have an opinion on merging this to 3.0? I don't feel strongly. Might be nice to get it in as a potential perf improvement.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] yma11 commented on issue #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines

Posted by GitBox <gi...@apache.org>.
yma11 commented on issue #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines
URL: https://github.com/apache/spark/pull/27546#issuecomment-595010426
 
 
   > Ah yes we do want to apply this across the board to all BLAS support.
   
   So let's apply same change in mllib/BLAS?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] mengxr commented on a change in pull request #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines

Posted by GitBox <gi...@apache.org>.
mengxr commented on a change in pull request #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines
URL: https://github.com/apache/spark/pull/27546#discussion_r393828874
 
 

 ##########
 File path: mllib-local/src/test/scala/org/apache/spark/ml/linalg/BLASSuite.scala
 ##########
 @@ -23,6 +23,15 @@ import org.apache.spark.ml.util.TestingUtils._
 
 class BLASSuite extends SparkMLFunSuite {
 
+  test("nativeL1Threshold") {
+    var vectorSize = 128
+    assert(getBLAS(vectorSize) == BLAS.f2jBLAS)
 
 Review comment:
   minor: you don't need `var`. just `assert(getBLAS(128) == BLAS.f2jBLAS)` is sufficient and cleaner.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] yma11 commented on issue #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines

Posted by GitBox <gi...@apache.org>.
yma11 commented on issue #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines
URL: https://github.com/apache/spark/pull/27546#issuecomment-586294938
 
 
   Hi @srowen and @mengxr ,
   Thanks for your comments on this PR. Please refer the attached performance result. It includes the performance data using different BLAS implementation (f2jBLAS, Intel MKL and OpenBLAS). You can see nativeBLAS has quite big advantage in axpy(), dot() and scal(doule, dense) for both small and big vectors. As I tested, this advantage may can be improved more if do further tuning against the nativeBLAS variable such as MKL_NUM_THREADS/OPENBLAS_NUM_THREADS based on the data size, cores used, etc.
   I also attached the microbenchmark lib for you to reproduce the result. Here is my test details for your information:
   
   **Spark version:** 
   master branch with head commit “e2d984aa1c79eb389cc8d333f656196b17af1c32: [SPARK-30733][R][HOTFIX] Fix SparkR tests per testthat and R version upgrade, and disable CRAN”
   **Test environment:**
   CPU: Skylake 6252(Intel(R) Xeon(R) Gold 6252 CPU @ 2.10GHz)
   OS: CentOS 7
   Kernel: 3.10.0-862.el7.x86_64
   Java version: 1.8.0_112
   MKL version: 2019
   OpenBLAS version: libopenblas_haswellp-r0.3.8.dev
   **Test command:**
   /opt/spark/bin/spark-submit --name test  --master local --num-executors 1 --executor-cores 4 --executor-memory 10g --class org.intel.spark.TestMlAxpy(TestMlDot/TestMlScal) microbenchmark.jar
   
   [Level1-routines-perf.xlsx](https://github.com/apache/spark/files/4204686/Level1-routines-perf.xlsx)
   [microbenchmark.zip](https://github.com/apache/spark/files/4204689/microbenchmark.zip)
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] srowen commented on issue #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines

Posted by GitBox <gi...@apache.org>.
srowen commented on issue #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines
URL: https://github.com/apache/spark/pull/27546#issuecomment-595009361
 
 
   Ah yes we do want to apply this across the board to all BLAS support.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] yma11 commented on issue #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines

Posted by GitBox <gi...@apache.org>.
yma11 commented on issue #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines
URL: https://github.com/apache/spark/pull/27546#issuecomment-599457119
 
 
   @mengxr simple ut added for nativeL1Threshold.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines
URL: https://github.com/apache/spark/pull/27546#issuecomment-585096909
 
 
   Can one of the admins verify this patch?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] srowen commented on issue #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines

Posted by GitBox <gi...@apache.org>.
srowen commented on issue #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines
URL: https://github.com/apache/spark/pull/27546#issuecomment-600798512
 
 
   Jenkins, test this please

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines
URL: https://github.com/apache/spark/pull/27546#issuecomment-599581959
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines
URL: https://github.com/apache/spark/pull/27546#issuecomment-600838883
 
 
   **[Test build #120005 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120005/testReport)** for PR 27546 at commit [`04559df`](https://github.com/apache/spark/commit/04559df1615b9d6a97cd7fed731ca6cccf55bb6a).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines
URL: https://github.com/apache/spark/pull/27546#issuecomment-585097459
 
 
   Can one of the admins verify this patch?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] yma11 commented on a change in pull request #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines

Posted by GitBox <gi...@apache.org>.
yma11 commented on a change in pull request #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines
URL: https://github.com/apache/spark/pull/27546#discussion_r384968081
 
 

 ##########
 File path: mllib-local/src/main/scala/org/apache/spark/ml/linalg/BLAS.scala
 ##########
 @@ -27,15 +27,35 @@ private[spark] object BLAS extends Serializable {
 
   @transient private var _f2jBLAS: NetlibBLAS = _
   @transient private var _nativeBLAS: NetlibBLAS = _
+  @transient private val vectorSizeThreshold: Int = 256
 
 Review comment:
   @srowen I think @kiszk make a good point. With properly tuned based on specific workload and runtime environment, nativeBLAS may can bring much benefit than f2jBLAS even vector size < 256.  Making it configurable can provide user a simpler way to use nativeBLAS in Spark ML when vector size is not so big. Do you think so? if so, we need to add spark-core dependency in spark-mllib-local package.
   Thanks @kiszk for your comments.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] kiszk commented on issue #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines

Posted by GitBox <gi...@apache.org>.
kiszk commented on issue #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines
URL: https://github.com/apache/spark/pull/27546#issuecomment-597205916
 
 
   I see about configurable threshold. For now, it is fine with constant.  
   It is great to measure thresholds on multiple environments.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] yma11 commented on issue #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines

Posted by GitBox <gi...@apache.org>.
yma11 commented on issue #27546: [SPARK-30773][ML]Support NativeBlas for level-1 routines
URL: https://github.com/apache/spark/pull/27546#issuecomment-594335144
 
 
   > n the `.mllib` side.
   mllib package is in maintenance mode that why change applied in ml.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org