You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/05/06 12:52:09 UTC

[GitHub] [spark] zhengruifeng opened a new pull request, #36469: [SPARK-39114][ML] ml.optim.aggregator avoid re-allocating buffers

zhengruifeng opened a new pull request, #36469:
URL: https://github.com/apache/spark/pull/36469

   ### What changes were proposed in this pull request?
   
   ml.optim.aggregator avoid re-allocating buffers
   
   
   ### Why are the changes needed?
   
   in SPARK-30661 (KMeans blockify input vectors), I found that it is sometimes performance-crucial to avoid re-allocating buffers for each blocks.
   
   It should be nice to also apply this optimization to other algorithms.
   
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   ### How was this patch tested?
   existing testsuites


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhengruifeng commented on pull request #36469: [SPARK-39114][ML] ml.optim.aggregator avoid re-allocating buffers

Posted by GitBox <gi...@apache.org>.
zhengruifeng commented on PR #36469:
URL: https://github.com/apache/spark/pull/36469#issuecomment-1120174933

   Merged to master


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhengruifeng commented on a diff in pull request #36469: [SPARK-39114][ML] ml.optim.aggregator avoid re-allocating buffers

Posted by GitBox <gi...@apache.org>.
zhengruifeng commented on code in PR #36469:
URL: https://github.com/apache/spark/pull/36469#discussion_r867285049


##########
mllib/src/main/scala/org/apache/spark/ml/optim/aggregator/MultinomialLogisticBlockAggregator.scala:
##########
@@ -151,15 +158,39 @@ private[ml] class MultinomialLogisticBlockAggregator(
       case dm: DenseMatrix =>
         // gradientSumArray[0 : F X C] += mat.T X dm
         BLAS.nativeBLAS.dgemm("T", "T", numClasses, numFeatures, size, 1.0,
-          mat.values, size, dm.values, numFeatures, 1.0, gradientSumArray, numClasses)
+          arr, size, dm.values, numFeatures, 1.0, gradientSumArray, numClasses)
 
       case sm: SparseMatrix =>
-        // TODO: convert Coefficients to row major order to simplify BLAS operations?
-        // linearGradSumMat = sm.T X mat
-        // existing BLAS.gemm requires linearGradSumMat is NOT Transposed.
-        val linearGradSumMat = DenseMatrix.zeros(numFeatures, numClasses)
-        BLAS.gemm(1.0, sm.transpose, mat, 0.0, linearGradSumMat)
-        linearGradSumMat.foreachActive { (i, j, v) => gradientSumArray(i * numClasses + j) += v }
+        // dedicated sparse GEMM implementation for transposed C: C += A * B, where:

Review Comment:
   current BLAS.gemm `C := alpha * A * B + beta * C` has two constraint:
   1, matrix B must be dense;
   2, matrix C must not be transposed;
   
   for the dense block above, direct use `BLAS.nativeBLAS.dgemm` to work around constraint 2;
   
   for the sparse block here, to minimize the changes, add a sparse GEMM for transposed C. Then we do not need to create `val linearGradSumMat = DenseMatrix.zeros(numFeatures, numClasses)` for each block



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhengruifeng commented on pull request #36469: [SPARK-39114][ML] ml.optim.aggregator avoid re-allocating buffers

Posted by GitBox <gi...@apache.org>.
zhengruifeng commented on PR #36469:
URL: https://github.com/apache/spark/pull/36469#issuecomment-1120106344

   cc @srowen 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhengruifeng commented on a diff in pull request #36469: [SPARK-39114][ML][WIP] ml.optim.aggregator avoid re-allocating buffers

Posted by GitBox <gi...@apache.org>.
zhengruifeng commented on code in PR #36469:
URL: https://github.com/apache/spark/pull/36469#discussion_r867285403


##########
mllib/src/main/scala/org/apache/spark/ml/optim/aggregator/MultinomialLogisticBlockAggregator.scala:
##########
@@ -151,15 +158,39 @@ private[ml] class MultinomialLogisticBlockAggregator(
       case dm: DenseMatrix =>
         // gradientSumArray[0 : F X C] += mat.T X dm
         BLAS.nativeBLAS.dgemm("T", "T", numClasses, numFeatures, size, 1.0,
-          mat.values, size, dm.values, numFeatures, 1.0, gradientSumArray, numClasses)
+          arr, size, dm.values, numFeatures, 1.0, gradientSumArray, numClasses)
 
       case sm: SparseMatrix =>
-        // TODO: convert Coefficients to row major order to simplify BLAS operations?
-        // linearGradSumMat = sm.T X mat
-        // existing BLAS.gemm requires linearGradSumMat is NOT Transposed.
-        val linearGradSumMat = DenseMatrix.zeros(numFeatures, numClasses)
-        BLAS.gemm(1.0, sm.transpose, mat, 0.0, linearGradSumMat)
-        linearGradSumMat.foreachActive { (i, j, v) => gradientSumArray(i * numClasses + j) += v }
+        // dedicated sparse GEMM implementation for transposed C: C += A * B, where:

Review Comment:
   this impl refers to https://github.com/apache/spark/blob/master/mllib-local/src/main/scala/org/apache/spark/ml/linalg/BLAS.scala#L569-L586, execpt the indexing of C  https://github.com/apache/spark/blob/master/mllib-local/src/main/scala/org/apache/spark/ml/linalg/BLAS.scala#L580 .



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhengruifeng closed pull request #36469: [SPARK-39114][ML] ml.optim.aggregator avoid re-allocating buffers

Posted by GitBox <gi...@apache.org>.
zhengruifeng closed pull request #36469: [SPARK-39114][ML] ml.optim.aggregator avoid re-allocating buffers
URL: https://github.com/apache/spark/pull/36469


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhengruifeng commented on a diff in pull request #36469: [SPARK-39114][ML][WIP] ml.optim.aggregator avoid re-allocating buffers

Posted by GitBox <gi...@apache.org>.
zhengruifeng commented on code in PR #36469:
URL: https://github.com/apache/spark/pull/36469#discussion_r867285049


##########
mllib/src/main/scala/org/apache/spark/ml/optim/aggregator/MultinomialLogisticBlockAggregator.scala:
##########
@@ -151,15 +158,39 @@ private[ml] class MultinomialLogisticBlockAggregator(
       case dm: DenseMatrix =>
         // gradientSumArray[0 : F X C] += mat.T X dm
         BLAS.nativeBLAS.dgemm("T", "T", numClasses, numFeatures, size, 1.0,
-          mat.values, size, dm.values, numFeatures, 1.0, gradientSumArray, numClasses)
+          arr, size, dm.values, numFeatures, 1.0, gradientSumArray, numClasses)
 
       case sm: SparseMatrix =>
-        // TODO: convert Coefficients to row major order to simplify BLAS operations?
-        // linearGradSumMat = sm.T X mat
-        // existing BLAS.gemm requires linearGradSumMat is NOT Transposed.
-        val linearGradSumMat = DenseMatrix.zeros(numFeatures, numClasses)
-        BLAS.gemm(1.0, sm.transpose, mat, 0.0, linearGradSumMat)
-        linearGradSumMat.foreachActive { (i, j, v) => gradientSumArray(i * numClasses + j) += v }
+        // dedicated sparse GEMM implementation for transposed C: C += A * B, where:

Review Comment:
   current BLAS.gemm `C := alpha * A * B + beta * C` has two constraint:
   1, matrix B must be dense;
   2, matrix C must not be transposed;
   
   for the dense block above, direct use `BLAS.nativeBLAS.dgemm` to work around constraint 2;
   
   for the sparse block here, to minimize the changes, add a sparse GEMM for transposed C.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org