You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2020/02/10 09:08:12 UTC

[GitHub] [spark] zhengruifeng opened a new pull request #27519: [SPARK-30770][ML] avoid vector conversion in GMM.transform

zhengruifeng opened a new pull request #27519: [SPARK-30770][ML] avoid vector conversion in GMM.transform
URL: https://github.com/apache/spark/pull/27519
 
 
   ### What changes were proposed in this pull request?
   Current impl needs to convert ml.Vector to breeze.Vector, which can be skipped.
   
   
   
   ### Why are the changes needed?
   avoid unnecessary vector conversions
   
   
   ### Does this PR introduce any user-facing change?
   No
   
   
   ### How was this patch tested?
   existing testsuites

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhengruifeng commented on a change in pull request #27519: [SPARK-30770][ML] avoid vector conversion in GMM.transform

Posted by GitBox <gi...@apache.org>.
zhengruifeng commented on a change in pull request #27519: [SPARK-30770][ML] avoid vector conversion in GMM.transform
URL: https://github.com/apache/spark/pull/27519#discussion_r386239057
 
 

 ##########
 File path: python/pyspark/ml/clustering.py
 ##########
 @@ -252,7 +252,7 @@ class GaussianMixture(JavaEstimator, _GaussianMixtureParams, JavaMLWritable, Jav
     >>> model.predict(df.head().features)
     2
     >>> model.predictProbability(df.head().features)
-    DenseVector([0.0, 0.4736, 0.5264])
+    DenseVector([0.0, 0.0, 1.0])
 
 Review comment:
   ![image](https://user-images.githubusercontent.com/7322292/75656008-01064e00-5c9e-11ea-945c-2c95daf1de2c.png)
   
   The result in `model.transform(df).show()` are not `DenseVector([0.0, 0.0, 1.0])`, it is about `[6.74824658670777...`;
   
   but the `model.transform(df).head()` shows `DenseVector([0.0, 0.0, 1.0])`.
   
   Is this a kind of rounding in `Vector.toString`?
   
   ![image](https://user-images.githubusercontent.com/7322292/75656394-cc46c680-5c9e-11ea-8b0a-a83b4527bf0e.png)
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27519: [SPARK-30770][ML][WIP] avoid vector conversion in GMM.transform

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27519: [SPARK-30770][ML][WIP] avoid vector conversion in GMM.transform
URL: https://github.com/apache/spark/pull/27519#issuecomment-590726295
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27519: [SPARK-30770][ML] avoid vector conversion in GMM.transform

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27519: [SPARK-30770][ML] avoid vector conversion in GMM.transform
URL: https://github.com/apache/spark/pull/27519#issuecomment-593822186
 
 
   **[Test build #119212 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/119212/testReport)** for PR 27519 at commit [`936e8bf`](https://github.com/apache/spark/commit/936e8bfe02ee5119471c47194a5a3597f271fb06).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27519: [SPARK-30770][ML] avoid vector conversion in GMM.transform

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27519: [SPARK-30770][ML] avoid vector conversion in GMM.transform
URL: https://github.com/apache/spark/pull/27519#issuecomment-584056097
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/118149/
   Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhengruifeng commented on a change in pull request #27519: [SPARK-30770][ML][WIP] avoid vector conversion in GMM.transform

Posted by GitBox <gi...@apache.org>.
zhengruifeng commented on a change in pull request #27519: [SPARK-30770][ML][WIP] avoid vector conversion in GMM.transform
URL: https://github.com/apache/spark/pull/27519#discussion_r380030567
 
 

 ##########
 File path: mllib-local/src/main/scala/org/apache/spark/ml/stat/distribution/MultivariateGaussian.scala
 ##########
 @@ -48,43 +48,40 @@ class MultivariateGaussian @Since("2.0.0") (
     this(Vectors.fromBreeze(mean), Matrices.fromBreeze(cov))
   }
 
-  @transient private lazy val breezeMu = mean.asBreeze.toDenseVector
-
   /**
    * Compute distribution dependent constants:
    *    rootSigmaInv = D^(-1/2)^ * U.t, where sigma = U * D * U.t
    *    u = log((2*pi)^(-k/2)^ * det(sigma)^(-1/2)^)
    */
-  @transient private lazy val tuple = calculateCovarianceConstants
-  @transient private lazy val rootSigmaInv = tuple._1
-  @transient private lazy val u = tuple._2
+  @transient private lazy val tuple3 = {
+    val (rootSigmaInv, u) = calculateCovarianceConstants
+    val rootSigmaInvMat = Matrices.fromBreeze(rootSigmaInv)
+    val rootSigmaInvMulMu = rootSigmaInvMat.multiply(mean)
+    (rootSigmaInvMat, u, rootSigmaInvMulMu)
+  }
+
+  @transient private lazy val rootSigmaInvMat = tuple3._1
+
+  @transient private lazy val u = tuple3._2
+
+  @transient private lazy val rootSigmaInvMulMu = tuple3._3
 
   /**
    * Returns density of this multivariate Gaussian at given point, x
    */
   @Since("2.0.0")
   def pdf(x: Vector): Double = {
-    pdf(x.asBreeze)
+    math.exp(logpdf(x))
   }
 
   /**
    * Returns the log-density of this multivariate Gaussian at given point, x
    */
   @Since("2.0.0")
   def logpdf(x: Vector): Double = {
-    logpdf(x.asBreeze)
-  }
-
-  /** Returns density of this multivariate Gaussian at given point, x */
-  private[ml] def pdf(x: BV[Double]): Double = {
-    math.exp(logpdf(x))
-  }
-
-  /** Returns the log-density of this multivariate Gaussian at given point, x */
-  private[ml] def logpdf(x: BV[Double]): Double = {
-    val delta = x - breezeMu
-    val v = rootSigmaInv * delta
-    u + v.t * v * -0.5
+    val v = rootSigmaInvMulMu.copy
+    BLAS.gemv(-1.0, rootSigmaInvMat, x, 1.0, v)
+    u - 0.5 * BLAS.dot(v, v)
 
 Review comment:
   `rootSigmaInv * delta = rootSigmaInv * (x - breezeMu) = - (rootSigmaInv * breezeMu - rootSigmaInv * x)`

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27519: [SPARK-30770][ML][WIP] avoid vector conversion in GMM.transform

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27519: [SPARK-30770][ML][WIP] avoid vector conversion in GMM.transform
URL: https://github.com/apache/spark/pull/27519#issuecomment-590696132
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/118894/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] srowen commented on a change in pull request #27519: [SPARK-30770][ML][WIP] avoid vector conversion in GMM.transform

Posted by GitBox <gi...@apache.org>.
srowen commented on a change in pull request #27519: [SPARK-30770][ML][WIP] avoid vector conversion in GMM.transform
URL: https://github.com/apache/spark/pull/27519#discussion_r380228378
 
 

 ##########
 File path: mllib-local/src/main/scala/org/apache/spark/ml/stat/distribution/MultivariateGaussian.scala
 ##########
 @@ -48,43 +48,40 @@ class MultivariateGaussian @Since("2.0.0") (
     this(Vectors.fromBreeze(mean), Matrices.fromBreeze(cov))
   }
 
-  @transient private lazy val breezeMu = mean.asBreeze.toDenseVector
-
   /**
    * Compute distribution dependent constants:
    *    rootSigmaInv = D^(-1/2)^ * U.t, where sigma = U * D * U.t
    *    u = log((2*pi)^(-k/2)^ * det(sigma)^(-1/2)^)
    */
-  @transient private lazy val tuple = calculateCovarianceConstants
-  @transient private lazy val rootSigmaInv = tuple._1
-  @transient private lazy val u = tuple._2
+  @transient private lazy val tuple3 = {
+    val (rootSigmaInv, u) = calculateCovarianceConstants
+    val rootSigmaInvMat = Matrices.fromBreeze(rootSigmaInv)
+    val rootSigmaInvMulMu = rootSigmaInvMat.multiply(mean)
+    (rootSigmaInvMat, u, rootSigmaInvMulMu)
+  }
+
+  @transient private lazy val rootSigmaInvMat = tuple3._1
+
+  @transient private lazy val u = tuple3._2
+
+  @transient private lazy val rootSigmaInvMulMu = tuple3._3
 
   /**
    * Returns density of this multivariate Gaussian at given point, x
    */
   @Since("2.0.0")
   def pdf(x: Vector): Double = {
-    pdf(x.asBreeze)
+    math.exp(logpdf(x))
   }
 
   /**
    * Returns the log-density of this multivariate Gaussian at given point, x
    */
   @Since("2.0.0")
   def logpdf(x: Vector): Double = {
-    logpdf(x.asBreeze)
-  }
-
-  /** Returns density of this multivariate Gaussian at given point, x */
-  private[ml] def pdf(x: BV[Double]): Double = {
-    math.exp(logpdf(x))
-  }
-
-  /** Returns the log-density of this multivariate Gaussian at given point, x */
-  private[ml] def logpdf(x: BV[Double]): Double = {
-    val delta = x - breezeMu
-    val v = rootSigmaInv * delta
-    u + v.t * v * -0.5
+    val v = rootSigmaInvMulMu.copy
+    BLAS.gemv(-1.0, rootSigmaInvMat, x, 1.0, v)
+    u - 0.5 * BLAS.dot(v, v)
 
 Review comment:
   Is logpdf covered in MultivariateGaussianSuite? I didn't see it. Might be good to make some assertions about it and make sure it doesn't change much at all after this. I don't know enough to evaluate whether this loses any numeric precision (or is it from Breeze?)

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27519: [SPARK-30770][ML][WIP] avoid vector conversion in GMM.transform

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27519: [SPARK-30770][ML][WIP] avoid vector conversion in GMM.transform
URL: https://github.com/apache/spark/pull/27519#issuecomment-590696126
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27519: [SPARK-30770][ML][WIP] avoid vector conversion in GMM.transform

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27519: [SPARK-30770][ML][WIP] avoid vector conversion in GMM.transform
URL: https://github.com/apache/spark/pull/27519#issuecomment-586863051
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/23304/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27519: [SPARK-30770][ML][WIP] avoid vector conversion in GMM.transform

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27519: [SPARK-30770][ML][WIP] avoid vector conversion in GMM.transform
URL: https://github.com/apache/spark/pull/27519#issuecomment-586816450
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhengruifeng commented on a change in pull request #27519: [SPARK-30770][ML] avoid vector conversion in GMM.transform

Posted by GitBox <gi...@apache.org>.
zhengruifeng commented on a change in pull request #27519: [SPARK-30770][ML] avoid vector conversion in GMM.transform
URL: https://github.com/apache/spark/pull/27519#discussion_r386239057
 
 

 ##########
 File path: python/pyspark/ml/clustering.py
 ##########
 @@ -252,7 +252,7 @@ class GaussianMixture(JavaEstimator, _GaussianMixtureParams, JavaMLWritable, Jav
     >>> model.predict(df.head().features)
     2
     >>> model.predictProbability(df.head().features)
-    DenseVector([0.0, 0.4736, 0.5264])
+    DenseVector([0.0, 0.0, 1.0])
 
 Review comment:
   ![image](https://user-images.githubusercontent.com/7322292/75656008-01064e00-5c9e-11ea-945c-2c95daf1de2c.png)
   
   The result in `model.transform(df).show()` are not `DenseVector([0.0, 0.0, 1.0])`, it is about `[6.74824658670777...`;
   
   but the result in `model.transform(df).head()` is `DenseVector([0.0, 0.0, 1.0])`.
   
   Is this a kind of rounding?
   
   ![image](https://user-images.githubusercontent.com/7322292/75656118-40cd3580-5c9e-11ea-83aa-a7047fae06d6.png)
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on issue #27519: [SPARK-30770][ML] avoid vector conversion in GMM.transform

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on issue #27519: [SPARK-30770][ML] avoid vector conversion in GMM.transform
URL: https://github.com/apache/spark/pull/27519#issuecomment-593822186
 
 
   **[Test build #119212 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/119212/testReport)** for PR 27519 at commit [`936e8bf`](https://github.com/apache/spark/commit/936e8bfe02ee5119471c47194a5a3597f271fb06).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on issue #27519: [SPARK-30770][ML][WIP] avoid vector conversion in GMM.transform

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on issue #27519: [SPARK-30770][ML][WIP] avoid vector conversion in GMM.transform
URL: https://github.com/apache/spark/pull/27519#issuecomment-590704587
 
 
   **[Test build #118900 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/118900/testReport)** for PR 27519 at commit [`7686e04`](https://github.com/apache/spark/commit/7686e04c648384251b98c0c335c084b1f654188e).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27519: [SPARK-30770][ML] avoid vector conversion in GMM.transform

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27519: [SPARK-30770][ML] avoid vector conversion in GMM.transform
URL: https://github.com/apache/spark/pull/27519#issuecomment-593847229
 
 
   **[Test build #119212 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/119212/testReport)** for PR 27519 at commit [`936e8bf`](https://github.com/apache/spark/commit/936e8bfe02ee5119471c47194a5a3597f271fb06).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhengruifeng commented on a change in pull request #27519: [SPARK-30770][ML] avoid vector conversion in GMM.transform

Posted by GitBox <gi...@apache.org>.
zhengruifeng commented on a change in pull request #27519: [SPARK-30770][ML] avoid vector conversion in GMM.transform
URL: https://github.com/apache/spark/pull/27519#discussion_r386239057
 
 

 ##########
 File path: python/pyspark/ml/clustering.py
 ##########
 @@ -252,7 +252,7 @@ class GaussianMixture(JavaEstimator, _GaussianMixtureParams, JavaMLWritable, Jav
     >>> model.predict(df.head().features)
     2
     >>> model.predictProbability(df.head().features)
-    DenseVector([0.0, 0.4736, 0.5264])
+    DenseVector([0.0, 0.0, 1.0])
 
 Review comment:
   ![image](https://user-images.githubusercontent.com/7322292/75656008-01064e00-5c9e-11ea-945c-2c95daf1de2c.png)
   
   The result in `model.transform(df).show()` are not `DenseVector([0.0, 0.0, 1.0])`, but the result in `model.transform(df).head()` is.
   
   ![image](https://user-images.githubusercontent.com/7322292/75656118-40cd3580-5c9e-11ea-83aa-a7047fae06d6.png)
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27519: [SPARK-30770][ML][WIP] avoid vector conversion in GMM.transform

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27519: [SPARK-30770][ML][WIP] avoid vector conversion in GMM.transform
URL: https://github.com/apache/spark/pull/27519#issuecomment-590726302
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/118900/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27519: [SPARK-30770][ML][WIP] avoid vector conversion in GMM.transform

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27519: [SPARK-30770][ML][WIP] avoid vector conversion in GMM.transform
URL: https://github.com/apache/spark/pull/27519#issuecomment-586816177
 
 
   **[Test build #118529 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/118529/testReport)** for PR 27519 at commit [`7a38546`](https://github.com/apache/spark/commit/7a38546e12f89703bdcb35dbdd00922c97d8069e).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27519: [SPARK-30770][ML] avoid vector conversion in GMM.transform

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27519: [SPARK-30770][ML] avoid vector conversion in GMM.transform
URL: https://github.com/apache/spark/pull/27519#issuecomment-591860095
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27519: [SPARK-30770][ML][WIP] avoid vector conversion in GMM.transform

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27519: [SPARK-30770][ML][WIP] avoid vector conversion in GMM.transform
URL: https://github.com/apache/spark/pull/27519#issuecomment-586895018
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/118550/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhengruifeng edited a comment on issue #27519: [SPARK-30770][ML] avoid vector conversion in GMM.transform

Posted by GitBox <gi...@apache.org>.
zhengruifeng edited a comment on issue #27519: [SPARK-30770][ML] avoid vector conversion in GMM.transform
URL: https://github.com/apache/spark/pull/27519#issuecomment-591219332
 
 
   Crrent Master impl and commit [7686e04](https://github.com/apache/spark/commit/7686e04c648384251b98c0c335c084b1f654188e), all need to create two vector in `logpdf`,
   while the initial commit [bc1586e](https://github.com/apache/spark/pull/27519/commits/bc1586eafa58748b8ae7855184d903c22c1088a4) only need to create one vector.
   
   All the scala tests passed in `bc1586e`, however, it will fail in the py side. We can see that the model coefficients are almost the same, the only significient difference is the `logLikelihood`. 
   
   The issue of `logLikelihood` is the same as https://github.com/apache/spark/pull/26735, @huaxingao had helped testing it, and found that if we set `maxIter>25`, then all impls will convergen to the same cost. 
   It looks like a little numeric perturbation (in https://github.com/apache/spark/pull/26735, the way to accumulate `sumWeights`; in `bc1586e`, the way to compute `logpdf`: `A*(x-mean) -> A*x - A*mean`) will cause the py test converge to `26.193922336279954` at iteration=5, so I am wondering if we can update the py test by setting a larger `maxIter`?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27519: [SPARK-30770][ML][WIP] avoid vector conversion in GMM.transform

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27519: [SPARK-30770][ML][WIP] avoid vector conversion in GMM.transform
URL: https://github.com/apache/spark/pull/27519#issuecomment-586863040
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27519: [SPARK-30770][ML] avoid vector conversion in GMM.transform

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27519: [SPARK-30770][ML] avoid vector conversion in GMM.transform
URL: https://github.com/apache/spark/pull/27519#issuecomment-591894618
 
 
   **[Test build #119020 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/119020/testReport)** for PR 27519 at commit [`87472a4`](https://github.com/apache/spark/commit/87472a41aa1f08474e03341f9e5fe09b594cab77).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27519: [SPARK-30770][ML] avoid vector conversion in GMM.transform

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27519: [SPARK-30770][ML] avoid vector conversion in GMM.transform
URL: https://github.com/apache/spark/pull/27519#issuecomment-584034804
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhengruifeng commented on a change in pull request #27519: [SPARK-30770][ML] avoid vector conversion in GMM.transform

Posted by GitBox <gi...@apache.org>.
zhengruifeng commented on a change in pull request #27519: [SPARK-30770][ML] avoid vector conversion in GMM.transform
URL: https://github.com/apache/spark/pull/27519#discussion_r386236966
 
 

 ##########
 File path: python/pyspark/ml/clustering.py
 ##########
 @@ -270,19 +270,16 @@ class GaussianMixture(JavaEstimator, _GaussianMixtureParams, JavaMLWritable, Jav
     3
     >>> gaussians[0].mean
     DenseVector([0.825, 0.8675])
-    >>> gaussians[0].cov.toArray()
-    array([[ 0.005625  , -0.0050625 ],
-           [-0.0050625 ,  0.00455625]])
+    >>> gaussians[0].cov
+    DenseMatrix(2, 2, [0.0056, -0.0051, -0.0051, 0.0046], 0)
     >>> gaussians[1].mean
-    DenseVector([-0.4777, -0.4096])
-    >>> gaussians[1].cov.toArray()
-    array([[ 0.1679695 ,  0.13181786],
-           [ 0.13181786,  0.10524592]])
+    DenseVector([-0.87, -0.72])
+    >>> gaussians[1].cov
+    DenseMatrix(2, 2, [0.0016, 0.0016, 0.0016, 0.0016], 0)
     >>> gaussians[2].mean
-    DenseVector([-0.4473, -0.3853])
-    >>> gaussians[2].cov.toArray()
-    array([[ 0.16730412,  0.13112435],
-           [ 0.13112435,  0.10469614]])
+    DenseVector([-0.055, -0.075])
+    >>> gaussians[2].cov
+    DenseMatrix(2, 2, [0.002, -0.0011, -0.0011, 0.0006], 0)
 
 Review comment:
   Since the maxIter was changed from 10 to 30, the model coefficients are different.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhengruifeng commented on a change in pull request #27519: [SPARK-30770][ML] avoid vector conversion in GMM.transform

Posted by GitBox <gi...@apache.org>.
zhengruifeng commented on a change in pull request #27519: [SPARK-30770][ML] avoid vector conversion in GMM.transform
URL: https://github.com/apache/spark/pull/27519#discussion_r376937067
 
 

 ##########
 File path: mllib-local/src/main/scala/org/apache/spark/ml/stat/distribution/MultivariateGaussian.scala
 ##########
 @@ -43,46 +43,40 @@ class MultivariateGaussian @Since("2.0.0") (
   require(cov.numCols == cov.numRows, "Covariance matrix must be square")
   require(mean.size == cov.numCols, "Mean vector length must match covariance matrix size")
 
-  /** Private constructor taking Breeze types */
-  private[ml] def this(mean: BDV[Double], cov: BDM[Double]) = {
-    this(Vectors.fromBreeze(mean), Matrices.fromBreeze(cov))
-  }
-
-  @transient private lazy val breezeMu = mean.asBreeze.toDenseVector
-
   /**
    * Compute distribution dependent constants:
    *    rootSigmaInv = D^(-1/2)^ * U.t, where sigma = U * D * U.t
    *    u = log((2*pi)^(-k/2)^ * det(sigma)^(-1/2)^)
    */
-  @transient private lazy val (rootSigmaInv: BDM[Double], u: Double) = calculateCovarianceConstants
+  @transient private lazy val tuple3 = {
 
 Review comment:
   it is said in [LeastSquaresAggregator](https://github.com/apache/spark/blob/12e1bbaddbb2ef304b5880a62df6683fcc94ea54/mllib/src/main/scala/org/apache/spark/ml/optim/aggregator/LeastSquaresAggregator.scala#L188) that 
   
   > // do not use tuple assignment above because it will circumvent the @transient tag

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] srowen commented on a change in pull request #27519: [SPARK-30770][ML][WIP] avoid vector conversion in GMM.transform

Posted by GitBox <gi...@apache.org>.
srowen commented on a change in pull request #27519: [SPARK-30770][ML][WIP] avoid vector conversion in GMM.transform
URL: https://github.com/apache/spark/pull/27519#discussion_r380227419
 
 

 ##########
 File path: python/pyspark/ml/clustering.py
 ##########
 @@ -271,18 +269,18 @@ class GaussianMixture(JavaEstimator, _GaussianMixtureParams, JavaMLWritable, Jav
     >>> gaussians[0].mean
     DenseVector([0.825, 0.8675])
     >>> gaussians[0].cov.toArray()
-    array([[ 0.005625  , -0.0050625 ],
-           [-0.0050625 ,  0.00455625]])
+    array([[ 0.00562..., -0.00506...],
 
 Review comment:
   How much did the answer change? trying to figure out whether it's more or less accurate. Unless the perf difference is significant I wouldn't want to lose much here.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27519: [SPARK-30770][ML][WIP] avoid vector conversion in GMM.transform

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27519: [SPARK-30770][ML][WIP] avoid vector conversion in GMM.transform
URL: https://github.com/apache/spark/pull/27519#issuecomment-586863051
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/23304/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27519: [SPARK-30770][ML][WIP] avoid vector conversion in GMM.transform

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27519: [SPARK-30770][ML][WIP] avoid vector conversion in GMM.transform
URL: https://github.com/apache/spark/pull/27519#issuecomment-590679160
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/23643/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27519: [SPARK-30770][ML] avoid vector conversion in GMM.transform

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27519: [SPARK-30770][ML] avoid vector conversion in GMM.transform
URL: https://github.com/apache/spark/pull/27519#issuecomment-593822672
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/23953/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27519: [SPARK-30770][ML] avoid vector conversion in GMM.transform

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27519: [SPARK-30770][ML] avoid vector conversion in GMM.transform
URL: https://github.com/apache/spark/pull/27519#issuecomment-591859670
 
 
   **[Test build #119020 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/119020/testReport)** for PR 27519 at commit [`87472a4`](https://github.com/apache/spark/commit/87472a41aa1f08474e03341f9e5fe09b594cab77).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27519: [SPARK-30770][ML] avoid vector conversion in GMM.transform

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27519: [SPARK-30770][ML] avoid vector conversion in GMM.transform
URL: https://github.com/apache/spark/pull/27519#issuecomment-591860101
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/23767/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27519: [SPARK-30770][ML] avoid vector conversion in GMM.transform

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27519: [SPARK-30770][ML] avoid vector conversion in GMM.transform
URL: https://github.com/apache/spark/pull/27519#issuecomment-584056089
 
 
   Merged build finished. Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27519: [SPARK-30770][ML] avoid vector conversion in GMM.transform

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27519: [SPARK-30770][ML] avoid vector conversion in GMM.transform
URL: https://github.com/apache/spark/pull/27519#issuecomment-593822660
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27519: [SPARK-30770][ML][WIP] avoid vector conversion in GMM.transform

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27519: [SPARK-30770][ML][WIP] avoid vector conversion in GMM.transform
URL: https://github.com/apache/spark/pull/27519#issuecomment-586827879
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/118529/
   Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27519: [SPARK-30770][ML] avoid vector conversion in GMM.transform

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27519: [SPARK-30770][ML] avoid vector conversion in GMM.transform
URL: https://github.com/apache/spark/pull/27519#issuecomment-584055742
 
 
   **[Test build #118149 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/118149/testReport)** for PR 27519 at commit [`73080ba`](https://github.com/apache/spark/commit/73080baad44df990bd5806a0de062c669a558506).
    * This patch **fails PySpark unit tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27519: [SPARK-30770][ML] avoid vector conversion in GMM.transform

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27519: [SPARK-30770][ML] avoid vector conversion in GMM.transform
URL: https://github.com/apache/spark/pull/27519#issuecomment-593847681
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] srowen commented on a change in pull request #27519: [SPARK-30770][ML][WIP] avoid vector conversion in GMM.transform

Posted by GitBox <gi...@apache.org>.
srowen commented on a change in pull request #27519: [SPARK-30770][ML][WIP] avoid vector conversion in GMM.transform
URL: https://github.com/apache/spark/pull/27519#discussion_r383932814
 
 

 ##########
 File path: mllib-local/src/main/scala/org/apache/spark/ml/stat/distribution/MultivariateGaussian.scala
 ##########
 @@ -48,43 +48,37 @@ class MultivariateGaussian @Since("2.0.0") (
     this(Vectors.fromBreeze(mean), Matrices.fromBreeze(cov))
   }
 
-  @transient private lazy val breezeMu = mean.asBreeze.toDenseVector
-
   /**
    * Compute distribution dependent constants:
    *    rootSigmaInv = D^(-1/2)^ * U.t, where sigma = U * D * U.t
    *    u = log((2*pi)^(-k/2)^ * det(sigma)^(-1/2)^)
    */
-  @transient private lazy val tuple = calculateCovarianceConstants
-  @transient private lazy val rootSigmaInv = tuple._1
+  @transient private lazy val tuple = {
+    val (rootSigmaInv, u) = calculateCovarianceConstants
+    val rootSigmaInvMat = Matrices.fromBreeze(rootSigmaInv)
+    (rootSigmaInvMat, u)
+  }
+  @transient private lazy val rootSigmaInvMat = tuple._1
   @transient private lazy val u = tuple._2
+  @transient private lazy val mu = mean.toDense
 
   /**
    * Returns density of this multivariate Gaussian at given point, x
    */
   @Since("2.0.0")
   def pdf(x: Vector): Double = {
-    pdf(x.asBreeze)
+    math.exp(logpdf(x))
 
 Review comment:
   This is how breeze does it? there's no way to compute this except in log space? just wondering if we lose any accuracy this way otherwise.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27519: [SPARK-30770][ML] avoid vector conversion in GMM.transform

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27519: [SPARK-30770][ML] avoid vector conversion in GMM.transform
URL: https://github.com/apache/spark/pull/27519#issuecomment-584025123
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/22910/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27519: [SPARK-30770][ML] avoid vector conversion in GMM.transform

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27519: [SPARK-30770][ML] avoid vector conversion in GMM.transform
URL: https://github.com/apache/spark/pull/27519#issuecomment-584025117
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27519: [SPARK-30770][ML][WIP] avoid vector conversion in GMM.transform

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27519: [SPARK-30770][ML][WIP] avoid vector conversion in GMM.transform
URL: https://github.com/apache/spark/pull/27519#issuecomment-586806575
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/118524/
   Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27519: [SPARK-30770][ML] avoid vector conversion in GMM.transform

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27519: [SPARK-30770][ML] avoid vector conversion in GMM.transform
URL: https://github.com/apache/spark/pull/27519#issuecomment-584024550
 
 
   **[Test build #118148 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/118148/testReport)** for PR 27519 at commit [`5552629`](https://github.com/apache/spark/commit/5552629da7824ad07de615aacabec512b8290a5b).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27519: [SPARK-30770][ML] avoid vector conversion in GMM.transform

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27519: [SPARK-30770][ML] avoid vector conversion in GMM.transform
URL: https://github.com/apache/spark/pull/27519#issuecomment-593847681
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27519: [SPARK-30770][ML][WIP] avoid vector conversion in GMM.transform

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27519: [SPARK-30770][ML][WIP] avoid vector conversion in GMM.transform
URL: https://github.com/apache/spark/pull/27519#issuecomment-590679160
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/23643/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27519: [SPARK-30770][ML] avoid vector conversion in GMM.transform

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27519: [SPARK-30770][ML] avoid vector conversion in GMM.transform
URL: https://github.com/apache/spark/pull/27519#issuecomment-591860095
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on issue #27519: [SPARK-30770][ML][WIP] avoid vector conversion in GMM.transform

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on issue #27519: [SPARK-30770][ML][WIP] avoid vector conversion in GMM.transform
URL: https://github.com/apache/spark/pull/27519#issuecomment-586803978
 
 
   **[Test build #118524 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/118524/testReport)** for PR 27519 at commit [`eb3b40a`](https://github.com/apache/spark/commit/eb3b40aac9683856f9827a60ee6239a87b202e80).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on issue #27519: [SPARK-30770][ML][WIP] avoid vector conversion in GMM.transform

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on issue #27519: [SPARK-30770][ML][WIP] avoid vector conversion in GMM.transform
URL: https://github.com/apache/spark/pull/27519#issuecomment-586816177
 
 
   **[Test build #118529 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/118529/testReport)** for PR 27519 at commit [`7a38546`](https://github.com/apache/spark/commit/7a38546e12f89703bdcb35dbdd00922c97d8069e).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhengruifeng commented on a change in pull request #27519: [SPARK-30770][ML] avoid vector conversion in GMM.transform

Posted by GitBox <gi...@apache.org>.
zhengruifeng commented on a change in pull request #27519: [SPARK-30770][ML] avoid vector conversion in GMM.transform
URL: https://github.com/apache/spark/pull/27519#discussion_r386236966
 
 

 ##########
 File path: python/pyspark/ml/clustering.py
 ##########
 @@ -270,19 +270,16 @@ class GaussianMixture(JavaEstimator, _GaussianMixtureParams, JavaMLWritable, Jav
     3
     >>> gaussians[0].mean
     DenseVector([0.825, 0.8675])
-    >>> gaussians[0].cov.toArray()
-    array([[ 0.005625  , -0.0050625 ],
-           [-0.0050625 ,  0.00455625]])
+    >>> gaussians[0].cov
+    DenseMatrix(2, 2, [0.0056, -0.0051, -0.0051, 0.0046], 0)
     >>> gaussians[1].mean
-    DenseVector([-0.4777, -0.4096])
-    >>> gaussians[1].cov.toArray()
-    array([[ 0.1679695 ,  0.13181786],
-           [ 0.13181786,  0.10524592]])
+    DenseVector([-0.87, -0.72])
+    >>> gaussians[1].cov
+    DenseMatrix(2, 2, [0.0016, 0.0016, 0.0016, 0.0016], 0)
     >>> gaussians[2].mean
-    DenseVector([-0.4473, -0.3853])
-    >>> gaussians[2].cov.toArray()
-    array([[ 0.16730412,  0.13112435],
-           [ 0.13112435,  0.10469614]])
+    DenseVector([-0.055, -0.075])
+    >>> gaussians[2].cov
+    DenseMatrix(2, 2, [0.002, -0.0011, -0.0011, 0.0006], 0)
 
 Review comment:
   Since the maxIter was changed from 10 to 30, the model coefficients are different.
   I had checked the model coefficients of both master and this PR with maxIter=30, they are the same. But they are different from coefficients at iteration=10;

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27519: [SPARK-30770][ML] avoid vector conversion in GMM.transform

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27519: [SPARK-30770][ML] avoid vector conversion in GMM.transform
URL: https://github.com/apache/spark/pull/27519#issuecomment-584030801
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/118148/
   Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27519: [SPARK-30770][ML][WIP] avoid vector conversion in GMM.transform

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27519: [SPARK-30770][ML][WIP] avoid vector conversion in GMM.transform
URL: https://github.com/apache/spark/pull/27519#issuecomment-590695808
 
 
   **[Test build #118894 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/118894/testReport)** for PR 27519 at commit [`4c61f20`](https://github.com/apache/spark/commit/4c61f20d07e6b4044d869c72f1a063c29057f368).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27519: [SPARK-30770][ML][WIP] avoid vector conversion in GMM.transform

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27519: [SPARK-30770][ML][WIP] avoid vector conversion in GMM.transform
URL: https://github.com/apache/spark/pull/27519#issuecomment-590704860
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/23649/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhengruifeng commented on issue #27519: [SPARK-30770][ML] avoid vector conversion in GMM.transform

Posted by GitBox <gi...@apache.org>.
zhengruifeng commented on issue #27519: [SPARK-30770][ML] avoid vector conversion in GMM.transform
URL: https://github.com/apache/spark/pull/27519#issuecomment-594300301
 
 
   Merged to master, thanks @srowen for reviewing!

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhengruifeng closed pull request #27519: [SPARK-30770][ML] avoid vector conversion in GMM.transform

Posted by GitBox <gi...@apache.org>.
zhengruifeng closed pull request #27519: [SPARK-30770][ML] avoid vector conversion in GMM.transform
URL: https://github.com/apache/spark/pull/27519
 
 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27519: [SPARK-30770][ML] avoid vector conversion in GMM.transform

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27519: [SPARK-30770][ML] avoid vector conversion in GMM.transform
URL: https://github.com/apache/spark/pull/27519#issuecomment-584030801
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/118148/
   Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhengruifeng commented on issue #27519: [SPARK-30770][ML] avoid vector conversion in GMM.transform

Posted by GitBox <gi...@apache.org>.
zhengruifeng commented on issue #27519: [SPARK-30770][ML] avoid vector conversion in GMM.transform
URL: https://github.com/apache/spark/pull/27519#issuecomment-591219332
 
 
   Crrent Master impl and commit [7686e04](https://github.com/apache/spark/commit/7686e04c648384251b98c0c335c084b1f654188e), all need to create two vector in `logpdf`,
   while the initial commit [bc1586e](https://github.com/apache/spark/pull/27519/commits/bc1586eafa58748b8ae7855184d903c22c1088a4) only need to create one vector.
   
   All the scala tests passed in `bc1586e`, however, it will fail in the py side. We can see that the model coefficients are almost the same, the only significient difference is the `logLikelihood`. 
   
   The issue of `logLikelihood` is the same as https://github.com/apache/spark/pull/26735, @huaxingao had helped testing it, and found that if we set `maxIter>25`, then all impls will convergen to the same cost. 
   It looks like a littler perturbation (in https://github.com/apache/spark/pull/26735, the way to accumulate sum of weight'; in `bc1586e`, the way to compute `logpdf`: `A*(x-mean) -> A*x - A*mean`) will cause the py test converge to `26.193922336279954` at iteration=5, so I am wondering if we can update the py test by setting a larger `maxIter`?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27519: [SPARK-30770][ML] avoid vector conversion in GMM.transform

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27519: [SPARK-30770][ML] avoid vector conversion in GMM.transform
URL: https://github.com/apache/spark/pull/27519#issuecomment-593822672
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/23953/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27519: [SPARK-30770][ML][WIP] avoid vector conversion in GMM.transform

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27519: [SPARK-30770][ML][WIP] avoid vector conversion in GMM.transform
URL: https://github.com/apache/spark/pull/27519#issuecomment-586827874
 
 
   Merged build finished. Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27519: [SPARK-30770][ML][WIP] avoid vector conversion in GMM.transform

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27519: [SPARK-30770][ML][WIP] avoid vector conversion in GMM.transform
URL: https://github.com/apache/spark/pull/27519#issuecomment-586804200
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27519: [SPARK-30770][ML][WIP] avoid vector conversion in GMM.transform

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27519: [SPARK-30770][ML][WIP] avoid vector conversion in GMM.transform
URL: https://github.com/apache/spark/pull/27519#issuecomment-586863040
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhengruifeng commented on a change in pull request #27519: [SPARK-30770][ML] avoid vector conversion in GMM.transform

Posted by GitBox <gi...@apache.org>.
zhengruifeng commented on a change in pull request #27519: [SPARK-30770][ML] avoid vector conversion in GMM.transform
URL: https://github.com/apache/spark/pull/27519#discussion_r386853676
 
 

 ##########
 File path: mllib-local/src/test/scala/org/apache/spark/ml/stat/distribution/MultivariateGaussianSuite.scala
 ##########
 @@ -31,11 +31,15 @@ class MultivariateGaussianSuite extends SparkMLFunSuite {
     val mu = Vectors.dense(0.0)
     val sigma1 = Matrices.dense(1, 1, Array(1.0))
     val dist1 = new MultivariateGaussian(mu, sigma1)
+    assert(dist1.logpdf(x1) ~== -0.9189385332046727 absTol 1E-5)
 
 Review comment:
   numbers here are generated in repl in 2.4.5.
   
   ![image](https://user-images.githubusercontent.com/7322292/75755322-5f4c3300-5d69-11ea-85d4-807c6d50f6f4.png)
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27519: [SPARK-30770][ML] avoid vector conversion in GMM.transform

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27519: [SPARK-30770][ML] avoid vector conversion in GMM.transform
URL: https://github.com/apache/spark/pull/27519#issuecomment-591895195
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/119020/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27519: [SPARK-30770][ML][WIP] avoid vector conversion in GMM.transform

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27519: [SPARK-30770][ML][WIP] avoid vector conversion in GMM.transform
URL: https://github.com/apache/spark/pull/27519#issuecomment-586806575
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/118524/
   Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27519: [SPARK-30770][ML] avoid vector conversion in GMM.transform

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27519: [SPARK-30770][ML] avoid vector conversion in GMM.transform
URL: https://github.com/apache/spark/pull/27519#issuecomment-584025123
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/22910/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhengruifeng commented on a change in pull request #27519: [SPARK-30770][ML][WIP] avoid vector conversion in GMM.transform

Posted by GitBox <gi...@apache.org>.
zhengruifeng commented on a change in pull request #27519: [SPARK-30770][ML][WIP] avoid vector conversion in GMM.transform
URL: https://github.com/apache/spark/pull/27519#discussion_r384252384
 
 

 ##########
 File path: mllib-local/src/main/scala/org/apache/spark/ml/stat/distribution/MultivariateGaussian.scala
 ##########
 @@ -48,43 +48,37 @@ class MultivariateGaussian @Since("2.0.0") (
     this(Vectors.fromBreeze(mean), Matrices.fromBreeze(cov))
   }
 
-  @transient private lazy val breezeMu = mean.asBreeze.toDenseVector
-
   /**
    * Compute distribution dependent constants:
    *    rootSigmaInv = D^(-1/2)^ * U.t, where sigma = U * D * U.t
    *    u = log((2*pi)^(-k/2)^ * det(sigma)^(-1/2)^)
    */
-  @transient private lazy val tuple = calculateCovarianceConstants
-  @transient private lazy val rootSigmaInv = tuple._1
+  @transient private lazy val tuple = {
+    val (rootSigmaInv, u) = calculateCovarianceConstants
+    val rootSigmaInvMat = Matrices.fromBreeze(rootSigmaInv)
+    (rootSigmaInvMat, u)
+  }
+  @transient private lazy val rootSigmaInvMat = tuple._1
   @transient private lazy val u = tuple._2
+  @transient private lazy val mu = mean.toDense
 
   /**
    * Returns density of this multivariate Gaussian at given point, x
    */
   @Since("2.0.0")
   def pdf(x: Vector): Double = {
-    pdf(x.asBreeze)
+    math.exp(logpdf(x))
 
 Review comment:
   Yes, that is what existing impl does:
   ```scala
     /**
      * Returns density of this multivariate Gaussian at given point, x
      */
     @Since("2.0.0")
     def pdf(x: Vector): Double = {
       pdf(x.asBreeze)
     }
   
     /**
      * Returns the log-density of this multivariate Gaussian at given point, x
      */
     @Since("2.0.0")
     def logpdf(x: Vector): Double = {
       logpdf(x.asBreeze)
     }
   
     /** Returns density of this multivariate Gaussian at given point, x */
     private[ml] def pdf(x: BV[Double]): Double = {
       math.exp(logpdf(x))
     }
   
     /** Returns the log-density of this multivariate Gaussian at given point, x */
     private[ml] def logpdf(x: BV[Double]): Double = {
       val delta = x - breezeMu
       val v = rootSigmaInv * delta
       u + v.t * v * -0.5
     }
   
   ```
   
   After this change, we do not need private methods based on BreezeVector.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27519: [SPARK-30770][ML] avoid vector conversion in GMM.transform

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27519: [SPARK-30770][ML] avoid vector conversion in GMM.transform
URL: https://github.com/apache/spark/pull/27519#issuecomment-591895185
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27519: [SPARK-30770][ML][WIP] avoid vector conversion in GMM.transform

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27519: [SPARK-30770][ML][WIP] avoid vector conversion in GMM.transform
URL: https://github.com/apache/spark/pull/27519#issuecomment-590704853
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27519: [SPARK-30770][ML][WIP] avoid vector conversion in GMM.transform

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27519: [SPARK-30770][ML][WIP] avoid vector conversion in GMM.transform
URL: https://github.com/apache/spark/pull/27519#issuecomment-586804200
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhengruifeng commented on a change in pull request #27519: [SPARK-30770][ML] avoid vector conversion in GMM.transform

Posted by GitBox <gi...@apache.org>.
zhengruifeng commented on a change in pull request #27519: [SPARK-30770][ML] avoid vector conversion in GMM.transform
URL: https://github.com/apache/spark/pull/27519#discussion_r386237201
 
 

 ##########
 File path: python/pyspark/ml/clustering.py
 ##########
 @@ -252,7 +252,7 @@ class GaussianMixture(JavaEstimator, _GaussianMixtureParams, JavaMLWritable, Jav
     >>> model.predict(df.head().features)
     2
     >>> model.predictProbability(df.head().features)
-    DenseVector([0.0, 0.4736, 0.5264])
+    DenseVector([0.0, 0.0, 1.0])
 
 Review comment:
   I had checked it in master, and when maxIter is set to 30, then the model also output `DenseVector([0.0, 0.0, 1.0])`

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27519: [SPARK-30770][ML][WIP] avoid vector conversion in GMM.transform

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27519: [SPARK-30770][ML][WIP] avoid vector conversion in GMM.transform
URL: https://github.com/apache/spark/pull/27519#issuecomment-586804202
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/23279/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27519: [SPARK-30770][ML] avoid vector conversion in GMM.transform

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27519: [SPARK-30770][ML] avoid vector conversion in GMM.transform
URL: https://github.com/apache/spark/pull/27519#issuecomment-593847698
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/119212/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27519: [SPARK-30770][ML] avoid vector conversion in GMM.transform

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27519: [SPARK-30770][ML] avoid vector conversion in GMM.transform
URL: https://github.com/apache/spark/pull/27519#issuecomment-584056097
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/118149/
   Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on issue #27519: [SPARK-30770][ML][WIP] avoid vector conversion in GMM.transform

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on issue #27519: [SPARK-30770][ML][WIP] avoid vector conversion in GMM.transform
URL: https://github.com/apache/spark/pull/27519#issuecomment-586862639
 
 
   **[Test build #118550 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/118550/testReport)** for PR 27519 at commit [`941a5a5`](https://github.com/apache/spark/commit/941a5a5e4434e8b08d72a9033dc351e5c03c43cf).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27519: [SPARK-30770][ML][WIP] avoid vector conversion in GMM.transform

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27519: [SPARK-30770][ML][WIP] avoid vector conversion in GMM.transform
URL: https://github.com/apache/spark/pull/27519#issuecomment-586806570
 
 
   Merged build finished. Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27519: [SPARK-30770][ML][WIP] avoid vector conversion in GMM.transform

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27519: [SPARK-30770][ML][WIP] avoid vector conversion in GMM.transform
URL: https://github.com/apache/spark/pull/27519#issuecomment-586816453
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/23284/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhengruifeng commented on a change in pull request #27519: [SPARK-30770][ML] avoid vector conversion in GMM.transform

Posted by GitBox <gi...@apache.org>.
zhengruifeng commented on a change in pull request #27519: [SPARK-30770][ML] avoid vector conversion in GMM.transform
URL: https://github.com/apache/spark/pull/27519#discussion_r386239057
 
 

 ##########
 File path: python/pyspark/ml/clustering.py
 ##########
 @@ -252,7 +252,7 @@ class GaussianMixture(JavaEstimator, _GaussianMixtureParams, JavaMLWritable, Jav
     >>> model.predict(df.head().features)
     2
     >>> model.predictProbability(df.head().features)
-    DenseVector([0.0, 0.4736, 0.5264])
+    DenseVector([0.0, 0.0, 1.0])
 
 Review comment:
   ![image](https://user-images.githubusercontent.com/7322292/75656008-01064e00-5c9e-11ea-945c-2c95daf1de2c.png)
   
   The result in `model.transform(df).show()` are not `DenseVector([0.0, 0.0, 1.0])`, it is about `[6.74824658670777...`;
   
   but the `model.transform(df).head()` shows `DenseVector([0.0, 0.0, 1.0])`.
   
   Is this a kind of rounding?
   
   ![image](https://user-images.githubusercontent.com/7322292/75656118-40cd3580-5c9e-11ea-83aa-a7047fae06d6.png)
   ![image](https://user-images.githubusercontent.com/7322292/75656394-cc46c680-5c9e-11ea-8b0a-a83b4527bf0e.png)
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27519: [SPARK-30770][ML][WIP] avoid vector conversion in GMM.transform

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27519: [SPARK-30770][ML][WIP] avoid vector conversion in GMM.transform
URL: https://github.com/apache/spark/pull/27519#issuecomment-586894440
 
 
   **[Test build #118550 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/118550/testReport)** for PR 27519 at commit [`941a5a5`](https://github.com/apache/spark/commit/941a5a5e4434e8b08d72a9033dc351e5c03c43cf).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27519: [SPARK-30770][ML] avoid vector conversion in GMM.transform

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27519: [SPARK-30770][ML] avoid vector conversion in GMM.transform
URL: https://github.com/apache/spark/pull/27519#issuecomment-584056089
 
 
   Merged build finished. Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27519: [SPARK-30770][ML][WIP] avoid vector conversion in GMM.transform

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27519: [SPARK-30770][ML][WIP] avoid vector conversion in GMM.transform
URL: https://github.com/apache/spark/pull/27519#issuecomment-586803978
 
 
   **[Test build #118524 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/118524/testReport)** for PR 27519 at commit [`eb3b40a`](https://github.com/apache/spark/commit/eb3b40aac9683856f9827a60ee6239a87b202e80).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27519: [SPARK-30770][ML] avoid vector conversion in GMM.transform

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27519: [SPARK-30770][ML] avoid vector conversion in GMM.transform
URL: https://github.com/apache/spark/pull/27519#issuecomment-593822660
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27519: [SPARK-30770][ML][WIP] avoid vector conversion in GMM.transform

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27519: [SPARK-30770][ML][WIP] avoid vector conversion in GMM.transform
URL: https://github.com/apache/spark/pull/27519#issuecomment-586806570
 
 
   Merged build finished. Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27519: [SPARK-30770][ML] avoid vector conversion in GMM.transform

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27519: [SPARK-30770][ML] avoid vector conversion in GMM.transform
URL: https://github.com/apache/spark/pull/27519#issuecomment-584030709
 
 
   **[Test build #118148 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/118148/testReport)** for PR 27519 at commit [`5552629`](https://github.com/apache/spark/commit/5552629da7824ad07de615aacabec512b8290a5b).
    * This patch **fails MiMa tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on issue #27519: [SPARK-30770][ML] avoid vector conversion in GMM.transform

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on issue #27519: [SPARK-30770][ML] avoid vector conversion in GMM.transform
URL: https://github.com/apache/spark/pull/27519#issuecomment-584034455
 
 
   **[Test build #118149 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/118149/testReport)** for PR 27519 at commit [`73080ba`](https://github.com/apache/spark/commit/73080baad44df990bd5806a0de062c669a558506).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] srowen commented on a change in pull request #27519: [SPARK-30770][ML] avoid vector conversion in GMM.transform

Posted by GitBox <gi...@apache.org>.
srowen commented on a change in pull request #27519: [SPARK-30770][ML] avoid vector conversion in GMM.transform
URL: https://github.com/apache/spark/pull/27519#discussion_r385900804
 
 

 ##########
 File path: python/pyspark/ml/clustering.py
 ##########
 @@ -252,7 +252,7 @@ class GaussianMixture(JavaEstimator, _GaussianMixtureParams, JavaMLWritable, Jav
     >>> model.predict(df.head().features)
     2
     >>> model.predictProbability(df.head().features)
-    DenseVector([0.0, 0.4736, 0.5264])
+    DenseVector([0.0, 0.0, 1.0])
 
 Review comment:
   This seems kind of unlikely, as probabilities? is this valid?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27519: [SPARK-30770][ML] avoid vector conversion in GMM.transform

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27519: [SPARK-30770][ML] avoid vector conversion in GMM.transform
URL: https://github.com/apache/spark/pull/27519#issuecomment-584030788
 
 
   Merged build finished. Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] srowen commented on a change in pull request #27519: [SPARK-30770][ML] avoid vector conversion in GMM.transform

Posted by GitBox <gi...@apache.org>.
srowen commented on a change in pull request #27519: [SPARK-30770][ML] avoid vector conversion in GMM.transform
URL: https://github.com/apache/spark/pull/27519#discussion_r385901134
 
 

 ##########
 File path: python/pyspark/ml/clustering.py
 ##########
 @@ -270,19 +270,16 @@ class GaussianMixture(JavaEstimator, _GaussianMixtureParams, JavaMLWritable, Jav
     3
     >>> gaussians[0].mean
     DenseVector([0.825, 0.8675])
-    >>> gaussians[0].cov.toArray()
-    array([[ 0.005625  , -0.0050625 ],
-           [-0.0050625 ,  0.00455625]])
+    >>> gaussians[0].cov
+    DenseMatrix(2, 2, [0.0056, -0.0051, -0.0051, 0.0046], 0)
     >>> gaussians[1].mean
-    DenseVector([-0.4777, -0.4096])
-    >>> gaussians[1].cov.toArray()
-    array([[ 0.1679695 ,  0.13181786],
-           [ 0.13181786,  0.10524592]])
+    DenseVector([-0.87, -0.72])
+    >>> gaussians[1].cov
+    DenseMatrix(2, 2, [0.0016, 0.0016, 0.0016, 0.0016], 0)
     >>> gaussians[2].mean
-    DenseVector([-0.4473, -0.3853])
-    >>> gaussians[2].cov.toArray()
-    array([[ 0.16730412,  0.13112435],
-           [ 0.13112435,  0.10469614]])
+    DenseVector([-0.055, -0.075])
+    >>> gaussians[2].cov
+    DenseMatrix(2, 2, [0.002, -0.0011, -0.0011, 0.0006], 0)
 
 Review comment:
   Same question, I just wonder why the answer changes so significantly?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27519: [SPARK-30770][ML][WIP] avoid vector conversion in GMM.transform

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27519: [SPARK-30770][ML][WIP] avoid vector conversion in GMM.transform
URL: https://github.com/apache/spark/pull/27519#issuecomment-586862639
 
 
   **[Test build #118550 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/118550/testReport)** for PR 27519 at commit [`941a5a5`](https://github.com/apache/spark/commit/941a5a5e4434e8b08d72a9033dc351e5c03c43cf).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27519: [SPARK-30770][ML][WIP] avoid vector conversion in GMM.transform

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27519: [SPARK-30770][ML][WIP] avoid vector conversion in GMM.transform
URL: https://github.com/apache/spark/pull/27519#issuecomment-590726295
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27519: [SPARK-30770][ML][WIP] avoid vector conversion in GMM.transform

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27519: [SPARK-30770][ML][WIP] avoid vector conversion in GMM.transform
URL: https://github.com/apache/spark/pull/27519#issuecomment-590704860
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/23649/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27519: [SPARK-30770][ML] avoid vector conversion in GMM.transform

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27519: [SPARK-30770][ML] avoid vector conversion in GMM.transform
URL: https://github.com/apache/spark/pull/27519#issuecomment-584030788
 
 
   Merged build finished. Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27519: [SPARK-30770][ML][WIP] avoid vector conversion in GMM.transform

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27519: [SPARK-30770][ML][WIP] avoid vector conversion in GMM.transform
URL: https://github.com/apache/spark/pull/27519#issuecomment-586804202
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/23279/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27519: [SPARK-30770][ML][WIP] avoid vector conversion in GMM.transform

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27519: [SPARK-30770][ML][WIP] avoid vector conversion in GMM.transform
URL: https://github.com/apache/spark/pull/27519#issuecomment-586827689
 
 
   **[Test build #118529 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/118529/testReport)** for PR 27519 at commit [`7a38546`](https://github.com/apache/spark/commit/7a38546e12f89703bdcb35dbdd00922c97d8069e).
    * This patch **fails PySpark unit tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27519: [SPARK-30770][ML][WIP] avoid vector conversion in GMM.transform

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27519: [SPARK-30770][ML][WIP] avoid vector conversion in GMM.transform
URL: https://github.com/apache/spark/pull/27519#issuecomment-590704853
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27519: [SPARK-30770][ML][WIP] avoid vector conversion in GMM.transform

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27519: [SPARK-30770][ML][WIP] avoid vector conversion in GMM.transform
URL: https://github.com/apache/spark/pull/27519#issuecomment-590679152
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27519: [SPARK-30770][ML] avoid vector conversion in GMM.transform

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27519: [SPARK-30770][ML] avoid vector conversion in GMM.transform
URL: https://github.com/apache/spark/pull/27519#issuecomment-591895195
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/119020/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhengruifeng commented on issue #27519: [SPARK-30770][ML][WIP] avoid vector conversion in GMM.transform

Posted by GitBox <gi...@apache.org>.
zhengruifeng commented on issue #27519: [SPARK-30770][ML][WIP] avoid vector conversion in GMM.transform
URL: https://github.com/apache/spark/pull/27519#issuecomment-586858044
 
 
   The pytest issue happened in https://github.com/apache/spark/pull/26735 recur, GMM is too sensitive to the dataset used in pytest.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27519: [SPARK-30770][ML][WIP] avoid vector conversion in GMM.transform

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27519: [SPARK-30770][ML][WIP] avoid vector conversion in GMM.transform
URL: https://github.com/apache/spark/pull/27519#issuecomment-586816453
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/23284/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27519: [SPARK-30770][ML] avoid vector conversion in GMM.transform

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27519: [SPARK-30770][ML] avoid vector conversion in GMM.transform
URL: https://github.com/apache/spark/pull/27519#issuecomment-584034804
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27519: [SPARK-30770][ML][WIP] avoid vector conversion in GMM.transform

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27519: [SPARK-30770][ML][WIP] avoid vector conversion in GMM.transform
URL: https://github.com/apache/spark/pull/27519#issuecomment-586816450
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27519: [SPARK-30770][ML] avoid vector conversion in GMM.transform

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27519: [SPARK-30770][ML] avoid vector conversion in GMM.transform
URL: https://github.com/apache/spark/pull/27519#issuecomment-584034816
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/22911/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on issue #27519: [SPARK-30770][ML] avoid vector conversion in GMM.transform

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on issue #27519: [SPARK-30770][ML] avoid vector conversion in GMM.transform
URL: https://github.com/apache/spark/pull/27519#issuecomment-584024550
 
 
   **[Test build #118148 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/118148/testReport)** for PR 27519 at commit [`5552629`](https://github.com/apache/spark/commit/5552629da7824ad07de615aacabec512b8290a5b).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhengruifeng commented on issue #27519: [SPARK-30770][ML] avoid vector conversion in GMM.transform

Posted by GitBox <gi...@apache.org>.
zhengruifeng commented on issue #27519: [SPARK-30770][ML] avoid vector conversion in GMM.transform
URL: https://github.com/apache/spark/pull/27519#issuecomment-591871256
 
 
   using following code to compare the convergence:
   ```python
   from pyspark.ml.linalg import Vectors
   from pyspark.ml.clustering import *
   
   data = [(Vectors.dense([-0.1, -0.05 ]),), (Vectors.dense([-0.01, -0.1]),), (Vectors.dense([0.9, 0.8]),), (Vectors.dense([0.75, 0.935]),), (Vectors.dense([-0.83, -0.68]),), (Vectors.dense([-0.91, -0.76]),)]
   df = spark.createDataFrame(sc.parallelize(data, 2), ["features"])
   gm = GaussianMixture(k=3, tol=0.0001, seed=10)
   curve = [gm.setMaxIter(k).fit(df).summary.logLikelihood for k in range(0,30)]
   
   ```
   
   ![image](https://user-images.githubusercontent.com/7322292/75430326-1f123c80-5986-11ea-813d-c7af5fee5eca.png)
   
   The Blue curve is Master, and the Orange one is https://github.com/apache/spark/commit/87472a41aa1f08474e03341f9e5fe09b594cab77. 
   
   The result is the same with https://github.com/apache/spark/pull/26735.
   With maxIter=30, both the two curves converge to `65.02945125241477`.
   I also check the coefficients of models, and they are the same.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27519: [SPARK-30770][ML] avoid vector conversion in GMM.transform

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27519: [SPARK-30770][ML] avoid vector conversion in GMM.transform
URL: https://github.com/apache/spark/pull/27519#issuecomment-584025117
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27519: [SPARK-30770][ML] avoid vector conversion in GMM.transform

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27519: [SPARK-30770][ML] avoid vector conversion in GMM.transform
URL: https://github.com/apache/spark/pull/27519#issuecomment-593847698
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/119212/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhengruifeng commented on a change in pull request #27519: [SPARK-30770][ML] avoid vector conversion in GMM.transform

Posted by GitBox <gi...@apache.org>.
zhengruifeng commented on a change in pull request #27519: [SPARK-30770][ML] avoid vector conversion in GMM.transform
URL: https://github.com/apache/spark/pull/27519#discussion_r386239057
 
 

 ##########
 File path: python/pyspark/ml/clustering.py
 ##########
 @@ -252,7 +252,7 @@ class GaussianMixture(JavaEstimator, _GaussianMixtureParams, JavaMLWritable, Jav
     >>> model.predict(df.head().features)
     2
     >>> model.predictProbability(df.head().features)
-    DenseVector([0.0, 0.4736, 0.5264])
+    DenseVector([0.0, 0.0, 1.0])
 
 Review comment:
   ![image](https://user-images.githubusercontent.com/7322292/75656008-01064e00-5c9e-11ea-945c-2c95daf1de2c.png)
   
   The result in `model.transform(df).show()` are not `DenseVector([0.0, 0.0, 1.0])`, it is about `[6.74824658670777...`;
   
   but the `model.transform(df).head()` shows `DenseVector([0.0, 0.0, 1.0])`.
   
   Is this a kind of rounding?
   
   ![image](https://user-images.githubusercontent.com/7322292/75656394-cc46c680-5c9e-11ea-8b0a-a83b4527bf0e.png)
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27519: [SPARK-30770][ML][WIP] avoid vector conversion in GMM.transform

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27519: [SPARK-30770][ML][WIP] avoid vector conversion in GMM.transform
URL: https://github.com/apache/spark/pull/27519#issuecomment-590696126
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27519: [SPARK-30770][ML][WIP] avoid vector conversion in GMM.transform

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27519: [SPARK-30770][ML][WIP] avoid vector conversion in GMM.transform
URL: https://github.com/apache/spark/pull/27519#issuecomment-586895018
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/118550/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27519: [SPARK-30770][ML][WIP] avoid vector conversion in GMM.transform

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27519: [SPARK-30770][ML][WIP] avoid vector conversion in GMM.transform
URL: https://github.com/apache/spark/pull/27519#issuecomment-586827879
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/118529/
   Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on issue #27519: [SPARK-30770][ML] avoid vector conversion in GMM.transform

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on issue #27519: [SPARK-30770][ML] avoid vector conversion in GMM.transform
URL: https://github.com/apache/spark/pull/27519#issuecomment-591859670
 
 
   **[Test build #119020 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/119020/testReport)** for PR 27519 at commit [`87472a4`](https://github.com/apache/spark/commit/87472a41aa1f08474e03341f9e5fe09b594cab77).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27519: [SPARK-30770][ML] avoid vector conversion in GMM.transform

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27519: [SPARK-30770][ML] avoid vector conversion in GMM.transform
URL: https://github.com/apache/spark/pull/27519#issuecomment-591895185
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27519: [SPARK-30770][ML][WIP] avoid vector conversion in GMM.transform

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27519: [SPARK-30770][ML][WIP] avoid vector conversion in GMM.transform
URL: https://github.com/apache/spark/pull/27519#issuecomment-590726302
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/118900/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27519: [SPARK-30770][ML][WIP] avoid vector conversion in GMM.transform

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27519: [SPARK-30770][ML][WIP] avoid vector conversion in GMM.transform
URL: https://github.com/apache/spark/pull/27519#issuecomment-586895010
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27519: [SPARK-30770][ML][WIP] avoid vector conversion in GMM.transform

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27519: [SPARK-30770][ML][WIP] avoid vector conversion in GMM.transform
URL: https://github.com/apache/spark/pull/27519#issuecomment-590678930
 
 
   **[Test build #118894 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/118894/testReport)** for PR 27519 at commit [`4c61f20`](https://github.com/apache/spark/commit/4c61f20d07e6b4044d869c72f1a063c29057f368).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27519: [SPARK-30770][ML][WIP] avoid vector conversion in GMM.transform

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27519: [SPARK-30770][ML][WIP] avoid vector conversion in GMM.transform
URL: https://github.com/apache/spark/pull/27519#issuecomment-590696132
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/118894/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27519: [SPARK-30770][ML][WIP] avoid vector conversion in GMM.transform

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27519: [SPARK-30770][ML][WIP] avoid vector conversion in GMM.transform
URL: https://github.com/apache/spark/pull/27519#issuecomment-590679152
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27519: [SPARK-30770][ML] avoid vector conversion in GMM.transform

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27519: [SPARK-30770][ML] avoid vector conversion in GMM.transform
URL: https://github.com/apache/spark/pull/27519#issuecomment-584034455
 
 
   **[Test build #118149 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/118149/testReport)** for PR 27519 at commit [`73080ba`](https://github.com/apache/spark/commit/73080baad44df990bd5806a0de062c669a558506).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27519: [SPARK-30770][ML][WIP] avoid vector conversion in GMM.transform

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27519: [SPARK-30770][ML][WIP] avoid vector conversion in GMM.transform
URL: https://github.com/apache/spark/pull/27519#issuecomment-586806555
 
 
   **[Test build #118524 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/118524/testReport)** for PR 27519 at commit [`eb3b40a`](https://github.com/apache/spark/commit/eb3b40aac9683856f9827a60ee6239a87b202e80).
    * This patch **fails MiMa tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27519: [SPARK-30770][ML][WIP] avoid vector conversion in GMM.transform

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27519: [SPARK-30770][ML][WIP] avoid vector conversion in GMM.transform
URL: https://github.com/apache/spark/pull/27519#issuecomment-586895010
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhengruifeng commented on a change in pull request #27519: [SPARK-30770][ML][WIP] avoid vector conversion in GMM.transform

Posted by GitBox <gi...@apache.org>.
zhengruifeng commented on a change in pull request #27519: [SPARK-30770][ML][WIP] avoid vector conversion in GMM.transform
URL: https://github.com/apache/spark/pull/27519#discussion_r376937067
 
 

 ##########
 File path: mllib-local/src/main/scala/org/apache/spark/ml/stat/distribution/MultivariateGaussian.scala
 ##########
 @@ -43,46 +43,40 @@ class MultivariateGaussian @Since("2.0.0") (
   require(cov.numCols == cov.numRows, "Covariance matrix must be square")
   require(mean.size == cov.numCols, "Mean vector length must match covariance matrix size")
 
-  /** Private constructor taking Breeze types */
-  private[ml] def this(mean: BDV[Double], cov: BDM[Double]) = {
-    this(Vectors.fromBreeze(mean), Matrices.fromBreeze(cov))
-  }
-
-  @transient private lazy val breezeMu = mean.asBreeze.toDenseVector
-
   /**
    * Compute distribution dependent constants:
    *    rootSigmaInv = D^(-1/2)^ * U.t, where sigma = U * D * U.t
    *    u = log((2*pi)^(-k/2)^ * det(sigma)^(-1/2)^)
    */
-  @transient private lazy val (rootSigmaInv: BDM[Double], u: Double) = calculateCovarianceConstants
+  @transient private lazy val tuple3 = {
 
 Review comment:
   it is said in [LeastSquaresAggregator](https://github.com/apache/spark/blob/12e1bbaddbb2ef304b5880a62df6683fcc94ea54/mllib/src/main/scala/org/apache/spark/ml/optim/aggregator/LeastSquaresAggregator.scala#L188) that 
   
   > // do not use tuple assignment above because it will circumvent the @transient tag

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27519: [SPARK-30770][ML][WIP] avoid vector conversion in GMM.transform

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27519: [SPARK-30770][ML][WIP] avoid vector conversion in GMM.transform
URL: https://github.com/apache/spark/pull/27519#issuecomment-590704587
 
 
   **[Test build #118900 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/118900/testReport)** for PR 27519 at commit [`7686e04`](https://github.com/apache/spark/commit/7686e04c648384251b98c0c335c084b1f654188e).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on issue #27519: [SPARK-30770][ML][WIP] avoid vector conversion in GMM.transform

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on issue #27519: [SPARK-30770][ML][WIP] avoid vector conversion in GMM.transform
URL: https://github.com/apache/spark/pull/27519#issuecomment-590678930
 
 
   **[Test build #118894 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/118894/testReport)** for PR 27519 at commit [`4c61f20`](https://github.com/apache/spark/commit/4c61f20d07e6b4044d869c72f1a063c29057f368).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27519: [SPARK-30770][ML] avoid vector conversion in GMM.transform

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27519: [SPARK-30770][ML] avoid vector conversion in GMM.transform
URL: https://github.com/apache/spark/pull/27519#issuecomment-591860101
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/23767/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27519: [SPARK-30770][ML][WIP] avoid vector conversion in GMM.transform

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27519: [SPARK-30770][ML][WIP] avoid vector conversion in GMM.transform
URL: https://github.com/apache/spark/pull/27519#issuecomment-586827874
 
 
   Merged build finished. Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhengruifeng commented on issue #27519: [SPARK-30770][ML] avoid vector conversion in GMM.transform

Posted by GitBox <gi...@apache.org>.
zhengruifeng commented on issue #27519: [SPARK-30770][ML] avoid vector conversion in GMM.transform
URL: https://github.com/apache/spark/pull/27519#issuecomment-591873551
 
 
   IIRC, in Scikit-Learn use KMeans to initialize the `mean` coefficients of GMM by default, which make GMM more stable.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27519: [SPARK-30770][ML] avoid vector conversion in GMM.transform

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27519: [SPARK-30770][ML] avoid vector conversion in GMM.transform
URL: https://github.com/apache/spark/pull/27519#issuecomment-584034816
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/22911/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] srowen commented on issue #27519: [SPARK-30770][ML] avoid vector conversion in GMM.transform

Posted by GitBox <gi...@apache.org>.
srowen commented on issue #27519: [SPARK-30770][ML] avoid vector conversion in GMM.transform
URL: https://github.com/apache/spark/pull/27519#issuecomment-591461201
 
 
   If you can make the computation more stable, great, do that. Otherwise yes I'd increase the iterations.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27519: [SPARK-30770][ML][WIP] avoid vector conversion in GMM.transform

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27519: [SPARK-30770][ML][WIP] avoid vector conversion in GMM.transform
URL: https://github.com/apache/spark/pull/27519#issuecomment-590725834
 
 
   **[Test build #118900 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/118900/testReport)** for PR 27519 at commit [`7686e04`](https://github.com/apache/spark/commit/7686e04c648384251b98c0c335c084b1f654188e).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org