You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2020/03/23 05:38:25 UTC

[GitHub] [spark] zhengruifeng opened a new pull request #27982: [SPARK-31222][ML] Make ANOVATest Sparsity-Aware

zhengruifeng opened a new pull request #27982: [SPARK-31222][ML] Make ANOVATest Sparsity-Aware
URL: https://github.com/apache/spark/pull/27982
 
 
   ### What changes were proposed in this pull request?
   when input dataset is sparse, make `ANOVATest` only process non-zero value
   
   
   ### Why are the changes needed?
   for performance
   
   
   ### Does this PR introduce any user-facing change?
   No
   
   
   ### How was this patch tested?
   existing testsuites
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhengruifeng commented on issue #27982: [SPARK-31222][ML] Make ANOVATest Sparsity-Aware

Posted by GitBox <gi...@apache.org>.
zhengruifeng commented on issue #27982: [SPARK-31222][ML] Make ANOVATest Sparsity-Aware
URL: https://github.com/apache/spark/pull/27982#issuecomment-606364469
 
 
   Merged to master, thanks all

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27982: [SPARK-31222][ML] Make ANOVATest Sparsity-Aware

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27982: [SPARK-31222][ML] Make ANOVATest Sparsity-Aware
URL: https://github.com/apache/spark/pull/27982#issuecomment-605953728
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27982: [SPARK-31222][ML] Make ANOVATest Sparsity-Aware

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27982: [SPARK-31222][ML] Make ANOVATest Sparsity-Aware
URL: https://github.com/apache/spark/pull/27982#issuecomment-603629698
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhengruifeng commented on a change in pull request #27982: [SPARK-31222][ML] Make ANOVATest Sparsity-Aware

Posted by GitBox <gi...@apache.org>.
zhengruifeng commented on a change in pull request #27982: [SPARK-31222][ML] Make ANOVATest Sparsity-Aware
URL: https://github.com/apache/spark/pull/27982#discussion_r399052759
 
 

 ##########
 File path: mllib/src/main/scala/org/apache/spark/ml/stat/ANOVATest.scala
 ##########
 @@ -80,65 +81,141 @@ object ANOVATest {
     SchemaUtils.checkColumnType(dataset.schema, featuresCol, new VectorUDT)
     SchemaUtils.checkNumericType(dataset.schema, labelCol)
 
-    dataset.select(col(labelCol).cast("double"), col(featuresCol))
-      .as[(Double, Vector)]
-      .rdd
-      .flatMap { case (label, features) =>
-        features.iterator.map { case (col, value) => (col, (label, value)) }
-      }.aggregateByKey[(Double, Double, OpenHashMap[Double, Double], OpenHashMap[Double, Long])](
-        (0.0, 0.0, new OpenHashMap[Double, Double], new OpenHashMap[Double, Long]))(
-        seqOp = {
-          case ((sum, sumOfSq, sums, counts), (label, value)) =>
-            // sums: mapOfSumPerClass (key: label, value: sum of features for each label)
-            // counts: mapOfCountPerClass key: label, value: count of features for each label
-            sums.changeValue(label, value, _ + value)
-            counts.changeValue(label, 1L, _ + 1L)
-            (sum + value, sumOfSq + value * value, sums, counts)
-        },
-        combOp = {
-          case ((sum1, sumOfSq1, sums1, counts1), (sum2, sumOfSq2, sums2, counts2)) =>
-            sums2.foreach { case (v, w) => sums1.changeValue(v, w, _ + w) }
-            counts2.foreach { case (v, w) => counts1.changeValue(v, w, _ + w) }
-            (sum1 + sum2, sumOfSq1 + sumOfSq2, sums1, counts1)
-        }
-        ).map { case (col, (sum, sumOfSq, sums, counts)) =>
-          val numSamples = counts.iterator.map(_._2).sum
-          val numClasses = counts.size
-
-          // e.g. features are [3.3, 2.5, 1.0, 3.0, 2.0] and labels are [1, 2, 1, 3, 3]
-          // sum: sum of all the features (3.3+2.5+1.0+3.0+2.0)
-          // sumOfSq: sum of squares of all the features (3.3^2+2.5^2+1.0^2+3.0^2+2.0^2)
+    val points = dataset.select(col(labelCol).cast("double"), col(featuresCol))
+      .as[(Double, Vector)].rdd
+
+    points.first()._2 match {
+      case dv: DenseVector =>
+        testClassificationDenseFeatures(points, dv.size)
+      case sv: SparseVector =>
+        testClassificationSparseFeatures(points, sv.size)
+    }
+  }
+
+  private def testClassificationDenseFeatures(
+      points: RDD[(Double, Vector)],
+      numFeatures: Int): Array[SelectionTestResult] = {
+    points.flatMap { case (label, features) =>
+      require(features.size == numFeatures,
+        s"Number of features must be $numFeatures but got ${features.size}")
+      features.iterator.map { case (col, value) => (col, (label, value)) }
+    }.aggregateByKey[(Double, Double, OpenHashMap[Double, Double], OpenHashMap[Double, Long])](
+      (0.0, 0.0, new OpenHashMap[Double, Double], new OpenHashMap[Double, Long]))(
+      seqOp = {
+        case ((sum, sumOfSq, sums, counts), (label, value)) =>
+          // sums: mapOfSumPerClass (key: label, value: sum of features for each label)
+          // counts: mapOfCountPerClass key: label, value: count of features for each label
+          sums.changeValue(label, value, _ + value)
+          counts.changeValue(label, 1L, _ + 1L)
+          (sum + value, sumOfSq + value * value, sums, counts)
+      },
+      combOp = {
+        case ((sum1, sumOfSq1, sums1, counts1), (sum2, sumOfSq2, sums2, counts2)) =>
+          sums2.foreach { case (v, w) => sums1.changeValue(v, w, _ + w) }
+          counts2.foreach { case (v, w) => counts1.changeValue(v, w, _ + w) }
+          (sum1 + sum2, sumOfSq1 + sumOfSq2, sums1, counts1)
+      }
+    ).mapValues { case (sum, sumOfSq, sums, counts) =>
+      computeANOVA(sum, sumOfSq, sums.toMap, counts.toMap)
+    }.collect().sortBy(_._1).map {
+      case (_, (pValue, degreesOfFreedom, fValue)) =>
+        new ANOVATestResult(pValue, degreesOfFreedom, fValue)
+    }
+  }
+
+  private def testClassificationSparseFeatures(
+      points: RDD[(Double, Vector)],
+      numFeatures: Int): Array[SelectionTestResult] = {
+    val sc = points.sparkContext
+    val counts = points.map(_._1).countByValue().toMap
+    val bcCounts = sc.broadcast(counts)
+
+    val results = points.flatMap { case (label, features) =>
+      require(features.size == numFeatures,
+        s"Number of features must be $numFeatures but got ${features.size}")
+      features.nonZeroIterator.map { case (col, value) => (col, (label, value)) }
+    }.aggregateByKey[(Double, Double, OpenHashMap[Double, Double])](
+      (0.0, 0.0, new OpenHashMap[Double, Double]))(
+      seqOp = {
+        case ((sum, sumOfSq, sums), (label, value)) =>
           // sums: mapOfSumPerClass (key: label, value: sum of features for each label)
-          //                                         ( 1 -> 3.3 + 1.0, 2 -> 2.5, 3 -> 3.0 + 2.0 )
-          // counts: mapOfCountPerClass (key: label, value: count of features for each label)
-          //                                         ( 1 -> 2, 2 -> 2, 3 -> 2 )
-          // sqSum: square of sum of all data ((3.3+2.5+1.0+3.0+2.0)^2)
-          val sqSum = sum * sum
-          val ssTot = sumOfSq - sqSum / numSamples
-
-          // sumOfSqSumPerClass:
-          //     sum( sq_sum_classes[k] / n_samples_per_class[k] for k in range(n_classes))
-          //     e.g. ((3.3+1.0)^2 / 2 + 2.5^2 / 1 + (3.0+2.0)^2 / 2)
-          val sumOfSqSumPerClass = sums.iterator
-            .map { case (label, sum) => sum * sum / counts(label) }.sum
-          // Sums of Squares Between
-          val ssbn = sumOfSqSumPerClass - (sqSum / numSamples)
-          // Sums of Squares Within
-          val sswn = ssTot - ssbn
-          // degrees of freedom between
-          val dfbn = numClasses - 1
-          // degrees of freedom within
-          val dfwn = numSamples - numClasses
-          // mean square between
-          val msb = ssbn / dfbn
-          // mean square within
-          val msw = sswn / dfwn
-          val fValue = msb / msw
-          val pValue = 1 - new FDistribution(dfbn, dfwn).cumulativeProbability(fValue)
-          (col, pValue, dfbn + dfwn, fValue)
-        }.collect().sortBy(_._1).map {
-          case (col, pValue, degreesOfFreedom, fValue) =>
-            new ANOVATestResult(pValue, degreesOfFreedom, fValue)
-        }
+          sums.changeValue(label, value, _ + value)
+          (sum + value, sumOfSq + value * value, sums)
+      },
+      combOp = {
+        case ((sum1, sumOfSq1, sums1), (sum2, sumOfSq2, sums2)) =>
+          sums2.foreach { case (v, w) => sums1.changeValue(v, w, _ + w) }
+          (sum1 + sum2, sumOfSq1 + sumOfSq2, sums1)
+      }
+    ).mapValues { case (sum, sumOfSq, sums) =>
+      val counts = bcCounts.value
+      counts.keysIterator.foreach { label =>
+        // adjust sums if all related feature values are 0 for some label
+        if (!sums.contains(label)) sums.update(label, 0.0)
+      }
+      computeANOVA(sum, sumOfSq, sums.toMap, counts)
+    }.collectAsMap()
+
+    bcCounts.destroy()
+
+    val finalResults = Array.ofDim[SelectionTestResult](numFeatures)
+    results.foreach { case (col, (pValue, degreesOfFreedom, fValue)) =>
+      finalResults(col) = new ANOVATestResult(pValue, degreesOfFreedom, fValue)
+    }
+
+    if (results.size < numFeatures) {
+      // if some column only contains 0 values
+      val (pValue, degreesOfFreedom, fValue) =
+        computeANOVA(0.0, 0.0, counts.mapValues(_ => 0.0), counts)
 
 Review comment:
   Yes, because the result contains `degreesOfFreedom` which is related to the input `counts`, I prefer to call `computeANOVA` here.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27982: [SPARK-31222][ML] Make ANOVATest Sparsity-Aware

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27982: [SPARK-31222][ML] Make ANOVATest Sparsity-Aware
URL: https://github.com/apache/spark/pull/27982#issuecomment-603610999
 
 
   **[Test build #120303 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120303/testReport)** for PR 27982 at commit [`ce4e043`](https://github.com/apache/spark/commit/ce4e0433503149ed90db8b2e0157b8cab0a3a819).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhengruifeng commented on a change in pull request #27982: [SPARK-31222][ML] Make ANOVATest Sparsity-Aware

Posted by GitBox <gi...@apache.org>.
zhengruifeng commented on a change in pull request #27982: [SPARK-31222][ML] Make ANOVATest Sparsity-Aware
URL: https://github.com/apache/spark/pull/27982#discussion_r396222712
 
 

 ##########
 File path: mllib/src/test/scala/org/apache/spark/ml/stat/ANOVATestSuite.scala
 ##########
 @@ -144,22 +144,30 @@ class ANOVATestSuite
   }
 
   test("test DataFrame with sparse vector") {
-    val df = spark.createDataFrame(Seq(
-      (3, Vectors.sparse(6, Array((0, 6.0), (1, 7.0), (3, 7.0), (4, 6.0)))),
-      (1, Vectors.sparse(6, Array((1, 9.0), (2, 6.0), (4, 5.0), (5, 9.0)))),
-      (3, Vectors.sparse(6, Array((1, 9.0), (2, 3.0), (4, 5.0), (5, 5.0)))),
-      (2, Vectors.dense(Array(0.0, 9.0, 8.0, 5.0, 6.0, 4.0))),
-      (2, Vectors.dense(Array(8.0, 9.0, 6.0, 5.0, 4.0, 4.0))),
-      (3, Vectors.dense(Array(8.0, 9.0, 6.0, 4.0, 0.0, 0.0)))
-    )).toDF("label", "features")
+    val data = Seq(
+      (3, Vectors.dense(Array(6.0, 7.0, 0.0, 7.0, 6.0, 0.0, 0.0))),
+      (1, Vectors.dense(Array(0.0, 9.0, 6.0, 0.0, 5.0, 9.0, 0.0))),
+      (3, Vectors.dense(Array(0.0, 9.0, 3.0, 0.0, 5.0, 5.0, 0.0))),
+      (2, Vectors.dense(Array(0.0, 9.0, 8.0, 5.0, 6.0, 4.0, 0.0))),
+      (2, Vectors.dense(Array(8.0, 9.0, 6.0, 5.0, 4.0, 4.0, 0.0))),
+      (3, Vectors.dense(Array(8.0, 9.0, 6.0, 4.0, 0.0, 0.0, 0.0))))
 
-    val anovaResult = ANOVATest.test(df, "features", "label")
-    val (pValues: Vector, fValues: Vector) =
-      anovaResult.select("pValues", "fValues")
-        .as[(Vector, Vector)].head()
-    assert(pValues ~== Vectors.dense(0.71554175, 0.71554175, 0.34278574, 0.45824059, 0.84633632,
-      0.15673368) relTol 1e-6)
-    assert(fValues ~== Vectors.dense(0.375, 0.375, 1.5625, 1.02364865, 0.17647059,
-      3.66) relTol 1e-6)
+    val df1 = spark.createDataFrame(data.map(t => (t._1, t._2.toDense)))
+      .toDF("label", "features")
+    val df2 = spark.createDataFrame(data.map(t => (t._1, t._2.toSparse)))
+      .toDF("label", "features")
+    val df3 = spark.createDataFrame(data.map(t => (t._1, t._2.compressed)))
+      .toDF("label", "features")
+
+    Seq(df1, df2, df3).foreach { df =>
+      val anovaResult = ANOVATest.test(df, "features", "label")
+      val (pValues: Vector, fValues: Vector) =
+        anovaResult.select("pValues", "fValues")
+          .as[(Vector, Vector)].head()
+      assert(pValues ~== Vectors.dense(0.71554175, 0.71554175, 0.34278574, 0.45824059, 0.84633632,
+        0.15673368, Double.NaN) relTol 1e-6)
 
 Review comment:
   for column only containing zero values, sklearn will return nan:
   ```python
   X = np.zeros([3,5])
   
   y = [1,2,3]
   
   f_classif(X, y)
   /home/zrf/Applications/anaconda3/lib/python3.7/site-packages/sklearn/feature_selection/_univariate_selection.py:110: RuntimeWarning: invalid value encountered in true_divide
     msw = sswn / float(dfwn)
   Out[24]: (array([nan, nan, nan, nan, nan]), array([nan, nan, nan, nan, nan]))
   =
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhengruifeng commented on a change in pull request #27982: [SPARK-31222][ML] Make ANOVATest Sparsity-Aware

Posted by GitBox <gi...@apache.org>.
zhengruifeng commented on a change in pull request #27982: [SPARK-31222][ML] Make ANOVATest Sparsity-Aware
URL: https://github.com/apache/spark/pull/27982#discussion_r396222712
 
 

 ##########
 File path: mllib/src/test/scala/org/apache/spark/ml/stat/ANOVATestSuite.scala
 ##########
 @@ -144,22 +144,30 @@ class ANOVATestSuite
   }
 
   test("test DataFrame with sparse vector") {
-    val df = spark.createDataFrame(Seq(
-      (3, Vectors.sparse(6, Array((0, 6.0), (1, 7.0), (3, 7.0), (4, 6.0)))),
-      (1, Vectors.sparse(6, Array((1, 9.0), (2, 6.0), (4, 5.0), (5, 9.0)))),
-      (3, Vectors.sparse(6, Array((1, 9.0), (2, 3.0), (4, 5.0), (5, 5.0)))),
-      (2, Vectors.dense(Array(0.0, 9.0, 8.0, 5.0, 6.0, 4.0))),
-      (2, Vectors.dense(Array(8.0, 9.0, 6.0, 5.0, 4.0, 4.0))),
-      (3, Vectors.dense(Array(8.0, 9.0, 6.0, 4.0, 0.0, 0.0)))
-    )).toDF("label", "features")
+    val data = Seq(
+      (3, Vectors.dense(Array(6.0, 7.0, 0.0, 7.0, 6.0, 0.0, 0.0))),
+      (1, Vectors.dense(Array(0.0, 9.0, 6.0, 0.0, 5.0, 9.0, 0.0))),
+      (3, Vectors.dense(Array(0.0, 9.0, 3.0, 0.0, 5.0, 5.0, 0.0))),
+      (2, Vectors.dense(Array(0.0, 9.0, 8.0, 5.0, 6.0, 4.0, 0.0))),
+      (2, Vectors.dense(Array(8.0, 9.0, 6.0, 5.0, 4.0, 4.0, 0.0))),
+      (3, Vectors.dense(Array(8.0, 9.0, 6.0, 4.0, 0.0, 0.0, 0.0))))
 
-    val anovaResult = ANOVATest.test(df, "features", "label")
-    val (pValues: Vector, fValues: Vector) =
-      anovaResult.select("pValues", "fValues")
-        .as[(Vector, Vector)].head()
-    assert(pValues ~== Vectors.dense(0.71554175, 0.71554175, 0.34278574, 0.45824059, 0.84633632,
-      0.15673368) relTol 1e-6)
-    assert(fValues ~== Vectors.dense(0.375, 0.375, 1.5625, 1.02364865, 0.17647059,
-      3.66) relTol 1e-6)
+    val df1 = spark.createDataFrame(data.map(t => (t._1, t._2.toDense)))
+      .toDF("label", "features")
+    val df2 = spark.createDataFrame(data.map(t => (t._1, t._2.toSparse)))
+      .toDF("label", "features")
+    val df3 = spark.createDataFrame(data.map(t => (t._1, t._2.compressed)))
+      .toDF("label", "features")
+
+    Seq(df1, df2, df3).foreach { df =>
+      val anovaResult = ANOVATest.test(df, "features", "label")
+      val (pValues: Vector, fValues: Vector) =
+        anovaResult.select("pValues", "fValues")
+          .as[(Vector, Vector)].head()
+      assert(pValues ~== Vectors.dense(0.71554175, 0.71554175, 0.34278574, 0.45824059, 0.84633632,
+        0.15673368, Double.NaN) relTol 1e-6)
 
 Review comment:
   for column only containing zero values, sklearn also returns `nan`:
   ```python
   X = np.zeros([3,5])
   
   y = [1,2,3]
   
   f_classif(X, y)
   /home/zrf/Applications/anaconda3/lib/python3.7/site-packages/sklearn/feature_selection/_univariate_selection.py:110: RuntimeWarning: invalid value encountered in true_divide
     msw = sswn / float(dfwn)
   Out[24]: (array([nan, nan, nan, nan, nan]), array([nan, nan, nan, nan, nan]))
   =
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27982: [SPARK-31222][ML] Make ANOVATest Sparsity-Aware

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27982: [SPARK-31222][ML] Make ANOVATest Sparsity-Aware
URL: https://github.com/apache/spark/pull/27982#issuecomment-605820650
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27982: [SPARK-31222][ML] Make ANOVATest Sparsity-Aware

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27982: [SPARK-31222][ML] Make ANOVATest Sparsity-Aware
URL: https://github.com/apache/spark/pull/27982#issuecomment-605959264
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120587/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27982: [SPARK-31222][ML] Make ANOVATest Sparsity-Aware

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27982: [SPARK-31222][ML] Make ANOVATest Sparsity-Aware
URL: https://github.com/apache/spark/pull/27982#issuecomment-605795870
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25278/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27982: [SPARK-31222][ML] Make ANOVATest Sparsity-Aware

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27982: [SPARK-31222][ML] Make ANOVATest Sparsity-Aware
URL: https://github.com/apache/spark/pull/27982#issuecomment-605922494
 
 
   **[Test build #120586 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120586/testReport)** for PR 27982 at commit [`cfddf80`](https://github.com/apache/spark/commit/cfddf80c438001ae6bb0627fb48d6340d065365d).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27982: [SPARK-31222][ML] Make ANOVATest Sparsity-Aware

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27982: [SPARK-31222][ML] Make ANOVATest Sparsity-Aware
URL: https://github.com/apache/spark/pull/27982#issuecomment-605923026
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27982: [SPARK-31222][ML] Make ANOVATest Sparsity-Aware

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27982: [SPARK-31222][ML] Make ANOVATest Sparsity-Aware
URL: https://github.com/apache/spark/pull/27982#issuecomment-605925949
 
 
   **[Test build #120587 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120587/testReport)** for PR 27982 at commit [`c561610`](https://github.com/apache/spark/commit/c561610a9d01fc1e03e6e70d70be8284164dcb9e).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on issue #27982: [SPARK-31222][ML] Make ANOVATest Sparsity-Aware

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on issue #27982: [SPARK-31222][ML] Make ANOVATest Sparsity-Aware
URL: https://github.com/apache/spark/pull/27982#issuecomment-602396898
 
 
   **[Test build #120177 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120177/testReport)** for PR 27982 at commit [`cd968ff`](https://github.com/apache/spark/commit/cd968ffe90aef52e37acdb37d5fc6261143fb20c).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27982: [SPARK-31222][ML] Make ANOVATest Sparsity-Aware

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27982: [SPARK-31222][ML] Make ANOVATest Sparsity-Aware
URL: https://github.com/apache/spark/pull/27982#issuecomment-605959264
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120587/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27982: [SPARK-31222][ML] Make ANOVATest Sparsity-Aware

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27982: [SPARK-31222][ML] Make ANOVATest Sparsity-Aware
URL: https://github.com/apache/spark/pull/27982#issuecomment-605820660
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120574/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27982: [SPARK-31222][ML] Make ANOVATest Sparsity-Aware

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27982: [SPARK-31222][ML] Make ANOVATest Sparsity-Aware
URL: https://github.com/apache/spark/pull/27982#issuecomment-602420196
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27982: [SPARK-31222][ML] Make ANOVATest Sparsity-Aware

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27982: [SPARK-31222][ML] Make ANOVATest Sparsity-Aware
URL: https://github.com/apache/spark/pull/27982#issuecomment-603611255
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27982: [SPARK-31222][ML] Make ANOVATest Sparsity-Aware

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27982: [SPARK-31222][ML] Make ANOVATest Sparsity-Aware
URL: https://github.com/apache/spark/pull/27982#issuecomment-603629477
 
 
   **[Test build #120303 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120303/testReport)** for PR 27982 at commit [`ce4e043`](https://github.com/apache/spark/commit/ce4e0433503149ed90db8b2e0157b8cab0a3a819).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27982: [SPARK-31222][ML] Make ANOVATest Sparsity-Aware

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27982: [SPARK-31222][ML] Make ANOVATest Sparsity-Aware
URL: https://github.com/apache/spark/pull/27982#issuecomment-602397229
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/24890/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27982: [SPARK-31222][ML] Make ANOVATest Sparsity-Aware

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27982: [SPARK-31222][ML] Make ANOVATest Sparsity-Aware
URL: https://github.com/apache/spark/pull/27982#issuecomment-602418223
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27982: [SPARK-31222][ML] Make ANOVATest Sparsity-Aware

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27982: [SPARK-31222][ML] Make ANOVATest Sparsity-Aware
URL: https://github.com/apache/spark/pull/27982#issuecomment-605953741
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120586/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on issue #27982: [SPARK-31222][ML] Make ANOVATest Sparsity-Aware

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on issue #27982: [SPARK-31222][ML] Make ANOVATest Sparsity-Aware
URL: https://github.com/apache/spark/pull/27982#issuecomment-605925949
 
 
   **[Test build #120587 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120587/testReport)** for PR 27982 at commit [`c561610`](https://github.com/apache/spark/commit/c561610a9d01fc1e03e6e70d70be8284164dcb9e).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27982: [SPARK-31222][ML] Make ANOVATest Sparsity-Aware

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27982: [SPARK-31222][ML] Make ANOVATest Sparsity-Aware
URL: https://github.com/apache/spark/pull/27982#issuecomment-605926551
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25290/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27982: [SPARK-31222][ML] Make ANOVATest Sparsity-Aware

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27982: [SPARK-31222][ML] Make ANOVATest Sparsity-Aware
URL: https://github.com/apache/spark/pull/27982#issuecomment-605926546
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27982: [SPARK-31222][ML] Make ANOVATest Sparsity-Aware

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27982: [SPARK-31222][ML] Make ANOVATest Sparsity-Aware
URL: https://github.com/apache/spark/pull/27982#issuecomment-605919542
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25287/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] huaxingao commented on a change in pull request #27982: [SPARK-31222][ML] Make ANOVATest Sparsity-Aware

Posted by GitBox <gi...@apache.org>.
huaxingao commented on a change in pull request #27982: [SPARK-31222][ML] Make ANOVATest Sparsity-Aware
URL: https://github.com/apache/spark/pull/27982#discussion_r398921906
 
 

 ##########
 File path: mllib/src/main/scala/org/apache/spark/ml/stat/ANOVATest.scala
 ##########
 @@ -80,65 +81,141 @@ object ANOVATest {
     SchemaUtils.checkColumnType(dataset.schema, featuresCol, new VectorUDT)
     SchemaUtils.checkNumericType(dataset.schema, labelCol)
 
-    dataset.select(col(labelCol).cast("double"), col(featuresCol))
-      .as[(Double, Vector)]
-      .rdd
-      .flatMap { case (label, features) =>
-        features.iterator.map { case (col, value) => (col, (label, value)) }
-      }.aggregateByKey[(Double, Double, OpenHashMap[Double, Double], OpenHashMap[Double, Long])](
-        (0.0, 0.0, new OpenHashMap[Double, Double], new OpenHashMap[Double, Long]))(
-        seqOp = {
-          case ((sum, sumOfSq, sums, counts), (label, value)) =>
-            // sums: mapOfSumPerClass (key: label, value: sum of features for each label)
-            // counts: mapOfCountPerClass key: label, value: count of features for each label
-            sums.changeValue(label, value, _ + value)
-            counts.changeValue(label, 1L, _ + 1L)
-            (sum + value, sumOfSq + value * value, sums, counts)
-        },
-        combOp = {
-          case ((sum1, sumOfSq1, sums1, counts1), (sum2, sumOfSq2, sums2, counts2)) =>
-            sums2.foreach { case (v, w) => sums1.changeValue(v, w, _ + w) }
-            counts2.foreach { case (v, w) => counts1.changeValue(v, w, _ + w) }
-            (sum1 + sum2, sumOfSq1 + sumOfSq2, sums1, counts1)
-        }
-        ).map { case (col, (sum, sumOfSq, sums, counts)) =>
-          val numSamples = counts.iterator.map(_._2).sum
-          val numClasses = counts.size
-
-          // e.g. features are [3.3, 2.5, 1.0, 3.0, 2.0] and labels are [1, 2, 1, 3, 3]
-          // sum: sum of all the features (3.3+2.5+1.0+3.0+2.0)
-          // sumOfSq: sum of squares of all the features (3.3^2+2.5^2+1.0^2+3.0^2+2.0^2)
+    val points = dataset.select(col(labelCol).cast("double"), col(featuresCol))
+      .as[(Double, Vector)].rdd
+
+    points.first()._2 match {
+      case dv: DenseVector =>
+        testClassificationDenseFeatures(points, dv.size)
+      case sv: SparseVector =>
+        testClassificationSparseFeatures(points, sv.size)
+    }
+  }
+
+  private def testClassificationDenseFeatures(
+      points: RDD[(Double, Vector)],
+      numFeatures: Int): Array[SelectionTestResult] = {
+    points.flatMap { case (label, features) =>
+      require(features.size == numFeatures,
+        s"Number of features must be $numFeatures but got ${features.size}")
+      features.iterator.map { case (col, value) => (col, (label, value)) }
+    }.aggregateByKey[(Double, Double, OpenHashMap[Double, Double], OpenHashMap[Double, Long])](
+      (0.0, 0.0, new OpenHashMap[Double, Double], new OpenHashMap[Double, Long]))(
+      seqOp = {
+        case ((sum, sumOfSq, sums, counts), (label, value)) =>
+          // sums: mapOfSumPerClass (key: label, value: sum of features for each label)
+          // counts: mapOfCountPerClass key: label, value: count of features for each label
+          sums.changeValue(label, value, _ + value)
+          counts.changeValue(label, 1L, _ + 1L)
+          (sum + value, sumOfSq + value * value, sums, counts)
+      },
+      combOp = {
+        case ((sum1, sumOfSq1, sums1, counts1), (sum2, sumOfSq2, sums2, counts2)) =>
+          sums2.foreach { case (v, w) => sums1.changeValue(v, w, _ + w) }
+          counts2.foreach { case (v, w) => counts1.changeValue(v, w, _ + w) }
+          (sum1 + sum2, sumOfSq1 + sumOfSq2, sums1, counts1)
+      }
+    ).mapValues { case (sum, sumOfSq, sums, counts) =>
+      computeANOVA(sum, sumOfSq, sums.toMap, counts.toMap)
+    }.collect().sortBy(_._1).map {
+      case (_, (pValue, degreesOfFreedom, fValue)) =>
+        new ANOVATestResult(pValue, degreesOfFreedom, fValue)
+    }
+  }
+
+  private def testClassificationSparseFeatures(
+      points: RDD[(Double, Vector)],
+      numFeatures: Int): Array[SelectionTestResult] = {
+    val sc = points.sparkContext
+    val counts = points.map(_._1).countByValue().toMap
+    val bcCounts = sc.broadcast(counts)
+
+    val results = points.flatMap { case (label, features) =>
+      require(features.size == numFeatures,
+        s"Number of features must be $numFeatures but got ${features.size}")
+      features.nonZeroIterator.map { case (col, value) => (col, (label, value)) }
+    }.aggregateByKey[(Double, Double, OpenHashMap[Double, Double])](
+      (0.0, 0.0, new OpenHashMap[Double, Double]))(
+      seqOp = {
+        case ((sum, sumOfSq, sums), (label, value)) =>
           // sums: mapOfSumPerClass (key: label, value: sum of features for each label)
-          //                                         ( 1 -> 3.3 + 1.0, 2 -> 2.5, 3 -> 3.0 + 2.0 )
-          // counts: mapOfCountPerClass (key: label, value: count of features for each label)
-          //                                         ( 1 -> 2, 2 -> 2, 3 -> 2 )
-          // sqSum: square of sum of all data ((3.3+2.5+1.0+3.0+2.0)^2)
-          val sqSum = sum * sum
-          val ssTot = sumOfSq - sqSum / numSamples
-
-          // sumOfSqSumPerClass:
-          //     sum( sq_sum_classes[k] / n_samples_per_class[k] for k in range(n_classes))
-          //     e.g. ((3.3+1.0)^2 / 2 + 2.5^2 / 1 + (3.0+2.0)^2 / 2)
-          val sumOfSqSumPerClass = sums.iterator
-            .map { case (label, sum) => sum * sum / counts(label) }.sum
-          // Sums of Squares Between
-          val ssbn = sumOfSqSumPerClass - (sqSum / numSamples)
-          // Sums of Squares Within
-          val sswn = ssTot - ssbn
-          // degrees of freedom between
-          val dfbn = numClasses - 1
-          // degrees of freedom within
-          val dfwn = numSamples - numClasses
-          // mean square between
-          val msb = ssbn / dfbn
-          // mean square within
-          val msw = sswn / dfwn
-          val fValue = msb / msw
-          val pValue = 1 - new FDistribution(dfbn, dfwn).cumulativeProbability(fValue)
-          (col, pValue, dfbn + dfwn, fValue)
-        }.collect().sortBy(_._1).map {
-          case (col, pValue, degreesOfFreedom, fValue) =>
-            new ANOVATestResult(pValue, degreesOfFreedom, fValue)
-        }
+          sums.changeValue(label, value, _ + value)
+          (sum + value, sumOfSq + value * value, sums)
+      },
+      combOp = {
+        case ((sum1, sumOfSq1, sums1), (sum2, sumOfSq2, sums2)) =>
+          sums2.foreach { case (v, w) => sums1.changeValue(v, w, _ + w) }
+          (sum1 + sum2, sumOfSq1 + sumOfSq2, sums1)
+      }
+    ).mapValues { case (sum, sumOfSq, sums) =>
+      val counts = bcCounts.value
+      counts.keysIterator.foreach { label =>
+        // adjust sums if all related feature values are 0 for some label
+        if (!sums.contains(label)) sums.update(label, 0.0)
+      }
+      computeANOVA(sum, sumOfSq, sums.toMap, counts)
+    }.collectAsMap()
+
+    bcCounts.destroy()
+
+    val finalResults = Array.ofDim[SelectionTestResult](numFeatures)
+    results.foreach { case (col, (pValue, degreesOfFreedom, fValue)) =>
+      finalResults(col) = new ANOVATestResult(pValue, degreesOfFreedom, fValue)
+    }
+
+    if (results.size < numFeatures) {
+      // if some column only contains 0 values
+      val (pValue, degreesOfFreedom, fValue) =
+        computeANOVA(0.0, 0.0, counts.mapValues(_ => 0.0), counts)
 
 Review comment:
   For the category that only contains value 0s, the ```pValue``` and ```fValue``` are NaN. Maybe just get the DF and skip calling ```computeANOVA```?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27982: [SPARK-31222][ML] Make ANOVATest Sparsity-Aware

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27982: [SPARK-31222][ML] Make ANOVATest Sparsity-Aware
URL: https://github.com/apache/spark/pull/27982#issuecomment-602398991
 
 
   **[Test build #120178 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120178/testReport)** for PR 27982 at commit [`594b830`](https://github.com/apache/spark/commit/594b8304888850a15c67746a47cc37f4baa01354).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27982: [SPARK-31222][ML] Make ANOVATest Sparsity-Aware

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27982: [SPARK-31222][ML] Make ANOVATest Sparsity-Aware
URL: https://github.com/apache/spark/pull/27982#issuecomment-605954187
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] huaxingao commented on a change in pull request #27982: [SPARK-31222][ML] Make ANOVATest Sparsity-Aware

Posted by GitBox <gi...@apache.org>.
huaxingao commented on a change in pull request #27982: [SPARK-31222][ML] Make ANOVATest Sparsity-Aware
URL: https://github.com/apache/spark/pull/27982#discussion_r399740772
 
 

 ##########
 File path: mllib/src/main/scala/org/apache/spark/ml/stat/ANOVATest.scala
 ##########
 @@ -80,65 +81,141 @@ object ANOVATest {
     SchemaUtils.checkColumnType(dataset.schema, featuresCol, new VectorUDT)
     SchemaUtils.checkNumericType(dataset.schema, labelCol)
 
-    dataset.select(col(labelCol).cast("double"), col(featuresCol))
-      .as[(Double, Vector)]
-      .rdd
-      .flatMap { case (label, features) =>
-        features.iterator.map { case (col, value) => (col, (label, value)) }
-      }.aggregateByKey[(Double, Double, OpenHashMap[Double, Double], OpenHashMap[Double, Long])](
-        (0.0, 0.0, new OpenHashMap[Double, Double], new OpenHashMap[Double, Long]))(
-        seqOp = {
-          case ((sum, sumOfSq, sums, counts), (label, value)) =>
-            // sums: mapOfSumPerClass (key: label, value: sum of features for each label)
-            // counts: mapOfCountPerClass key: label, value: count of features for each label
-            sums.changeValue(label, value, _ + value)
-            counts.changeValue(label, 1L, _ + 1L)
-            (sum + value, sumOfSq + value * value, sums, counts)
-        },
-        combOp = {
-          case ((sum1, sumOfSq1, sums1, counts1), (sum2, sumOfSq2, sums2, counts2)) =>
-            sums2.foreach { case (v, w) => sums1.changeValue(v, w, _ + w) }
-            counts2.foreach { case (v, w) => counts1.changeValue(v, w, _ + w) }
-            (sum1 + sum2, sumOfSq1 + sumOfSq2, sums1, counts1)
-        }
-        ).map { case (col, (sum, sumOfSq, sums, counts)) =>
-          val numSamples = counts.iterator.map(_._2).sum
-          val numClasses = counts.size
-
-          // e.g. features are [3.3, 2.5, 1.0, 3.0, 2.0] and labels are [1, 2, 1, 3, 3]
-          // sum: sum of all the features (3.3+2.5+1.0+3.0+2.0)
-          // sumOfSq: sum of squares of all the features (3.3^2+2.5^2+1.0^2+3.0^2+2.0^2)
+    val points = dataset.select(col(labelCol).cast("double"), col(featuresCol))
+      .as[(Double, Vector)].rdd
+
+    points.first()._2 match {
+      case dv: DenseVector =>
+        testClassificationDenseFeatures(points, dv.size)
+      case sv: SparseVector =>
+        testClassificationSparseFeatures(points, sv.size)
+    }
+  }
+
+  private def testClassificationDenseFeatures(
+      points: RDD[(Double, Vector)],
+      numFeatures: Int): Array[SelectionTestResult] = {
+    points.flatMap { case (label, features) =>
+      require(features.size == numFeatures,
+        s"Number of features must be $numFeatures but got ${features.size}")
+      features.iterator.map { case (col, value) => (col, (label, value)) }
+    }.aggregateByKey[(Double, Double, OpenHashMap[Double, Double], OpenHashMap[Double, Long])](
+      (0.0, 0.0, new OpenHashMap[Double, Double], new OpenHashMap[Double, Long]))(
+      seqOp = {
+        case ((sum, sumOfSq, sums, counts), (label, value)) =>
+          // sums: mapOfSumPerClass (key: label, value: sum of features for each label)
+          // counts: mapOfCountPerClass key: label, value: count of features for each label
+          sums.changeValue(label, value, _ + value)
+          counts.changeValue(label, 1L, _ + 1L)
+          (sum + value, sumOfSq + value * value, sums, counts)
+      },
+      combOp = {
+        case ((sum1, sumOfSq1, sums1, counts1), (sum2, sumOfSq2, sums2, counts2)) =>
+          sums2.foreach { case (v, w) => sums1.changeValue(v, w, _ + w) }
+          counts2.foreach { case (v, w) => counts1.changeValue(v, w, _ + w) }
+          (sum1 + sum2, sumOfSq1 + sumOfSq2, sums1, counts1)
+      }
+    ).mapValues { case (sum, sumOfSq, sums, counts) =>
+      computeANOVA(sum, sumOfSq, sums.toMap, counts.toMap)
+    }.collect().sortBy(_._1).map {
+      case (_, (pValue, degreesOfFreedom, fValue)) =>
+        new ANOVATestResult(pValue, degreesOfFreedom, fValue)
+    }
+  }
+
+  private def testClassificationSparseFeatures(
+      points: RDD[(Double, Vector)],
+      numFeatures: Int): Array[SelectionTestResult] = {
+    val sc = points.sparkContext
+    val counts = points.map(_._1).countByValue().toMap
+    val bcCounts = sc.broadcast(counts)
+
+    val results = points.flatMap { case (label, features) =>
+      require(features.size == numFeatures,
+        s"Number of features must be $numFeatures but got ${features.size}")
+      features.nonZeroIterator.map { case (col, value) => (col, (label, value)) }
+    }.aggregateByKey[(Double, Double, OpenHashMap[Double, Double])](
+      (0.0, 0.0, new OpenHashMap[Double, Double]))(
+      seqOp = {
+        case ((sum, sumOfSq, sums), (label, value)) =>
           // sums: mapOfSumPerClass (key: label, value: sum of features for each label)
-          //                                         ( 1 -> 3.3 + 1.0, 2 -> 2.5, 3 -> 3.0 + 2.0 )
-          // counts: mapOfCountPerClass (key: label, value: count of features for each label)
-          //                                         ( 1 -> 2, 2 -> 2, 3 -> 2 )
-          // sqSum: square of sum of all data ((3.3+2.5+1.0+3.0+2.0)^2)
-          val sqSum = sum * sum
-          val ssTot = sumOfSq - sqSum / numSamples
-
-          // sumOfSqSumPerClass:
-          //     sum( sq_sum_classes[k] / n_samples_per_class[k] for k in range(n_classes))
-          //     e.g. ((3.3+1.0)^2 / 2 + 2.5^2 / 1 + (3.0+2.0)^2 / 2)
-          val sumOfSqSumPerClass = sums.iterator
-            .map { case (label, sum) => sum * sum / counts(label) }.sum
-          // Sums of Squares Between
-          val ssbn = sumOfSqSumPerClass - (sqSum / numSamples)
-          // Sums of Squares Within
-          val sswn = ssTot - ssbn
-          // degrees of freedom between
-          val dfbn = numClasses - 1
-          // degrees of freedom within
-          val dfwn = numSamples - numClasses
-          // mean square between
-          val msb = ssbn / dfbn
-          // mean square within
-          val msw = sswn / dfwn
-          val fValue = msb / msw
-          val pValue = 1 - new FDistribution(dfbn, dfwn).cumulativeProbability(fValue)
-          (col, pValue, dfbn + dfwn, fValue)
-        }.collect().sortBy(_._1).map {
-          case (col, pValue, degreesOfFreedom, fValue) =>
-            new ANOVATestResult(pValue, degreesOfFreedom, fValue)
-        }
+          sums.changeValue(label, value, _ + value)
+          (sum + value, sumOfSq + value * value, sums)
+      },
+      combOp = {
+        case ((sum1, sumOfSq1, sums1), (sum2, sumOfSq2, sums2)) =>
+          sums2.foreach { case (v, w) => sums1.changeValue(v, w, _ + w) }
+          (sum1 + sum2, sumOfSq1 + sumOfSq2, sums1)
+      }
+    ).mapValues { case (sum, sumOfSq, sums) =>
+      val counts = bcCounts.value
+      counts.keysIterator.foreach { label =>
+        // adjust sums if all related feature values are 0 for some label
+        if (!sums.contains(label)) sums.update(label, 0.0)
+      }
+      computeANOVA(sum, sumOfSq, sums.toMap, counts)
+    }.collectAsMap()
+
+    bcCounts.destroy()
+
+    val finalResults = Array.ofDim[SelectionTestResult](numFeatures)
+    results.foreach { case (col, (pValue, degreesOfFreedom, fValue)) =>
+      finalResults(col) = new ANOVATestResult(pValue, degreesOfFreedom, fValue)
+    }
+
+    if (results.size < numFeatures) {
+      // if some column only contains 0 values
+      val (pValue, degreesOfFreedom, fValue) =
+        computeANOVA(0.0, 0.0, counts.mapValues(_ => 0.0), counts)
 
 Review comment:
   ```degreesOfFreedom = numSamples - 1```, right? no need to call ```computeANOVA```?
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27982: [SPARK-31222][ML] Make ANOVATest Sparsity-Aware

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27982: [SPARK-31222][ML] Make ANOVATest Sparsity-Aware
URL: https://github.com/apache/spark/pull/27982#issuecomment-602420199
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120178/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhengruifeng commented on a change in pull request #27982: [SPARK-31222][ML] Make ANOVATest Sparsity-Aware

Posted by GitBox <gi...@apache.org>.
zhengruifeng commented on a change in pull request #27982: [SPARK-31222][ML] Make ANOVATest Sparsity-Aware
URL: https://github.com/apache/spark/pull/27982#discussion_r399935112
 
 

 ##########
 File path: mllib/src/main/scala/org/apache/spark/ml/stat/ANOVATest.scala
 ##########
 @@ -80,65 +81,141 @@ object ANOVATest {
     SchemaUtils.checkColumnType(dataset.schema, featuresCol, new VectorUDT)
     SchemaUtils.checkNumericType(dataset.schema, labelCol)
 
-    dataset.select(col(labelCol).cast("double"), col(featuresCol))
-      .as[(Double, Vector)]
-      .rdd
-      .flatMap { case (label, features) =>
-        features.iterator.map { case (col, value) => (col, (label, value)) }
-      }.aggregateByKey[(Double, Double, OpenHashMap[Double, Double], OpenHashMap[Double, Long])](
-        (0.0, 0.0, new OpenHashMap[Double, Double], new OpenHashMap[Double, Long]))(
-        seqOp = {
-          case ((sum, sumOfSq, sums, counts), (label, value)) =>
-            // sums: mapOfSumPerClass (key: label, value: sum of features for each label)
-            // counts: mapOfCountPerClass key: label, value: count of features for each label
-            sums.changeValue(label, value, _ + value)
-            counts.changeValue(label, 1L, _ + 1L)
-            (sum + value, sumOfSq + value * value, sums, counts)
-        },
-        combOp = {
-          case ((sum1, sumOfSq1, sums1, counts1), (sum2, sumOfSq2, sums2, counts2)) =>
-            sums2.foreach { case (v, w) => sums1.changeValue(v, w, _ + w) }
-            counts2.foreach { case (v, w) => counts1.changeValue(v, w, _ + w) }
-            (sum1 + sum2, sumOfSq1 + sumOfSq2, sums1, counts1)
-        }
-        ).map { case (col, (sum, sumOfSq, sums, counts)) =>
-          val numSamples = counts.iterator.map(_._2).sum
-          val numClasses = counts.size
-
-          // e.g. features are [3.3, 2.5, 1.0, 3.0, 2.0] and labels are [1, 2, 1, 3, 3]
-          // sum: sum of all the features (3.3+2.5+1.0+3.0+2.0)
-          // sumOfSq: sum of squares of all the features (3.3^2+2.5^2+1.0^2+3.0^2+2.0^2)
+    val points = dataset.select(col(labelCol).cast("double"), col(featuresCol))
+      .as[(Double, Vector)].rdd
+
+    points.first()._2 match {
+      case dv: DenseVector =>
+        testClassificationDenseFeatures(points, dv.size)
+      case sv: SparseVector =>
+        testClassificationSparseFeatures(points, sv.size)
+    }
+  }
+
+  private def testClassificationDenseFeatures(
+      points: RDD[(Double, Vector)],
+      numFeatures: Int): Array[SelectionTestResult] = {
+    points.flatMap { case (label, features) =>
+      require(features.size == numFeatures,
+        s"Number of features must be $numFeatures but got ${features.size}")
+      features.iterator.map { case (col, value) => (col, (label, value)) }
+    }.aggregateByKey[(Double, Double, OpenHashMap[Double, Double], OpenHashMap[Double, Long])](
+      (0.0, 0.0, new OpenHashMap[Double, Double], new OpenHashMap[Double, Long]))(
+      seqOp = {
+        case ((sum, sumOfSq, sums, counts), (label, value)) =>
+          // sums: mapOfSumPerClass (key: label, value: sum of features for each label)
+          // counts: mapOfCountPerClass key: label, value: count of features for each label
+          sums.changeValue(label, value, _ + value)
+          counts.changeValue(label, 1L, _ + 1L)
+          (sum + value, sumOfSq + value * value, sums, counts)
+      },
+      combOp = {
+        case ((sum1, sumOfSq1, sums1, counts1), (sum2, sumOfSq2, sums2, counts2)) =>
+          sums2.foreach { case (v, w) => sums1.changeValue(v, w, _ + w) }
+          counts2.foreach { case (v, w) => counts1.changeValue(v, w, _ + w) }
+          (sum1 + sum2, sumOfSq1 + sumOfSq2, sums1, counts1)
+      }
+    ).mapValues { case (sum, sumOfSq, sums, counts) =>
+      computeANOVA(sum, sumOfSq, sums.toMap, counts.toMap)
+    }.collect().sortBy(_._1).map {
+      case (_, (pValue, degreesOfFreedom, fValue)) =>
+        new ANOVATestResult(pValue, degreesOfFreedom, fValue)
+    }
+  }
+
+  private def testClassificationSparseFeatures(
+      points: RDD[(Double, Vector)],
+      numFeatures: Int): Array[SelectionTestResult] = {
+    val sc = points.sparkContext
+    val counts = points.map(_._1).countByValue().toMap
+    val bcCounts = sc.broadcast(counts)
+
+    val results = points.flatMap { case (label, features) =>
+      require(features.size == numFeatures,
+        s"Number of features must be $numFeatures but got ${features.size}")
+      features.nonZeroIterator.map { case (col, value) => (col, (label, value)) }
+    }.aggregateByKey[(Double, Double, OpenHashMap[Double, Double])](
+      (0.0, 0.0, new OpenHashMap[Double, Double]))(
+      seqOp = {
+        case ((sum, sumOfSq, sums), (label, value)) =>
           // sums: mapOfSumPerClass (key: label, value: sum of features for each label)
-          //                                         ( 1 -> 3.3 + 1.0, 2 -> 2.5, 3 -> 3.0 + 2.0 )
-          // counts: mapOfCountPerClass (key: label, value: count of features for each label)
-          //                                         ( 1 -> 2, 2 -> 2, 3 -> 2 )
-          // sqSum: square of sum of all data ((3.3+2.5+1.0+3.0+2.0)^2)
-          val sqSum = sum * sum
-          val ssTot = sumOfSq - sqSum / numSamples
-
-          // sumOfSqSumPerClass:
-          //     sum( sq_sum_classes[k] / n_samples_per_class[k] for k in range(n_classes))
-          //     e.g. ((3.3+1.0)^2 / 2 + 2.5^2 / 1 + (3.0+2.0)^2 / 2)
-          val sumOfSqSumPerClass = sums.iterator
-            .map { case (label, sum) => sum * sum / counts(label) }.sum
-          // Sums of Squares Between
-          val ssbn = sumOfSqSumPerClass - (sqSum / numSamples)
-          // Sums of Squares Within
-          val sswn = ssTot - ssbn
-          // degrees of freedom between
-          val dfbn = numClasses - 1
-          // degrees of freedom within
-          val dfwn = numSamples - numClasses
-          // mean square between
-          val msb = ssbn / dfbn
-          // mean square within
-          val msw = sswn / dfwn
-          val fValue = msb / msw
-          val pValue = 1 - new FDistribution(dfbn, dfwn).cumulativeProbability(fValue)
-          (col, pValue, dfbn + dfwn, fValue)
-        }.collect().sortBy(_._1).map {
-          case (col, pValue, degreesOfFreedom, fValue) =>
-            new ANOVATestResult(pValue, degreesOfFreedom, fValue)
-        }
+          sums.changeValue(label, value, _ + value)
+          (sum + value, sumOfSq + value * value, sums)
+      },
+      combOp = {
+        case ((sum1, sumOfSq1, sums1), (sum2, sumOfSq2, sums2)) =>
+          sums2.foreach { case (v, w) => sums1.changeValue(v, w, _ + w) }
+          (sum1 + sum2, sumOfSq1 + sumOfSq2, sums1)
+      }
+    ).mapValues { case (sum, sumOfSq, sums) =>
+      val counts = bcCounts.value
+      counts.keysIterator.foreach { label =>
+        // adjust sums if all related feature values are 0 for some label
+        if (!sums.contains(label)) sums.update(label, 0.0)
+      }
+      computeANOVA(sum, sumOfSq, sums.toMap, counts)
+    }.collectAsMap()
+
+    bcCounts.destroy()
+
+    val finalResults = Array.ofDim[SelectionTestResult](numFeatures)
+    results.foreach { case (col, (pValue, degreesOfFreedom, fValue)) =>
+      finalResults(col) = new ANOVATestResult(pValue, degreesOfFreedom, fValue)
+    }
+
+    if (results.size < numFeatures) {
+      // if some column only contains 0 values
+      val (pValue, degreesOfFreedom, fValue) =
+        computeANOVA(0.0, 0.0, counts.mapValues(_ => 0.0), counts)
 
 Review comment:
   yes, but I think it maybe more consistent to return a result via `computeANOVA`

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27982: [SPARK-31222][ML] Make ANOVATest Sparsity-Aware

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27982: [SPARK-31222][ML] Make ANOVATest Sparsity-Aware
URL: https://github.com/apache/spark/pull/27982#issuecomment-605923038
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25289/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27982: [SPARK-31222][ML] Make ANOVATest Sparsity-Aware

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27982: [SPARK-31222][ML] Make ANOVATest Sparsity-Aware
URL: https://github.com/apache/spark/pull/27982#issuecomment-602397225
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27982: [SPARK-31222][ML] Make ANOVATest Sparsity-Aware

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27982: [SPARK-31222][ML] Make ANOVATest Sparsity-Aware
URL: https://github.com/apache/spark/pull/27982#issuecomment-603629698
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27982: [SPARK-31222][ML] Make ANOVATest Sparsity-Aware

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27982: [SPARK-31222][ML] Make ANOVATest Sparsity-Aware
URL: https://github.com/apache/spark/pull/27982#issuecomment-605795864
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27982: [SPARK-31222][ML] Make ANOVATest Sparsity-Aware

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27982: [SPARK-31222][ML] Make ANOVATest Sparsity-Aware
URL: https://github.com/apache/spark/pull/27982#issuecomment-605926551
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25290/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27982: [SPARK-31222][ML] Make ANOVATest Sparsity-Aware

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27982: [SPARK-31222][ML] Make ANOVATest Sparsity-Aware
URL: https://github.com/apache/spark/pull/27982#issuecomment-605919535
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27982: [SPARK-31222][ML] Make ANOVATest Sparsity-Aware

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27982: [SPARK-31222][ML] Make ANOVATest Sparsity-Aware
URL: https://github.com/apache/spark/pull/27982#issuecomment-605953732
 
 
   **[Test build #120584 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120584/testReport)** for PR 27982 at commit [`eeb6552`](https://github.com/apache/spark/commit/eeb655257214ade6e487f94fe15c0440e84645fe).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27982: [SPARK-31222][ML] Make ANOVATest Sparsity-Aware

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27982: [SPARK-31222][ML] Make ANOVATest Sparsity-Aware
URL: https://github.com/apache/spark/pull/27982#issuecomment-605795602
 
 
   **[Test build #120574 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120574/testReport)** for PR 27982 at commit [`ab95e33`](https://github.com/apache/spark/commit/ab95e3398acfbe50f8f91b8bef45955fdb24b416).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27982: [SPARK-31222][ML] Make ANOVATest Sparsity-Aware

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27982: [SPARK-31222][ML] Make ANOVATest Sparsity-Aware
URL: https://github.com/apache/spark/pull/27982#issuecomment-605958840
 
 
   **[Test build #120587 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120587/testReport)** for PR 27982 at commit [`c561610`](https://github.com/apache/spark/commit/c561610a9d01fc1e03e6e70d70be8284164dcb9e).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27982: [SPARK-31222][ML] Make ANOVATest Sparsity-Aware

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27982: [SPARK-31222][ML] Make ANOVATest Sparsity-Aware
URL: https://github.com/apache/spark/pull/27982#issuecomment-602399308
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/24891/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27982: [SPARK-31222][ML] Make ANOVATest Sparsity-Aware

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27982: [SPARK-31222][ML] Make ANOVATest Sparsity-Aware
URL: https://github.com/apache/spark/pull/27982#issuecomment-602418228
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120177/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on issue #27982: [SPARK-31222][ML] Make ANOVATest Sparsity-Aware

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on issue #27982: [SPARK-31222][ML] Make ANOVATest Sparsity-Aware
URL: https://github.com/apache/spark/pull/27982#issuecomment-603610999
 
 
   **[Test build #120303 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120303/testReport)** for PR 27982 at commit [`ce4e043`](https://github.com/apache/spark/commit/ce4e0433503149ed90db8b2e0157b8cab0a3a819).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] huaxingao commented on issue #27982: [SPARK-31222][ML] Make ANOVATest Sparsity-Aware

Posted by GitBox <gi...@apache.org>.
huaxingao commented on issue #27982: [SPARK-31222][ML] Make ANOVATest Sparsity-Aware
URL: https://github.com/apache/spark/pull/27982#issuecomment-602782545
 
 
   The optimization looks good to me; also not sure if there are many use cases with sparse data though. 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27982: [SPARK-31222][ML] Make ANOVATest Sparsity-Aware

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27982: [SPARK-31222][ML] Make ANOVATest Sparsity-Aware
URL: https://github.com/apache/spark/pull/27982#issuecomment-605919535
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27982: [SPARK-31222][ML] Make ANOVATest Sparsity-Aware

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27982: [SPARK-31222][ML] Make ANOVATest Sparsity-Aware
URL: https://github.com/apache/spark/pull/27982#issuecomment-605820650
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27982: [SPARK-31222][ML] Make ANOVATest Sparsity-Aware

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27982: [SPARK-31222][ML] Make ANOVATest Sparsity-Aware
URL: https://github.com/apache/spark/pull/27982#issuecomment-603611255
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27982: [SPARK-31222][ML] Make ANOVATest Sparsity-Aware

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27982: [SPARK-31222][ML] Make ANOVATest Sparsity-Aware
URL: https://github.com/apache/spark/pull/27982#issuecomment-603629702
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120303/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27982: [SPARK-31222][ML] Make ANOVATest Sparsity-Aware

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27982: [SPARK-31222][ML] Make ANOVATest Sparsity-Aware
URL: https://github.com/apache/spark/pull/27982#issuecomment-605918903
 
 
   **[Test build #120584 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120584/testReport)** for PR 27982 at commit [`eeb6552`](https://github.com/apache/spark/commit/eeb655257214ade6e487f94fe15c0440e84645fe).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27982: [SPARK-31222][ML] Make ANOVATest Sparsity-Aware

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27982: [SPARK-31222][ML] Make ANOVATest Sparsity-Aware
URL: https://github.com/apache/spark/pull/27982#issuecomment-602417833
 
 
   **[Test build #120177 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120177/testReport)** for PR 27982 at commit [`cd968ff`](https://github.com/apache/spark/commit/cd968ffe90aef52e37acdb37d5fc6261143fb20c).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27982: [SPARK-31222][ML] Make ANOVATest Sparsity-Aware

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27982: [SPARK-31222][ML] Make ANOVATest Sparsity-Aware
URL: https://github.com/apache/spark/pull/27982#issuecomment-602418223
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27982: [SPARK-31222][ML] Make ANOVATest Sparsity-Aware

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27982: [SPARK-31222][ML] Make ANOVATest Sparsity-Aware
URL: https://github.com/apache/spark/pull/27982#issuecomment-602396898
 
 
   **[Test build #120177 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120177/testReport)** for PR 27982 at commit [`cd968ff`](https://github.com/apache/spark/commit/cd968ffe90aef52e37acdb37d5fc6261143fb20c).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27982: [SPARK-31222][ML] Make ANOVATest Sparsity-Aware

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27982: [SPARK-31222][ML] Make ANOVATest Sparsity-Aware
URL: https://github.com/apache/spark/pull/27982#issuecomment-605953741
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120586/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27982: [SPARK-31222][ML] Make ANOVATest Sparsity-Aware

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27982: [SPARK-31222][ML] Make ANOVATest Sparsity-Aware
URL: https://github.com/apache/spark/pull/27982#issuecomment-605795864
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on issue #27982: [SPARK-31222][ML] Make ANOVATest Sparsity-Aware

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on issue #27982: [SPARK-31222][ML] Make ANOVATest Sparsity-Aware
URL: https://github.com/apache/spark/pull/27982#issuecomment-605918903
 
 
   **[Test build #120584 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120584/testReport)** for PR 27982 at commit [`eeb6552`](https://github.com/apache/spark/commit/eeb655257214ade6e487f94fe15c0440e84645fe).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27982: [SPARK-31222][ML] Make ANOVATest Sparsity-Aware

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27982: [SPARK-31222][ML] Make ANOVATest Sparsity-Aware
URL: https://github.com/apache/spark/pull/27982#issuecomment-603611259
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25013/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27982: [SPARK-31222][ML] Make ANOVATest Sparsity-Aware

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27982: [SPARK-31222][ML] Make ANOVATest Sparsity-Aware
URL: https://github.com/apache/spark/pull/27982#issuecomment-602420196
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27982: [SPARK-31222][ML] Make ANOVATest Sparsity-Aware

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27982: [SPARK-31222][ML] Make ANOVATest Sparsity-Aware
URL: https://github.com/apache/spark/pull/27982#issuecomment-603629702
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120303/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27982: [SPARK-31222][ML] Make ANOVATest Sparsity-Aware

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27982: [SPARK-31222][ML] Make ANOVATest Sparsity-Aware
URL: https://github.com/apache/spark/pull/27982#issuecomment-605923038
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25289/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27982: [SPARK-31222][ML] Make ANOVATest Sparsity-Aware

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27982: [SPARK-31222][ML] Make ANOVATest Sparsity-Aware
URL: https://github.com/apache/spark/pull/27982#issuecomment-605923026
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27982: [SPARK-31222][ML] Make ANOVATest Sparsity-Aware

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27982: [SPARK-31222][ML] Make ANOVATest Sparsity-Aware
URL: https://github.com/apache/spark/pull/27982#issuecomment-605820130
 
 
   **[Test build #120574 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120574/testReport)** for PR 27982 at commit [`ab95e33`](https://github.com/apache/spark/commit/ab95e3398acfbe50f8f91b8bef45955fdb24b416).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27982: [SPARK-31222][ML] Make ANOVATest Sparsity-Aware

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27982: [SPARK-31222][ML] Make ANOVATest Sparsity-Aware
URL: https://github.com/apache/spark/pull/27982#issuecomment-602397229
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/24890/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27982: [SPARK-31222][ML] Make ANOVATest Sparsity-Aware

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27982: [SPARK-31222][ML] Make ANOVATest Sparsity-Aware
URL: https://github.com/apache/spark/pull/27982#issuecomment-605954187
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27982: [SPARK-31222][ML] Make ANOVATest Sparsity-Aware

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27982: [SPARK-31222][ML] Make ANOVATest Sparsity-Aware
URL: https://github.com/apache/spark/pull/27982#issuecomment-602397225
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27982: [SPARK-31222][ML] Make ANOVATest Sparsity-Aware

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27982: [SPARK-31222][ML] Make ANOVATest Sparsity-Aware
URL: https://github.com/apache/spark/pull/27982#issuecomment-605953295
 
 
   **[Test build #120586 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120586/testReport)** for PR 27982 at commit [`cfddf80`](https://github.com/apache/spark/commit/cfddf80c438001ae6bb0627fb48d6340d065365d).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27982: [SPARK-31222][ML] Make ANOVATest Sparsity-Aware

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27982: [SPARK-31222][ML] Make ANOVATest Sparsity-Aware
URL: https://github.com/apache/spark/pull/27982#issuecomment-602399307
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27982: [SPARK-31222][ML] Make ANOVATest Sparsity-Aware

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27982: [SPARK-31222][ML] Make ANOVATest Sparsity-Aware
URL: https://github.com/apache/spark/pull/27982#issuecomment-605954200
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120584/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27982: [SPARK-31222][ML] Make ANOVATest Sparsity-Aware

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27982: [SPARK-31222][ML] Make ANOVATest Sparsity-Aware
URL: https://github.com/apache/spark/pull/27982#issuecomment-605954200
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120584/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27982: [SPARK-31222][ML] Make ANOVATest Sparsity-Aware

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27982: [SPARK-31222][ML] Make ANOVATest Sparsity-Aware
URL: https://github.com/apache/spark/pull/27982#issuecomment-605820660
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120574/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27982: [SPARK-31222][ML] Make ANOVATest Sparsity-Aware

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27982: [SPARK-31222][ML] Make ANOVATest Sparsity-Aware
URL: https://github.com/apache/spark/pull/27982#issuecomment-603611259
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25013/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] srowen commented on issue #27982: [SPARK-31222][ML] Make ANOVATest Sparsity-Aware

Posted by GitBox <gi...@apache.org>.
srowen commented on issue #27982: [SPARK-31222][ML] Make ANOVATest Sparsity-Aware
URL: https://github.com/apache/spark/pull/27982#issuecomment-602631796
 
 
   I think it's probably OK; you are only avoiding counts 0s in sum and sumSq right? not count.
   Would ANOVA be applied to sparse data regularly? not sure, maybe. Just wondering how important the extra complexity is.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27982: [SPARK-31222][ML] Make ANOVATest Sparsity-Aware

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27982: [SPARK-31222][ML] Make ANOVATest Sparsity-Aware
URL: https://github.com/apache/spark/pull/27982#issuecomment-605919542
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25287/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhengruifeng closed pull request #27982: [SPARK-31222][ML] Make ANOVATest Sparsity-Aware

Posted by GitBox <gi...@apache.org>.
zhengruifeng closed pull request #27982: [SPARK-31222][ML] Make ANOVATest Sparsity-Aware
URL: https://github.com/apache/spark/pull/27982
 
 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27982: [SPARK-31222][ML] Make ANOVATest Sparsity-Aware

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27982: [SPARK-31222][ML] Make ANOVATest Sparsity-Aware
URL: https://github.com/apache/spark/pull/27982#issuecomment-602420199
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120178/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on issue #27982: [SPARK-31222][ML] Make ANOVATest Sparsity-Aware

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on issue #27982: [SPARK-31222][ML] Make ANOVATest Sparsity-Aware
URL: https://github.com/apache/spark/pull/27982#issuecomment-605795602
 
 
   **[Test build #120574 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120574/testReport)** for PR 27982 at commit [`ab95e33`](https://github.com/apache/spark/commit/ab95e3398acfbe50f8f91b8bef45955fdb24b416).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhengruifeng commented on issue #27982: [SPARK-31222][ML] Make ANOVATest Sparsity-Aware

Posted by GitBox <gi...@apache.org>.
zhengruifeng commented on issue #27982: [SPARK-31222][ML] Make ANOVATest Sparsity-Aware
URL: https://github.com/apache/spark/pull/27982#issuecomment-603608126
 
 
   @srowen @huaxingao Thanks for reviewing!
   
   > you are only avoiding counts 0s in sum and sumSq right? not count.
   
   Yes, you are right.
   
   > Would ANOVA be applied to sparse data regularly?
   
   Yes, I think so. It is usual to encounter categorical label and sparse numerical features, such as public dataset `KDD12`.
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27982: [SPARK-31222][ML] Make ANOVATest Sparsity-Aware

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27982: [SPARK-31222][ML] Make ANOVATest Sparsity-Aware
URL: https://github.com/apache/spark/pull/27982#issuecomment-602418228
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120177/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27982: [SPARK-31222][ML] Make ANOVATest Sparsity-Aware

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27982: [SPARK-31222][ML] Make ANOVATest Sparsity-Aware
URL: https://github.com/apache/spark/pull/27982#issuecomment-602419834
 
 
   **[Test build #120178 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120178/testReport)** for PR 27982 at commit [`594b830`](https://github.com/apache/spark/commit/594b8304888850a15c67746a47cc37f4baa01354).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on issue #27982: [SPARK-31222][ML] Make ANOVATest Sparsity-Aware

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on issue #27982: [SPARK-31222][ML] Make ANOVATest Sparsity-Aware
URL: https://github.com/apache/spark/pull/27982#issuecomment-605922494
 
 
   **[Test build #120586 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120586/testReport)** for PR 27982 at commit [`cfddf80`](https://github.com/apache/spark/commit/cfddf80c438001ae6bb0627fb48d6340d065365d).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27982: [SPARK-31222][ML] Make ANOVATest Sparsity-Aware

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27982: [SPARK-31222][ML] Make ANOVATest Sparsity-Aware
URL: https://github.com/apache/spark/pull/27982#issuecomment-605953728
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27982: [SPARK-31222][ML] Make ANOVATest Sparsity-Aware

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27982: [SPARK-31222][ML] Make ANOVATest Sparsity-Aware
URL: https://github.com/apache/spark/pull/27982#issuecomment-605795870
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25278/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27982: [SPARK-31222][ML] Make ANOVATest Sparsity-Aware

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27982: [SPARK-31222][ML] Make ANOVATest Sparsity-Aware
URL: https://github.com/apache/spark/pull/27982#issuecomment-605959255
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27982: [SPARK-31222][ML] Make ANOVATest Sparsity-Aware

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27982: [SPARK-31222][ML] Make ANOVATest Sparsity-Aware
URL: https://github.com/apache/spark/pull/27982#issuecomment-605926546
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27982: [SPARK-31222][ML] Make ANOVATest Sparsity-Aware

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27982: [SPARK-31222][ML] Make ANOVATest Sparsity-Aware
URL: https://github.com/apache/spark/pull/27982#issuecomment-602399307
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27982: [SPARK-31222][ML] Make ANOVATest Sparsity-Aware

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27982: [SPARK-31222][ML] Make ANOVATest Sparsity-Aware
URL: https://github.com/apache/spark/pull/27982#issuecomment-602399308
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/24891/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27982: [SPARK-31222][ML] Make ANOVATest Sparsity-Aware

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27982: [SPARK-31222][ML] Make ANOVATest Sparsity-Aware
URL: https://github.com/apache/spark/pull/27982#issuecomment-605959255
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on issue #27982: [SPARK-31222][ML] Make ANOVATest Sparsity-Aware

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on issue #27982: [SPARK-31222][ML] Make ANOVATest Sparsity-Aware
URL: https://github.com/apache/spark/pull/27982#issuecomment-602398991
 
 
   **[Test build #120178 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120178/testReport)** for PR 27982 at commit [`594b830`](https://github.com/apache/spark/commit/594b8304888850a15c67746a47cc37f4baa01354).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org