You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2020/03/27 07:12:54 UTC

[GitHub] [spark] zhengruifeng opened a new pull request #28045: [SPARK-31283][ML] Simplify ChiSq by adding a common method

zhengruifeng opened a new pull request #28045: [SPARK-31283][ML] Simplify ChiSq by adding a common method
URL: https://github.com/apache/spark/pull/28045
 
 
   ### What changes were proposed in this pull request?
   add a common method `computeChiSq` and reuse it in both `chiSquaredDenseFeatures` and `chiSquaredSparseFeatures`
   
   ### Why are the changes needed?
   to simplify ChiSq
   
   
   ### Does this PR introduce any user-facing change?
   No
   
   
   ### How was this patch tested?
   existing testsuites
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #28045: [SPARK-31283][ML] Simplify ChiSq by adding a common method

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28045: [SPARK-31283][ML] Simplify ChiSq by adding a common method
URL: https://github.com/apache/spark/pull/28045#issuecomment-604870530
 
 
   Merged build finished. Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #28045: [SPARK-31283][ML] Simplify ChiSq by adding a common method

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28045: [SPARK-31283][ML] Simplify ChiSq by adding a common method
URL: https://github.com/apache/spark/pull/28045#issuecomment-604853447
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25169/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] huaxingao commented on a change in pull request #28045: [SPARK-31283][ML] Simplify ChiSq by adding a common method

Posted by GitBox <gi...@apache.org>.
huaxingao commented on a change in pull request #28045: [SPARK-31283][ML] Simplify ChiSq by adding a common method
URL: https://github.com/apache/spark/pull/28045#discussion_r399687400
 
 

 ##########
 File path: mllib/src/main/scala/org/apache/spark/mllib/stat/test/ChiSqTest.scala
 ##########
 @@ -224,12 +170,11 @@ private[spark] object ChiSqTest extends Logging {
 
     if (results.size < numFeatures) {
 
 Review comment:
   Seems the test suite doesn't cover this ```if (results.size < numFeatures)``` path? Is it worth adding the coverage?
   I mean add something to test this path?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #28045: [SPARK-31283][ML] Simplify ChiSq by adding a common method

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28045: [SPARK-31283][ML] Simplify ChiSq by adding a common method
URL: https://github.com/apache/spark/pull/28045#issuecomment-604853441
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #28045: [SPARK-31283][ML] Simplify ChiSq by adding a common method

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28045: [SPARK-31283][ML] Simplify ChiSq by adding a common method
URL: https://github.com/apache/spark/pull/28045#issuecomment-604870530
 
 
   Merged build finished. Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on issue #28045: [SPARK-31283][ML] Simplify ChiSq by adding a common method

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on issue #28045: [SPARK-31283][ML] Simplify ChiSq by adding a common method
URL: https://github.com/apache/spark/pull/28045#issuecomment-604852895
 
 
   **[Test build #120462 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120462/testReport)** for PR 28045 at commit [`79831c2`](https://github.com/apache/spark/commit/79831c2a6f3e527cebaff70f1546b31876e25e28).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #28045: [SPARK-31283][ML] Simplify ChiSq by adding a common method

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28045: [SPARK-31283][ML] Simplify ChiSq by adding a common method
URL: https://github.com/apache/spark/pull/28045#issuecomment-604873475
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25178/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #28045: [SPARK-31283][ML] Simplify ChiSq by adding a common method

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28045: [SPARK-31283][ML] Simplify ChiSq by adding a common method
URL: https://github.com/apache/spark/pull/28045#issuecomment-604853441
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #28045: [SPARK-31283][ML] Simplify ChiSq by adding a common method

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28045: [SPARK-31283][ML] Simplify ChiSq by adding a common method
URL: https://github.com/apache/spark/pull/28045#issuecomment-604912243
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120472/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #28045: [SPARK-31283][ML] Simplify ChiSq by adding a common method

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #28045: [SPARK-31283][ML] Simplify ChiSq by adding a common method
URL: https://github.com/apache/spark/pull/28045#issuecomment-604872986
 
 
   **[Test build #120472 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120472/testReport)** for PR 28045 at commit [`79831c2`](https://github.com/apache/spark/commit/79831c2a6f3e527cebaff70f1546b31876e25e28).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhengruifeng commented on a change in pull request #28045: [SPARK-31283][ML] Simplify ChiSq by adding a common method

Posted by GitBox <gi...@apache.org>.
zhengruifeng commented on a change in pull request #28045: [SPARK-31283][ML] Simplify ChiSq by adding a common method
URL: https://github.com/apache/spark/pull/28045#discussion_r399935947
 
 

 ##########
 File path: mllib/src/main/scala/org/apache/spark/mllib/stat/test/ChiSqTest.scala
 ##########
 @@ -224,12 +170,11 @@ private[spark] object ChiSqTest extends Logging {
 
     if (results.size < numFeatures) {
 
 Review comment:
   This is cover in `ChiSquareTestSuite` via case `test DataFrame of sparse points`.
   The values of the last feature are always zero, it will trigger this `if (results.size < numFeatures)` path

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] huaxingao commented on issue #28045: [SPARK-31283][ML] Simplify ChiSq by adding a common method

Posted by GitBox <gi...@apache.org>.
huaxingao commented on issue #28045: [SPARK-31283][ML] Simplify ChiSq by adding a common method
URL: https://github.com/apache/spark/pull/28045#issuecomment-605493038
 
 
   LGTM

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhengruifeng commented on issue #28045: [SPARK-31283][ML] Simplify ChiSq by adding a common method

Posted by GitBox <gi...@apache.org>.
zhengruifeng commented on issue #28045: [SPARK-31283][ML] Simplify ChiSq by adding a common method
URL: https://github.com/apache/spark/pull/28045#issuecomment-604871110
 
 
   retest this please

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #28045: [SPARK-31283][ML] Simplify ChiSq by adding a common method

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28045: [SPARK-31283][ML] Simplify ChiSq by adding a common method
URL: https://github.com/apache/spark/pull/28045#issuecomment-604912232
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #28045: [SPARK-31283][ML] Simplify ChiSq by adding a common method

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28045: [SPARK-31283][ML] Simplify ChiSq by adding a common method
URL: https://github.com/apache/spark/pull/28045#issuecomment-604873464
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] srowen commented on a change in pull request #28045: [SPARK-31283][ML] Simplify ChiSq by adding a common method

Posted by GitBox <gi...@apache.org>.
srowen commented on a change in pull request #28045: [SPARK-31283][ML] Simplify ChiSq by adding a common method
URL: https://github.com/apache/spark/pull/28045#discussion_r399821160
 
 

 ##########
 File path: mllib/src/main/scala/org/apache/spark/mllib/stat/test/ChiSqTest.scala
 ##########
 @@ -94,127 +94,73 @@ private[spark] object ChiSqTest extends Logging {
       methodName: String = PEARSON.name): Array[ChiSqTestResult] = {
     data.flatMap { case LabeledPoint(label, features) =>
       require(features.size == numFeatures)
-      features.iterator.map { case (col, value) =>
-        (col, (value, label))
-      }
+      features.iterator.map { case (col, value) => (col, (label, value)) }
     }.aggregateByKey(new OpenHashMap[(Double, Double), Long])(
-      seqOp = { case (count, t) =>
-        count.changeValue(t, 1L, _ + 1L)
-        count
+      seqOp = { case (counts, t) =>
+        counts.changeValue(t, 1L, _ + 1L)
+        counts
       },
-      combOp = { case (count1, count2) =>
-        count2.iterator.foreach { case (t, c) =>
-          count1.changeValue(t, c, _ + c)
-        }
-        count1
-      }
-    ).map { case (col, count) =>
-      val label2Index = count.iterator.map(_._1._2).toArray.distinct.sorted.zipWithIndex.toMap
-      val numLabels = label2Index.size
-      if (numLabels > maxCategories) {
-        throw new SparkException(s"Chi-square test expect factors (categorical values) but "
-          + s"found more than $maxCategories distinct label values.")
-      }
-
-      val value2Index = count.iterator.map(_._1._1).toArray.distinct.sorted.zipWithIndex.toMap
-      val numValues = value2Index.size
-      if (numValues > maxCategories) {
-        throw new SparkException(s"Chi-square test expect factors (categorical values) but "
-          + s"found more than $maxCategories distinct values in column $col.")
+      combOp = { case (counts1, counts2) =>
+        counts2.foreach { case (t, c) => counts1.changeValue(t, c, _ + c) }
 
 Review comment:
   OK, so the iterator isn't helping here if just used with foreach. OK.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhengruifeng commented on issue #28045: [SPARK-31283][ML] Simplify ChiSq by adding a common method

Posted by GitBox <gi...@apache.org>.
zhengruifeng commented on issue #28045: [SPARK-31283][ML] Simplify ChiSq by adding a common method
URL: https://github.com/apache/spark/pull/28045#issuecomment-605793227
 
 
   Merged to master, thanks @srowen @huaxingao for reviewing

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #28045: [SPARK-31283][ML] Simplify ChiSq by adding a common method

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28045: [SPARK-31283][ML] Simplify ChiSq by adding a common method
URL: https://github.com/apache/spark/pull/28045#issuecomment-604873464
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #28045: [SPARK-31283][ML] Simplify ChiSq by adding a common method

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28045: [SPARK-31283][ML] Simplify ChiSq by adding a common method
URL: https://github.com/apache/spark/pull/28045#issuecomment-604870537
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120462/
   Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #28045: [SPARK-31283][ML] Simplify ChiSq by adding a common method

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #28045: [SPARK-31283][ML] Simplify ChiSq by adding a common method
URL: https://github.com/apache/spark/pull/28045#issuecomment-604870200
 
 
   **[Test build #120462 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120462/testReport)** for PR 28045 at commit [`79831c2`](https://github.com/apache/spark/commit/79831c2a6f3e527cebaff70f1546b31876e25e28).
    * This patch **fails PySpark unit tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #28045: [SPARK-31283][ML] Simplify ChiSq by adding a common method

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #28045: [SPARK-31283][ML] Simplify ChiSq by adding a common method
URL: https://github.com/apache/spark/pull/28045#issuecomment-604911403
 
 
   **[Test build #120472 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120472/testReport)** for PR 28045 at commit [`79831c2`](https://github.com/apache/spark/commit/79831c2a6f3e527cebaff70f1546b31876e25e28).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #28045: [SPARK-31283][ML] Simplify ChiSq by adding a common method

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28045: [SPARK-31283][ML] Simplify ChiSq by adding a common method
URL: https://github.com/apache/spark/pull/28045#issuecomment-604873475
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25178/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #28045: [SPARK-31283][ML] Simplify ChiSq by adding a common method

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28045: [SPARK-31283][ML] Simplify ChiSq by adding a common method
URL: https://github.com/apache/spark/pull/28045#issuecomment-604912243
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120472/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #28045: [SPARK-31283][ML] Simplify ChiSq by adding a common method

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #28045: [SPARK-31283][ML] Simplify ChiSq by adding a common method
URL: https://github.com/apache/spark/pull/28045#issuecomment-604852895
 
 
   **[Test build #120462 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120462/testReport)** for PR 28045 at commit [`79831c2`](https://github.com/apache/spark/commit/79831c2a6f3e527cebaff70f1546b31876e25e28).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #28045: [SPARK-31283][ML] Simplify ChiSq by adding a common method

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28045: [SPARK-31283][ML] Simplify ChiSq by adding a common method
URL: https://github.com/apache/spark/pull/28045#issuecomment-604853447
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25169/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhengruifeng closed pull request #28045: [SPARK-31283][ML] Simplify ChiSq by adding a common method

Posted by GitBox <gi...@apache.org>.
zhengruifeng closed pull request #28045: [SPARK-31283][ML] Simplify ChiSq by adding a common method
URL: https://github.com/apache/spark/pull/28045
 
 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] huaxingao commented on a change in pull request #28045: [SPARK-31283][ML] Simplify ChiSq by adding a common method

Posted by GitBox <gi...@apache.org>.
huaxingao commented on a change in pull request #28045: [SPARK-31283][ML] Simplify ChiSq by adding a common method
URL: https://github.com/apache/spark/pull/28045#discussion_r399687400
 
 

 ##########
 File path: mllib/src/main/scala/org/apache/spark/mllib/stat/test/ChiSqTest.scala
 ##########
 @@ -224,12 +170,11 @@ private[spark] object ChiSqTest extends Logging {
 
     if (results.size < numFeatures) {
 
 Review comment:
   Seems the test suite doesn't cover this ```if (results.size < numFeatures)``` path? Is it worth adding the coverage?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on issue #28045: [SPARK-31283][ML] Simplify ChiSq by adding a common method

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on issue #28045: [SPARK-31283][ML] Simplify ChiSq by adding a common method
URL: https://github.com/apache/spark/pull/28045#issuecomment-604872986
 
 
   **[Test build #120472 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120472/testReport)** for PR 28045 at commit [`79831c2`](https://github.com/apache/spark/commit/79831c2a6f3e527cebaff70f1546b31876e25e28).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #28045: [SPARK-31283][ML] Simplify ChiSq by adding a common method

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28045: [SPARK-31283][ML] Simplify ChiSq by adding a common method
URL: https://github.com/apache/spark/pull/28045#issuecomment-604912232
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhengruifeng commented on a change in pull request #28045: [SPARK-31283][ML] Simplify ChiSq by adding a common method

Posted by GitBox <gi...@apache.org>.
zhengruifeng commented on a change in pull request #28045: [SPARK-31283][ML] Simplify ChiSq by adding a common method
URL: https://github.com/apache/spark/pull/28045#discussion_r399935947
 
 

 ##########
 File path: mllib/src/main/scala/org/apache/spark/mllib/stat/test/ChiSqTest.scala
 ##########
 @@ -224,12 +170,11 @@ private[spark] object ChiSqTest extends Logging {
 
     if (results.size < numFeatures) {
 
 Review comment:
   This is covered in `ChiSquareTestSuite` via case `test DataFrame of sparse points`.
   The values of the last feature are always zero, it will trigger this `if (results.size < numFeatures)` path

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #28045: [SPARK-31283][ML] Simplify ChiSq by adding a common method

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28045: [SPARK-31283][ML] Simplify ChiSq by adding a common method
URL: https://github.com/apache/spark/pull/28045#issuecomment-604870537
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120462/
   Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org