You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2020/07/14 21:38:29 UTC

[GitHub] [spark] huaxingao opened a new pull request #29112: [SPARK-32310][ML][PySpark] ML params default value parity part 1

huaxingao opened a new pull request #29112:
URL: https://github.com/apache/spark/pull/29112


   ### What changes were proposed in this pull request?
   set params default values in trait ...Params in both Scala and Python.
   I will do this in two PRs. I will change classification, regression, clustering and fpm in this PR. Will change the rest in another PR.
   
   
   ### Why are the changes needed?
   Make ML has the same default param values between estimator and its corresponding transformer, and also between Scala and Python.
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   
   ### How was this patch tested?
   Existing tests
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29112: [SPARK-32310][ML][PySpark] ML params default value parity part 1

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29112:
URL: https://github.com/apache/spark/pull/29112#issuecomment-658452246






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] huaxingao commented on a change in pull request #29112: [SPARK-32310][ML][PySpark] ML params default value parity in classification, regression, clustering and fpm

Posted by GitBox <gi...@apache.org>.
huaxingao commented on a change in pull request #29112:
URL: https://github.com/apache/spark/pull/29112#discussion_r455923815



##########
File path: mllib/src/main/scala/org/apache/spark/ml/classification/FMClassifier.scala
##########
@@ -85,7 +85,6 @@ class FMClassifier @Since("3.0.0") (
    */
   @Since("3.0.0")
   def setFactorSize(value: Int): this.type = set(factorSize, value)
-  setDefault(factorSize -> 8)

Review comment:
       I move setDefault for these params to FactorizationMachinesParams in FMRegressor when I fixed solver last time




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29112: [SPARK-32310][ML][PySpark] ML params default value parity in classification, regression, clustering and fpm

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29112:
URL: https://github.com/apache/spark/pull/29112#issuecomment-659535138






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29112: [SPARK-32310][ML][PySpark] ML params default value parity in classification, regression, clustering and fpm

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29112:
URL: https://github.com/apache/spark/pull/29112#issuecomment-659535138






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] viirya commented on a change in pull request #29112: [SPARK-32310][ML][PySpark] ML params default value parity in classification, regression, clustering and fpm

Posted by GitBox <gi...@apache.org>.
viirya commented on a change in pull request #29112:
URL: https://github.com/apache/spark/pull/29112#discussion_r455519563



##########
File path: mllib/src/main/scala/org/apache/spark/ml/classification/FMClassifier.scala
##########
@@ -85,7 +85,6 @@ class FMClassifier @Since("3.0.0") (
    */
   @Since("3.0.0")
   def setFactorSize(value: Int): this.type = set(factorSize, value)
-  setDefault(factorSize -> 8)

Review comment:
       Where do the default params of `FMClassifier` move?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29112: [SPARK-32310][ML][PySpark] ML params default value parity in classification, regression, clustering and fpm

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29112:
URL: https://github.com/apache/spark/pull/29112#issuecomment-659576497






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29112: [SPARK-32310][ML][PySpark] ML params default value parity in classification, regression, clustering and fpm

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29112:
URL: https://github.com/apache/spark/pull/29112#issuecomment-659576497






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29112: [SPARK-32310][ML][PySpark] ML params default value parity part 1

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29112:
URL: https://github.com/apache/spark/pull/29112#issuecomment-658427540






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #29112: [SPARK-32310][ML][PySpark] ML params default value parity part 1

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #29112:
URL: https://github.com/apache/spark/pull/29112#issuecomment-658430071


   **[Test build #125861 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125861/testReport)** for PR 29112 at commit [`72c17e9`](https://github.com/apache/spark/commit/72c17e9a5f882aa7b1c6cead1f72135e56b5b6e3).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29112: [SPARK-32310][ML][PySpark] ML params default value parity part 1

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29112:
URL: https://github.com/apache/spark/pull/29112#issuecomment-658452246






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29112: [SPARK-32310][ML][PySpark] ML params default value parity in classification, regression, clustering and fpm

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29112:
URL: https://github.com/apache/spark/pull/29112#issuecomment-659576110


   **[Test build #125992 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125992/testReport)** for PR 29112 at commit [`4bc1053`](https://github.com/apache/spark/commit/4bc1053cf56c3d6dca212cfbd75c3941c907806a).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] huaxingao commented on a change in pull request #29112: [SPARK-32310][ML][PySpark] ML params default value parity in classification, regression, clustering and fpm

Posted by GitBox <gi...@apache.org>.
huaxingao commented on a change in pull request #29112:
URL: https://github.com/apache/spark/pull/29112#discussion_r455922564



##########
File path: mllib/src/main/scala/org/apache/spark/ml/clustering/BisectingKMeans.scala
##########
@@ -68,6 +68,12 @@ private[clustering] trait BisectingKMeansParams extends Params with HasMaxIter
     "The minimum number of points (if >= 1.0) or the minimum proportion " +
       "of points (if < 1.0) of a divisible cluster.", ParamValidators.gt(0.0))
 
+
+  setDefault(

Review comment:
       Fixed. Thanks!




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] huaxingao commented on pull request #29112: [SPARK-32310][ML][PySpark] ML params default value parity part 1

Posted by GitBox <gi...@apache.org>.
huaxingao commented on pull request #29112:
URL: https://github.com/apache/spark/pull/29112#issuecomment-658901689


   cc @srowen @viirya @zhengruifeng 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] huaxingao commented on pull request #29112: [SPARK-32310][ML][PySpark] ML params default value parity in classification, regression, clustering and fpm

Posted by GitBox <gi...@apache.org>.
huaxingao commented on pull request #29112:
URL: https://github.com/apache/spark/pull/29112#issuecomment-659588209


   Merged to mater. Thank you all for reviewing!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] huaxingao closed pull request #29112: [SPARK-32310][ML][PySpark] ML params default value parity in classification, regression, clustering and fpm

Posted by GitBox <gi...@apache.org>.
huaxingao closed pull request #29112:
URL: https://github.com/apache/spark/pull/29112


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29112: [SPARK-32310][ML][PySpark] ML params default value parity in classification, regression, clustering and fpm

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29112:
URL: https://github.com/apache/spark/pull/29112#issuecomment-659534524


   **[Test build #125992 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125992/testReport)** for PR 29112 at commit [`4bc1053`](https://github.com/apache/spark/commit/4bc1053cf56c3d6dca212cfbd75c3941c907806a).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #29112: [SPARK-32310][ML][PySpark] ML params default value parity in classification, regression, clustering and fpm

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #29112:
URL: https://github.com/apache/spark/pull/29112#issuecomment-659534524


   **[Test build #125992 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125992/testReport)** for PR 29112 at commit [`4bc1053`](https://github.com/apache/spark/commit/4bc1053cf56c3d6dca212cfbd75c3941c907806a).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29112: [SPARK-32310][ML][PySpark] ML params default value parity part 1

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29112:
URL: https://github.com/apache/spark/pull/29112#issuecomment-658430071


   **[Test build #125861 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125861/testReport)** for PR 29112 at commit [`72c17e9`](https://github.com/apache/spark/commit/72c17e9a5f882aa7b1c6cead1f72135e56b5b6e3).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] srowen commented on pull request #29112: [SPARK-32310][ML][PySpark] ML params default value parity part 1

Posted by GitBox <gi...@apache.org>.
srowen commented on pull request #29112:
URL: https://github.com/apache/spark/pull/29112#issuecomment-658908251


   So in theory this shouldn't change behavior, or if it does, it's fixing an incompatibility that's likely more a bug than anything right?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] huaxingao commented on pull request #29112: [SPARK-32310][ML][PySpark] ML params default value parity part 1

Posted by GitBox <gi...@apache.org>.
huaxingao commented on pull request #29112:
URL: https://github.com/apache/spark/pull/29112#issuecomment-658916821


   > So in theory this shouldn't change behavior, or if it does, it's fixing an incompatibility that's likely more a bug than anything right?
   
   Yes


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhengruifeng commented on a change in pull request #29112: [SPARK-32310][ML][PySpark] ML params default value parity in classification, regression, clustering and fpm

Posted by GitBox <gi...@apache.org>.
zhengruifeng commented on a change in pull request #29112:
URL: https://github.com/apache/spark/pull/29112#discussion_r455517498



##########
File path: mllib/src/main/scala/org/apache/spark/ml/clustering/BisectingKMeans.scala
##########
@@ -68,6 +68,12 @@ private[clustering] trait BisectingKMeansParams extends Params with HasMaxIter
     "The minimum number of points (if >= 1.0) or the minimum proportion " +
       "of points (if < 1.0) of a divisible cluster.", ParamValidators.gt(0.0))
 
+
+  setDefault(

Review comment:
       total nit: make these params in single line, like above places




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29112: [SPARK-32310][ML][PySpark] ML params default value parity part 1

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29112:
URL: https://github.com/apache/spark/pull/29112#issuecomment-658427540






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] viirya commented on pull request #29112: [SPARK-32310][ML][PySpark] ML params default value parity part 1

Posted by GitBox <gi...@apache.org>.
viirya commented on pull request #29112:
URL: https://github.com/apache/spark/pull/29112#issuecomment-658922003


   "classification, regression, clustering and fpm" instead of "part 1" in the title? 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29112: [SPARK-32310][ML][PySpark] ML params default value parity part 1

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29112:
URL: https://github.com/apache/spark/pull/29112#issuecomment-658452089


   **[Test build #125861 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125861/testReport)** for PR 29112 at commit [`72c17e9`](https://github.com/apache/spark/commit/72c17e9a5f882aa7b1c6cead1f72135e56b5b6e3).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] huaxingao commented on pull request #29112: [SPARK-32310][ML][PySpark] ML params default value parity in classification, regression, clustering and fpm

Posted by GitBox <gi...@apache.org>.
huaxingao commented on pull request #29112:
URL: https://github.com/apache/spark/pull/29112#issuecomment-659541013


   @srowen If it's OK with you, I will merge after test passes so I can finish the 2nd part. I will merge to both 3.0.1 and master. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org