You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2020/03/04 06:41:47 UTC

[GitHub] [spark] huaxingao opened a new pull request #27785: [SPARK-30934][ML][DOCS] Update ml-guide and ml-migration-guide for 3.0 release

huaxingao opened a new pull request #27785: [SPARK-30934][ML][DOCS] Update ml-guide and ml-migration-guide for 3.0 release
URL: https://github.com/apache/spark/pull/27785
 
 
   
   
   ### What changes were proposed in this pull request?
   Update ml-guide and ml-migration-guide for 3.0.
   
   
   ### Why are the changes needed?
   This is required for each release.
   
   
   ### Does this PR introduce any user-facing change?
   Yes.
   ![image](https://user-images.githubusercontent.com/13592258/75851710-478fa180-5d9f-11ea-9a1e-47e97c764f2e.png)
   
   
   
   ![image](https://user-images.githubusercontent.com/13592258/75851734-524a3680-5d9f-11ea-9eb6-6124a9663e47.png)
   
   
   
   ![image](https://user-images.githubusercontent.com/13592258/75851754-5c6c3500-5d9f-11ea-9542-d5feb664d49a.png)
   
   
   
   
   
   
   
   ### How was this patch tested?
   Manually build and check.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27785: [SPARK-30934][ML][DOCS] Update ml-guide and ml-migration-guide for 3.0 release

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27785: [SPARK-30934][ML][DOCS] Update ml-guide and ml-migration-guide for 3.0 release
URL: https://github.com/apache/spark/pull/27785#issuecomment-596124326
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/119521/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on issue #27785: [SPARK-30934][ML][DOCS] Update ml-guide and ml-migration-guide for 3.0 release

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on issue #27785: [SPARK-30934][ML][DOCS] Update ml-guide and ml-migration-guide for 3.0 release
URL: https://github.com/apache/spark/pull/27785#issuecomment-596122903
 
 
   **[Test build #119521 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/119521/testReport)** for PR 27785 at commit [`e00a5a7`](https://github.com/apache/spark/commit/e00a5a751bcd769ed92ff26a84c702b176cf1671).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27785: [SPARK-30934][ML][DOCS] Update ml-guide and ml-migration-guide for 3.0 release

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27785: [SPARK-30934][ML][DOCS] Update ml-guide and ml-migration-guide for 3.0 release
URL: https://github.com/apache/spark/pull/27785#issuecomment-596123020
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27785: [SPARK-30934][ML][DOCS] Update ml-guide and ml-migration-guide for 3.0 release

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27785: [SPARK-30934][ML][DOCS] Update ml-guide and ml-migration-guide for 3.0 release
URL: https://github.com/apache/spark/pull/27785#issuecomment-596123022
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/24250/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhengruifeng commented on a change in pull request #27785: [SPARK-30934][ML][DOCS] Update ml-guide and ml-migration-guide for 3.0 release

Posted by GitBox <gi...@apache.org>.
zhengruifeng commented on a change in pull request #27785: [SPARK-30934][ML][DOCS] Update ml-guide and ml-migration-guide for 3.0 release
URL: https://github.com/apache/spark/pull/27785#discussion_r388058801
 
 

 ##########
 File path: docs/ml-guide.md
 ##########
 @@ -87,31 +85,41 @@ To use MLlib in Python, you will need [NumPy](http://www.numpy.org) version 1.4
 [^1]: To learn more about the benefits and background of system optimised natives, you may wish to
     watch Sam Halliday's ScalaX talk on [High Performance Linear Algebra in Scala](http://fommil.github.io/scalax14/#/).
 
-# Highlights in 2.3
+# Highlights in 3.0
 
-The list below highlights some of the new features and enhancements added to MLlib in the `2.3`
+The list below highlights some of the new features and enhancements added to MLlib in the `3.0`
 release of Spark:
 
-* Built-in support for reading images into a `DataFrame` was added
-([SPARK-21866](https://issues.apache.org/jira/browse/SPARK-21866)).
-* [`OneHotEncoderEstimator`](ml-features.html#onehotencoderestimator) was added, and should be
-used instead of the existing `OneHotEncoder` transformer. The new estimator supports
-transforming multiple columns.
-* Multiple column support was also added to `QuantileDiscretizer` and `Bucketizer`
-([SPARK-22397](https://issues.apache.org/jira/browse/SPARK-22397) and
-[SPARK-20542](https://issues.apache.org/jira/browse/SPARK-20542))
-* A new [`FeatureHasher`](ml-features.html#featurehasher) transformer was added
- ([SPARK-13969](https://issues.apache.org/jira/browse/SPARK-13969)).
-* Added support for evaluating multiple models in parallel when performing cross-validation using
-[`TrainValidationSplit` or `CrossValidator`](ml-tuning.html)
-([SPARK-19357](https://issues.apache.org/jira/browse/SPARK-19357)).
-* Improved support for custom pipeline components in Python (see
-[SPARK-21633](https://issues.apache.org/jira/browse/SPARK-21633) and 
-[SPARK-21542](https://issues.apache.org/jira/browse/SPARK-21542)).
-* `DataFrame` functions for descriptive summary statistics over vector columns
-([SPARK-19634](https://issues.apache.org/jira/browse/SPARK-19634)).
-* Robust linear regression with Huber loss
-([SPARK-3181](https://issues.apache.org/jira/browse/SPARK-3181)).
+* Multiple columns support was added to `Binarizer`, `StringIndexer`, `StopWordsRemover` and PySpark `QuantileDiscretizer`
+([SPARK-23578](https://issues.apache.org/jira/browse/SPARK-23578)),
+([SPARK-11215](https://issues.apache.org/jira/browse/SPARK-11215)),
+([SPARK-29808](https://issues.apache.org/jira/browse/SPARK-29808)),
+([SPARK-22796](https://issues.apache.org/jira/browse/SPARK-22796)).
+* Support Tree-Based Feature Transformation was added
+([SPARK-13677](https://issues.apache.org/jira/browse/SPARK-13677)).
+* Two new evaluators `MultilabelClassificationEvaluator` and `RankingEvaluator` were added
+([SPARK-16692](https://issues.apache.org/jira/browse/SPARK-16692)),
+([SPARK-28045](https://issues.apache.org/jira/browse/SPARK-28045)).
+* Sample weights support was added in `DecisionTreeClassifier/Regressor`, `RandomForestClassifier/Regressor`, `BisectingKMeans`, `KMeans` and `GaussianMixture`
+([SPARK-19591](https://issues.apache.org/jira/browse/SPARK-19591)),
+([SPARK-9478](https://issues.apache.org/jira/browse/SPARK-9478)),
+([SPARK-30351](https://issues.apache.org/jira/browse/SPARK-30351)),
+([SPARK-29967](https://issues.apache.org/jira/browse/SPARK-29967)),
+([SPARK-30102](https://issues.apache.org/jira/browse/SPARK-30102)).
+* R API for `PowerIterationClustering` was added
+([SPARK-19827](https://issues.apache.org/jira/browse/SPARK-19827)).
+* Added Spark ML listener for tracking ML pipeline status
+([SPARK-23674](https://issues.apache.org/jira/browse/SPARK-23674)).
+* Fit with validation set was added to Gradient Boosted Trees in Python
+([SPARK-24333](https://issues.apache.org/jira/browse/SPARK-24333)).
+* [`RobustScaler`](ml-features.html#robustscaler) transformer was added
+([SPARK-28399](https://issues.apache.org/jira/browse/SPARK-28399)).
+* [`Factorization Machines`](ml-classification-regression.html#factorization-machines) classifier and regressor were added
+([SPARK-29224](https://issues.apache.org/jira/browse/SPARK-29224)).
+* Complement Naive Bayes Classifier was added
+([SPARK-29942](https://issues.apache.org/jira/browse/SPARK-29942)).
+* ML function parity between Scala and Python
+([SPARK-28958](https://issues.apache.org/jira/browse/SPARK-28958)).
 
 
 Review comment:
   [SPARK-30358] ML algorithm expose `predictRaw` and `predictProbability`

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27785: [SPARK-30934][ML][DOCS] Update ml-guide and ml-migration-guide for 3.0 release

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27785: [SPARK-30934][ML][DOCS] Update ml-guide and ml-migration-guide for 3.0 release
URL: https://github.com/apache/spark/pull/27785#issuecomment-595073581
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27785: [SPARK-30934][ML][DOCS] Update ml-guide and ml-migration-guide for 3.0 release

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27785: [SPARK-30934][ML][DOCS] Update ml-guide and ml-migration-guide for 3.0 release
URL: https://github.com/apache/spark/pull/27785#issuecomment-596122903
 
 
   **[Test build #119521 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/119521/testReport)** for PR 27785 at commit [`e00a5a7`](https://github.com/apache/spark/commit/e00a5a751bcd769ed92ff26a84c702b176cf1671).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] viirya commented on a change in pull request #27785: [SPARK-30934][ML][DOCS] Update ml-guide and ml-migration-guide for 3.0 release

Posted by GitBox <gi...@apache.org>.
viirya commented on a change in pull request #27785: [SPARK-30934][ML][DOCS] Update ml-guide and ml-migration-guide for 3.0 release
URL: https://github.com/apache/spark/pull/27785#discussion_r388032600
 
 

 ##########
 File path: docs/ml-guide.md
 ##########
 @@ -87,31 +85,41 @@ To use MLlib in Python, you will need [NumPy](http://www.numpy.org) version 1.4
 [^1]: To learn more about the benefits and background of system optimised natives, you may wish to
     watch Sam Halliday's ScalaX talk on [High Performance Linear Algebra in Scala](http://fommil.github.io/scalax14/#/).
 
-# Highlights in 2.3
 
 Review comment:
   This means we didn't remove it when releasing 2.4?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27785: [SPARK-30934][ML][DOCS] Update ml-guide and ml-migration-guide for 3.0 release

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27785: [SPARK-30934][ML][DOCS] Update ml-guide and ml-migration-guide for 3.0 release
URL: https://github.com/apache/spark/pull/27785#issuecomment-595077974
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/119370/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27785: [SPARK-30934][ML][DOCS] Update ml-guide and ml-migration-guide for 3.0 release

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27785: [SPARK-30934][ML][DOCS] Update ml-guide and ml-migration-guide for 3.0 release
URL: https://github.com/apache/spark/pull/27785#issuecomment-594359072
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] huaxingao commented on a change in pull request #27785: [SPARK-30934][ML][DOCS] Update ml-guide and ml-migration-guide for 3.0 release

Posted by GitBox <gi...@apache.org>.
huaxingao commented on a change in pull request #27785: [SPARK-30934][ML][DOCS] Update ml-guide and ml-migration-guide for 3.0 release
URL: https://github.com/apache/spark/pull/27785#discussion_r387474968
 
 

 ##########
 File path: docs/ml-migration-guide.md
 ##########
 @@ -33,16 +33,65 @@ Please refer [Migration Guide: SQL, Datasets and DataFrame](sql-migration-guide.
 
 * `OneHotEncoder` which is deprecated in 2.3, is removed in 3.0 and `OneHotEncoderEstimator` is now renamed to `OneHotEncoder`.
 * `org.apache.spark.ml.image.ImageSchema.readImages` which is deprecated in 2.3, is removed in 3.0, use `spark.read.format('image')` instead.
+* `org.apache.spark.mllib.clustering.KMeans.train` with param Int `runs` which is deprecated in 2.1, is removed in 3.0, use `train` method without `runs` instead.
+* `org.apache.spark.mllib.classification.LogisticRegressionWithSGD` which is deprecated in 2.0, is removed in 3.0, use `org.apache.spark.ml.classification.LogisticRegression` or `spark.mllib.classification.LogisticRegressionWithLBFGS` instead.
+* `org.apache.spark.mllib.feature.ChiSqSelectorModel.isSorted ` which is deprecated in 2.1, is removed in 3.0, is not intended for subclasses to use.
+* `org.apache.spark.mllib.regression.RidgeRegressionWithSGD` which is deprecated in 2.0, is removed in 3.0, use `org.apache.spark.ml.regression.LinearRegression` with `elasticNetParam` = 0.0. Note the default `regParam` is 0.01 for `RidgeRegressionWithSGD`, but is 0.0 for `LinearRegression`.
+* `org.apache.spark.mllib.regression.LassoWithSGD` which is deprecated in 2.0, is removed in 3.0, use `org.apache.spark.ml.regression.LinearRegression` with `elasticNetParam` = 1.0. Note the default `regParam` is 0.01 for `LassoWithSGD`, but is 0.0 for `LinearRegression`.
+* `org.apache.spark.mllib.regression.LinearRegressionWithSGD` which is deprecated in 2.0, is removed in 3.0, use `org.apache.spark.ml.regression.LinearRegression` or `LBFGS` instead.
+* `org.apache.spark.mllib.clustering.KMeans.getRuns` and `setRuns` which are deprecated in 2.1, is removed in 3.0, have no effect since Spark 2.0.0.
+* `org.apache.spark.ml.LinearSVCModel.setWeightCol` which is deprecated in 2.4, is removed in 3.0, is not intended for users.
+* From 3.0, `org.apache.spark.ml.classification.MultilayerPerceptronClassificationModel` extends `MultilayerPerceptronParams` to expose the training params. As a result, layers in `MultilayerPerceptronClassificationModel` has been changed from Array[Int] to IntArrayParam. Users should use `MultilayerPerceptronClassificationModel.getLayers` instead of `MultilayerPerceptronClassificationModel.layers` to retrieve the size of layers.
+* `org.apache.spark.ml.classification.GBTClassifier.numTrees`  which is deprecated in 2.4.5, is removed in 3.0, use `getNumTrees` instead.
+* `org.apache.spark.ml.clustering.KMeansModel.computeCost` which is deprecated in 2.4, is removed in 3.0, use `ClusteringEvaluator` instead.
+* `org.apache.spark.mllib.evaluation.MulticlassMetrics.precision` which is deprecated in 2.0, is removed in 3.0, use `accuracy` instead.
+* `org.apache.spark.mllib.evaluation.MulticlassMetrics.recall` which is deprecated in 2.0, is removed in 3.0, use `accuracy` instead.
+* `org.apache.spark.mllib.evaluation.MulticlassMetrics.fMeasure` which is deprecated in 2.0, is removed in 3.0, use `accuracy` instead.
+* `org.apache.spark.ml.util.GeneralMLWriter.context` which is deprecated in 2.0, is removed in 3.0, use `session` instead.
+* `org.apache.spark.ml.util.MLWriter.context` which is deprecated in 2.0, is removed in 3.0, use `session` instead.
+* `org.apache.spark.ml.util.MLReader.context` which is deprecated in 2.0, is removed in 3.0, use `session` instead.
+* `abstract class UnaryTransformer[IN, OUT, T <: UnaryTransformer[IN, OUT, T]]` is changed to `abstract class UnaryTransformer[IN: TypeTag, OUT: TypeTag, T <: UnaryTransformer[IN, OUT, T]]` in 3.0.
 
-### Changes of behavior
+### Deprecations and changes of behavior
 {:.no_toc}
 
+**Deprecations**
+
+* [SPARK-11215](https://issues.apache.org/jira/browse/SPARK-11215):
+`labels` in `StringIndexer` is deprecated and will be removed in 3.1.0. Use `labelsArray` instead.
+* [SPARK-25758](https://issues.apache.org/jira/browse/SPARK-25758):
+`computeCost` in `BisectingKMeans` is deprecated and will be removed in future versions. Use `ClusteringEvaluator` instead.
+
+**Changes of behavior**
 
 Review comment:
   Do we need to include the newly added stuff in this section? For example, ```[SPARK-29960](https://issues.apache.org/jira/browse/SPARK-29960)
    `hammingLoss` support is addd to MulticlassClassificationEvaluator```? If we do, then there are a lot more to add. 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27785: [SPARK-30934][ML][DOCS] Update ml-guide and ml-migration-guide for 3.0 release

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27785: [SPARK-30934][ML][DOCS] Update ml-guide and ml-migration-guide for 3.0 release
URL: https://github.com/apache/spark/pull/27785#issuecomment-595077962
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27785: [SPARK-30934][ML][DOCS] Update ml-guide and ml-migration-guide for 3.0 release

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27785: [SPARK-30934][ML][DOCS] Update ml-guide and ml-migration-guide for 3.0 release
URL: https://github.com/apache/spark/pull/27785#issuecomment-594979783
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/24089/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27785: [SPARK-30934][ML][DOCS] Update ml-guide and ml-migration-guide for 3.0 release

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27785: [SPARK-30934][ML][DOCS] Update ml-guide and ml-migration-guide for 3.0 release
URL: https://github.com/apache/spark/pull/27785#issuecomment-595073587
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/24107/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27785: [SPARK-30934][ML][DOCS] Update ml-guide and ml-migration-guide for 3.0 release

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27785: [SPARK-30934][ML][DOCS] Update ml-guide and ml-migration-guide for 3.0 release
URL: https://github.com/apache/spark/pull/27785#issuecomment-596124324
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27785: [SPARK-30934][ML][DOCS] Update ml-guide and ml-migration-guide for 3.0 release

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27785: [SPARK-30934][ML][DOCS] Update ml-guide and ml-migration-guide for 3.0 release
URL: https://github.com/apache/spark/pull/27785#issuecomment-594355496
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27785: [SPARK-30934][ML][DOCS] Update ml-guide and ml-migration-guide for 3.0 release

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27785: [SPARK-30934][ML][DOCS] Update ml-guide and ml-migration-guide for 3.0 release
URL: https://github.com/apache/spark/pull/27785#issuecomment-594979783
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/24089/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27785: [SPARK-30934][ML][DOCS] Update ml-guide and ml-migration-guide for 3.0 release

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27785: [SPARK-30934][ML][DOCS] Update ml-guide and ml-migration-guide for 3.0 release
URL: https://github.com/apache/spark/pull/27785#issuecomment-595077962
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27785: [SPARK-30934][ML][DOCS] Update ml-guide and ml-migration-guide for 3.0 release

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27785: [SPARK-30934][ML][DOCS] Update ml-guide and ml-migration-guide for 3.0 release
URL: https://github.com/apache/spark/pull/27785#issuecomment-594355496
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27785: [SPARK-30934][ML][DOCS] Update ml-guide and ml-migration-guide for 3.0 release

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27785: [SPARK-30934][ML][DOCS] Update ml-guide and ml-migration-guide for 3.0 release
URL: https://github.com/apache/spark/pull/27785#issuecomment-596124326
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/119521/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] huaxingao commented on a change in pull request #27785: [SPARK-30934][ML][DOCS] Update ml-guide and ml-migration-guide for 3.0 release

Posted by GitBox <gi...@apache.org>.
huaxingao commented on a change in pull request #27785: [SPARK-30934][ML][DOCS] Update ml-guide and ml-migration-guide for 3.0 release
URL: https://github.com/apache/spark/pull/27785#discussion_r388093284
 
 

 ##########
 File path: docs/ml-migration-guide.md
 ##########
 @@ -33,16 +33,65 @@ Please refer [Migration Guide: SQL, Datasets and DataFrame](sql-migration-guide.
 
 * `OneHotEncoder` which is deprecated in 2.3, is removed in 3.0 and `OneHotEncoderEstimator` is now renamed to `OneHotEncoder`.
 * `org.apache.spark.ml.image.ImageSchema.readImages` which is deprecated in 2.3, is removed in 3.0, use `spark.read.format('image')` instead.
+* `org.apache.spark.mllib.clustering.KMeans.train` with param Int `runs` which is deprecated in 2.1, is removed in 3.0, use `train` method without `runs` instead.
+* `org.apache.spark.mllib.classification.LogisticRegressionWithSGD` which is deprecated in 2.0, is removed in 3.0, use `org.apache.spark.ml.classification.LogisticRegression` or `spark.mllib.classification.LogisticRegressionWithLBFGS` instead.
+* `org.apache.spark.mllib.feature.ChiSqSelectorModel.isSorted ` which is deprecated in 2.1, is removed in 3.0, is not intended for subclasses to use.
+* `org.apache.spark.mllib.regression.RidgeRegressionWithSGD` which is deprecated in 2.0, is removed in 3.0, use `org.apache.spark.ml.regression.LinearRegression` with `elasticNetParam` = 0.0. Note the default `regParam` is 0.01 for `RidgeRegressionWithSGD`, but is 0.0 for `LinearRegression`.
+* `org.apache.spark.mllib.regression.LassoWithSGD` which is deprecated in 2.0, is removed in 3.0, use `org.apache.spark.ml.regression.LinearRegression` with `elasticNetParam` = 1.0. Note the default `regParam` is 0.01 for `LassoWithSGD`, but is 0.0 for `LinearRegression`.
+* `org.apache.spark.mllib.regression.LinearRegressionWithSGD` which is deprecated in 2.0, is removed in 3.0, use `org.apache.spark.ml.regression.LinearRegression` or `LBFGS` instead.
+* `org.apache.spark.mllib.clustering.KMeans.getRuns` and `setRuns` which are deprecated in 2.1, is removed in 3.0, have no effect since Spark 2.0.0.
+* `org.apache.spark.ml.LinearSVCModel.setWeightCol` which is deprecated in 2.4, is removed in 3.0, is not intended for users.
+* From 3.0, `org.apache.spark.ml.classification.MultilayerPerceptronClassificationModel` extends `MultilayerPerceptronParams` to expose the training params. As a result, layers in `MultilayerPerceptronClassificationModel` has been changed from Array[Int] to IntArrayParam. Users should use `MultilayerPerceptronClassificationModel.getLayers` instead of `MultilayerPerceptronClassificationModel.layers` to retrieve the size of layers.
+* `org.apache.spark.ml.classification.GBTClassifier.numTrees`  which is deprecated in 2.4.5, is removed in 3.0, use `getNumTrees` instead.
+* `org.apache.spark.ml.clustering.KMeansModel.computeCost` which is deprecated in 2.4, is removed in 3.0, use `ClusteringEvaluator` instead.
+* `org.apache.spark.mllib.evaluation.MulticlassMetrics.precision` which is deprecated in 2.0, is removed in 3.0, use `accuracy` instead.
+* `org.apache.spark.mllib.evaluation.MulticlassMetrics.recall` which is deprecated in 2.0, is removed in 3.0, use `accuracy` instead.
+* `org.apache.spark.mllib.evaluation.MulticlassMetrics.fMeasure` which is deprecated in 2.0, is removed in 3.0, use `accuracy` instead.
 
 Review comment:
   I mean the class member variables. Will change to 
   ```
   The member variable `precision` in `org.apache.spark.mllib.evaluation.MulticlassMetrics` which is deprecated in 2.0, is removed in 3.0. Use `accuracy` instead.
   The member variable `recall` in `org.apache.spark.mllib.evaluation.MulticlassMetrics` which is deprecated in 2.0, is removed in 3.0. Use `accuracy` instead.
   The member variable `fMeasure` in `org.apache.spark.mllib.evaluation.MulticlassMetrics` which is deprecated in 2.0, is removed in 3.0. Use `accuracy` instead.
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27785: [SPARK-30934][ML][DOCS] Update ml-guide and ml-migration-guide for 3.0 release

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27785: [SPARK-30934][ML][DOCS] Update ml-guide and ml-migration-guide for 3.0 release
URL: https://github.com/apache/spark/pull/27785#issuecomment-594982789
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27785: [SPARK-30934][ML][DOCS] Update ml-guide and ml-migration-guide for 3.0 release

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27785: [SPARK-30934][ML][DOCS] Update ml-guide and ml-migration-guide for 3.0 release
URL: https://github.com/apache/spark/pull/27785#issuecomment-595073587
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/24107/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] viirya commented on a change in pull request #27785: [SPARK-30934][ML][DOCS] Update ml-guide and ml-migration-guide for 3.0 release

Posted by GitBox <gi...@apache.org>.
viirya commented on a change in pull request #27785: [SPARK-30934][ML][DOCS] Update ml-guide and ml-migration-guide for 3.0 release
URL: https://github.com/apache/spark/pull/27785#discussion_r388035757
 
 

 ##########
 File path: docs/ml-migration-guide.md
 ##########
 @@ -33,16 +33,65 @@ Please refer [Migration Guide: SQL, Datasets and DataFrame](sql-migration-guide.
 
 * `OneHotEncoder` which is deprecated in 2.3, is removed in 3.0 and `OneHotEncoderEstimator` is now renamed to `OneHotEncoder`.
 * `org.apache.spark.ml.image.ImageSchema.readImages` which is deprecated in 2.3, is removed in 3.0, use `spark.read.format('image')` instead.
+* `org.apache.spark.mllib.clustering.KMeans.train` with param Int `runs` which is deprecated in 2.1, is removed in 3.0, use `train` method without `runs` instead.
+* `org.apache.spark.mllib.classification.LogisticRegressionWithSGD` which is deprecated in 2.0, is removed in 3.0, use `org.apache.spark.ml.classification.LogisticRegression` or `spark.mllib.classification.LogisticRegressionWithLBFGS` instead.
+* `org.apache.spark.mllib.feature.ChiSqSelectorModel.isSorted ` which is deprecated in 2.1, is removed in 3.0, is not intended for subclasses to use.
+* `org.apache.spark.mllib.regression.RidgeRegressionWithSGD` which is deprecated in 2.0, is removed in 3.0, use `org.apache.spark.ml.regression.LinearRegression` with `elasticNetParam` = 0.0. Note the default `regParam` is 0.01 for `RidgeRegressionWithSGD`, but is 0.0 for `LinearRegression`.
+* `org.apache.spark.mllib.regression.LassoWithSGD` which is deprecated in 2.0, is removed in 3.0, use `org.apache.spark.ml.regression.LinearRegression` with `elasticNetParam` = 1.0. Note the default `regParam` is 0.01 for `LassoWithSGD`, but is 0.0 for `LinearRegression`.
+* `org.apache.spark.mllib.regression.LinearRegressionWithSGD` which is deprecated in 2.0, is removed in 3.0, use `org.apache.spark.ml.regression.LinearRegression` or `LBFGS` instead.
+* `org.apache.spark.mllib.clustering.KMeans.getRuns` and `setRuns` which are deprecated in 2.1, is removed in 3.0, have no effect since Spark 2.0.0.
+* `org.apache.spark.ml.LinearSVCModel.setWeightCol` which is deprecated in 2.4, is removed in 3.0, is not intended for users.
+* From 3.0, `org.apache.spark.ml.classification.MultilayerPerceptronClassificationModel` extends `MultilayerPerceptronParams` to expose the training params. As a result, layers in `MultilayerPerceptronClassificationModel` has been changed from Array[Int] to IntArrayParam. Users should use `MultilayerPerceptronClassificationModel.getLayers` instead of `MultilayerPerceptronClassificationModel.layers` to retrieve the size of layers.
 
 Review comment:
   `layers` in `MultilayerPerceptronClassificationModel` has been changed from `Array[Int]` to `IntArrayParam`?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] srowen commented on issue #27785: [SPARK-30934][ML][DOCS] Update ml-guide and ml-migration-guide for 3.0 release

Posted by GitBox <gi...@apache.org>.
srowen commented on issue #27785: [SPARK-30934][ML][DOCS] Update ml-guide and ml-migration-guide for 3.0 release
URL: https://github.com/apache/spark/pull/27785#issuecomment-596150766
 
 
   Merged to master/3.0

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhengruifeng commented on a change in pull request #27785: [SPARK-30934][ML][DOCS] Update ml-guide and ml-migration-guide for 3.0 release

Posted by GitBox <gi...@apache.org>.
zhengruifeng commented on a change in pull request #27785: [SPARK-30934][ML][DOCS] Update ml-guide and ml-migration-guide for 3.0 release
URL: https://github.com/apache/spark/pull/27785#discussion_r388055023
 
 

 ##########
 File path: docs/ml-guide.md
 ##########
 @@ -87,31 +85,41 @@ To use MLlib in Python, you will need [NumPy](http://www.numpy.org) version 1.4
 [^1]: To learn more about the benefits and background of system optimised natives, you may wish to
     watch Sam Halliday's ScalaX talk on [High Performance Linear Algebra in Scala](http://fommil.github.io/scalax14/#/).
 
-# Highlights in 2.3
+# Highlights in 3.0
 
-The list below highlights some of the new features and enhancements added to MLlib in the `2.3`
+The list below highlights some of the new features and enhancements added to MLlib in the `3.0`
 release of Spark:
 
-* Built-in support for reading images into a `DataFrame` was added
-([SPARK-21866](https://issues.apache.org/jira/browse/SPARK-21866)).
-* [`OneHotEncoderEstimator`](ml-features.html#onehotencoderestimator) was added, and should be
-used instead of the existing `OneHotEncoder` transformer. The new estimator supports
-transforming multiple columns.
-* Multiple column support was also added to `QuantileDiscretizer` and `Bucketizer`
-([SPARK-22397](https://issues.apache.org/jira/browse/SPARK-22397) and
-[SPARK-20542](https://issues.apache.org/jira/browse/SPARK-20542))
-* A new [`FeatureHasher`](ml-features.html#featurehasher) transformer was added
- ([SPARK-13969](https://issues.apache.org/jira/browse/SPARK-13969)).
-* Added support for evaluating multiple models in parallel when performing cross-validation using
-[`TrainValidationSplit` or `CrossValidator`](ml-tuning.html)
-([SPARK-19357](https://issues.apache.org/jira/browse/SPARK-19357)).
-* Improved support for custom pipeline components in Python (see
-[SPARK-21633](https://issues.apache.org/jira/browse/SPARK-21633) and 
-[SPARK-21542](https://issues.apache.org/jira/browse/SPARK-21542)).
-* `DataFrame` functions for descriptive summary statistics over vector columns
-([SPARK-19634](https://issues.apache.org/jira/browse/SPARK-19634)).
-* Robust linear regression with Huber loss
-([SPARK-3181](https://issues.apache.org/jira/browse/SPARK-3181)).
+* Multiple columns support was added to `Binarizer`, `StringIndexer`, `StopWordsRemover` and PySpark `QuantileDiscretizer`
+([SPARK-23578](https://issues.apache.org/jira/browse/SPARK-23578)),
+([SPARK-11215](https://issues.apache.org/jira/browse/SPARK-11215)),
+([SPARK-29808](https://issues.apache.org/jira/browse/SPARK-29808)),
+([SPARK-22796](https://issues.apache.org/jira/browse/SPARK-22796)).
+* Support Tree-Based Feature Transformation was added
+([SPARK-13677](https://issues.apache.org/jira/browse/SPARK-13677)).
+* Two new evaluators `MultilabelClassificationEvaluator` and `RankingEvaluator` were added
+([SPARK-16692](https://issues.apache.org/jira/browse/SPARK-16692)),
+([SPARK-28045](https://issues.apache.org/jira/browse/SPARK-28045)).
+* Sample weights support was added in `DecisionTreeClassifier/Regressor`, `RandomForestClassifier/Regressor`, `BisectingKMeans`, `KMeans` and `GaussianMixture`
+([SPARK-19591](https://issues.apache.org/jira/browse/SPARK-19591)),
+([SPARK-9478](https://issues.apache.org/jira/browse/SPARK-9478)),
+([SPARK-30351](https://issues.apache.org/jira/browse/SPARK-30351)),
+([SPARK-29967](https://issues.apache.org/jira/browse/SPARK-29967)),
+([SPARK-30102](https://issues.apache.org/jira/browse/SPARK-30102)).
+* R API for `PowerIterationClustering` was added
+([SPARK-19827](https://issues.apache.org/jira/browse/SPARK-19827)).
+* Added Spark ML listener for tracking ML pipeline status
+([SPARK-23674](https://issues.apache.org/jira/browse/SPARK-23674)).
+* Fit with validation set was added to Gradient Boosted Trees in Python
+([SPARK-24333](https://issues.apache.org/jira/browse/SPARK-24333)).
+* [`RobustScaler`](ml-features.html#robustscaler) transformer was added
+([SPARK-28399](https://issues.apache.org/jira/browse/SPARK-28399)).
+* [`Factorization Machines`](ml-classification-regression.html#factorization-machines) classifier and regressor were added
+([SPARK-29224](https://issues.apache.org/jira/browse/SPARK-29224)).
+* Complement Naive Bayes Classifier was added
 
 Review comment:
   I'd also add Gaussian Naive Bayes:
   `[SPARK-16872][SPARK-29224] Gaussian Naive Bayes and Complement Naive Bayes were added`

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhengruifeng commented on a change in pull request #27785: [SPARK-30934][ML][DOCS] Update ml-guide and ml-migration-guide for 3.0 release

Posted by GitBox <gi...@apache.org>.
zhengruifeng commented on a change in pull request #27785: [SPARK-30934][ML][DOCS] Update ml-guide and ml-migration-guide for 3.0 release
URL: https://github.com/apache/spark/pull/27785#discussion_r388054424
 
 

 ##########
 File path: docs/ml-migration-guide.md
 ##########
 @@ -33,16 +33,65 @@ Please refer [Migration Guide: SQL, Datasets and DataFrame](sql-migration-guide.
 
 * `OneHotEncoder` which is deprecated in 2.3, is removed in 3.0 and `OneHotEncoderEstimator` is now renamed to `OneHotEncoder`.
 * `org.apache.spark.ml.image.ImageSchema.readImages` which is deprecated in 2.3, is removed in 3.0, use `spark.read.format('image')` instead.
+* `org.apache.spark.mllib.clustering.KMeans.train` with param Int `runs` which is deprecated in 2.1, is removed in 3.0, use `train` method without `runs` instead.
+* `org.apache.spark.mllib.classification.LogisticRegressionWithSGD` which is deprecated in 2.0, is removed in 3.0, use `org.apache.spark.ml.classification.LogisticRegression` or `spark.mllib.classification.LogisticRegressionWithLBFGS` instead.
+* `org.apache.spark.mllib.feature.ChiSqSelectorModel.isSorted ` which is deprecated in 2.1, is removed in 3.0, is not intended for subclasses to use.
+* `org.apache.spark.mllib.regression.RidgeRegressionWithSGD` which is deprecated in 2.0, is removed in 3.0, use `org.apache.spark.ml.regression.LinearRegression` with `elasticNetParam` = 0.0. Note the default `regParam` is 0.01 for `RidgeRegressionWithSGD`, but is 0.0 for `LinearRegression`.
+* `org.apache.spark.mllib.regression.LassoWithSGD` which is deprecated in 2.0, is removed in 3.0, use `org.apache.spark.ml.regression.LinearRegression` with `elasticNetParam` = 1.0. Note the default `regParam` is 0.01 for `LassoWithSGD`, but is 0.0 for `LinearRegression`.
+* `org.apache.spark.mllib.regression.LinearRegressionWithSGD` which is deprecated in 2.0, is removed in 3.0, use `org.apache.spark.ml.regression.LinearRegression` or `LBFGS` instead.
+* `org.apache.spark.mllib.clustering.KMeans.getRuns` and `setRuns` which are deprecated in 2.1, is removed in 3.0, have no effect since Spark 2.0.0.
+* `org.apache.spark.ml.LinearSVCModel.setWeightCol` which is deprecated in 2.4, is removed in 3.0, is not intended for users.
+* From 3.0, `org.apache.spark.ml.classification.MultilayerPerceptronClassificationModel` extends `MultilayerPerceptronParams` to expose the training params. As a result, layers in `MultilayerPerceptronClassificationModel` has been changed from Array[Int] to IntArrayParam. Users should use `MultilayerPerceptronClassificationModel.getLayers` instead of `MultilayerPerceptronClassificationModel.layers` to retrieve the size of layers.
 
 Review comment:
   Yes, it was a `layers` field in `MultilayerPerceptronClassificationModel` before 3.0.
   In 3.0, we make it extends `MultilayerPerceptronParams` and has param `layers`

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhengruifeng commented on a change in pull request #27785: [SPARK-30934][ML][DOCS] Update ml-guide and ml-migration-guide for 3.0 release

Posted by GitBox <gi...@apache.org>.
zhengruifeng commented on a change in pull request #27785: [SPARK-30934][ML][DOCS] Update ml-guide and ml-migration-guide for 3.0 release
URL: https://github.com/apache/spark/pull/27785#discussion_r388053897
 
 

 ##########
 File path: docs/ml-migration-guide.md
 ##########
 @@ -33,16 +33,65 @@ Please refer [Migration Guide: SQL, Datasets and DataFrame](sql-migration-guide.
 
 * `OneHotEncoder` which is deprecated in 2.3, is removed in 3.0 and `OneHotEncoderEstimator` is now renamed to `OneHotEncoder`.
 * `org.apache.spark.ml.image.ImageSchema.readImages` which is deprecated in 2.3, is removed in 3.0, use `spark.read.format('image')` instead.
+* `org.apache.spark.mllib.clustering.KMeans.train` with param Int `runs` which is deprecated in 2.1, is removed in 3.0, use `train` method without `runs` instead.
+* `org.apache.spark.mllib.classification.LogisticRegressionWithSGD` which is deprecated in 2.0, is removed in 3.0, use `org.apache.spark.ml.classification.LogisticRegression` or `spark.mllib.classification.LogisticRegressionWithLBFGS` instead.
+* `org.apache.spark.mllib.feature.ChiSqSelectorModel.isSorted ` which is deprecated in 2.1, is removed in 3.0, is not intended for subclasses to use.
+* `org.apache.spark.mllib.regression.RidgeRegressionWithSGD` which is deprecated in 2.0, is removed in 3.0, use `org.apache.spark.ml.regression.LinearRegression` with `elasticNetParam` = 0.0. Note the default `regParam` is 0.01 for `RidgeRegressionWithSGD`, but is 0.0 for `LinearRegression`.
+* `org.apache.spark.mllib.regression.LassoWithSGD` which is deprecated in 2.0, is removed in 3.0, use `org.apache.spark.ml.regression.LinearRegression` with `elasticNetParam` = 1.0. Note the default `regParam` is 0.01 for `LassoWithSGD`, but is 0.0 for `LinearRegression`.
+* `org.apache.spark.mllib.regression.LinearRegressionWithSGD` which is deprecated in 2.0, is removed in 3.0, use `org.apache.spark.ml.regression.LinearRegression` or `LBFGS` instead.
+* `org.apache.spark.mllib.clustering.KMeans.getRuns` and `setRuns` which are deprecated in 2.1, is removed in 3.0, have no effect since Spark 2.0.0.
+* `org.apache.spark.ml.LinearSVCModel.setWeightCol` which is deprecated in 2.4, is removed in 3.0, is not intended for users.
+* From 3.0, `org.apache.spark.ml.classification.MultilayerPerceptronClassificationModel` extends `MultilayerPerceptronParams` to expose the training params. As a result, layers in `MultilayerPerceptronClassificationModel` has been changed from Array[Int] to IntArrayParam. Users should use `MultilayerPerceptronClassificationModel.getLayers` instead of `MultilayerPerceptronClassificationModel.layers` to retrieve the size of layers.
+* `org.apache.spark.ml.classification.GBTClassifier.numTrees`  which is deprecated in 2.4.5, is removed in 3.0, use `getNumTrees` instead.
+* `org.apache.spark.ml.clustering.KMeansModel.computeCost` which is deprecated in 2.4, is removed in 3.0, use `ClusteringEvaluator` instead.
+* `org.apache.spark.mllib.evaluation.MulticlassMetrics.precision` which is deprecated in 2.0, is removed in 3.0, use `accuracy` instead.
+* `org.apache.spark.mllib.evaluation.MulticlassMetrics.recall` which is deprecated in 2.0, is removed in 3.0, use `accuracy` instead.
+* `org.apache.spark.mllib.evaluation.MulticlassMetrics.fMeasure` which is deprecated in 2.0, is removed in 3.0, use `accuracy` instead.
+* `org.apache.spark.ml.util.GeneralMLWriter.context` which is deprecated in 2.0, is removed in 3.0, use `session` instead.
+* `org.apache.spark.ml.util.MLWriter.context` which is deprecated in 2.0, is removed in 3.0, use `session` instead.
+* `org.apache.spark.ml.util.MLReader.context` which is deprecated in 2.0, is removed in 3.0, use `session` instead.
+* `abstract class UnaryTransformer[IN, OUT, T <: UnaryTransformer[IN, OUT, T]]` is changed to `abstract class UnaryTransformer[IN: TypeTag, OUT: TypeTag, T <: UnaryTransformer[IN, OUT, T]]` in 3.0.
 
-### Changes of behavior
+### Deprecations and changes of behavior
 {:.no_toc}
 
+**Deprecations**
+
+* [SPARK-11215](https://issues.apache.org/jira/browse/SPARK-11215):
+`labels` in `StringIndexer` is deprecated and will be removed in 3.1.0. Use `labelsArray` instead.
+* [SPARK-25758](https://issues.apache.org/jira/browse/SPARK-25758):
+`computeCost` in `BisectingKMeans` is deprecated and will be removed in future versions. Use `ClusteringEvaluator` instead.
+
+**Changes of behavior**
 
 Review comment:
   I am netural on it, since it is not a big change.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27785: [SPARK-30934][ML][DOCS] Update ml-guide and ml-migration-guide for 3.0 release

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27785: [SPARK-30934][ML][DOCS] Update ml-guide and ml-migration-guide for 3.0 release
URL: https://github.com/apache/spark/pull/27785#issuecomment-594355503
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/24022/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on issue #27785: [SPARK-30934][ML][DOCS] Update ml-guide and ml-migration-guide for 3.0 release

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on issue #27785: [SPARK-30934][ML][DOCS] Update ml-guide and ml-migration-guide for 3.0 release
URL: https://github.com/apache/spark/pull/27785#issuecomment-594979489
 
 
   **[Test build #119351 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/119351/testReport)** for PR 27785 at commit [`24e6c8f`](https://github.com/apache/spark/commit/24e6c8fde8947fe13b78f126bc5053c61d3a1e2b).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27785: [SPARK-30934][ML][DOCS] Update ml-guide and ml-migration-guide for 3.0 release

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27785: [SPARK-30934][ML][DOCS] Update ml-guide and ml-migration-guide for 3.0 release
URL: https://github.com/apache/spark/pull/27785#issuecomment-594979776
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27785: [SPARK-30934][ML][DOCS] Update ml-guide and ml-migration-guide for 3.0 release

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27785: [SPARK-30934][ML][DOCS] Update ml-guide and ml-migration-guide for 3.0 release
URL: https://github.com/apache/spark/pull/27785#issuecomment-595077827
 
 
   **[Test build #119370 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/119370/testReport)** for PR 27785 at commit [`1a19da7`](https://github.com/apache/spark/commit/1a19da714c4a607454d3dd36d827893454190552).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] viirya commented on a change in pull request #27785: [SPARK-30934][ML][DOCS] Update ml-guide and ml-migration-guide for 3.0 release

Posted by GitBox <gi...@apache.org>.
viirya commented on a change in pull request #27785: [SPARK-30934][ML][DOCS] Update ml-guide and ml-migration-guide for 3.0 release
URL: https://github.com/apache/spark/pull/27785#discussion_r388039615
 
 

 ##########
 File path: docs/ml-migration-guide.md
 ##########
 @@ -33,16 +33,65 @@ Please refer [Migration Guide: SQL, Datasets and DataFrame](sql-migration-guide.
 
 * `OneHotEncoder` which is deprecated in 2.3, is removed in 3.0 and `OneHotEncoderEstimator` is now renamed to `OneHotEncoder`.
 * `org.apache.spark.ml.image.ImageSchema.readImages` which is deprecated in 2.3, is removed in 3.0, use `spark.read.format('image')` instead.
+* `org.apache.spark.mllib.clustering.KMeans.train` with param Int `runs` which is deprecated in 2.1, is removed in 3.0, use `train` method without `runs` instead.
+* `org.apache.spark.mllib.classification.LogisticRegressionWithSGD` which is deprecated in 2.0, is removed in 3.0, use `org.apache.spark.ml.classification.LogisticRegression` or `spark.mllib.classification.LogisticRegressionWithLBFGS` instead.
+* `org.apache.spark.mllib.feature.ChiSqSelectorModel.isSorted ` which is deprecated in 2.1, is removed in 3.0, is not intended for subclasses to use.
+* `org.apache.spark.mllib.regression.RidgeRegressionWithSGD` which is deprecated in 2.0, is removed in 3.0, use `org.apache.spark.ml.regression.LinearRegression` with `elasticNetParam` = 0.0. Note the default `regParam` is 0.01 for `RidgeRegressionWithSGD`, but is 0.0 for `LinearRegression`.
+* `org.apache.spark.mllib.regression.LassoWithSGD` which is deprecated in 2.0, is removed in 3.0, use `org.apache.spark.ml.regression.LinearRegression` with `elasticNetParam` = 1.0. Note the default `regParam` is 0.01 for `LassoWithSGD`, but is 0.0 for `LinearRegression`.
+* `org.apache.spark.mllib.regression.LinearRegressionWithSGD` which is deprecated in 2.0, is removed in 3.0, use `org.apache.spark.ml.regression.LinearRegression` or `LBFGS` instead.
+* `org.apache.spark.mllib.clustering.KMeans.getRuns` and `setRuns` which are deprecated in 2.1, is removed in 3.0, have no effect since Spark 2.0.0.
+* `org.apache.spark.ml.LinearSVCModel.setWeightCol` which is deprecated in 2.4, is removed in 3.0, is not intended for users.
+* From 3.0, `org.apache.spark.ml.classification.MultilayerPerceptronClassificationModel` extends `MultilayerPerceptronParams` to expose the training params. As a result, layers in `MultilayerPerceptronClassificationModel` has been changed from Array[Int] to IntArrayParam. Users should use `MultilayerPerceptronClassificationModel.getLayers` instead of `MultilayerPerceptronClassificationModel.layers` to retrieve the size of layers.
+* `org.apache.spark.ml.classification.GBTClassifier.numTrees`  which is deprecated in 2.4.5, is removed in 3.0, use `getNumTrees` instead.
+* `org.apache.spark.ml.clustering.KMeansModel.computeCost` which is deprecated in 2.4, is removed in 3.0, use `ClusteringEvaluator` instead.
+* `org.apache.spark.mllib.evaluation.MulticlassMetrics.precision` which is deprecated in 2.0, is removed in 3.0, use `accuracy` instead.
+* `org.apache.spark.mllib.evaluation.MulticlassMetrics.recall` which is deprecated in 2.0, is removed in 3.0, use `accuracy` instead.
+* `org.apache.spark.mllib.evaluation.MulticlassMetrics.fMeasure` which is deprecated in 2.0, is removed in 3.0, use `accuracy` instead.
+* `org.apache.spark.ml.util.GeneralMLWriter.context` which is deprecated in 2.0, is removed in 3.0, use `session` instead.
+* `org.apache.spark.ml.util.MLWriter.context` which is deprecated in 2.0, is removed in 3.0, use `session` instead.
+* `org.apache.spark.ml.util.MLReader.context` which is deprecated in 2.0, is removed in 3.0, use `session` instead.
+* `abstract class UnaryTransformer[IN, OUT, T <: UnaryTransformer[IN, OUT, T]]` is changed to `abstract class UnaryTransformer[IN: TypeTag, OUT: TypeTag, T <: UnaryTransformer[IN, OUT, T]]` in 3.0.
 
-### Changes of behavior
+### Deprecations and changes of behavior
 {:.no_toc}
 
+**Deprecations**
+
+* [SPARK-11215](https://issues.apache.org/jira/browse/SPARK-11215):
+`labels` in `StringIndexerModel` is deprecated and will be removed in 3.1.0. Use `labelsArray` instead.
+* [SPARK-25758](https://issues.apache.org/jira/browse/SPARK-25758):
+`computeCost` in `BisectingKMeansModel` is deprecated and will be removed in future versions. Use `ClusteringEvaluator` instead.
+
+**Changes of behavior**
+
 * [SPARK-11215](https://issues.apache.org/jira/browse/SPARK-11215):
  In Spark 2.4 and previous versions, when specifying `frequencyDesc` or `frequencyAsc` as
  `stringOrderType` param in `StringIndexer`, in case of equal frequency, the order of
  strings is undefined. Since Spark 3.0, the strings with equal frequency are further
  sorted by alphabet. And since Spark 3.0, `StringIndexer` supports encoding multiple
  columns.
+ * [SPARK-20604](https://issues.apache.org/jira/browse/SPARK-20604):
+ In prior to 3.0 releases, `Imputer` requires input column to be Double or Float. In 3.0, this
+ restriction is lifted so `Imputer` can handle all numeric types.
+* [SPARK-23469](https://issues.apache.org/jira/browse/SPARK-23469):
+In Spark 3.0, the `HashingTF` Transformer uses a corrected implementation of the murmur3 hash
+function to hash elements to vectors. `HashingTF` fits with Spark 3.0 will map elements to
 
 Review comment:
   Does ``` `HashingTF` fits ``` means ``` `HashingTF.fits` ``` here?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27785: [SPARK-30934][ML][DOCS] Update ml-guide and ml-migration-guide for 3.0 release

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27785: [SPARK-30934][ML][DOCS] Update ml-guide and ml-migration-guide for 3.0 release
URL: https://github.com/apache/spark/pull/27785#issuecomment-594358944
 
 
   **[Test build #119282 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/119282/testReport)** for PR 27785 at commit [`2ef974e`](https://github.com/apache/spark/commit/2ef974e276db711d623a61e40f558d09789f9bc8).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] huaxingao commented on a change in pull request #27785: [SPARK-30934][ML][DOCS] Update ml-guide and ml-migration-guide for 3.0 release

Posted by GitBox <gi...@apache.org>.
huaxingao commented on a change in pull request #27785: [SPARK-30934][ML][DOCS] Update ml-guide and ml-migration-guide for 3.0 release
URL: https://github.com/apache/spark/pull/27785#discussion_r388093510
 
 

 ##########
 File path: docs/ml-guide.md
 ##########
 @@ -87,31 +85,41 @@ To use MLlib in Python, you will need [NumPy](http://www.numpy.org) version 1.4
 [^1]: To learn more about the benefits and background of system optimised natives, you may wish to
     watch Sam Halliday's ScalaX talk on [High Performance Linear Algebra in Scala](http://fommil.github.io/scalax14/#/).
 
-# Highlights in 2.3
+# Highlights in 3.0
 
-The list below highlights some of the new features and enhancements added to MLlib in the `2.3`
+The list below highlights some of the new features and enhancements added to MLlib in the `3.0`
 release of Spark:
 
-* Built-in support for reading images into a `DataFrame` was added
-([SPARK-21866](https://issues.apache.org/jira/browse/SPARK-21866)).
-* [`OneHotEncoderEstimator`](ml-features.html#onehotencoderestimator) was added, and should be
-used instead of the existing `OneHotEncoder` transformer. The new estimator supports
-transforming multiple columns.
-* Multiple column support was also added to `QuantileDiscretizer` and `Bucketizer`
-([SPARK-22397](https://issues.apache.org/jira/browse/SPARK-22397) and
-[SPARK-20542](https://issues.apache.org/jira/browse/SPARK-20542))
-* A new [`FeatureHasher`](ml-features.html#featurehasher) transformer was added
- ([SPARK-13969](https://issues.apache.org/jira/browse/SPARK-13969)).
-* Added support for evaluating multiple models in parallel when performing cross-validation using
-[`TrainValidationSplit` or `CrossValidator`](ml-tuning.html)
-([SPARK-19357](https://issues.apache.org/jira/browse/SPARK-19357)).
-* Improved support for custom pipeline components in Python (see
-[SPARK-21633](https://issues.apache.org/jira/browse/SPARK-21633) and 
-[SPARK-21542](https://issues.apache.org/jira/browse/SPARK-21542)).
-* `DataFrame` functions for descriptive summary statistics over vector columns
-([SPARK-19634](https://issues.apache.org/jira/browse/SPARK-19634)).
-* Robust linear regression with Huber loss
-([SPARK-3181](https://issues.apache.org/jira/browse/SPARK-3181)).
+* Multiple columns support was added to `Binarizer`, `StringIndexer`, `StopWordsRemover` and PySpark `QuantileDiscretizer`
+([SPARK-23578](https://issues.apache.org/jira/browse/SPARK-23578)),
+([SPARK-11215](https://issues.apache.org/jira/browse/SPARK-11215)),
+([SPARK-29808](https://issues.apache.org/jira/browse/SPARK-29808)),
+([SPARK-22796](https://issues.apache.org/jira/browse/SPARK-22796)).
+* Support Tree-Based Feature Transformation was added
+([SPARK-13677](https://issues.apache.org/jira/browse/SPARK-13677)).
+* Two new evaluators `MultilabelClassificationEvaluator` and `RankingEvaluator` were added
+([SPARK-16692](https://issues.apache.org/jira/browse/SPARK-16692)),
+([SPARK-28045](https://issues.apache.org/jira/browse/SPARK-28045)).
+* Sample weights support was added in `DecisionTreeClassifier/Regressor`, `RandomForestClassifier/Regressor`, `BisectingKMeans`, `KMeans` and `GaussianMixture`
+([SPARK-19591](https://issues.apache.org/jira/browse/SPARK-19591)),
+([SPARK-9478](https://issues.apache.org/jira/browse/SPARK-9478)),
+([SPARK-30351](https://issues.apache.org/jira/browse/SPARK-30351)),
+([SPARK-29967](https://issues.apache.org/jira/browse/SPARK-29967)),
+([SPARK-30102](https://issues.apache.org/jira/browse/SPARK-30102)).
+* R API for `PowerIterationClustering` was added
+([SPARK-19827](https://issues.apache.org/jira/browse/SPARK-19827)).
+* Added Spark ML listener for tracking ML pipeline status
+([SPARK-23674](https://issues.apache.org/jira/browse/SPARK-23674)).
+* Fit with validation set was added to Gradient Boosted Trees in Python
+([SPARK-24333](https://issues.apache.org/jira/browse/SPARK-24333)).
+* [`RobustScaler`](ml-features.html#robustscaler) transformer was added
+([SPARK-28399](https://issues.apache.org/jira/browse/SPARK-28399)).
+* [`Factorization Machines`](ml-classification-regression.html#factorization-machines) classifier and regressor were added
+([SPARK-29224](https://issues.apache.org/jira/browse/SPARK-29224)).
+* Complement Naive Bayes Classifier was added
+([SPARK-29942](https://issues.apache.org/jira/browse/SPARK-29942)).
+* ML function parity between Scala and Python
+([SPARK-28958](https://issues.apache.org/jira/browse/SPARK-28958)).
 
 
 Review comment:
   Will add all these you have mentioned. 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] srowen closed pull request #27785: [SPARK-30934][ML][DOCS] Update ml-guide and ml-migration-guide for 3.0 release

Posted by GitBox <gi...@apache.org>.
srowen closed pull request #27785: [SPARK-30934][ML][DOCS] Update ml-guide and ml-migration-guide for 3.0 release
URL: https://github.com/apache/spark/pull/27785
 
 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] viirya commented on a change in pull request #27785: [SPARK-30934][ML][DOCS] Update ml-guide and ml-migration-guide for 3.0 release

Posted by GitBox <gi...@apache.org>.
viirya commented on a change in pull request #27785: [SPARK-30934][ML][DOCS] Update ml-guide and ml-migration-guide for 3.0 release
URL: https://github.com/apache/spark/pull/27785#discussion_r388037736
 
 

 ##########
 File path: docs/ml-migration-guide.md
 ##########
 @@ -33,16 +33,65 @@ Please refer [Migration Guide: SQL, Datasets and DataFrame](sql-migration-guide.
 
 * `OneHotEncoder` which is deprecated in 2.3, is removed in 3.0 and `OneHotEncoderEstimator` is now renamed to `OneHotEncoder`.
 * `org.apache.spark.ml.image.ImageSchema.readImages` which is deprecated in 2.3, is removed in 3.0, use `spark.read.format('image')` instead.
+* `org.apache.spark.mllib.clustering.KMeans.train` with param Int `runs` which is deprecated in 2.1, is removed in 3.0, use `train` method without `runs` instead.
+* `org.apache.spark.mllib.classification.LogisticRegressionWithSGD` which is deprecated in 2.0, is removed in 3.0, use `org.apache.spark.ml.classification.LogisticRegression` or `spark.mllib.classification.LogisticRegressionWithLBFGS` instead.
+* `org.apache.spark.mllib.feature.ChiSqSelectorModel.isSorted ` which is deprecated in 2.1, is removed in 3.0, is not intended for subclasses to use.
+* `org.apache.spark.mllib.regression.RidgeRegressionWithSGD` which is deprecated in 2.0, is removed in 3.0, use `org.apache.spark.ml.regression.LinearRegression` with `elasticNetParam` = 0.0. Note the default `regParam` is 0.01 for `RidgeRegressionWithSGD`, but is 0.0 for `LinearRegression`.
+* `org.apache.spark.mllib.regression.LassoWithSGD` which is deprecated in 2.0, is removed in 3.0, use `org.apache.spark.ml.regression.LinearRegression` with `elasticNetParam` = 1.0. Note the default `regParam` is 0.01 for `LassoWithSGD`, but is 0.0 for `LinearRegression`.
+* `org.apache.spark.mllib.regression.LinearRegressionWithSGD` which is deprecated in 2.0, is removed in 3.0, use `org.apache.spark.ml.regression.LinearRegression` or `LBFGS` instead.
+* `org.apache.spark.mllib.clustering.KMeans.getRuns` and `setRuns` which are deprecated in 2.1, is removed in 3.0, have no effect since Spark 2.0.0.
+* `org.apache.spark.ml.LinearSVCModel.setWeightCol` which is deprecated in 2.4, is removed in 3.0, is not intended for users.
+* From 3.0, `org.apache.spark.ml.classification.MultilayerPerceptronClassificationModel` extends `MultilayerPerceptronParams` to expose the training params. As a result, layers in `MultilayerPerceptronClassificationModel` has been changed from Array[Int] to IntArrayParam. Users should use `MultilayerPerceptronClassificationModel.getLayers` instead of `MultilayerPerceptronClassificationModel.layers` to retrieve the size of layers.
+* `org.apache.spark.ml.classification.GBTClassifier.numTrees`  which is deprecated in 2.4.5, is removed in 3.0, use `getNumTrees` instead.
+* `org.apache.spark.ml.clustering.KMeansModel.computeCost` which is deprecated in 2.4, is removed in 3.0, use `ClusteringEvaluator` instead.
+* `org.apache.spark.mllib.evaluation.MulticlassMetrics.precision` which is deprecated in 2.0, is removed in 3.0, use `accuracy` instead.
+* `org.apache.spark.mllib.evaluation.MulticlassMetrics.recall` which is deprecated in 2.0, is removed in 3.0, use `accuracy` instead.
+* `org.apache.spark.mllib.evaluation.MulticlassMetrics.fMeasure` which is deprecated in 2.0, is removed in 3.0, use `accuracy` instead.
 
 Review comment:
   I still saw `precision`, `recall` and `fMeasure` in `MulticlassMetrics`?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27785: [SPARK-30934][ML][DOCS] Update ml-guide and ml-migration-guide for 3.0 release

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27785: [SPARK-30934][ML][DOCS] Update ml-guide and ml-migration-guide for 3.0 release
URL: https://github.com/apache/spark/pull/27785#issuecomment-594979776
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27785: [SPARK-30934][ML][DOCS] Update ml-guide and ml-migration-guide for 3.0 release

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27785: [SPARK-30934][ML][DOCS] Update ml-guide and ml-migration-guide for 3.0 release
URL: https://github.com/apache/spark/pull/27785#issuecomment-594359077
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/119282/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27785: [SPARK-30934][ML][DOCS] Update ml-guide and ml-migration-guide for 3.0 release

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27785: [SPARK-30934][ML][DOCS] Update ml-guide and ml-migration-guide for 3.0 release
URL: https://github.com/apache/spark/pull/27785#issuecomment-594359072
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhengruifeng commented on a change in pull request #27785: [SPARK-30934][ML][DOCS] Update ml-guide and ml-migration-guide for 3.0 release

Posted by GitBox <gi...@apache.org>.
zhengruifeng commented on a change in pull request #27785: [SPARK-30934][ML][DOCS] Update ml-guide and ml-migration-guide for 3.0 release
URL: https://github.com/apache/spark/pull/27785#discussion_r388075689
 
 

 ##########
 File path: docs/ml-guide.md
 ##########
 @@ -87,31 +85,41 @@ To use MLlib in Python, you will need [NumPy](http://www.numpy.org) version 1.4
 [^1]: To learn more about the benefits and background of system optimised natives, you may wish to
     watch Sam Halliday's ScalaX talk on [High Performance Linear Algebra in Scala](http://fommil.github.io/scalax14/#/).
 
-# Highlights in 2.3
+# Highlights in 3.0
 
-The list below highlights some of the new features and enhancements added to MLlib in the `2.3`
+The list below highlights some of the new features and enhancements added to MLlib in the `3.0`
 release of Spark:
 
-* Built-in support for reading images into a `DataFrame` was added
-([SPARK-21866](https://issues.apache.org/jira/browse/SPARK-21866)).
-* [`OneHotEncoderEstimator`](ml-features.html#onehotencoderestimator) was added, and should be
-used instead of the existing `OneHotEncoder` transformer. The new estimator supports
-transforming multiple columns.
-* Multiple column support was also added to `QuantileDiscretizer` and `Bucketizer`
-([SPARK-22397](https://issues.apache.org/jira/browse/SPARK-22397) and
-[SPARK-20542](https://issues.apache.org/jira/browse/SPARK-20542))
-* A new [`FeatureHasher`](ml-features.html#featurehasher) transformer was added
- ([SPARK-13969](https://issues.apache.org/jira/browse/SPARK-13969)).
-* Added support for evaluating multiple models in parallel when performing cross-validation using
-[`TrainValidationSplit` or `CrossValidator`](ml-tuning.html)
-([SPARK-19357](https://issues.apache.org/jira/browse/SPARK-19357)).
-* Improved support for custom pipeline components in Python (see
-[SPARK-21633](https://issues.apache.org/jira/browse/SPARK-21633) and 
-[SPARK-21542](https://issues.apache.org/jira/browse/SPARK-21542)).
-* `DataFrame` functions for descriptive summary statistics over vector columns
-([SPARK-19634](https://issues.apache.org/jira/browse/SPARK-19634)).
-* Robust linear regression with Huber loss
-([SPARK-3181](https://issues.apache.org/jira/browse/SPARK-3181)).
+* Multiple columns support was added to `Binarizer`, `StringIndexer`, `StopWordsRemover` and PySpark `QuantileDiscretizer`
+([SPARK-23578](https://issues.apache.org/jira/browse/SPARK-23578)),
+([SPARK-11215](https://issues.apache.org/jira/browse/SPARK-11215)),
+([SPARK-29808](https://issues.apache.org/jira/browse/SPARK-29808)),
+([SPARK-22796](https://issues.apache.org/jira/browse/SPARK-22796)).
+* Support Tree-Based Feature Transformation was added
+([SPARK-13677](https://issues.apache.org/jira/browse/SPARK-13677)).
+* Two new evaluators `MultilabelClassificationEvaluator` and `RankingEvaluator` were added
+([SPARK-16692](https://issues.apache.org/jira/browse/SPARK-16692)),
+([SPARK-28045](https://issues.apache.org/jira/browse/SPARK-28045)).
+* Sample weights support was added in `DecisionTreeClassifier/Regressor`, `RandomForestClassifier/Regressor`, `BisectingKMeans`, `KMeans` and `GaussianMixture`
 
 Review comment:
   [24102][24103] RegressionEvaluator and BinaryClassificationEvaluator also support weighting

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] huaxingao commented on issue #27785: [SPARK-30934][ML][DOCS] Update ml-guide and ml-migration-guide for 3.0 release

Posted by GitBox <gi...@apache.org>.
huaxingao commented on issue #27785: [SPARK-30934][ML][DOCS] Update ml-guide and ml-migration-guide for 3.0 release
URL: https://github.com/apache/spark/pull/27785#issuecomment-596167299
 
 
   Thank you so much! @srowen @viirya @zhengruifeng 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] viirya commented on a change in pull request #27785: [SPARK-30934][ML][DOCS] Update ml-guide and ml-migration-guide for 3.0 release

Posted by GitBox <gi...@apache.org>.
viirya commented on a change in pull request #27785: [SPARK-30934][ML][DOCS] Update ml-guide and ml-migration-guide for 3.0 release
URL: https://github.com/apache/spark/pull/27785#discussion_r388035757
 
 

 ##########
 File path: docs/ml-migration-guide.md
 ##########
 @@ -33,16 +33,65 @@ Please refer [Migration Guide: SQL, Datasets and DataFrame](sql-migration-guide.
 
 * `OneHotEncoder` which is deprecated in 2.3, is removed in 3.0 and `OneHotEncoderEstimator` is now renamed to `OneHotEncoder`.
 * `org.apache.spark.ml.image.ImageSchema.readImages` which is deprecated in 2.3, is removed in 3.0, use `spark.read.format('image')` instead.
+* `org.apache.spark.mllib.clustering.KMeans.train` with param Int `runs` which is deprecated in 2.1, is removed in 3.0, use `train` method without `runs` instead.
+* `org.apache.spark.mllib.classification.LogisticRegressionWithSGD` which is deprecated in 2.0, is removed in 3.0, use `org.apache.spark.ml.classification.LogisticRegression` or `spark.mllib.classification.LogisticRegressionWithLBFGS` instead.
+* `org.apache.spark.mllib.feature.ChiSqSelectorModel.isSorted ` which is deprecated in 2.1, is removed in 3.0, is not intended for subclasses to use.
+* `org.apache.spark.mllib.regression.RidgeRegressionWithSGD` which is deprecated in 2.0, is removed in 3.0, use `org.apache.spark.ml.regression.LinearRegression` with `elasticNetParam` = 0.0. Note the default `regParam` is 0.01 for `RidgeRegressionWithSGD`, but is 0.0 for `LinearRegression`.
+* `org.apache.spark.mllib.regression.LassoWithSGD` which is deprecated in 2.0, is removed in 3.0, use `org.apache.spark.ml.regression.LinearRegression` with `elasticNetParam` = 1.0. Note the default `regParam` is 0.01 for `LassoWithSGD`, but is 0.0 for `LinearRegression`.
+* `org.apache.spark.mllib.regression.LinearRegressionWithSGD` which is deprecated in 2.0, is removed in 3.0, use `org.apache.spark.ml.regression.LinearRegression` or `LBFGS` instead.
+* `org.apache.spark.mllib.clustering.KMeans.getRuns` and `setRuns` which are deprecated in 2.1, is removed in 3.0, have no effect since Spark 2.0.0.
+* `org.apache.spark.ml.LinearSVCModel.setWeightCol` which is deprecated in 2.4, is removed in 3.0, is not intended for users.
+* From 3.0, `org.apache.spark.ml.classification.MultilayerPerceptronClassificationModel` extends `MultilayerPerceptronParams` to expose the training params. As a result, layers in `MultilayerPerceptronClassificationModel` has been changed from Array[Int] to IntArrayParam. Users should use `MultilayerPerceptronClassificationModel.getLayers` instead of `MultilayerPerceptronClassificationModel.layers` to retrieve the size of layers.
 
 Review comment:
   ``` `layers` ``` in ``` `MultilayerPerceptronClassificationModel` ``` has been changed from ``` `Array[Int]` ``` to ``` `IntArrayParam` ```?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] viirya commented on a change in pull request #27785: [SPARK-30934][ML][DOCS] Update ml-guide and ml-migration-guide for 3.0 release

Posted by GitBox <gi...@apache.org>.
viirya commented on a change in pull request #27785: [SPARK-30934][ML][DOCS] Update ml-guide and ml-migration-guide for 3.0 release
URL: https://github.com/apache/spark/pull/27785#discussion_r388039730
 
 

 ##########
 File path: docs/ml-migration-guide.md
 ##########
 @@ -33,16 +33,65 @@ Please refer [Migration Guide: SQL, Datasets and DataFrame](sql-migration-guide.
 
 * `OneHotEncoder` which is deprecated in 2.3, is removed in 3.0 and `OneHotEncoderEstimator` is now renamed to `OneHotEncoder`.
 * `org.apache.spark.ml.image.ImageSchema.readImages` which is deprecated in 2.3, is removed in 3.0, use `spark.read.format('image')` instead.
+* `org.apache.spark.mllib.clustering.KMeans.train` with param Int `runs` which is deprecated in 2.1, is removed in 3.0, use `train` method without `runs` instead.
+* `org.apache.spark.mllib.classification.LogisticRegressionWithSGD` which is deprecated in 2.0, is removed in 3.0, use `org.apache.spark.ml.classification.LogisticRegression` or `spark.mllib.classification.LogisticRegressionWithLBFGS` instead.
+* `org.apache.spark.mllib.feature.ChiSqSelectorModel.isSorted ` which is deprecated in 2.1, is removed in 3.0, is not intended for subclasses to use.
+* `org.apache.spark.mllib.regression.RidgeRegressionWithSGD` which is deprecated in 2.0, is removed in 3.0, use `org.apache.spark.ml.regression.LinearRegression` with `elasticNetParam` = 0.0. Note the default `regParam` is 0.01 for `RidgeRegressionWithSGD`, but is 0.0 for `LinearRegression`.
+* `org.apache.spark.mllib.regression.LassoWithSGD` which is deprecated in 2.0, is removed in 3.0, use `org.apache.spark.ml.regression.LinearRegression` with `elasticNetParam` = 1.0. Note the default `regParam` is 0.01 for `LassoWithSGD`, but is 0.0 for `LinearRegression`.
+* `org.apache.spark.mllib.regression.LinearRegressionWithSGD` which is deprecated in 2.0, is removed in 3.0, use `org.apache.spark.ml.regression.LinearRegression` or `LBFGS` instead.
+* `org.apache.spark.mllib.clustering.KMeans.getRuns` and `setRuns` which are deprecated in 2.1, is removed in 3.0, have no effect since Spark 2.0.0.
+* `org.apache.spark.ml.LinearSVCModel.setWeightCol` which is deprecated in 2.4, is removed in 3.0, is not intended for users.
+* From 3.0, `org.apache.spark.ml.classification.MultilayerPerceptronClassificationModel` extends `MultilayerPerceptronParams` to expose the training params. As a result, layers in `MultilayerPerceptronClassificationModel` has been changed from Array[Int] to IntArrayParam. Users should use `MultilayerPerceptronClassificationModel.getLayers` instead of `MultilayerPerceptronClassificationModel.layers` to retrieve the size of layers.
+* `org.apache.spark.ml.classification.GBTClassifier.numTrees`  which is deprecated in 2.4.5, is removed in 3.0, use `getNumTrees` instead.
+* `org.apache.spark.ml.clustering.KMeansModel.computeCost` which is deprecated in 2.4, is removed in 3.0, use `ClusteringEvaluator` instead.
+* `org.apache.spark.mllib.evaluation.MulticlassMetrics.precision` which is deprecated in 2.0, is removed in 3.0, use `accuracy` instead.
+* `org.apache.spark.mllib.evaluation.MulticlassMetrics.recall` which is deprecated in 2.0, is removed in 3.0, use `accuracy` instead.
+* `org.apache.spark.mllib.evaluation.MulticlassMetrics.fMeasure` which is deprecated in 2.0, is removed in 3.0, use `accuracy` instead.
+* `org.apache.spark.ml.util.GeneralMLWriter.context` which is deprecated in 2.0, is removed in 3.0, use `session` instead.
+* `org.apache.spark.ml.util.MLWriter.context` which is deprecated in 2.0, is removed in 3.0, use `session` instead.
+* `org.apache.spark.ml.util.MLReader.context` which is deprecated in 2.0, is removed in 3.0, use `session` instead.
+* `abstract class UnaryTransformer[IN, OUT, T <: UnaryTransformer[IN, OUT, T]]` is changed to `abstract class UnaryTransformer[IN: TypeTag, OUT: TypeTag, T <: UnaryTransformer[IN, OUT, T]]` in 3.0.
 
-### Changes of behavior
+### Deprecations and changes of behavior
 {:.no_toc}
 
+**Deprecations**
+
+* [SPARK-11215](https://issues.apache.org/jira/browse/SPARK-11215):
+`labels` in `StringIndexerModel` is deprecated and will be removed in 3.1.0. Use `labelsArray` instead.
+* [SPARK-25758](https://issues.apache.org/jira/browse/SPARK-25758):
+`computeCost` in `BisectingKMeansModel` is deprecated and will be removed in future versions. Use `ClusteringEvaluator` instead.
+
+**Changes of behavior**
+
 * [SPARK-11215](https://issues.apache.org/jira/browse/SPARK-11215):
  In Spark 2.4 and previous versions, when specifying `frequencyDesc` or `frequencyAsc` as
  `stringOrderType` param in `StringIndexer`, in case of equal frequency, the order of
  strings is undefined. Since Spark 3.0, the strings with equal frequency are further
  sorted by alphabet. And since Spark 3.0, `StringIndexer` supports encoding multiple
  columns.
+ * [SPARK-20604](https://issues.apache.org/jira/browse/SPARK-20604):
+ In prior to 3.0 releases, `Imputer` requires input column to be Double or Float. In 3.0, this
+ restriction is lifted so `Imputer` can handle all numeric types.
+* [SPARK-23469](https://issues.apache.org/jira/browse/SPARK-23469):
+In Spark 3.0, the `HashingTF` Transformer uses a corrected implementation of the murmur3 hash
+function to hash elements to vectors. `HashingTF` fits with Spark 3.0 will map elements to
+different positions in vectors than in Spark 2. However, `HashingTF` created with Spark 2.x
+and loaded with Spark 3.0 will still use the previous hash function and will not change behavior.
+* [SPARK-28969](https://issues.apache.org/jira/browse/SPARK-28969):
+The `setClassifier` method in Pyspark's `OneVsRestModel` has been removed in 3.0 for parity with
 
 Review comment:
   Pyspark -> PySpark

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27785: [SPARK-30934][ML][DOCS] Update ml-guide and ml-migration-guide for 3.0 release

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27785: [SPARK-30934][ML][DOCS] Update ml-guide and ml-migration-guide for 3.0 release
URL: https://github.com/apache/spark/pull/27785#issuecomment-596124324
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27785: [SPARK-30934][ML][DOCS] Update ml-guide and ml-migration-guide for 3.0 release

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27785: [SPARK-30934][ML][DOCS] Update ml-guide and ml-migration-guide for 3.0 release
URL: https://github.com/apache/spark/pull/27785#issuecomment-596124286
 
 
   **[Test build #119521 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/119521/testReport)** for PR 27785 at commit [`e00a5a7`](https://github.com/apache/spark/commit/e00a5a751bcd769ed92ff26a84c702b176cf1671).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27785: [SPARK-30934][ML][DOCS] Update ml-guide and ml-migration-guide for 3.0 release

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27785: [SPARK-30934][ML][DOCS] Update ml-guide and ml-migration-guide for 3.0 release
URL: https://github.com/apache/spark/pull/27785#issuecomment-594355503
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/24022/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27785: [SPARK-30934][ML][DOCS] Update ml-guide and ml-migration-guide for 3.0 release

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27785: [SPARK-30934][ML][DOCS] Update ml-guide and ml-migration-guide for 3.0 release
URL: https://github.com/apache/spark/pull/27785#issuecomment-594355182
 
 
   **[Test build #119282 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/119282/testReport)** for PR 27785 at commit [`2ef974e`](https://github.com/apache/spark/commit/2ef974e276db711d623a61e40f558d09789f9bc8).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27785: [SPARK-30934][ML][DOCS] Update ml-guide and ml-migration-guide for 3.0 release

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27785: [SPARK-30934][ML][DOCS] Update ml-guide and ml-migration-guide for 3.0 release
URL: https://github.com/apache/spark/pull/27785#issuecomment-596123022
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/24250/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27785: [SPARK-30934][ML][DOCS] Update ml-guide and ml-migration-guide for 3.0 release

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27785: [SPARK-30934][ML][DOCS] Update ml-guide and ml-migration-guide for 3.0 release
URL: https://github.com/apache/spark/pull/27785#issuecomment-594982686
 
 
   **[Test build #119351 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/119351/testReport)** for PR 27785 at commit [`24e6c8f`](https://github.com/apache/spark/commit/24e6c8fde8947fe13b78f126bc5053c61d3a1e2b).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on issue #27785: [SPARK-30934][ML][DOCS] Update ml-guide and ml-migration-guide for 3.0 release

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on issue #27785: [SPARK-30934][ML][DOCS] Update ml-guide and ml-migration-guide for 3.0 release
URL: https://github.com/apache/spark/pull/27785#issuecomment-594355182
 
 
   **[Test build #119282 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/119282/testReport)** for PR 27785 at commit [`2ef974e`](https://github.com/apache/spark/commit/2ef974e276db711d623a61e40f558d09789f9bc8).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhengruifeng commented on a change in pull request #27785: [SPARK-30934][ML][DOCS] Update ml-guide and ml-migration-guide for 3.0 release

Posted by GitBox <gi...@apache.org>.
zhengruifeng commented on a change in pull request #27785: [SPARK-30934][ML][DOCS] Update ml-guide and ml-migration-guide for 3.0 release
URL: https://github.com/apache/spark/pull/27785#discussion_r388055551
 
 

 ##########
 File path: docs/ml-guide.md
 ##########
 @@ -87,31 +85,41 @@ To use MLlib in Python, you will need [NumPy](http://www.numpy.org) version 1.4
 [^1]: To learn more about the benefits and background of system optimised natives, you may wish to
     watch Sam Halliday's ScalaX talk on [High Performance Linear Algebra in Scala](http://fommil.github.io/scalax14/#/).
 
-# Highlights in 2.3
+# Highlights in 3.0
 
-The list below highlights some of the new features and enhancements added to MLlib in the `2.3`
+The list below highlights some of the new features and enhancements added to MLlib in the `3.0`
 release of Spark:
 
-* Built-in support for reading images into a `DataFrame` was added
-([SPARK-21866](https://issues.apache.org/jira/browse/SPARK-21866)).
-* [`OneHotEncoderEstimator`](ml-features.html#onehotencoderestimator) was added, and should be
-used instead of the existing `OneHotEncoder` transformer. The new estimator supports
-transforming multiple columns.
-* Multiple column support was also added to `QuantileDiscretizer` and `Bucketizer`
-([SPARK-22397](https://issues.apache.org/jira/browse/SPARK-22397) and
-[SPARK-20542](https://issues.apache.org/jira/browse/SPARK-20542))
-* A new [`FeatureHasher`](ml-features.html#featurehasher) transformer was added
- ([SPARK-13969](https://issues.apache.org/jira/browse/SPARK-13969)).
-* Added support for evaluating multiple models in parallel when performing cross-validation using
-[`TrainValidationSplit` or `CrossValidator`](ml-tuning.html)
-([SPARK-19357](https://issues.apache.org/jira/browse/SPARK-19357)).
-* Improved support for custom pipeline components in Python (see
-[SPARK-21633](https://issues.apache.org/jira/browse/SPARK-21633) and 
-[SPARK-21542](https://issues.apache.org/jira/browse/SPARK-21542)).
-* `DataFrame` functions for descriptive summary statistics over vector columns
-([SPARK-19634](https://issues.apache.org/jira/browse/SPARK-19634)).
-* Robust linear regression with Huber loss
-([SPARK-3181](https://issues.apache.org/jira/browse/SPARK-3181)).
+* Multiple columns support was added to `Binarizer`, `StringIndexer`, `StopWordsRemover` and PySpark `QuantileDiscretizer`
+([SPARK-23578](https://issues.apache.org/jira/browse/SPARK-23578)),
+([SPARK-11215](https://issues.apache.org/jira/browse/SPARK-11215)),
+([SPARK-29808](https://issues.apache.org/jira/browse/SPARK-29808)),
+([SPARK-22796](https://issues.apache.org/jira/browse/SPARK-22796)).
+* Support Tree-Based Feature Transformation was added
+([SPARK-13677](https://issues.apache.org/jira/browse/SPARK-13677)).
+* Two new evaluators `MultilabelClassificationEvaluator` and `RankingEvaluator` were added
+([SPARK-16692](https://issues.apache.org/jira/browse/SPARK-16692)),
+([SPARK-28045](https://issues.apache.org/jira/browse/SPARK-28045)).
+* Sample weights support was added in `DecisionTreeClassifier/Regressor`, `RandomForestClassifier/Regressor`, `BisectingKMeans`, `KMeans` and `GaussianMixture`
 
 Review comment:
   GBTClassifier/Regressor also support sample weighting in [SPARK-9612]

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] huaxingao commented on a change in pull request #27785: [SPARK-30934][ML][DOCS] Update ml-guide and ml-migration-guide for 3.0 release

Posted by GitBox <gi...@apache.org>.
huaxingao commented on a change in pull request #27785: [SPARK-30934][ML][DOCS] Update ml-guide and ml-migration-guide for 3.0 release
URL: https://github.com/apache/spark/pull/27785#discussion_r388093414
 
 

 ##########
 File path: docs/ml-migration-guide.md
 ##########
 @@ -33,16 +33,65 @@ Please refer [Migration Guide: SQL, Datasets and DataFrame](sql-migration-guide.
 
 * `OneHotEncoder` which is deprecated in 2.3, is removed in 3.0 and `OneHotEncoderEstimator` is now renamed to `OneHotEncoder`.
 * `org.apache.spark.ml.image.ImageSchema.readImages` which is deprecated in 2.3, is removed in 3.0, use `spark.read.format('image')` instead.
+* `org.apache.spark.mllib.clustering.KMeans.train` with param Int `runs` which is deprecated in 2.1, is removed in 3.0, use `train` method without `runs` instead.
+* `org.apache.spark.mllib.classification.LogisticRegressionWithSGD` which is deprecated in 2.0, is removed in 3.0, use `org.apache.spark.ml.classification.LogisticRegression` or `spark.mllib.classification.LogisticRegressionWithLBFGS` instead.
+* `org.apache.spark.mllib.feature.ChiSqSelectorModel.isSorted ` which is deprecated in 2.1, is removed in 3.0, is not intended for subclasses to use.
+* `org.apache.spark.mllib.regression.RidgeRegressionWithSGD` which is deprecated in 2.0, is removed in 3.0, use `org.apache.spark.ml.regression.LinearRegression` with `elasticNetParam` = 0.0. Note the default `regParam` is 0.01 for `RidgeRegressionWithSGD`, but is 0.0 for `LinearRegression`.
+* `org.apache.spark.mllib.regression.LassoWithSGD` which is deprecated in 2.0, is removed in 3.0, use `org.apache.spark.ml.regression.LinearRegression` with `elasticNetParam` = 1.0. Note the default `regParam` is 0.01 for `LassoWithSGD`, but is 0.0 for `LinearRegression`.
+* `org.apache.spark.mllib.regression.LinearRegressionWithSGD` which is deprecated in 2.0, is removed in 3.0, use `org.apache.spark.ml.regression.LinearRegression` or `LBFGS` instead.
+* `org.apache.spark.mllib.clustering.KMeans.getRuns` and `setRuns` which are deprecated in 2.1, is removed in 3.0, have no effect since Spark 2.0.0.
+* `org.apache.spark.ml.LinearSVCModel.setWeightCol` which is deprecated in 2.4, is removed in 3.0, is not intended for users.
+* From 3.0, `org.apache.spark.ml.classification.MultilayerPerceptronClassificationModel` extends `MultilayerPerceptronParams` to expose the training params. As a result, layers in `MultilayerPerceptronClassificationModel` has been changed from Array[Int] to IntArrayParam. Users should use `MultilayerPerceptronClassificationModel.getLayers` instead of `MultilayerPerceptronClassificationModel.layers` to retrieve the size of layers.
+* `org.apache.spark.ml.classification.GBTClassifier.numTrees`  which is deprecated in 2.4.5, is removed in 3.0, use `getNumTrees` instead.
+* `org.apache.spark.ml.clustering.KMeansModel.computeCost` which is deprecated in 2.4, is removed in 3.0, use `ClusteringEvaluator` instead.
+* `org.apache.spark.mllib.evaluation.MulticlassMetrics.precision` which is deprecated in 2.0, is removed in 3.0, use `accuracy` instead.
+* `org.apache.spark.mllib.evaluation.MulticlassMetrics.recall` which is deprecated in 2.0, is removed in 3.0, use `accuracy` instead.
+* `org.apache.spark.mllib.evaluation.MulticlassMetrics.fMeasure` which is deprecated in 2.0, is removed in 3.0, use `accuracy` instead.
+* `org.apache.spark.ml.util.GeneralMLWriter.context` which is deprecated in 2.0, is removed in 3.0, use `session` instead.
+* `org.apache.spark.ml.util.MLWriter.context` which is deprecated in 2.0, is removed in 3.0, use `session` instead.
+* `org.apache.spark.ml.util.MLReader.context` which is deprecated in 2.0, is removed in 3.0, use `session` instead.
+* `abstract class UnaryTransformer[IN, OUT, T <: UnaryTransformer[IN, OUT, T]]` is changed to `abstract class UnaryTransformer[IN: TypeTag, OUT: TypeTag, T <: UnaryTransformer[IN, OUT, T]]` in 3.0.
 
-### Changes of behavior
+### Deprecations and changes of behavior
 {:.no_toc}
 
+**Deprecations**
+
+* [SPARK-11215](https://issues.apache.org/jira/browse/SPARK-11215):
+`labels` in `StringIndexerModel` is deprecated and will be removed in 3.1.0. Use `labelsArray` instead.
+* [SPARK-25758](https://issues.apache.org/jira/browse/SPARK-25758):
+`computeCost` in `BisectingKMeansModel` is deprecated and will be removed in future versions. Use `ClusteringEvaluator` instead.
+
+**Changes of behavior**
+
 * [SPARK-11215](https://issues.apache.org/jira/browse/SPARK-11215):
  In Spark 2.4 and previous versions, when specifying `frequencyDesc` or `frequencyAsc` as
  `stringOrderType` param in `StringIndexer`, in case of equal frequency, the order of
  strings is undefined. Since Spark 3.0, the strings with equal frequency are further
  sorted by alphabet. And since Spark 3.0, `StringIndexer` supports encoding multiple
  columns.
+ * [SPARK-20604](https://issues.apache.org/jira/browse/SPARK-20604):
+ In prior to 3.0 releases, `Imputer` requires input column to be Double or Float. In 3.0, this
+ restriction is lifted so `Imputer` can handle all numeric types.
+* [SPARK-23469](https://issues.apache.org/jira/browse/SPARK-23469):
+In Spark 3.0, the `HashingTF` Transformer uses a corrected implementation of the murmur3 hash
+function to hash elements to vectors. `HashingTF` fits with Spark 3.0 will map elements to
+different positions in vectors than in Spark 2. However, `HashingTF` created with Spark 2.x
+and loaded with Spark 3.0 will still use the previous hash function and will not change behavior.
+* [SPARK-28969](https://issues.apache.org/jira/browse/SPARK-28969):
+The `setClassifier` method in Pyspark's `OneVsRestModel` has been removed in 3.0 for parity with
 
 Review comment:
   Will update

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27785: [SPARK-30934][ML][DOCS] Update ml-guide and ml-migration-guide for 3.0 release

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27785: [SPARK-30934][ML][DOCS] Update ml-guide and ml-migration-guide for 3.0 release
URL: https://github.com/apache/spark/pull/27785#issuecomment-595073581
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27785: [SPARK-30934][ML][DOCS] Update ml-guide and ml-migration-guide for 3.0 release

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27785: [SPARK-30934][ML][DOCS] Update ml-guide and ml-migration-guide for 3.0 release
URL: https://github.com/apache/spark/pull/27785#issuecomment-595073049
 
 
   **[Test build #119370 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/119370/testReport)** for PR 27785 at commit [`1a19da7`](https://github.com/apache/spark/commit/1a19da714c4a607454d3dd36d827893454190552).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27785: [SPARK-30934][ML][DOCS] Update ml-guide and ml-migration-guide for 3.0 release

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27785: [SPARK-30934][ML][DOCS] Update ml-guide and ml-migration-guide for 3.0 release
URL: https://github.com/apache/spark/pull/27785#issuecomment-595077974
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/119370/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] huaxingao commented on a change in pull request #27785: [SPARK-30934][ML][DOCS] Update ml-guide and ml-migration-guide for 3.0 release

Posted by GitBox <gi...@apache.org>.
huaxingao commented on a change in pull request #27785: [SPARK-30934][ML][DOCS] Update ml-guide and ml-migration-guide for 3.0 release
URL: https://github.com/apache/spark/pull/27785#discussion_r389300249
 
 

 ##########
 File path: docs/sql-ref-syntax-ddl-alter-table.md
 ##########
 @@ -23,7 +23,7 @@ license: |
 `ALTER TABLE` statement changes the schema or properties of a table.
 
 ### RENAME 
-`ALTER TABLE RENAME` statement changes the table name of an existing table in the database.
+`ALTER TABLE RENAME TO` statement changes the table name of an existing table in the database.
 
 Review comment:
   Sorry, I didn't realize this went in. Will remove now.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on issue #27785: [SPARK-30934][ML][DOCS] Update ml-guide and ml-migration-guide for 3.0 release

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on issue #27785: [SPARK-30934][ML][DOCS] Update ml-guide and ml-migration-guide for 3.0 release
URL: https://github.com/apache/spark/pull/27785#issuecomment-595073049
 
 
   **[Test build #119370 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/119370/testReport)** for PR 27785 at commit [`1a19da7`](https://github.com/apache/spark/commit/1a19da714c4a607454d3dd36d827893454190552).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27785: [SPARK-30934][ML][DOCS] Update ml-guide and ml-migration-guide for 3.0 release

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27785: [SPARK-30934][ML][DOCS] Update ml-guide and ml-migration-guide for 3.0 release
URL: https://github.com/apache/spark/pull/27785#issuecomment-594979489
 
 
   **[Test build #119351 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/119351/testReport)** for PR 27785 at commit [`24e6c8f`](https://github.com/apache/spark/commit/24e6c8fde8947fe13b78f126bc5053c61d3a1e2b).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] viirya commented on a change in pull request #27785: [SPARK-30934][ML][DOCS] Update ml-guide and ml-migration-guide for 3.0 release

Posted by GitBox <gi...@apache.org>.
viirya commented on a change in pull request #27785: [SPARK-30934][ML][DOCS] Update ml-guide and ml-migration-guide for 3.0 release
URL: https://github.com/apache/spark/pull/27785#discussion_r388038788
 
 

 ##########
 File path: docs/ml-migration-guide.md
 ##########
 @@ -33,16 +33,65 @@ Please refer [Migration Guide: SQL, Datasets and DataFrame](sql-migration-guide.
 
 * `OneHotEncoder` which is deprecated in 2.3, is removed in 3.0 and `OneHotEncoderEstimator` is now renamed to `OneHotEncoder`.
 * `org.apache.spark.ml.image.ImageSchema.readImages` which is deprecated in 2.3, is removed in 3.0, use `spark.read.format('image')` instead.
+* `org.apache.spark.mllib.clustering.KMeans.train` with param Int `runs` which is deprecated in 2.1, is removed in 3.0, use `train` method without `runs` instead.
 
 Review comment:
   Is it better to replace `... which is deprecated in 2.3, is removed in 3.0, use ...` with `... which is deprecated in 2.3, is removed in 3.0. Use ...`?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] huaxingao commented on issue #27785: [SPARK-30934][ML][DOCS] Update ml-guide and ml-migration-guide for 3.0 release

Posted by GitBox <gi...@apache.org>.
huaxingao commented on issue #27785: [SPARK-30934][ML][DOCS] Update ml-guide and ml-migration-guide for 3.0 release
URL: https://github.com/apache/spark/pull/27785#issuecomment-594684927
 
 
   Thanks @srowen for reviewing! Also cc @WeichenXu123 @zhengruifeng 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27785: [SPARK-30934][ML][DOCS] Update ml-guide and ml-migration-guide for 3.0 release

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27785: [SPARK-30934][ML][DOCS] Update ml-guide and ml-migration-guide for 3.0 release
URL: https://github.com/apache/spark/pull/27785#issuecomment-594982789
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] huaxingao commented on a change in pull request #27785: [SPARK-30934][ML][DOCS] Update ml-guide and ml-migration-guide for 3.0 release

Posted by GitBox <gi...@apache.org>.
huaxingao commented on a change in pull request #27785: [SPARK-30934][ML][DOCS] Update ml-guide and ml-migration-guide for 3.0 release
URL: https://github.com/apache/spark/pull/27785#discussion_r388093328
 
 

 ##########
 File path: docs/ml-migration-guide.md
 ##########
 @@ -33,16 +33,65 @@ Please refer [Migration Guide: SQL, Datasets and DataFrame](sql-migration-guide.
 
 * `OneHotEncoder` which is deprecated in 2.3, is removed in 3.0 and `OneHotEncoderEstimator` is now renamed to `OneHotEncoder`.
 * `org.apache.spark.ml.image.ImageSchema.readImages` which is deprecated in 2.3, is removed in 3.0, use `spark.read.format('image')` instead.
+* `org.apache.spark.mllib.clustering.KMeans.train` with param Int `runs` which is deprecated in 2.1, is removed in 3.0, use `train` method without `runs` instead.
 
 Review comment:
   Will change to 
   ```
   `org.apache.spark.mllib.clustering.KMeans.train` with param Int `runs` which is deprecated in 2.1, is removed in 3.0. Use `train` method without `runs` instead.
   ```
   Is there a better way to say this? @srowen 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27785: [SPARK-30934][ML][DOCS] Update ml-guide and ml-migration-guide for 3.0 release

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27785: [SPARK-30934][ML][DOCS] Update ml-guide and ml-migration-guide for 3.0 release
URL: https://github.com/apache/spark/pull/27785#issuecomment-594982797
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/119351/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27785: [SPARK-30934][ML][DOCS] Update ml-guide and ml-migration-guide for 3.0 release

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27785: [SPARK-30934][ML][DOCS] Update ml-guide and ml-migration-guide for 3.0 release
URL: https://github.com/apache/spark/pull/27785#issuecomment-596123020
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] huaxingao commented on a change in pull request #27785: [SPARK-30934][ML][DOCS] Update ml-guide and ml-migration-guide for 3.0 release

Posted by GitBox <gi...@apache.org>.
huaxingao commented on a change in pull request #27785: [SPARK-30934][ML][DOCS] Update ml-guide and ml-migration-guide for 3.0 release
URL: https://github.com/apache/spark/pull/27785#discussion_r388093376
 
 

 ##########
 File path: docs/ml-migration-guide.md
 ##########
 @@ -33,16 +33,65 @@ Please refer [Migration Guide: SQL, Datasets and DataFrame](sql-migration-guide.
 
 * `OneHotEncoder` which is deprecated in 2.3, is removed in 3.0 and `OneHotEncoderEstimator` is now renamed to `OneHotEncoder`.
 * `org.apache.spark.ml.image.ImageSchema.readImages` which is deprecated in 2.3, is removed in 3.0, use `spark.read.format('image')` instead.
+* `org.apache.spark.mllib.clustering.KMeans.train` with param Int `runs` which is deprecated in 2.1, is removed in 3.0, use `train` method without `runs` instead.
+* `org.apache.spark.mllib.classification.LogisticRegressionWithSGD` which is deprecated in 2.0, is removed in 3.0, use `org.apache.spark.ml.classification.LogisticRegression` or `spark.mllib.classification.LogisticRegressionWithLBFGS` instead.
+* `org.apache.spark.mllib.feature.ChiSqSelectorModel.isSorted ` which is deprecated in 2.1, is removed in 3.0, is not intended for subclasses to use.
+* `org.apache.spark.mllib.regression.RidgeRegressionWithSGD` which is deprecated in 2.0, is removed in 3.0, use `org.apache.spark.ml.regression.LinearRegression` with `elasticNetParam` = 0.0. Note the default `regParam` is 0.01 for `RidgeRegressionWithSGD`, but is 0.0 for `LinearRegression`.
+* `org.apache.spark.mllib.regression.LassoWithSGD` which is deprecated in 2.0, is removed in 3.0, use `org.apache.spark.ml.regression.LinearRegression` with `elasticNetParam` = 1.0. Note the default `regParam` is 0.01 for `LassoWithSGD`, but is 0.0 for `LinearRegression`.
+* `org.apache.spark.mllib.regression.LinearRegressionWithSGD` which is deprecated in 2.0, is removed in 3.0, use `org.apache.spark.ml.regression.LinearRegression` or `LBFGS` instead.
+* `org.apache.spark.mllib.clustering.KMeans.getRuns` and `setRuns` which are deprecated in 2.1, is removed in 3.0, have no effect since Spark 2.0.0.
+* `org.apache.spark.ml.LinearSVCModel.setWeightCol` which is deprecated in 2.4, is removed in 3.0, is not intended for users.
+* From 3.0, `org.apache.spark.ml.classification.MultilayerPerceptronClassificationModel` extends `MultilayerPerceptronParams` to expose the training params. As a result, layers in `MultilayerPerceptronClassificationModel` has been changed from Array[Int] to IntArrayParam. Users should use `MultilayerPerceptronClassificationModel.getLayers` instead of `MultilayerPerceptronClassificationModel.layers` to retrieve the size of layers.
+* `org.apache.spark.ml.classification.GBTClassifier.numTrees`  which is deprecated in 2.4.5, is removed in 3.0, use `getNumTrees` instead.
+* `org.apache.spark.ml.clustering.KMeansModel.computeCost` which is deprecated in 2.4, is removed in 3.0, use `ClusteringEvaluator` instead.
+* `org.apache.spark.mllib.evaluation.MulticlassMetrics.precision` which is deprecated in 2.0, is removed in 3.0, use `accuracy` instead.
+* `org.apache.spark.mllib.evaluation.MulticlassMetrics.recall` which is deprecated in 2.0, is removed in 3.0, use `accuracy` instead.
+* `org.apache.spark.mllib.evaluation.MulticlassMetrics.fMeasure` which is deprecated in 2.0, is removed in 3.0, use `accuracy` instead.
+* `org.apache.spark.ml.util.GeneralMLWriter.context` which is deprecated in 2.0, is removed in 3.0, use `session` instead.
+* `org.apache.spark.ml.util.MLWriter.context` which is deprecated in 2.0, is removed in 3.0, use `session` instead.
+* `org.apache.spark.ml.util.MLReader.context` which is deprecated in 2.0, is removed in 3.0, use `session` instead.
+* `abstract class UnaryTransformer[IN, OUT, T <: UnaryTransformer[IN, OUT, T]]` is changed to `abstract class UnaryTransformer[IN: TypeTag, OUT: TypeTag, T <: UnaryTransformer[IN, OUT, T]]` in 3.0.
 
-### Changes of behavior
+### Deprecations and changes of behavior
 {:.no_toc}
 
+**Deprecations**
+
+* [SPARK-11215](https://issues.apache.org/jira/browse/SPARK-11215):
+`labels` in `StringIndexerModel` is deprecated and will be removed in 3.1.0. Use `labelsArray` instead.
+* [SPARK-25758](https://issues.apache.org/jira/browse/SPARK-25758):
+`computeCost` in `BisectingKMeansModel` is deprecated and will be removed in future versions. Use `ClusteringEvaluator` instead.
+
+**Changes of behavior**
+
 * [SPARK-11215](https://issues.apache.org/jira/browse/SPARK-11215):
  In Spark 2.4 and previous versions, when specifying `frequencyDesc` or `frequencyAsc` as
  `stringOrderType` param in `StringIndexer`, in case of equal frequency, the order of
  strings is undefined. Since Spark 3.0, the strings with equal frequency are further
  sorted by alphabet. And since Spark 3.0, `StringIndexer` supports encoding multiple
  columns.
+ * [SPARK-20604](https://issues.apache.org/jira/browse/SPARK-20604):
+ In prior to 3.0 releases, `Imputer` requires input column to be Double or Float. In 3.0, this
+ restriction is lifted so `Imputer` can handle all numeric types.
+* [SPARK-23469](https://issues.apache.org/jira/browse/SPARK-23469):
+In Spark 3.0, the `HashingTF` Transformer uses a corrected implementation of the murmur3 hash
+function to hash elements to vectors. `HashingTF` fits with Spark 3.0 will map elements to
 
 Review comment:
   I will change to 
   ```  `HashingTF` in Spark 3.0 ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] viirya commented on a change in pull request #27785: [SPARK-30934][ML][DOCS] Update ml-guide and ml-migration-guide for 3.0 release

Posted by GitBox <gi...@apache.org>.
viirya commented on a change in pull request #27785: [SPARK-30934][ML][DOCS] Update ml-guide and ml-migration-guide for 3.0 release
URL: https://github.com/apache/spark/pull/27785#discussion_r388084507
 
 

 ##########
 File path: docs/ml-migration-guide.md
 ##########
 @@ -33,16 +33,65 @@ Please refer [Migration Guide: SQL, Datasets and DataFrame](sql-migration-guide.
 
 * `OneHotEncoder` which is deprecated in 2.3, is removed in 3.0 and `OneHotEncoderEstimator` is now renamed to `OneHotEncoder`.
 * `org.apache.spark.ml.image.ImageSchema.readImages` which is deprecated in 2.3, is removed in 3.0, use `spark.read.format('image')` instead.
+* `org.apache.spark.mllib.clustering.KMeans.train` with param Int `runs` which is deprecated in 2.1, is removed in 3.0, use `train` method without `runs` instead.
+* `org.apache.spark.mllib.classification.LogisticRegressionWithSGD` which is deprecated in 2.0, is removed in 3.0, use `org.apache.spark.ml.classification.LogisticRegression` or `spark.mllib.classification.LogisticRegressionWithLBFGS` instead.
+* `org.apache.spark.mllib.feature.ChiSqSelectorModel.isSorted ` which is deprecated in 2.1, is removed in 3.0, is not intended for subclasses to use.
+* `org.apache.spark.mllib.regression.RidgeRegressionWithSGD` which is deprecated in 2.0, is removed in 3.0, use `org.apache.spark.ml.regression.LinearRegression` with `elasticNetParam` = 0.0. Note the default `regParam` is 0.01 for `RidgeRegressionWithSGD`, but is 0.0 for `LinearRegression`.
+* `org.apache.spark.mllib.regression.LassoWithSGD` which is deprecated in 2.0, is removed in 3.0, use `org.apache.spark.ml.regression.LinearRegression` with `elasticNetParam` = 1.0. Note the default `regParam` is 0.01 for `LassoWithSGD`, but is 0.0 for `LinearRegression`.
+* `org.apache.spark.mllib.regression.LinearRegressionWithSGD` which is deprecated in 2.0, is removed in 3.0, use `org.apache.spark.ml.regression.LinearRegression` or `LBFGS` instead.
+* `org.apache.spark.mllib.clustering.KMeans.getRuns` and `setRuns` which are deprecated in 2.1, is removed in 3.0, have no effect since Spark 2.0.0.
+* `org.apache.spark.ml.LinearSVCModel.setWeightCol` which is deprecated in 2.4, is removed in 3.0, is not intended for users.
+* From 3.0, `org.apache.spark.ml.classification.MultilayerPerceptronClassificationModel` extends `MultilayerPerceptronParams` to expose the training params. As a result, layers in `MultilayerPerceptronClassificationModel` has been changed from Array[Int] to IntArrayParam. Users should use `MultilayerPerceptronClassificationModel.getLayers` instead of `MultilayerPerceptronClassificationModel.layers` to retrieve the size of layers.
 
 Review comment:
   yea, I just mean to add quotes to layers, Array[Int] and IntArrayParam.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27785: [SPARK-30934][ML][DOCS] Update ml-guide and ml-migration-guide for 3.0 release

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27785: [SPARK-30934][ML][DOCS] Update ml-guide and ml-migration-guide for 3.0 release
URL: https://github.com/apache/spark/pull/27785#issuecomment-594359077
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/119282/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] srowen commented on a change in pull request #27785: [SPARK-30934][ML][DOCS] Update ml-guide and ml-migration-guide for 3.0 release

Posted by GitBox <gi...@apache.org>.
srowen commented on a change in pull request #27785: [SPARK-30934][ML][DOCS] Update ml-guide and ml-migration-guide for 3.0 release
URL: https://github.com/apache/spark/pull/27785#discussion_r389299761
 
 

 ##########
 File path: docs/sql-ref-syntax-ddl-alter-table.md
 ##########
 @@ -23,7 +23,7 @@ license: |
 `ALTER TABLE` statement changes the schema or properties of a table.
 
 ### RENAME 
-`ALTER TABLE RENAME` statement changes the table name of an existing table in the database.
+`ALTER TABLE RENAME TO` statement changes the table name of an existing table in the database.
 
 Review comment:
   Are these changes meant to be in this PR? looks somewhat unrelated

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27785: [SPARK-30934][ML][DOCS] Update ml-guide and ml-migration-guide for 3.0 release

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27785: [SPARK-30934][ML][DOCS] Update ml-guide and ml-migration-guide for 3.0 release
URL: https://github.com/apache/spark/pull/27785#issuecomment-594982797
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/119351/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org