You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Nick Pentreath (JIRA)" <ji...@apache.org> on 2016/07/04 13:37:11 UTC

[jira] [Commented] (SPARK-13448) Document MLlib behavior changes in Spark 2.0

    [ https://issues.apache.org/jira/browse/SPARK-13448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15361326#comment-15361326 ] 

Nick Pentreath commented on SPARK-13448:
----------------------------------------

[~yanboliang] is this done now with the PRs for SPARK-15643 merged? Is there anything remaining here?

> Document MLlib behavior changes in Spark 2.0
> --------------------------------------------
>
>                 Key: SPARK-13448
>                 URL: https://issues.apache.org/jira/browse/SPARK-13448
>             Project: Spark
>          Issue Type: Documentation
>          Components: ML, MLlib
>            Reporter: Xiangrui Meng
>            Assignee: Xiangrui Meng
>            Priority: Blocker
>
> This JIRA keeps a list of MLlib behavior changes in Spark 2.0. So we can remember to add them to the migration guide / release notes.
> * SPARK-13429: change convergenceTol in LogisticRegressionWithLBFGS from 1e-4 to 1e-6.
> * SPARK-7780: Intercept will not be regularized if users train binary classification model with L1/L2 Updater by LogisticRegressionWithLBFGS, because it calls ML LogisticRegresson implementation. Meanwhile if users set without regularization, training with or without feature scaling will return the same solution by the same convergence rate(because they run the same code route), this behavior is different from the old API.
> * SPARK-12363: Bug fix for PowerIterationClustering which will likely change results
> * SPARK-13048: LDA using the EM optimizer will keep the last checkpoint by default, if checkpointing is being used.
> * SPARK-12153: Word2Vec now respects sentence boundaries.  Previously, it did not handle them correctly.
> * SPARK-10574: HashingTF uses MurmurHash3 by default in both spark.ml and spark.mllib
> * SPARK-14768: Remove expectedType arg for PySpark Param
> * SPARK-14931: Mismatched default Param values between pipelines in Spark and PySpark
> * SPARK-13600: QuantileDiscretizer now uses approxQuantile from DataFrame stats (previously used custom sampling logic). Buckets will differ for same input data and params. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org