You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@spark.apache.org by db...@apache.org on 2016/02/29 09:55:50 UTC

spark git commit: [SPARK-13545][MLLIB][PYSPARK] Make MLlib LogisticRegressionWithLBFGS's default parameters consistent in Scala and Python

Repository: spark
Updated Branches:
  refs/heads/master dd3b5455c -> d81a71357


[SPARK-13545][MLLIB][PYSPARK] Make MLlib LogisticRegressionWithLBFGS's default parameters consistent in Scala and Python

## What changes were proposed in this pull request?
* The default value of ```regParam``` of PySpark MLlib ```LogisticRegressionWithLBFGS``` should be consistent with Scala which is ```0.0```. (This is also consistent with ML ```LogisticRegression```.)
* BTW, if we use a known updater(L1 or L2) for binary classification, ```LogisticRegressionWithLBFGS``` will call the ML implementation. We should update the API doc to clarifying ```numCorrections``` will have no effect if we fall into that route.
* Make a pass for all parameters of ```LogisticRegressionWithLBFGS```, others are set properly.

cc mengxr dbtsai
## How was this patch tested?
No new tests, it should pass all current tests.

Author: Yanbo Liang <yb...@gmail.com>

Closes #11424 from yanboliang/spark-13545.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/d81a7135
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/d81a7135
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/d81a7135

Branch: refs/heads/master
Commit: d81a71357e24160244b6eeff028b0d9a4863becf
Parents: dd3b545
Author: Yanbo Liang <yb...@gmail.com>
Authored: Mon Feb 29 00:55:51 2016 -0800
Committer: DB Tsai <db...@netflix.com>
Committed: Mon Feb 29 00:55:51 2016 -0800

----------------------------------------------------------------------
 .../spark/mllib/classification/LogisticRegression.scala      | 4 ++++
 python/pyspark/mllib/classification.py                       | 8 +++++---
 2 files changed, 9 insertions(+), 3 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/spark/blob/d81a7135/mllib/src/main/scala/org/apache/spark/mllib/classification/LogisticRegression.scala
----------------------------------------------------------------------
diff --git a/mllib/src/main/scala/org/apache/spark/mllib/classification/LogisticRegression.scala b/mllib/src/main/scala/org/apache/spark/mllib/classification/LogisticRegression.scala
index c388260..f807b56 100644
--- a/mllib/src/main/scala/org/apache/spark/mllib/classification/LogisticRegression.scala
+++ b/mllib/src/main/scala/org/apache/spark/mllib/classification/LogisticRegression.scala
@@ -408,6 +408,10 @@ class LogisticRegressionWithLBFGS
    * defaults to the mllib implementation. If more than two classes
    * or feature scaling is disabled, always uses mllib implementation.
    * Uses user provided weights.
+   *
+   * In the ml LogisticRegression implementation, the number of corrections
+   * used in the LBFGS update can not be configured. So `optimizer.setNumCorrections()`
+   * will have no effect if we fall into that route.
    */
   override def run(input: RDD[LabeledPoint], initialWeights: Vector): LogisticRegressionModel = {
     run(input, initialWeights, userSuppliedWeights = true)

http://git-wip-us.apache.org/repos/asf/spark/blob/d81a7135/python/pyspark/mllib/classification.py
----------------------------------------------------------------------
diff --git a/python/pyspark/mllib/classification.py b/python/pyspark/mllib/classification.py
index b4d54ef..53a0df2 100644
--- a/python/pyspark/mllib/classification.py
+++ b/python/pyspark/mllib/classification.py
@@ -326,7 +326,7 @@ class LogisticRegressionWithLBFGS(object):
     """
     @classmethod
     @since('1.2.0')
-    def train(cls, data, iterations=100, initialWeights=None, regParam=0.01, regType="l2",
+    def train(cls, data, iterations=100, initialWeights=None, regParam=0.0, regType="l2",
               intercept=False, corrections=10, tolerance=1e-6, validateData=True, numClasses=2):
         """
         Train a logistic regression model on the given data.
@@ -341,7 +341,7 @@ class LogisticRegressionWithLBFGS(object):
           (default: None)
         :param regParam:
           The regularizer parameter.
-          (default: 0.01)
+          (default: 0.0)
         :param regType:
           The type of regularizer used for training our model.
           Allowed values:
@@ -356,7 +356,9 @@ class LogisticRegressionWithLBFGS(object):
           (default: False)
         :param corrections:
           The number of corrections used in the LBFGS update.
-          (default: 10)
+          If a known updater is used for binary classification,
+          it calls the ml implementation and this parameter will
+          have no effect. (default: 10)
         :param tolerance:
           The convergence tolerance of iterations for L-BFGS.
           (default: 1e-6)


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org