You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@spark.apache.org by me...@apache.org on 2014/12/18 22:55:53 UTC

spark git commit: [SPARK-4887][MLlib] Fix a bad unittest in LogisticRegressionSuite

Repository: spark
Updated Branches:
  refs/heads/master 3720057b8 -> 59a49db59


[SPARK-4887][MLlib] Fix a bad unittest in LogisticRegressionSuite

The original test doesn't make sense since if you step in, the lossSum is already NaN,
and the coefficients are diverging. That's because the step size is too large for SGD,
so it doesn't work.

The correct behavior is that you should get smaller coefficients than the one
without regularization. Comparing the values using 20000.0 relative error doesn't
make sense as well.

Author: DB Tsai <db...@alpinenow.com>

Closes #3735 from dbtsai/mlortestfix and squashes the following commits:

b1a3c42 [DB Tsai] first commit


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/59a49db5
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/59a49db5
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/59a49db5

Branch: refs/heads/master
Commit: 59a49db5982ecc487187fcd92399e08b4b4bea64
Parents: 3720057
Author: DB Tsai <db...@alpinenow.com>
Authored: Thu Dec 18 13:55:49 2014 -0800
Committer: Xiangrui Meng <me...@databricks.com>
Committed: Thu Dec 18 13:55:49 2014 -0800

----------------------------------------------------------------------
 .../spark/mllib/classification/LogisticRegressionSuite.scala  | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/spark/blob/59a49db5/mllib/src/test/scala/org/apache/spark/mllib/classification/LogisticRegressionSuite.scala
----------------------------------------------------------------------
diff --git a/mllib/src/test/scala/org/apache/spark/mllib/classification/LogisticRegressionSuite.scala b/mllib/src/test/scala/org/apache/spark/mllib/classification/LogisticRegressionSuite.scala
index 4e81299..94b0e00 100644
--- a/mllib/src/test/scala/org/apache/spark/mllib/classification/LogisticRegressionSuite.scala
+++ b/mllib/src/test/scala/org/apache/spark/mllib/classification/LogisticRegressionSuite.scala
@@ -178,15 +178,16 @@ class LogisticRegressionSuite extends FunSuite with MLlibTestSparkContext with M
     // Use half as many iterations as the previous test.
     val lr = new LogisticRegressionWithSGD().setIntercept(true)
     lr.optimizer.
-      setStepSize(10.0).
+      setStepSize(1.0).
       setNumIterations(10).
       setRegParam(1.0)
 
     val model = lr.run(testRDD, initialWeights)
 
     // Test the weights
-    assert(model.weights(0) ~== -430000.0 relTol 20000.0)
-    assert(model.intercept ~== 370000.0 relTol 20000.0)
+    // With regularization, the resulting weights will be smaller.
+    assert(model.weights(0) ~== -0.14 relTol 0.02)
+    assert(model.intercept ~== 0.25 relTol 0.02)
 
     val validationData = LogisticRegressionSuite.generateLogisticInput(A, B, nPoints, 17)
     val validationRDD = sc.parallelize(validationData, 2)


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org