You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@spark.apache.org by me...@apache.org on 2015/01/07 06:23:36 UTC

spark git commit: [SPARK-5099][Mllib] Simplify logistic loss function

Repository: spark
Updated Branches:
  refs/heads/master bb38ebb1a -> e21acc197


[SPARK-5099][Mllib] Simplify logistic loss function

This is a minor pr where I think that we can simply take minus of `margin`, instead of subtracting  `margin`.

Mathematically, they are equal. But the modified equation is the common form of logistic loss function and so more readable. It also computes more accurate value as some quick tests show.

Author: Liang-Chi Hsieh <vi...@gmail.com>

Closes #3899 from viirya/logit_func and squashes the following commits:

91a3860 [Liang-Chi Hsieh] Modified for comment.
0aa51e4 [Liang-Chi Hsieh] Further simplified.
72a295e [Liang-Chi Hsieh] Revert LogLoss back and add more considerations in Logistic Loss.
a3f83ca [Liang-Chi Hsieh] Fix a bug.
2bc5712 [Liang-Chi Hsieh] Simplify loss function.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/e21acc19
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/e21acc19
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/e21acc19

Branch: refs/heads/master
Commit: e21acc1978a6f4a57ef2e08490692b0ffe05fa9e
Parents: bb38ebb
Author: Liang-Chi Hsieh <vi...@gmail.com>
Authored: Tue Jan 6 21:23:31 2015 -0800
Committer: Xiangrui Meng <me...@databricks.com>
Committed: Tue Jan 6 21:23:31 2015 -0800

----------------------------------------------------------------------
 .../org/apache/spark/mllib/optimization/Gradient.scala  | 12 +++++++++---
 1 file changed, 9 insertions(+), 3 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/spark/blob/e21acc19/mllib/src/main/scala/org/apache/spark/mllib/optimization/Gradient.scala
----------------------------------------------------------------------
diff --git a/mllib/src/main/scala/org/apache/spark/mllib/optimization/Gradient.scala b/mllib/src/main/scala/org/apache/spark/mllib/optimization/Gradient.scala
index 5a419d1..aaacf3a 100644
--- a/mllib/src/main/scala/org/apache/spark/mllib/optimization/Gradient.scala
+++ b/mllib/src/main/scala/org/apache/spark/mllib/optimization/Gradient.scala
@@ -64,11 +64,17 @@ class LogisticGradient extends Gradient {
     val gradientMultiplier = (1.0 / (1.0 + math.exp(margin))) - label
     val gradient = data.copy
     scal(gradientMultiplier, gradient)
+    val minusYP = if (label > 0) margin else -margin
+
+    // log1p is log(1+p) but more accurate for small p
+    // Following two equations are the same analytically but not numerically, e.g.,
+    // math.log1p(math.exp(1000)) == Infinity
+    // 1000 + math.log1p(math.exp(-1000)) == 1000.0
     val loss =
-      if (label > 0) {
-        math.log1p(math.exp(margin)) // log1p is log(1+p) but more accurate for small p
+      if (minusYP < 0) {
+        math.log1p(math.exp(minusYP))
       } else {
-        math.log1p(math.exp(margin)) - margin
+        math.log1p(math.exp(-minusYP)) + minusYP
       }
 
     (gradient, loss)


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org