You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by dbtsai <gi...@git.apache.org> on 2014/08/29 23:28:04 UTC

[GitHub] spark pull request: [SPARK-3317][MLlib] The loss of regularization...

GitHub user dbtsai opened a pull request:

    https://github.com/apache/spark/pull/2207

    [SPARK-3317][MLlib] The loss of regularization in Updater should use the oldWeights

    The current loss of the regularization is computed from the newWeights which is not correct. The loss, R(w) = 1/2 ||w||^2 should be computed with the oldWeights.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/AlpineNow/spark dbtsai-updater

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/2207.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2207
    
----
commit 1447c234092339f67d1887bfc75731665264b770
Author: DB Tsai <db...@alpinenow.com>
Date:   2014-08-29T21:13:11Z

    Fixed updater bug

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-3317][MLlib] The loss of regularization...

Posted by dbtsai <gi...@git.apache.org>.

GitHub user dbtsai reopened a pull request:

    https://github.com/apache/spark/pull/2207

    [SPARK-3317][MLlib] The loss of regularization in Updater should use the oldWeights

    The current loss of the regularization is computed from the newWeights which is not correct. The loss, R(w) = 1/2 ||w||^2 should be computed with the oldWeights.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/AlpineNow/spark dbtsai-updater

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/2207.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2207
    
----
commit 1447c234092339f67d1887bfc75731665264b770
Author: DB Tsai <db...@alpinenow.com>
Date:   2014-08-29T21:13:11Z

    Fixed updater bug

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-3317][MLlib] The loss of regularization...

Posted by srowen <gi...@git.apache.org>.

Github user srowen commented on the pull request:

    https://github.com/apache/spark/pull/2207#issuecomment-54002847
  
    @dbtsai are you logged in? That's explained 100% of the many times Apache JIRA stumped me with "why can't I modify my own issue"?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-3317][MLlib] The loss of regularization...

Posted by dbtsai <gi...@git.apache.org>.

Github user dbtsai commented on the pull request:

    https://github.com/apache/spark/pull/2207#issuecomment-53933078
  
    LBFGS needs correct loss to find next weights while SGD doesn't.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-3317][MLlib] The loss of regularization...

Posted by dbtsai <gi...@git.apache.org>.

Github user dbtsai commented on the pull request:

https://github.com/apache/spark/pull/2207#issuecomment-54002680

@srowen @mengxr

I was working on OWLQN for L1 in my company, and I didn't follow the LBFGS code so I was confused. The current code in MLlib actually gives the correct result.

The Updater api is a little confusing, and after I read my note when I implemented LBFGS, I actually use the existing Updater api to get the current loss of regularization correctly with a trick by setting the gradient as zero vector, stepSize as zero, and iteration as one.

For SGD, we computed the loss of regularization after weights are updated, and we keep this value, and add it into total loss in next iteration. I now remembered that I fixed a bug because of the updater design couple months ago - the first iteration of the loss of regularization was not properly computed.

Hope the whole design issue can be addressed by #1518 [SPARK-2505][MLlib] Weighted Regularizer for Generalized Linear Model once this is finished.

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-3317][MLlib] The loss of regularization...

Posted by dbtsai <gi...@git.apache.org>.

Github user dbtsai commented on the pull request:

    https://github.com/apache/spark/pull/2207#issuecomment-54002773
  
    PS, it seems that I can not close https://issues.apache.org/jira/browse/SPARK-3317 myself. Can any of you close for me? Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-3317][MLlib] The loss of regularization...

Posted by srowen <gi...@git.apache.org>.

Github user srowen commented on the pull request:

    https://github.com/apache/spark/pull/2207#issuecomment-53938249
  
    @dbtsai The scaladoc for `Updater` says the return value is:
    
    ```
    A tuple of 2 elements. The first element is a column matrix containing updated weights,
    and the second element is the regularization value computed using updated weights.
    ```
    
    Looking at `GradientDescent` that seems like how it is intended to work, since the loss is computed from the weights and regularization term from the previous iteration. Wouldn't this put it out of sync by 1 iteration? You would have theta_i weights but a regularization term computed over theta_{i-1}.
    
    I might be overlooking something and you know the L-BFGS implementation better but are you sure this should change for both?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-3317][MLlib] The loss of regularization...

Posted by dbtsai <gi...@git.apache.org>.

Github user dbtsai closed the pull request at:

    https://github.com/apache/spark/pull/2207


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-3317][MLlib] The loss of regularization...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2207#issuecomment-53937177
  
      [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19493/consoleFull) for   PR 2207 at commit [`1447c23`](https://github.com/apache/spark/commit/1447c234092339f67d1887bfc75731665264b770).
     * This patch **passes** unit tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-3317][MLlib] The loss of regularization...

Posted by dbtsai <gi...@git.apache.org>.

Github user dbtsai commented on the pull request:

    https://github.com/apache/spark/pull/2207#issuecomment-54002970
  
    You are right. Using my desktop without login session. Thanks


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-3317][MLlib] The loss of regularization...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2207#issuecomment-53932470
  
      [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19493/consoleFull) for   PR 2207 at commit [`1447c23`](https://github.com/apache/spark/commit/1447c234092339f67d1887bfc75731665264b770).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-3317][MLlib] The loss of regularization...

Posted by dbtsai <gi...@git.apache.org>.

Github user dbtsai closed the pull request at:

    https://github.com/apache/spark/pull/2207


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org