You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by dbtsai <gi...@git.apache.org> on 2014/04/28 23:58:18 UTC

[GitHub] spark pull request: [SPARK-1157][MLlib] Bug fix: lossHistory shoul...

GitHub user dbtsai opened a pull request:

    https://github.com/apache/spark/pull/582

    [SPARK-1157][MLlib] Bug fix: lossHistory should be monotonically decresing

    Instead of recording the loss in the costFun for each time that optimizer calls costFun, we get the loss from the api provided in breeze to avoid recording the rejection steps which will cause the loss bumping up.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/dbtsai/spark dbtsai-lbfgs-bug

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/582.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #582
    
----
commit d72c67908abc8546b9713c41101aa6c685ce31eb
Author: DB Tsai <db...@alpinenow.com>
Date:   2014-04-28T20:36:13Z

    Using Breeze's states to get the loss.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

Re: [GitHub] spark pull request: [SPARK-1157][MLlib] Bug fix: lossHistory shoul...

Posted by Debasish Das <de...@gmail.com>.
Please remove it....there is no stochastic bfgs....we will put an admm
wrapper over bfgs which has better optimization properties than sgd....
 On Apr 28, 2014 10:34 PM, "mengxr" <gi...@git.apache.org> wrote:

> Github user mengxr commented on the pull request:
>
>     https://github.com/apache/spark/pull/582#issuecomment-41643015
>
>     I think it is good to remove `miniBatchFraction` from `LBFGS`'s params
> in this PR, unless someone has a good understanding of the behavior of
> "stochastic" L-BFGS.
>
>
> ---
> If your project is set up for it, you can reply to this email and have your
> reply appear on GitHub as well. If your project does not have this feature
> enabled and wishes so, or if the feature is enabled but not working, please
> contact infrastructure at infrastructure@apache.org or file a JIRA ticket
> with INFRA.
> ---
>

[GitHub] spark pull request: [SPARK-1157][MLlib] Bug fix: lossHistory shoul...

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on the pull request:

    https://github.com/apache/spark/pull/582#issuecomment-41643015
  
    I think it is good to remove `miniBatchFraction` from `LBFGS`'s params in this PR, unless someone has a good understanding of the behavior of "stochastic" L-BFGS.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1157][MLlib] Bug fix: lossHistory shoul...

Posted by dbtsai <gi...@git.apache.org>.
Github user dbtsai commented on the pull request:

    https://github.com/apache/spark/pull/582#issuecomment-41751464
  
    Make sense from the inverse of hessian point of view. Just remove it!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1157][MLlib] Bug fix: lossHistory shoul...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/582#issuecomment-41753419
  
    All automated tests passed.
    Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14577/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1157][MLlib] Bug fix: lossHistory shoul...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/582#issuecomment-41619512
  
    Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1157][MLlib] Bug fix: lossHistory shoul...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/582#issuecomment-41623307
  
    All automated tests passed.
    Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14541/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1157][MLlib] Bug fix: lossHistory shoul...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/582#issuecomment-41623306
  
    Merged build finished. All automated tests passed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1157][MLlib] Bug fix: lossHistory shoul...

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on the pull request:

    https://github.com/apache/spark/pull/582#issuecomment-42257123
  
    @dbtsai Could you please update the `mllib-optimization.md` and include an example of L-BFGS?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1157][MLlib] Bug fix: lossHistory shoul...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/582#issuecomment-41751530
  
     Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1157][MLlib] Bug fix: lossHistory shoul...

Posted by pwendell <gi...@git.apache.org>.
Github user pwendell commented on the pull request:

    https://github.com/apache/spark/pull/582#issuecomment-42623759
  
    Thanks, merged.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1157][MLlib] Bug fix: lossHistory shoul...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/582#issuecomment-41619500
  
     Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1157][MLlib] Bug fix: lossHistory shoul...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/582#issuecomment-41753418
  
    Merged build finished. All automated tests passed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1157][MLlib] Bug fix: lossHistory shoul...

Posted by dbtsai <gi...@git.apache.org>.
Github user dbtsai commented on the pull request:

    https://github.com/apache/spark/pull/582#issuecomment-41740842
  
    @mengxr  Just did some hack on trying to implement the right "stochastic" L-BFGS, and it kind of works as long as we don't change the objective function. But there is no good way to know which LBFGS step it is to keep the objective function the same in line search step, so I need to do some injection as David suggest. See https://github.com/dbtsai/spark/commit/0c699f259af7c1bd630d033a3ce960771efaf66c
    
    What do you think now? Just remove the miniBatchFraction since the RDD sample is even not efficient?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1157][MLlib] Bug fix: lossHistory shoul...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/582


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1157][MLlib] Bug fix: lossHistory shoul...

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on the pull request:

    https://github.com/apache/spark/pull/582#issuecomment-41749238
  
    I prefer removing the miniBatchFraction. Those quasi-Newton methods approximate the inverse of Hessian. It doesn't make sense if the gradients are computed from a varying objective.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1157][MLlib] Bug fix: lossHistory shoul...

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on the pull request:

    https://github.com/apache/spark/pull/582#issuecomment-42623415
  
    LGTM. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1157][MLlib] Bug fix: lossHistory shoul...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/582#issuecomment-41751535
  
    Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---