You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@madlib.apache.org by iyerr3 <gi...@git.apache.org> on 2018/08/17 11:13:14 UTC

[GitHub] madlib pull request #313: MLP: Simplify momentum and Nesterov updates

GitHub user iyerr3 opened a pull request:

    https://github.com/apache/madlib/pull/313

    MLP: Simplify momentum and Nesterov updates

    Momentum updates are complicated due to Nesterov requiring an initial update before gradient calculations. There is, however, a different form of the Nesterov update that can be cleanly performed after the regular update, simplifying the code. 
    
    Note: the `gradientInPlace` update for zero momentum is still retained without the velocity vectors since it avoids copying the gradient vectors.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/madlib/madlib feature/simplify_nesterov

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/madlib/pull/313.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #313
    
----
commit d18f69f461d1ca5ab692c15c748ab6df2bc8959e
Author: Rahul Iyer <ri...@...>
Date:   2018-08-17T08:42:53Z

    MLP: Simplify momentum and Nesterov updates

----


---

[GitHub] madlib issue #313: MLP: Simplify momentum and Nesterov updates

Posted by fmcquillan99 <gi...@git.apache.org>.

Github user fmcquillan99 commented on the issue:

    https://github.com/apache/madlib/pull/313
  
    is this ready to merge?


---

[GitHub] madlib pull request #313: MLP: Simplify momentum and Nesterov updates

Posted by kaknikhil <gi...@git.apache.org>.

Github user kaknikhil commented on a diff in the pull request:

    https://github.com/apache/madlib/pull/313#discussion_r211725199
  
    --- Diff: src/modules/convex/task/mlp.hpp ---
    @@ -192,18 +189,23 @@ MLP<Model, Tuple>::getLossAndUpdateModel(
         //  1. normalize to per row update
         //  2. discount by stepsize
         //  3. add regularization
    -    //  4. make negative
    +    //  4. make negative for descent
         for (Index k=0; k < model.num_layers; k++){
             Matrix regularization = MLP<Model, Tuple>::lambda * model.u[k];
             regularization.row(0).setZero(); // Do not update bias
    -        total_gradient_per_layer[k] = -stepsize * (total_gradient_per_layer[k] / static_cast<double>(num_rows_in_batch) +
    -                                                  regularization);
    -        model.updateVelocity(total_gradient_per_layer[k], k);
    -        model.updatePosition(total_gradient_per_layer[k], k);
    +        total_gradient_per_layer[k] = -stepsize *
    +            (total_gradient_per_layer[k] / static_cast<double>(num_rows_in_batch) +
    +             regularization);
    +
    +        // total_gradient_per_layer is now the update vector
    +        model.velocity[k] = model.momentum * model.velocity[k] + total_gradient_per_layer[k];
    --- End diff --
    
    we should only update the velocity vector if momentum > 0


---

[GitHub] madlib pull request #313: MLP: Simplify momentum and Nesterov updates

Posted by kaknikhil <gi...@git.apache.org>.

Github user kaknikhil commented on a diff in the pull request:

    https://github.com/apache/madlib/pull/313#discussion_r211725456
  
    --- Diff: src/modules/convex/task/mlp.hpp ---
    @@ -192,18 +189,23 @@ MLP<Model, Tuple>::getLossAndUpdateModel(
         //  1. normalize to per row update
         //  2. discount by stepsize
         //  3. add regularization
    -    //  4. make negative
    +    //  4. make negative for descent
         for (Index k=0; k < model.num_layers; k++){
             Matrix regularization = MLP<Model, Tuple>::lambda * model.u[k];
             regularization.row(0).setZero(); // Do not update bias
    -        total_gradient_per_layer[k] = -stepsize * (total_gradient_per_layer[k] / static_cast<double>(num_rows_in_batch) +
    -                                                  regularization);
    -        model.updateVelocity(total_gradient_per_layer[k], k);
    -        model.updatePosition(total_gradient_per_layer[k], k);
    +        total_gradient_per_layer[k] = -stepsize *
    +            (total_gradient_per_layer[k] / static_cast<double>(num_rows_in_batch) +
    +             regularization);
    +
    +        // total_gradient_per_layer is now the update vector
    +        model.velocity[k] = model.momentum * model.velocity[k] + total_gradient_per_layer[k];
    --- End diff --
    
    can we also add a comment for why we are duplicating the code instead of creating a function for these updates ?


---

[GitHub] madlib pull request #313: MLP: Simplify momentum and Nesterov updates

Posted by asfgit <gi...@git.apache.org>.

Github user asfgit closed the pull request at:

    https://github.com/apache/madlib/pull/313


---

[GitHub] madlib issue #313: MLP: Simplify momentum and Nesterov updates

Posted by asfgit <gi...@git.apache.org>.

Github user asfgit commented on the issue:

    https://github.com/apache/madlib/pull/313
  
    
    Refer to this link for build results (access rights to CI server needed): 
    https://builds.apache.org/job/madlib-pr-build/660/



---

[GitHub] madlib issue #313: MLP: Simplify momentum and Nesterov updates

Posted by asfgit <gi...@git.apache.org>.

Github user asfgit commented on the issue:

    https://github.com/apache/madlib/pull/313
  
    
    Refer to this link for build results (access rights to CI server needed): 
    https://builds.apache.org/job/madlib-pr-build/656/



---

[GitHub] madlib issue #313: MLP: Simplify momentum and Nesterov updates

Posted by asfgit <gi...@git.apache.org>.

Github user asfgit commented on the issue:

    https://github.com/apache/madlib/pull/313
  
    
    Refer to this link for build results (access rights to CI server needed): 
    https://builds.apache.org/job/madlib-pr-build/662/



---