You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@madlib.apache.org by iyerr3 <gi...@git.apache.org> on 2018/08/17 11:13:14 UTC
[GitHub] madlib pull request #313: MLP: Simplify momentum and Nesterov updates
GitHub user iyerr3 opened a pull request:
https://github.com/apache/madlib/pull/313
MLP: Simplify momentum and Nesterov updates
Momentum updates are complicated due to Nesterov requiring an initial update before gradient calculations. There is, however, a different form of the Nesterov update that can be cleanly performed after the regular update, simplifying the code.
Note: the `gradientInPlace` update for zero momentum is still retained without the velocity vectors since it avoids copying the gradient vectors.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/madlib/madlib feature/simplify_nesterov
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/madlib/pull/313.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #313
----
commit d18f69f461d1ca5ab692c15c748ab6df2bc8959e
Author: Rahul Iyer <ri...@...>
Date: 2018-08-17T08:42:53Z
MLP: Simplify momentum and Nesterov updates
----
---
[GitHub] madlib issue #313: MLP: Simplify momentum and Nesterov updates
Posted by fmcquillan99 <gi...@git.apache.org>.
Github user fmcquillan99 commented on the issue:
https://github.com/apache/madlib/pull/313
is this ready to merge?
---
[GitHub] madlib pull request #313: MLP: Simplify momentum and Nesterov updates
Posted by kaknikhil <gi...@git.apache.org>.
Github user kaknikhil commented on a diff in the pull request:
https://github.com/apache/madlib/pull/313#discussion_r211725199
--- Diff: src/modules/convex/task/mlp.hpp ---
@@ -192,18 +189,23 @@ MLP<Model, Tuple>::getLossAndUpdateModel(
// 1. normalize to per row update
// 2. discount by stepsize
// 3. add regularization
- // 4. make negative
+ // 4. make negative for descent
for (Index k=0; k < model.num_layers; k++){
Matrix regularization = MLP<Model, Tuple>::lambda * model.u[k];
regularization.row(0).setZero(); // Do not update bias
- total_gradient_per_layer[k] = -stepsize * (total_gradient_per_layer[k] / static_cast<double>(num_rows_in_batch) +
- regularization);
- model.updateVelocity(total_gradient_per_layer[k], k);
- model.updatePosition(total_gradient_per_layer[k], k);
+ total_gradient_per_layer[k] = -stepsize *
+ (total_gradient_per_layer[k] / static_cast<double>(num_rows_in_batch) +
+ regularization);
+
+ // total_gradient_per_layer is now the update vector
+ model.velocity[k] = model.momentum * model.velocity[k] + total_gradient_per_layer[k];
--- End diff --
we should only update the velocity vector if momentum > 0
---
[GitHub] madlib pull request #313: MLP: Simplify momentum and Nesterov updates
Posted by kaknikhil <gi...@git.apache.org>.
Github user kaknikhil commented on a diff in the pull request:
https://github.com/apache/madlib/pull/313#discussion_r211725456
--- Diff: src/modules/convex/task/mlp.hpp ---
@@ -192,18 +189,23 @@ MLP<Model, Tuple>::getLossAndUpdateModel(
// 1. normalize to per row update
// 2. discount by stepsize
// 3. add regularization
- // 4. make negative
+ // 4. make negative for descent
for (Index k=0; k < model.num_layers; k++){
Matrix regularization = MLP<Model, Tuple>::lambda * model.u[k];
regularization.row(0).setZero(); // Do not update bias
- total_gradient_per_layer[k] = -stepsize * (total_gradient_per_layer[k] / static_cast<double>(num_rows_in_batch) +
- regularization);
- model.updateVelocity(total_gradient_per_layer[k], k);
- model.updatePosition(total_gradient_per_layer[k], k);
+ total_gradient_per_layer[k] = -stepsize *
+ (total_gradient_per_layer[k] / static_cast<double>(num_rows_in_batch) +
+ regularization);
+
+ // total_gradient_per_layer is now the update vector
+ model.velocity[k] = model.momentum * model.velocity[k] + total_gradient_per_layer[k];
--- End diff --
can we also add a comment for why we are duplicating the code instead of creating a function for these updates ?
---
[GitHub] madlib pull request #313: MLP: Simplify momentum and Nesterov updates
Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:
https://github.com/apache/madlib/pull/313
---
[GitHub] madlib issue #313: MLP: Simplify momentum and Nesterov updates
Posted by asfgit <gi...@git.apache.org>.
Github user asfgit commented on the issue:
https://github.com/apache/madlib/pull/313
Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/madlib-pr-build/660/
---
[GitHub] madlib issue #313: MLP: Simplify momentum and Nesterov updates
Posted by asfgit <gi...@git.apache.org>.
Github user asfgit commented on the issue:
https://github.com/apache/madlib/pull/313
Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/madlib-pr-build/656/
---
[GitHub] madlib issue #313: MLP: Simplify momentum and Nesterov updates
Posted by asfgit <gi...@git.apache.org>.
Github user asfgit commented on the issue:
https://github.com/apache/madlib/pull/313
Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/madlib-pr-build/662/
---