You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@mxnet.apache.org by GitBox <gi...@apache.org> on 2018/12/26 22:25:23 UTC

[GitHub] eric-haibin-lin commented on a change in pull request #13728: AdamW optimizer (Fixing Weight Decay Regularization in Adam)

eric-haibin-lin commented on a change in pull request #13728: AdamW optimizer (Fixing Weight Decay Regularization in Adam)
URL: https://github.com/apache/incubator-mxnet/pull/13728#discussion_r244056230
 
 

 ##########
 File path: python/mxnet/optimizer/optimizer.py
 ##########
 @@ -1018,6 +1018,70 @@ class ccSGD(SGD):
     def __init__(self, *args, **kwargs):
         super(ccSGD, self).__init__(*args, **kwargs)
 
+@register
+class AdamW(Optimizer):
+    """The Adam optimizer with fixed weight decay regularization.
+
+    This class implements the optimizer described in *Fixing Weight Decay
+    Regularization in Adam*, available at https://arxiv.org/abs/1711.05101.
+
+    Note that this is different from the original Adam optimizer which adds L2
+    regularization on the weights to the loss: it regularizes weights with large
+    gradients more than L2 regularization would, which was shown to yield better
+    training loss and generalization error in the paper above.
+
+    Updates are applied by::
+
+        rescaled_grad = clip(grad * rescale_grad, clip_gradient)
+        m = beta1 * m + (1 - beta1) * rescaled_grad
+        v = beta2 * v + (1 - beta2) * (rescaled_grad**2)
+        w = w - learning_rate * (m / (sqrt(v) + epsilon) + wd * w)
 
 Review comment:
   Good point. The issue is that the learning rate and schedule multiplier is not decoupled in MXNet. Here `learning_rate` is effectively `eta_t * alpha` in the paper and `wd` actually needs to be set as `w / alpha`. In another word `wd` can be rescaled properly so that it does exactly the same thing in the paper. Would this be acceptable? 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services