You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@mxnet.apache.org by GitBox <gi...@apache.org> on 2018/12/27 04:04:00 UTC

[GitHub] sxjscience commented on a change in pull request #13728: AdamW optimizer (Fixing Weight Decay Regularization in Adam)

sxjscience commented on a change in pull request #13728: AdamW optimizer (Fixing Weight Decay Regularization in Adam)
URL: https://github.com/apache/incubator-mxnet/pull/13728#discussion_r244079276
 
 

 ##########
 File path: python/mxnet/optimizer/optimizer.py
 ##########
 @@ -1018,6 +1018,70 @@ class ccSGD(SGD):
     def __init__(self, *args, **kwargs):
         super(ccSGD, self).__init__(*args, **kwargs)
 
+@register
+class AdamW(Optimizer):
+    """The Adam optimizer with fixed weight decay regularization.
+
+    This class implements the optimizer described in *Fixing Weight Decay
+    Regularization in Adam*, available at https://arxiv.org/abs/1711.05101.
+
+    Note that this is different from the original Adam optimizer which adds L2
+    regularization on the weights to the loss: it regularizes weights with large
+    gradients more than L2 regularization would, which was shown to yield better
+    training loss and generalization error in the paper above.
+
+    Updates are applied by::
+
+        rescaled_grad = clip(grad * rescale_grad, clip_gradient)
+        m = beta1 * m + (1 - beta1) * rescaled_grad
+        v = beta2 * v + (1 - beta2) * (rescaled_grad**2)
+        w = w - learning_rate * (m / (sqrt(v) + epsilon) + wd * w)
 
 Review comment:
   I think it's acceptable as long as the `wd` is set correctly.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services