You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@mxnet.apache.org by GitBox <gi...@apache.org> on 2018/05/18 22:08:49 UTC

[GitHub] astonzhang commented on issue #9881: Inconsistent weight decay logics in multiple optimizers

astonzhang commented on issue #9881: Inconsistent weight decay logics in multiple optimizers
URL: https://github.com/apache/incubator-mxnet/issues/9881#issuecomment-390345581
 
 
   Thank Haibin for raising such issues.
   
   Besides, weight decay should only apply to weights (not bias). [1][2] Thus, users usually do
   
   ```
   trainer = gluon.Trainer(net.collect_params(), 'sgd', {'learning_rate': learning_rate, 'wd': weight_decay})
   ```
   
   by assuming that weight decay only applies to weights. However, our current implementation applies weight decay to all model parameters including bias.
   
   
   Reference:
   
   [1] LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. nature, 521(7553), 436.
   
   [2] Franklin, J. (2005). The elements of statistical learning: data mining, inference and prediction. The Mathematical Intelligencer, 27(2), 83-85.
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services