You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@mxnet.apache.org by GitBox <gi...@apache.org> on 2017/12/31 09:48:26 UTC
[GitHub] szhengac commented on issue #9262: [WIP] FTML optimizer implementation
szhengac commented on issue #9262: [WIP] FTML optimizer implementation
URL: https://github.com/apache/incubator-mxnet/pull/9262#issuecomment-354594768
For weight decay, it may be correct, but for l2 regularizer, we can either incorporate the grad w.r.t. l2 regularizer into the complete grad or use following formula:
<img width="214" alt="screen shot 2017-12-31 at 4 43 19 pm" src="https://user-images.githubusercontent.com/3960020/34460728-c5f26020-ee49-11e7-8b46-0ad6451c0629.png">
where \lambda_2 is the regularization parameter. If elastic net is considered, the following one can be used:
<img width="419" alt="screen shot 2017-12-31 at 9 46 48 am" src="https://user-images.githubusercontent.com/3960020/34460724-a934d968-ee49-11e7-95f9-5371bb540055.png">
where lambda_1 is the regularization parameter for $\ell_1$ part.
Also, I think it is more efficient to update the powers of beta 1and beta 2 iteratively.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services