You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@mxnet.apache.org by GitBox <gi...@apache.org> on 2017/12/31 09:48:26 UTC

[GitHub] szhengac commented on issue #9262: [WIP] FTML optimizer implementation

szhengac commented on issue #9262: [WIP] FTML optimizer implementation
URL: https://github.com/apache/incubator-mxnet/pull/9262#issuecomment-354594768
 
 
   For weight decay, it may be correct, but for l2 regularizer, we can either incorporate the grad w.r.t. l2 regularizer into the complete grad or use following formula:
   <img width="214" alt="screen shot 2017-12-31 at 4 43 19 pm" src="https://user-images.githubusercontent.com/3960020/34460728-c5f26020-ee49-11e7-8b46-0ad6451c0629.png">
   where \lambda_2 is the regularization parameter. If elastic net is considered, the following one can be used:
   <img width="419" alt="screen shot 2017-12-31 at 9 46 48 am" src="https://user-images.githubusercontent.com/3960020/34460724-a934d968-ee49-11e7-95f9-5371bb540055.png">
   where lambda_1 is the regularization parameter for $\ell_1$ part. 
   
   Also, I think it is more efficient to update the powers of beta 1and beta 2 iteratively.
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services