You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@mxnet.apache.org by GitBox <gi...@apache.org> on 2018/09/27 19:00:14 UTC

[GitHub] FoConrad commented on issue #10563: Suboptimal performance implementing PPO with Adam Optimizer

FoConrad commented on issue #10563: Suboptimal  performance implementing PPO with Adam Optimizer
URL: https://github.com/apache/incubator-mxnet/issues/10563#issuecomment-425206192
 
 
   It turns out the primary cause for our performance problem was solved after making this post, but I was still stuck on tracing down the cause of the weight divergence. Digging in to both implementation of Adam, it seemed that, at least algebraically, both were computing the same thing (and, in the example above, all hyper-parameters were set the same). 
   
   My best guess for the weight divergence is simply the order of operations in which things are calculated. Once weights start to diverge (and they diverge between MXNet and TF even after a single tanh) to a large enough amount, then they will continue to diverge as the gradients of each will be different.
   
   This is not a very satisfying answer, but seems to be the case.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services