You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@mxnet.apache.org by GitBox <gi...@apache.org> on 2017/12/22 05:28:29 UTC

[GitHub] eric-haibin-lin opened a new issue #9177: Support standard optimizer with sparse gradient

eric-haibin-lin opened a new issue #9177: Support standard optimizer with sparse gradient
URL: https://github.com/apache/incubator-mxnet/issues/9177

Per @mg0880gm's request:

Operators such as dot and sparse_embedding generates row_sparse gradients, one can use SGD with momentum or adam as the optimizer. The problem with these optimizers is, only [lazy update](https://www.tensorflow.org/api_docs/python/tf/contrib/opt/LazyAdamOptimizer) is supported: i.e. the states (momentum in SGD, m & v in adam) are only updated if their row indices appear in the gradient of the current batch. Whereas the standard optimizer updates all rows of the states.

Therefore, **an user cannot use sparse gradient to perform standard update** in MXNet right now, which makes it harder for people to adopt sparse operators with existing models because the update rule is different.

To support standard use case, we can add `lazy_update` params to optimizer and updater operators, which performs lazy update only if `lazy_update=True`, `weight.stype=row_sparse` and `grad.stype=row_sparse`. If `lazy_update=False`, or weight/grad is dense, standard update is applied.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

With regards,
Apache Git Services