You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@mxnet.apache.org by mk-61 <no...@github.com> on 2020/08/10 21:17:26 UTC

[apache/incubator-mxnet] [RFC] Moving MXNet-AMP to core (#18896)

MXNet already has experimental AMP (Automatic Mixed Precision) support, exposed in mxnet.contrib package. It is used for automatic casting models to both float16 and bfloat16. This RFC covers moving it into core / making a first-class feature, as well as further development.

Here's a rough task break down for the initial move:

* Need to ensure AMP works with numpy ops - i.e., all ops are in either of the lists
* API change: make loss scale public (https://github.com/apache/incubator-mxnet/issues/17507)
* A number of issues has to be resolved to improve user experience:
  1. Cannot load trainer with AMP (https://github.com/apache/incubator-mxnet/issues/16858)
  2. There's a CUDA crash (IMA) in amp_multicast, happens on some models (Yolo3)
* The actual shuffling code around and updating import paths

Post move:

1. Layout optimization - upstreaming feature already existing in NVIDIA NGC container. This helps convolutions' performance by automatically casting between NCHW and NHWC layouts.
2. Explore alternatives to front end ops monkey-patching (https://github.com/apache/incubator-mxnet/issues/18697)


-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/apache/incubator-mxnet/issues/18896

Re: [apache/incubator-mxnet] [RFC] Moving MXNet-AMP to core (#18896)

Posted by Xingjian Shi <no...@github.com>.

@mk-61 If you'd like to see some test cases of the new numpy API, you can also try the numpy version of GluonNLP:  https://github.com/dmlc/gluon-nlp/tree/numpy . Would we connect via Slack?

-- 
You are receiving this because you commented.
Reply to this email directly or view it on GitHub:
https://github.com/apache/incubator-mxnet/issues/18896#issuecomment-673143785

Re: [apache/incubator-mxnet] [RFC] Moving MXNet-AMP to core (#18896)

Posted by "github-actions[bot]" <no...@github.com>.

Welcome to Apache MXNet (incubating)! We are on a mission to democratize AI, and we are glad that you are contributing to it by opening this issue.
Please make sure to include all the relevant context, and one of the @apache/mxnet-committers will be here shortly.
If you are interested in contributing to our project, let us know! Also, be sure to check out our guide on [contributing to MXNet](https://mxnet.apache.org/community/contribute) and our [development guides wiki](https://cwiki.apache.org/confluence/display/MXNET/Developments).

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/apache/incubator-mxnet/issues/18896#issuecomment-671594508

Re: [apache/incubator-mxnet] [RFC] Moving MXNet-AMP to core (#18896)

Posted by Sheng Zha <no...@github.com>.

cc @sxjscience @leezu 

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/apache/incubator-mxnet/issues/18896#issuecomment-673009714