You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@mxnet.apache.org by GitBox <gi...@apache.org> on 2019/05/19 21:45:24 UTC

[GitHub] [incubator-mxnet] ziyuang commented on issue #14159: [Feature Request] Support fp16 for c++ api

ziyuang commented on issue #14159: [Feature Request] Support fp16 for c++ api
URL: https://github.com/apache/incubator-mxnet/issues/14159#issuecomment-493794406
 
 
   > I can give a quick summary of my experience with fp16 and C++ so far:
   > 
   > I believe the best way to do this is as in other front-ends such as python. As @anirudh2290 mentions, add casts around numerically sensitive and insensitive sections (batchnorms and softmaxs for example should be fp32, convs and FC should be fp16) and then make sure your inputs are in fp16. You should then be able to run inference as normal (and you should see that fp16 operations are properly running).
   > 
   > One caveat is that depending on your graph the time spent casting inputs may be more than the time you save using fp16. That's where AMP and TensorRT integration can help. They'll fuse many operators that are numerically sensitive, removing them from the computation graph, which means you'll get larger sections of a graph that you can run in fp16 mode. They'll also fuse casting operations into numerical operations which saves you from doing two full memory copies on your tensors when casting. These methods should be a much more practical way of running fp16 for inference (with C++).
   
   But TensorRT doesn't support dynamic/variable input shape, right?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services