You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@mxnet.apache.org by GitBox <gi...@apache.org> on 2021/07/12 02:52:39 UTC

[GitHub] [incubator-mxnet] DickJC123 opened a new pull request #20443: [FEATURE] Add backend MXGetMaxSupportedArch() and frontend get_rtc_compile_opts() for CUDA enhanced compatibility

DickJC123 opened a new pull request #20443:
URL: https://github.com/apache/incubator-mxnet/pull/20443


   ## Description ##
   This PR makes RTC (as invoked by our Python unittests and other model scripts) work with CUDA enhanced compatibility. 
    As such, it is an extension of PR https://github.com/apache/incubator-mxnet/pull/19364, which brought that functionality to the C++ backend.  This PR keeps test_operator_gpu.py::test_cuda_rtc from failing on systems that rely on CUDA enhanced compatibility, though those systems may not be part of upstream CI at present.
    
   The changes of this PR are:
   - break off the calculation of the max supported arch into a separate function GetMaxSupportedArch(), and enhance it to use nvrtcGetSupportedArchs() if CUDA_VERSION >= 11.2
   - wrap GetMaxSupportedArch() as MXGetMaxSupportedArch() and add it to the C api
   - use MXGetMaxSupportedArch() in a newly created Python utility function get_rtc_compile_opts(ctx)
   - enhance test_cuda_rtc to use this new function
   
   Our current approach to RTC in Python code, which might fail under CUDA enhanced compatibility:
   ```
   module = mx.rtc.CudaModule(source)
   ```
   With this PR, the new approach that succeeds under CUDA enhanced compatibility:
   ```
   ctx = < some GPU context, e.g. mx.gpu(0) >
   module = mx.rtc.CudaModule(source, options=get_rtc_compile_opts(ctx))
   ```
   get_rtc_compile_opts() will return a list of options that is most appropriate for the system and the gpu context.  Currently this is a single option of the form `--gpu-architecture=compute_NN` or `--gpu-architecture=sm_NN` as needed.
   ## Background ##
   Starting with CUDA 11.1, a user can accept minor release upgrades of the CUDA toolkit (potentially picking up support for a newer GPU arch) without upgrading the driver (per https://docs.nvidia.com/deploy/cuda-compatibility/index.html).  In such cases, the toolkit nvrtc compile toolchain should not only compile CUDA code to PTX, but also further translate the PTX to SASS, since the driver would be unable to JIT-compile to SASS for the newer GPU arch.  This is controlled by the nvrtc compiler option used: for example, to compile to SASS for the Ampere A100 the option is `--gpu-architecture=sm_80`.  To compile only to PTX, and so rely on the driver's ability to JIT-compile to SASS, the option is `--gpu-architecture=compute_80`.
   
   
   ## Checklist ##
   ### Essentials ###
   - [X] PR's title starts with a category (e.g. [BUGFIX], [MODEL], [TUTORIAL], [FEATURE], [DOC], etc)
   - [X] Changes are complete (i.e. I finished coding on this PR)
   - [~] All changes have test coverage [Verified privately, but ideally upstream's CI would have systems that stress this PR]
   - [X] Code is well-documented
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@mxnet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-mxnet] DickJC123 commented on a change in pull request #20443: [FEATURE] Add backend MXGetMaxSupportedArch() and frontend get_rtc_compile_opts() for CUDA enhanced compatibility

Posted by GitBox <gi...@apache.org>.

DickJC123 commented on a change in pull request #20443:
URL: https://github.com/apache/incubator-mxnet/pull/20443#discussion_r669079602



##########
File path: src/common/rtc.cc
##########
@@ -44,8 +44,8 @@ CudaModule::Chunk::Chunk(
       << "For lower version of CUDA, please prepend your kernel defintiions "
       << "with extern \"C\" instead.";
 #endif
-  std::vector<const char*> c_options(options.size());
-  for (const auto& i : options) c_options.emplace_back(i.c_str());
+  std::vector<const char*> c_options;
+  for (const auto& i : options) c_options.push_back(i.c_str());

Review comment:
       This is a bug fix.  The original code path creates a vector twice the desired size.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@mxnet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-mxnet] mxnet-bot commented on pull request #20443: [FEATURE] Add backend MXGetMaxSupportedArch() and frontend get_rtc_compile_opts() for CUDA enhanced compatibility

Posted by GitBox <gi...@apache.org>.

mxnet-bot commented on pull request #20443:
URL: https://github.com/apache/incubator-mxnet/pull/20443#issuecomment-877929978


   Hey @DickJC123 , Thanks for submitting the PR 
   All tests are already queued to run once. If tests fail, you can trigger one or more tests again with the following commands: 
   - To trigger all jobs: @mxnet-bot run ci [all] 
   - To trigger specific jobs: @mxnet-bot run ci [job1, job2] 
   *** 
   **CI supported jobs**: [clang, website, miscellaneous, unix-cpu, centos-cpu, windows-cpu, unix-gpu, sanity, edge, windows-gpu, centos-gpu]
   *** 
   _Note_: 
    Only following 3 categories can trigger CI :PR Author, MXNet Committer, Jenkins Admin. 
   All CI tests must pass before the PR can be merged. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@mxnet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-mxnet] ptrendx commented on a change in pull request #20443: [FEATURE] Add backend MXGetMaxSupportedArch() and frontend get_rtc_compile_opts() for CUDA enhanced compatibility

Posted by GitBox <gi...@apache.org>.

ptrendx commented on a change in pull request #20443:
URL: https://github.com/apache/incubator-mxnet/pull/20443#discussion_r669112869



##########
File path: src/common/rtc.cc
##########
@@ -44,8 +44,8 @@ CudaModule::Chunk::Chunk(
       << "For lower version of CUDA, please prepend your kernel defintiions "
       << "with extern \"C\" instead.";
 #endif
-  std::vector<const char*> c_options(options.size());
-  for (const auto& i : options) c_options.emplace_back(i.c_str());
+  std::vector<const char*> c_options;
+  for (const auto& i : options) c_options.push_back(i.c_str());

Review comment:
       Oh, I see. Why not then just create the `c_options` with the right size and assign instead of `emplace_back` or `push_back`?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@mxnet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-mxnet] ptrendx commented on a change in pull request #20443: [FEATURE] Add backend MXGetMaxSupportedArch() and frontend get_rtc_compile_opts() for CUDA enhanced compatibility

Posted by GitBox <gi...@apache.org>.

ptrendx commented on a change in pull request #20443:
URL: https://github.com/apache/incubator-mxnet/pull/20443#discussion_r668865610



##########
File path: src/common/rtc.cc
##########
@@ -44,8 +44,8 @@ CudaModule::Chunk::Chunk(
       << "For lower version of CUDA, please prepend your kernel defintiions "
       << "with extern \"C\" instead.";
 #endif
-  std::vector<const char*> c_options(options.size());
-  for (const auto& i : options) c_options.emplace_back(i.c_str());
+  std::vector<const char*> c_options;
+  for (const auto& i : options) c_options.push_back(i.c_str());

Review comment:
       Why?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@mxnet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-mxnet] ptrendx merged pull request #20443: [FEATURE] Add backend MXGetMaxSupportedArch() and frontend get_rtc_compile_opts() for CUDA enhanced compatibility

Posted by GitBox <gi...@apache.org>.

ptrendx merged pull request #20443:
URL: https://github.com/apache/incubator-mxnet/pull/20443


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@mxnet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-mxnet] DickJC123 commented on a change in pull request #20443: [FEATURE] Add backend MXGetMaxSupportedArch() and frontend get_rtc_compile_opts() for CUDA enhanced compatibility

Posted by GitBox <gi...@apache.org>.

DickJC123 commented on a change in pull request #20443:
URL: https://github.com/apache/incubator-mxnet/pull/20443#discussion_r669136650



##########
File path: src/common/rtc.cc
##########
@@ -44,8 +44,8 @@ CudaModule::Chunk::Chunk(
       << "For lower version of CUDA, please prepend your kernel defintiions "
       << "with extern \"C\" instead.";
 #endif
-  std::vector<const char*> c_options(options.size());
-  for (const auto& i : options) c_options.emplace_back(i.c_str());
+  std::vector<const char*> c_options;
+  for (const auto& i : options) c_options.push_back(i.c_str());

Review comment:
       As we discussed offline, this reinstates the prior working code.  It's simple and good enough for adding a small number of pointers to the vector.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@mxnet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org