You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@mxnet.apache.org by GitBox <gi...@apache.org> on 2019/09/30 20:01:40 UTC

[GitHub] [incubator-mxnet] ptrendx commented on issue #16294: Add CMake flag `CMAKE_BUILD_TYPE=Release`

ptrendx commented on issue #16294: Add CMake flag `CMAKE_BUILD_TYPE=Release`
URL: https://github.com/apache/incubator-mxnet/pull/16294#issuecomment-536727538
 
 
   The `too many resources requested for launch` error happens most often because the number of registers required  for the kernel exceeded the number of registers available. The register file on the GPU has some capacity that is shared by all threads in a SM (streaming multiprocessor), so the more registers is used, less number of threads can be actually launched. The problem comes from the fact that the number of threads launched is a value known only at runtime, not at compile time, so the compiler cannot do the analysis to limit the number of used registers / spill some to global memory. Debug build uses more registers than the release build, so that is where you hit the error in that particular kernel. This problem can be solved by telling the compiler what is the maximum number of threads and blocks that will be launched per SM via [launch bounds](https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#launch-bounds). Inserting proper launch bounds should be a very easy change, if you have any problem applying it just tell us which exact kernel is giving this error and we can make a PR for it as well.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services