You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@mxnet.apache.org by GitBox <gi...@apache.org> on 2019/12/09 22:44:25 UTC

[GitHub] [incubator-mxnet] ptrendx opened a new pull request #17028: Workaround problem with fusion in CUDA 9

ptrendx opened a new pull request #17028: Workaround problem with fusion in CUDA 9
URL: https://github.com/apache/incubator-mxnet/pull/17028
 
 
   ## Description ##
   Fixes #17020 
   
   The problem comes from the bug in how NVRTC in CUDA 9 handles the `default-device` flag. That flag is supposed to mark all the functions in the file as `__device__` functions, but it should leave the functions decorated differently (like kernels decorated with `__global__`) alone. This is the behavior in CUDA 10+. In CUDA 9, however, this `__device__` attribute is applied to every function (including kernels), which is incompatible with `__launch_bounds__()` attribute that we use for kernels.
   
   This PR removes the usage of `default-device` flag for NVRTC compilation and instead manually decorates all the required functions as `__device__`

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services