You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@mxnet.apache.org by GitBox <gi...@apache.org> on 2021/05/19 21:54:31 UTC

[GitHub] [incubator-mxnet] ptrendx commented on pull request #19426: [FEATURE] Use RTC for reduction ops

ptrendx commented on pull request #19426:
URL: https://github.com/apache/incubator-mxnet/pull/19426#issuecomment-844501183


   > Do you have any data on the overheads involved in RTC launch vs. compiled kernel launch, e.g. on the first iteration and thereafter (perhaps for both hybridized and unhybridized models)?
   
   There is an overhead on the first launch of the given kernel of 10ms-100ms since it needs to be compiled before use. After the compilation it is stored in a cache and any subsequent call is fast - I measured ~2us overhead for constructing the kernel code and cache lookup, which is comparable with the cudaLaunchKernel itself. There is not really any difference between the hybridized and nonhybridized models since the functionality works irrespective of hybridization.
   
   > 
   > I'm sorry to see all those floating point constants in the MXNet RTC code. Are there no compiler-defined constants that can be used, or is there a motivation for avoiding them?
   
   No floating point constants are compiler defined - they all come from header files (e.g. <climits>). The motivation of avoiding including external headers is to avoid the potential issues of finding the headers' location and the fact that in NVRTC we cannot include any header which contains host-only code.
   
   > 
   > Having worked on these reduce functions quite a bit, you probably have a good sense of the level of testing. Do you feel it's adequate? Can RTC-based reduction invoke any new regions of the operator parameter space?
   
   I think the level of testing is generally adequate and the change to RTC does not introduce any additional parameters to be tested. It actually consolidates the functionality and so improves the testing coverage (since previously some functions were using customized versions of the kernel e.g. from `src/operator/numpy/linalg/broadcast_reduce_customized-inl.cuh` and now all the usecases are handled by the same kernel code).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org