You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@mxnet.apache.org by GitBox <gi...@apache.org> on 2019/02/18 23:37:14 UTC

[GitHub] DickJC123 commented on a change in pull request #14006: Dual stream cudnn Convolution backward() with MXNET_GPU_WORKER_NSTREAMS=2.

DickJC123 commented on a change in pull request #14006: Dual stream cudnn Convolution backward() with MXNET_GPU_WORKER_NSTREAMS=2.
URL: https://github.com/apache/incubator-mxnet/pull/14006#discussion_r257851551

##########
File path: docs/faq/env_var.md
##########
@@ -174,6 +174,12 @@ When USE_PROFILER is enabled in Makefile or CMake, the following environments ca

## Other Environment Variables

+* MXNET_GPU_WORKER_NSTREAMS

Review comment:
The short answer is that 'yes', an operator with 3 inputs might make use of 3 streams in Backward(), so I did not want to propose an environment variable name like MXNET_GPU_WORKER_USE_DUAL_STREAM=0/1 that might soon become obsolete. On the other hand, Convolution only needs 2 streams, and I did not want to burden this enhancement with more complexity than is needed at this time. I propose that when we have a use-case for 3 or more streams, then we can expand the implementation and employ the use-case in our testing of it.

At the end of every kernel execution, there is a fall-off in GPU utilization leading up to the completion of the last grid block. When two streams are being used, these utilization gaps can be filled by work from the second stream. I would guess that having 3 streams would not enhance this effect. On the other hand, let's say you had 3 small independent kernels that each would occupy a third of the GPU. You could see how having 3 streams would be a win in this case over 2 streams.

So it's good that you ask, how might we expand this to 3 or more streams? The MXNET_GPU_WORKER_NSTREAMS environment variable would remain unchanged, though the documentation would indicate that the framework supports a value greater than 2. Legacy env-var uses would be preserved so I think this could happen as part of a minor release. At the RunContext level, a GPUAuxStream* would be replaced by a std::vector<GPUAuxStream*>. The RunContext method get_gpu_aux_stream() might then be changed to RunContex::get_gpu_aux_stream(int aux_stream_id = 0), which would not break operator code that started using the simpler aux_stream API proposed by this PR.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

With regards,
Apache Git Services