You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@mxnet.apache.org by GitBox <gi...@apache.org> on 2020/08/18 00:27:58 UTC

[GitHub] [incubator-mxnet] KellenSunderland commented on issue #18951: [RFC] GPU performance improvements in MXNet engine

KellenSunderland commented on issue #18951:
URL: https://github.com/apache/incubator-mxnet/issues/18951#issuecomment-675180856


   I really like this proposal, thanks for the great write-up Przemyslaw.
   
   I haven't totally thought through pros/cons, but would it be possible to return a cudaStreamWaitEvent by default after every block of operators is called, and use that as a reference for any dependent block of ops? Would this unblock our GPU worker threads because we're not calling a cudaStreamSync?
   
   If I'm understanding correctly that would be the equivalent of what you're proposing in your second scenario (when we have two cuda streams)? Would it have a lot of overhead in scenario 1 where we use same stream?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org