You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@tvm.apache.org by "MasterJH5574 (via GitHub)" <gi...@apache.org> on 2023/07/21 05:50:21 UTC

[GitHub] [tvm] MasterJH5574 opened a new pull request, #15373: [TIR] Allreduce broadcast result to each thread in multi-warp case

MasterJH5574 opened a new pull request, #15373:
URL: https://github.com/apache/tvm/pull/15373

   PR #15327 introduces the warp-level primitive support in multi-warp allreduce. However, due to the specialty of the two-stage shuffle-down reduction implementation of the allreduce in multi-warp scenarios, PR #15327 did not broadcast the allreduce result to each reduction thread. This behavior does not align with the semantics of allreduce and is not ideal for many use cases. Therefore, this PR completes the implementation by inserting a stage of writing the reduction results to shared memory, so that each reduction thread across all the reduction warps can access the reduction results.
   
   This shared memory write-back stage will only be inserted in multi-warp allreduce cases. In single-warp allreduce, a `shfl_sync` is used to broadcast the reduction results across reduction threads. Since in multi-warp settings we cannot leverage warp-level primitives to broadcast the value, we can only make use of shared memory.
   
   The numerical correctness are verified locally.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] tqchen merged pull request #15373: [TIR] Allreduce broadcast result to each thread in multi-warp case

Posted by "tqchen (via GitHub)" <gi...@apache.org>.

tqchen merged PR #15373:
URL: https://github.com/apache/tvm/pull/15373


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] tvm-bot commented on pull request #15373: [TIR] Allreduce broadcast result to each thread in multi-warp case

Posted by "tvm-bot (via GitHub)" <gi...@apache.org>.

tvm-bot commented on PR #15373:
URL: https://github.com/apache/tvm/pull/15373#issuecomment-1645017749

   <!---bot-comment-->
   
   Thanks for contributing to TVM! Please refer to the contributing guidelines https://tvm.apache.org/docs/contribute/ for useful information and tips. Please request code reviews from [Reviewers](https://github.com/apache/incubator-tvm/blob/master/CONTRIBUTORS.md#reviewers) by @-ing them in a comment.
   
   <!--bot-comment-ccs-start-->
    * cc @Hzfengsy, @junrushao, @quic-sanirudh, @shingjan <sub>See [#10317](https://github.com/apache/tvm/issues/10317) for details</sub><!--bot-comment-ccs-end-->
   
   <sub>Generated by [tvm-bot](https://github.com/apache/tvm/blob/main/ci/README.md#github-actions)</sub>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] MasterJH5574 commented on pull request #15373: [TIR] Allreduce broadcast result to each thread in multi-warp case

Posted by "MasterJH5574 (via GitHub)" <gi...@apache.org>.

MasterJH5574 commented on PR #15373:
URL: https://github.com/apache/tvm/pull/15373#issuecomment-1645021499

   cc @yzh119 @tqchen 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] MasterJH5574 commented on pull request #15373: [TIR] Allreduce broadcast result to each thread in multi-warp case

Posted by "MasterJH5574 (via GitHub)" <gi...@apache.org>.

MasterJH5574 commented on PR #15373:
URL: https://github.com/apache/tvm/pull/15373#issuecomment-1645027161

   > LGTM! I was curious if there's any performance implication of this change?
   
   @junrushao I didn’t measure. For platform like CUDA with warp size 32, the additional shared memory will have at most 16 elements, and I assume this overhead is negligible. Nevertheless, in multi-warp reduction settings, the current impl with warp-level primitive leverage will at least be no slower than the status before #15327, which allocated large shared memory used naive cross-thread reduction implementation over shared memory.
   
   On the other hand, to fulfill the semantics of allreduce, we have to compromise the shared memory here.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org