You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@tvm.apache.org by GitBox <gi...@apache.org> on 2022/07/26 10:37:04 UTC

[GitHub] [tvm] masahi commented on a diff in pull request #12171: [TIR] Asynchrounos stage in software pipeline

masahi commented on code in PR #12171:
URL: https://github.com/apache/tvm/pull/12171#discussion_r929804656


##########
src/tir/transforms/thread_storage_sync.cc:
##########
@@ -384,6 +426,9 @@ class ThreadSyncInserter : public StmtExprMutator {
 
 Stmt ThreadSync(Stmt stmt, std::string storage_scope) {
   StorageScope sync_scope = StorageScope::Create(storage_scope);
+  if (sync_scope.rank == StorageRank::kShared && sync_scope.tag == "") {

Review Comment:
   This is only for making sure that this code path is hit only once. `ThreadSyncAfterWaitQueueInserter` just looks for `async_wait_queue_scope` and inserts `syncthreads` after it. So assuming that all shared memory, including dynamic ones, are protected by `async_wait_queue_scope` (which should be the case by `InjectSoftwarePipeline`), all necessary `syncthreads` will be inserted.  
   
    Since `ThreadSync` is called twice, for `shared` and `shared.dyn`, https://github.com/apache/tvm/blob/7ef6811000a3752e843b333fd5743ea9ef6c049e/src/driver/driver_api.cc#L530-L531, we get two `syncthreads` without this check.
   
   Thinking about it more now, this assumes that `async_wait_queue_scope` on GPU is always associated with shared memory. This should be fine as long as the only async operation is copying into shared memory. I have to admit this is a bit hacky, but something like this is needed for correctness.  



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org