You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@tvm.apache.org by "yzh119 (via GitHub)" <gi...@apache.org> on 2023/03/12 15:03:51 UTC

[GitHub] [tvm] yzh119 opened a new pull request, #14280: [TensorIR][Transform] Enable warp shuffling for `LowerWarpMemory`

yzh119 opened a new pull request, #14280:
URL: https://github.com/apache/tvm/pull/14280

   # Motivation
   The `LowerWarpMemory` pass cannot emit `shfl_sync` instructions because of an [internal check](https://github.com/apache/tvm/blob/e3c8f2bc2d9ab3f5ba1c0db0755e455bf59a39b8/src/tir/transforms/lower_warp_memory.cc#L334-L337) introduced in #9727 . Actually if we load value from another lane in the warp, the `local_index` would inevitably carry the warp index, and this case would be disabled by the check.
   
   This PR fix the issue by disabling the check and add an unit test for warp shuffling.
   
   The PR depends on #14279 , I'll rebase to upstream/main after that PR is merged.
   
   @Lunderberg @masahi @tqchen 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] junrushao commented on pull request #14280: [TensorIR][Transform] Enable warp shuffling for `LowerWarpMemory`

Posted by "junrushao (via GitHub)" <gi...@apache.org>.

junrushao commented on PR #14280:
URL: https://github.com/apache/tvm/pull/14280#issuecomment-1465274504

   Its fine to keep it open as a draft PR :-)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] yzh119 closed pull request #14280: [TensorIR][Transform] Enable warp shuffling for `LowerWarpMemory`

Posted by "yzh119 (via GitHub)" <gi...@apache.org>.

yzh119 closed pull request #14280: [TensorIR][Transform] Enable warp shuffling for `LowerWarpMemory`
URL: https://github.com/apache/tvm/pull/14280


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] Lunderberg commented on a diff in pull request #14280: [TensorIR][Transform] Enable warp shuffling for `LowerWarpMemory`

Posted by "Lunderberg (via GitHub)" <gi...@apache.org>.

Lunderberg commented on code in PR #14280:
URL: https://github.com/apache/tvm/pull/14280#discussion_r1133951480


##########
tests/python/unittest/test_tir_transform_lower_warp_memory.py:
##########
@@ -347,5 +348,105 @@ def test_lower_warp_memory_divide_by_factor():
         tvm.tir.transform.LowerWarpMemory()(mod)["f_kernel0"]
 
 
+@T.prim_func
+def func(a: T.handle, b: T.handle) -> None:
+    A = T.match_buffer(a, [32], "float32")
+    B = T.match_buffer(b, [32], "float32")
+    for i in range(32):
+        with T.block("warp_shuffle"):
+            vi = T.axis.spatial(32, i)
+            B[vi] = A[(vi % 4) * 8 + vi // 4] + T.float32(1)
+
+
+def test_warp_shuffle_transform():
+    @tvm.script.ir_module
+    class Before:
+        @T.prim_func
+        def main(A: T.handle("float32", "global"), B: T.handle("float32", "global")):
+            blockIdx_x = T.env_thread("blockIdx.x")
+            threadIdx_x = T.env_thread("threadIdx.x")
+            T.func_attr(

Review Comment:
   It looks like the test case only requires the `"target"` attribute, and only requires `"kind"` and `"thread_warp_size"` within that.  Can we remove the extra attributes from the unit test?



##########
tests/python/unittest/test_tir_transform_lower_warp_memory.py:
##########
@@ -347,5 +348,105 @@ def test_lower_warp_memory_divide_by_factor():
         tvm.tir.transform.LowerWarpMemory()(mod)["f_kernel0"]
 
 
+@T.prim_func
+def func(a: T.handle, b: T.handle) -> None:
+    A = T.match_buffer(a, [32], "float32")
+    B = T.match_buffer(b, [32], "float32")
+    for i in range(32):
+        with T.block("warp_shuffle"):
+            vi = T.axis.spatial(32, i)
+            B[vi] = A[(vi % 4) * 8 + vi // 4] + T.float32(1)
+
+
+def test_warp_shuffle_transform():
+    @tvm.script.ir_module
+    class Before:
+        @T.prim_func
+        def main(A: T.handle("float32", "global"), B: T.handle("float32", "global")):
+            blockIdx_x = T.env_thread("blockIdx.x")
+            threadIdx_x = T.env_thread("threadIdx.x")
+            T.func_attr(
+                {
+                    "calling_conv": 2,
+                    "global_symbol": "main",
+                    "target": T.target(
+                        {
+                            "host": {"keys": ["cpu"], "kind": "llvm", "tag": ""},
+                            "keys": ["cuda", "gpu"],
+                            "kind": "cuda",
+                            "max_num_threads": 1024,
+                            "tag": "",
+                            "thread_warp_size": 32,
+                        }
+                    ),
+                    "tir.device_thread_axis": [
+                        T.iter_var(blockIdx_x, [0, 1], "ThreadIndex", "blockIdx.x"),
+                        T.iter_var(threadIdx_x, [0, 32], "ThreadIndex", "threadIdx.x"),
+                    ],
+                    "tir.is_global_func": 1,
+                    "tir.noalias": 1,
+                }
+            )
+            T.launch_thread(blockIdx_x, 1)
+            A_warp = T.allocate([32], "float32", "warp")
+            B_warp = T.allocate([32], "float32", "warp")
+            T.launch_thread(threadIdx_x, 32)
+            A_warp_1 = T.Buffer((32,), data=A_warp, scope="warp")
+            A_1 = T.Buffer((32,), data=A)

Review Comment:
   Instead of having a separate `A: T.handle` and `A_1: T.Buffer`, the buffer could be declared as a parameter `A_1: T.Buffer(32)`.  It does result in slightly different TIR, as it follows the style from before `MakePackedAPI` is applied, but for a unit test would help to emphasize the change being tested.



##########
tests/python/unittest/test_tir_transform_lower_warp_memory.py:
##########
@@ -347,5 +348,105 @@ def test_lower_warp_memory_divide_by_factor():
         tvm.tir.transform.LowerWarpMemory()(mod)["f_kernel0"]
 
 
+@T.prim_func
+def func(a: T.handle, b: T.handle) -> None:
+    A = T.match_buffer(a, [32], "float32")
+    B = T.match_buffer(b, [32], "float32")
+    for i in range(32):
+        with T.block("warp_shuffle"):
+            vi = T.axis.spatial(32, i)
+            B[vi] = A[(vi % 4) * 8 + vi // 4] + T.float32(1)
+
+
+def test_warp_shuffle_transform():
+    @tvm.script.ir_module
+    class Before:
+        @T.prim_func
+        def main(A: T.handle("float32", "global"), B: T.handle("float32", "global")):
+            blockIdx_x = T.env_thread("blockIdx.x")
+            threadIdx_x = T.env_thread("threadIdx.x")
+            T.func_attr(
+                {
+                    "calling_conv": 2,
+                    "global_symbol": "main",
+                    "target": T.target(
+                        {
+                            "host": {"keys": ["cpu"], "kind": "llvm", "tag": ""},
+                            "keys": ["cuda", "gpu"],
+                            "kind": "cuda",
+                            "max_num_threads": 1024,
+                            "tag": "",
+                            "thread_warp_size": 32,
+                        }
+                    ),
+                    "tir.device_thread_axis": [
+                        T.iter_var(blockIdx_x, [0, 1], "ThreadIndex", "blockIdx.x"),
+                        T.iter_var(threadIdx_x, [0, 32], "ThreadIndex", "threadIdx.x"),
+                    ],
+                    "tir.is_global_func": 1,
+                    "tir.noalias": 1,
+                }
+            )
+            T.launch_thread(blockIdx_x, 1)
+            A_warp = T.allocate([32], "float32", "warp")
+            B_warp = T.allocate([32], "float32", "warp")
+            T.launch_thread(threadIdx_x, 32)
+            A_warp_1 = T.Buffer((32,), data=A_warp, scope="warp")
+            A_1 = T.Buffer((32,), data=A)
+            A_warp_1[threadIdx_x] = A_1[threadIdx_x]
+            B_warp_1 = T.Buffer((32,), data=B_warp, scope="warp")
+            T.tvm_storage_sync("warp")
+            B_warp_1[threadIdx_x] = A_warp_1[threadIdx_x % 4 * 8 + threadIdx_x // 4] + T.float32(1)

Review Comment:
   Could we add a comment here, indicating that this line is the one that should be updated correctly?



##########
tests/python/unittest/test_tir_transform_lower_warp_memory.py:
##########
@@ -347,5 +348,105 @@ def test_lower_warp_memory_divide_by_factor():
         tvm.tir.transform.LowerWarpMemory()(mod)["f_kernel0"]
 
 
+@T.prim_func
+def func(a: T.handle, b: T.handle) -> None:
+    A = T.match_buffer(a, [32], "float32")
+    B = T.match_buffer(b, [32], "float32")
+    for i in range(32):
+        with T.block("warp_shuffle"):
+            vi = T.axis.spatial(32, i)
+            B[vi] = A[(vi % 4) * 8 + vi // 4] + T.float32(1)
+
+
+def test_warp_shuffle_transform():

Review Comment:
   The test looks reasonable as-is, though there's also a `tvm.testing.CompareBeforeAfter` that you could use to further reduce the boilerplate.
   
   ```python
   class TestWarpShuffleTransform(tvm.testing.CompareBeforeAfter):
       transform = tvm.tir.transform.LowerWarpMemory()
   
       def before(A: T.handle("float32", "global"), B: T.handle("float32", "global")):
           ...
   
       def expected(A: T.handle("float32", "global"), B: T.handle("float32", "global")):
           ...
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] Lunderberg commented on pull request #14280: [TensorIR][Transform] Enable warp shuffling for `LowerWarpMemory`

Posted by "Lunderberg (via GitHub)" <gi...@apache.org>.

Lunderberg commented on PR #14280:
URL: https://github.com/apache/tvm/pull/14280#issuecomment-1466171798

   (Also, it looks like the initial check dates back to [PR#1050](https://github.com/apache/tvm/pull/1050/files#diff-2798beec4533fbfc67324a90dc70ea51b263dbeb56e48e295da335fe512656d8R206), just with different refactorings that touched that line along the way.)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] tvm-bot commented on pull request #14280: [TensorIR][Transform] Enable warp shuffling for `LowerWarpMemory`

Posted by "tvm-bot (via GitHub)" <gi...@apache.org>.

tvm-bot commented on PR #14280:
URL: https://github.com/apache/tvm/pull/14280#issuecomment-1465222323

   <!---bot-comment-->
   
   Thanks for contributing to TVM! Please refer to the contributing guidelines https://tvm.apache.org/docs/contribute/ for useful information and tips. Please request code reviews from [Reviewers](https://github.com/apache/incubator-tvm/blob/master/CONTRIBUTORS.md#reviewers) by @-ing them in a comment.
   
   <!--bot-comment-ccs-start-->
    * No users to tag found in teams: `tensorir`, `transform` <sub>See [#10317](https://github.com/apache/tvm/issues/10317) for details</sub><!--bot-comment-ccs-end-->
   
   <sub>Generated by [tvm-bot](https://github.com/apache/tvm/blob/main/ci/README.md#github-actions)</sub>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] yzh119 commented on pull request #14280: [TensorIR][Transform] Enable warp shuffling for `LowerWarpMemory`

Posted by "yzh119 (via GitHub)" <gi...@apache.org>.

yzh119 commented on PR #14280:
URL: https://github.com/apache/tvm/pull/14280#issuecomment-1465234402

   It seems the unit test still works if I add the `ICHECK`, I'll close the PR first.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org