You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@tvm.apache.org by GitBox <gi...@apache.org> on 2023/01/13 19:39:47 UTC

[GitHub] [tvm] adstraw commented on a diff in pull request #13719: [Hexagon] Add hexagon user DMA intrins for tensorization

adstraw commented on code in PR #13719:
URL: https://github.com/apache/tvm/pull/13719#discussion_r1069928049


##########
python/tvm/tir/tensor_intrin/hexagon.py:
##########
@@ -20,6 +20,47 @@
 from .. import TensorIntrin
 
 
+def generate_dma_load_intrin(
+    size: int,
+    dtype: str,
+):
+    """Generator of dma_load intrins"""
+
+    @T.prim_func
+    def dma_load_desc(a: T.handle, c: T.handle) -> None:
+        A = T.match_buffer(a, (size), dtype, offset_factor=1, scope="global")
+        C = T.match_buffer(c, (size), dtype, offset_factor=1, scope="global.vtcm")
+        with T.block("root"):
+            T.reads(A[0:size])
+            T.writes(C[0:size])
+            for i in T.serial(size):
+                with T.block("load"):
+                    vii = T.axis.remap("S", [i])
+                    C[vii] = A[vii]
+
+    @T.prim_func
+    def dma_load_impl(a: T.handle, c: T.handle) -> None:
+        A = T.match_buffer(a, (size), dtype, offset_factor=1, scope="global")
+        C = T.match_buffer(c, (size), dtype, offset_factor=1, scope="global.vtcm")
+        with T.block("root"):
+            T.reads(A[0:size])
+            T.writes(C[0:size])
+            T.evaluate(
+                T.tvm_call_packed(
+                    "device_api.hexagon.dma_copy",
+                    0,

Review Comment:
   Synchronous DMA uses queue ID -1.  See [here](https://github.com/apache/tvm/blob/60c723ec267ee5095a35add5f8259e650b8ddd7b/src/runtime/hexagon/hexagon_user_dma.h#L37).  This is so as not to interfere with async DMA flow which uses queue IDs starting with 0.  Please use queue -1 and add some comments here.



##########
python/tvm/tir/tensor_intrin/hexagon.py:
##########
@@ -20,6 +20,47 @@
 from .. import TensorIntrin
 
 
+def generate_dma_load_intrin(
+    size: int,
+    dtype: str,
+):
+    """Generator of dma_load intrins"""
+
+    @T.prim_func
+    def dma_load_desc(a: T.handle, c: T.handle) -> None:
+        A = T.match_buffer(a, (size), dtype, offset_factor=1, scope="global")
+        C = T.match_buffer(c, (size), dtype, offset_factor=1, scope="global.vtcm")
+        with T.block("root"):
+            T.reads(A[0:size])
+            T.writes(C[0:size])
+            for i in T.serial(size):
+                with T.block("load"):
+                    vii = T.axis.remap("S", [i])
+                    C[vii] = A[vii]
+
+    @T.prim_func
+    def dma_load_impl(a: T.handle, c: T.handle) -> None:

Review Comment:
   sync_dma_load_impl



##########
tests/python/contrib/test_hexagon/test_vtcm_bandwidth.py:
##########
@@ -123,15 +125,18 @@ class TestMatMulVec:
 
     # Removed most of these to speedup CI.
     size = tvm.testing.parameter(
-        # 10 * KB,
+        128,
+        256,
+        1024,
+        10 * KB,
         # 20 * KB,
         # 40 * KB,
         # 80 * KB,
         # 160 * KB,
         # 320 * KB,
         640 * KB,
         # MB,
-        # 2 * MB,

Review Comment:
   Did you mean to uncomment this?  Makes the test run longer in CI.



##########
tests/python/contrib/test_hexagon/test_vtcm_bandwidth.py:
##########
@@ -104,8 +106,8 @@ def evaluate(hexagon_session, sch, size):
     )
 
     # These are reduced for CI but number=100 and repeat=10 does a good job of removing noise.
-    number = 1
-    repeat = 1
+    number = 10
+    repeat = 10

Review Comment:
   Did you mean to change this?  Makes the test run longer in CI.



##########
python/tvm/tir/tensor_intrin/hexagon.py:
##########
@@ -20,6 +20,47 @@
 from .. import TensorIntrin
 
 
+def generate_dma_load_intrin(
+    size: int,
+    dtype: str,
+):
+    """Generator of dma_load intrins"""
+
+    @T.prim_func
+    def dma_load_desc(a: T.handle, c: T.handle) -> None:
+        A = T.match_buffer(a, (size), dtype, offset_factor=1, scope="global")
+        C = T.match_buffer(c, (size), dtype, offset_factor=1, scope="global.vtcm")
+        with T.block("root"):
+            T.reads(A[0:size])
+            T.writes(C[0:size])
+            for i in T.serial(size):
+                with T.block("load"):
+                    vii = T.axis.remap("S", [i])
+                    C[vii] = A[vii]
+
+    @T.prim_func
+    def dma_load_impl(a: T.handle, c: T.handle) -> None:
+        A = T.match_buffer(a, (size), dtype, offset_factor=1, scope="global")
+        C = T.match_buffer(c, (size), dtype, offset_factor=1, scope="global.vtcm")
+        with T.block("root"):
+            T.reads(A[0:size])
+            T.writes(C[0:size])
+            T.evaluate(
+                T.tvm_call_packed(
+                    "device_api.hexagon.dma_copy",
+                    0,
+                    T.address_of(C[0], dtype="handle"),
+                    T.address_of(A[0], dtype="handle"),
+                    size,
+                    0,
+                    dtype="int32",
+                )
+            )
+            T.evaluate(T.tvm_call_packed("device_api.hexagon.dma_wait", 0, 0, dtype="int32"))

Review Comment:
   Queue = -1.  Comments that Wait(queue, 0) means to wait for the queue to drain which is the sum total of the previous dma_copy.



##########
python/tvm/tir/tensor_intrin/hexagon.py:
##########
@@ -20,6 +20,47 @@
 from .. import TensorIntrin
 
 
+def generate_dma_load_intrin(
+    size: int,
+    dtype: str,
+):
+    """Generator of dma_load intrins"""
+
+    @T.prim_func
+    def dma_load_desc(a: T.handle, c: T.handle) -> None:

Review Comment:
   Would like this to be called "sync_dma_load_desc" with some comments to distinguish between async and sync (copy and immediate wait) flow.



##########
python/tvm/tir/tensor_intrin/hexagon.py:
##########
@@ -20,6 +20,47 @@
 from .. import TensorIntrin
 
 
+def generate_dma_load_intrin(
+    size: int,
+    dtype: str,
+):
+    """Generator of dma_load intrins"""
+
+    @T.prim_func
+    def dma_load_desc(a: T.handle, c: T.handle) -> None:
+        A = T.match_buffer(a, (size), dtype, offset_factor=1, scope="global")
+        C = T.match_buffer(c, (size), dtype, offset_factor=1, scope="global.vtcm")
+        with T.block("root"):
+            T.reads(A[0:size])
+            T.writes(C[0:size])
+            for i in T.serial(size):
+                with T.block("load"):
+                    vii = T.axis.remap("S", [i])
+                    C[vii] = A[vii]
+
+    @T.prim_func
+    def dma_load_impl(a: T.handle, c: T.handle) -> None:
+        A = T.match_buffer(a, (size), dtype, offset_factor=1, scope="global")
+        C = T.match_buffer(c, (size), dtype, offset_factor=1, scope="global.vtcm")
+        with T.block("root"):
+            T.reads(A[0:size])
+            T.writes(C[0:size])
+            T.evaluate(
+                T.tvm_call_packed(
+                    "device_api.hexagon.dma_copy",
+                    0,
+                    T.address_of(C[0], dtype="handle"),
+                    T.address_of(A[0], dtype="handle"),
+                    size,
+                    0,

Review Comment:
   Need comments, at least to indicate that this is for bypass.  Better would be to tie the setting of this bit to `tir.experimental_dma_bypass_cache` annotation.



##########
python/tvm/tir/tensor_intrin/hexagon.py:
##########
@@ -163,3 +204,27 @@ def dot_product_32x2_i16i16i32_vdmpy(a: T.handle, b: T.handle, c: T.handle) -> N
 
 VRMPY_u8i8i32_VTCM_INTRIN = "dot_32x4_u8i8i32_vtcm_vrmpy"
 TensorIntrin.register(VRMPY_u8i8i32_VTCM_INTRIN, *generate_dot_product_32x4_u8i8i32("global.vtcm"))
+
+DMA_READ_1_u8 = "dma_read_1_u8"

Review Comment:
   I don't see users for most of these.  Seems like it might be better to delete and allow users to create what is needed based on the test case or schedule?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org