You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@tvm.apache.org by GitBox <gi...@apache.org> on 2022/09/12 17:23:08 UTC

[GitHub] [tvm] wrongtest-intellif opened a new pull request, #12757: [TIR] Fix plan buffer allocation location for loop carried dependencies

wrongtest-intellif opened a new pull request, #12757:
URL: https://github.com/apache/tvm/pull/12757

   The pass `PlanAndUpdateBufferAllocationLocation` seems to have problem when the buffer accessed indices take a loop carried dependency. As an example,
   ```python
   @T.prim_func
   def test(A: T.Buffer[(8, 8), "int32"], B: T.Buffer[(8, 8), "int32"]):
       C = T.alloc_buffer([8, 8], "int32")
       for i in range(8):
           for j in range(8):
               with T.block("b0"):
                   vi = T.axis.spatial(8, i)
                   vj = T.axis.spatial(8, j)
                   C[vi, vj] = A[vi, vj] + vi
           for j in range(8):
               with T.block("b1"):
                   vi = T.axis.opaque(8, i)
                   vj = T.axis.spatial(8, j)
                   B[vi, vj] = C[vi, vj] + T.if_then_else(vi > 0, C[vi - 1, vj], vi, dtype="int32")
   ```
   
   The block `b1`'s read access to intermediate buffer `B` on iteration `i`, depends `b0` write of `B` on both `i` and `i-1`, thus we should not put allocation of `B` under loop `i`, which is the LCA position of current plan strategy.
   
   To fix the issue we change the behavior of `DetectBufferLCA` to be aware of opaque block iters (loop carried dependency and other more complex behaviors are categorized as `opaque` in iter type annotation).
   
   It enforce that every legal "ancestor" of buffer accesses should dominate all loops relates to accessed opaque block iters within buffer indices. Eg, since `vi` is opaque, the loop `i` must be under the planned allocation point.
   
   As an interesting workload related to loop carried dependency, refer to https://discuss.tvm.apache.org/t/rfc-introducing-a-rolling-buffer-scheduling-primitive/9836, where the intermediate result of previous iteration is try best to get reused.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] wrongtest-intellif merged pull request #12757: [TIR] Fix plan buffer allocation location for loop carried dependencies

Posted by GitBox <gi...@apache.org>.

wrongtest-intellif merged PR #12757:
URL: https://github.com/apache/tvm/pull/12757


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] wrongtest-intellif commented on a diff in pull request #12757: [TIR] Fix plan buffer allocation location for loop carried dependencies

Posted by GitBox <gi...@apache.org>.

wrongtest-intellif commented on code in PR #12757:
URL: https://github.com/apache/tvm/pull/12757#discussion_r974191510


##########
tests/python/unittest/test_tir_transform_plan_update_buffer_allocation_location.py:
##########
@@ -242,9 +243,107 @@ def test_lower_te():
     )  # PlanAndUpdateBufferAllocationLocation should do nothing on TE
 
 
+def test_loop_carried_dependency():
+    """The buffer allocation should be above opaque iter var's loop scopes
+    such that buffer accesses with loop carried dependencies are covered."""
+
+    @T.prim_func
+    def before(A: T.Buffer[(8, 8, 8), "int32"], B: T.Buffer[(8, 8, 8), "int32"]):
+        C = T.alloc_buffer([8, 8, 8], dtype="int32")
+        for i in T.serial(8):
+            for j in T.serial(8):
+                for k in T.serial(8):
+                    with T.block("b0"):
+                        vi, vj, vk = T.axis.remap("SSS", [i, j, k])
+                        C[vi, vj, vk] = A[vi, vj, vk] + 1
+                for k in T.serial(8):
+                    with T.block("b1"):
+                        vi, vk = T.axis.remap("SS", [i, k])
+                        vj = T.axis.opaque(8, j)
+                        B[vi, vj, vk] = C[vi, vj, vk] + T.if_then_else(
+                            0 < vj, C[vi, vj - j, vk], 0, dtype="int32"

Review Comment:
   typo vj - j



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org