You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@tvm.apache.org by GitBox <gi...@apache.org> on 2022/04/01 21:34:20 UTC

[GitHub] [tvm] tkonolige opened a new issue #10877: [Bug][TIR] Invalid loop vectorization and buffer compaction

tkonolige opened a new issue #10877:
URL: https://github.com/apache/tvm/issues/10877


   When using an `alloc_buffer` that is not initialized or written out, CompactBufferAllocation and LoopVectorize passes perform invalid rewrites. Technically these could be considered correct because the changes have no effect on output, but they are very confusing when inspecting the lowered code.
   
   Example script:
   ```
   import tvm
   from tvm.script import tir as T
   
   @T.prim_func
   def blis_gemm_microkernel_template(c: T.handle):
       A_pack = T.alloc_buffer((8,), "float32", scope="local")
       B_pack = T.alloc_buffer((8,), "float32", scope="local")
       C = T.match_buffer(c, (8,8))
   
       for loop in range(1000):
           for rii in T.unroll(8):
               for rjj in T.vectorized(8):
                   C[rii, rjj] += A_pack[rii] * B_pack[rjj]
   
   if __name__ == "__main__":
       with tvm.transform.PassContext(opt_level=3):
           print(tvm.lower(blis_gemm_microkernel_template, "llvm -mcpu=znver3"))
   
       with tvm.transform.PassContext(opt_level=3, disabled_pass=["tir.CompactBufferAllocation"]):
           print(tvm.lower(blis_gemm_microkernel_template, "llvm -mcpu=znver3"))
   ```
   
   Output:
   ```
   @main = primfn(c: handle) -> ()
     attr = {"global_symbol": "main", "tir.noalias": True}
     buffers = {C: Buffer(C_1: Pointer(global float32), float32, [64], [])}
     buffer_map = {c: C} {
     allocate(A_pack: Pointer(local float32x8), float32x8, [1]), storage_scope = local;
     allocate(B_pack: Pointer(local float32x8), float32x8, [1]), storage_scope = local;
     for (loop: int32, 0, 1000) {
       C[ramp(0, 1, 8)] = (C[ramp(0, 1, 8)] + (A_pack_1: Buffer(A_pack, float32x8, [1], [], scope="local")[0]*B_pack_1: Buffer(B_pack, float32x8, [1], [], scope="local")[0]))
       C[ramp(8, 1, 8)] = (C[ramp(8, 1, 8)] + (A_pack_1[0]*B_pack_1[0]))
       C[ramp(16, 1, 8)] = (C[ramp(16, 1, 8)] + (A_pack_1[0]*B_pack_1[0]))
       C[ramp(24, 1, 8)] = (C[ramp(24, 1, 8)] + (A_pack_1[0]*B_pack_1[0]))
       C[ramp(32, 1, 8)] = (C[ramp(32, 1, 8)] + (A_pack_1[0]*B_pack_1[0]))
       C[ramp(40, 1, 8)] = (C[ramp(40, 1, 8)] + (A_pack_1[0]*B_pack_1[0]))
       C[ramp(48, 1, 8)] = (C[ramp(48, 1, 8)] + (A_pack_1[0]*B_pack_1[0]))
       C[ramp(56, 1, 8)] = (C[ramp(56, 1, 8)] + (A_pack_1[0]*B_pack_1[0]))
     }
   }
   
   
   @main = primfn(c: handle) -> ()
     attr = {"global_symbol": "main", "tir.noalias": True}
     buffers = {C: Buffer(C_1: Pointer(global float32), float32, [64], [])}
     buffer_map = {c: C} {
     allocate(A_pack: Pointer(local float32x8), float32x8, [8]), storage_scope = local;
     allocate(B_pack: Pointer(local float32), float32, [64]), storage_scope = local;
     for (loop: int32, 0, 1000) {
       C[ramp(0, 1, 8)] = (C[ramp(0, 1, 8)] + (A_pack_1: Buffer(A_pack, float32x8, [8], [], scope="local")[0]*B_pack_1: Buffer(B_pack, float32, [64], [], scope="local")[ramp(0, 9, 8)]))
       C[ramp(8, 1, 8)] = (C[ramp(8, 1, 8)] + (A_pack_1[1]*B_pack_1[ramp(0, 9, 8)]))
       C[ramp(16, 1, 8)] = (C[ramp(16, 1, 8)] + (A_pack_1[2]*B_pack_1[ramp(0, 9, 8)]))
       C[ramp(24, 1, 8)] = (C[ramp(24, 1, 8)] + (A_pack_1[3]*B_pack_1[ramp(0, 9, 8)]))
       C[ramp(32, 1, 8)] = (C[ramp(32, 1, 8)] + (A_pack_1[4]*B_pack_1[ramp(0, 9, 8)]))
       C[ramp(40, 1, 8)] = (C[ramp(40, 1, 8)] + (A_pack_1[5]*B_pack_1[ramp(0, 9, 8)]))
       C[ramp(48, 1, 8)] = (C[ramp(48, 1, 8)] + (A_pack_1[6]*B_pack_1[ramp(0, 9, 8)]))
       C[ramp(56, 1, 8)] = (C[ramp(56, 1, 8)] + (A_pack_1[7]*B_pack_1[ramp(0, 9, 8)]))
     }
   }
   ```
   
   The first output incorrectly has `A_pack_1` accesses as all being 0 when they should be 0-7. The second output has an incorrect ramp with a stride of 9.
   
   @vinx13 @junrushao1994 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] vinx13 commented on issue #10877: [Bug][TIR] Invalid loop vectorization and buffer compaction

Posted by GitBox <gi...@apache.org>.

vinx13 commented on issue #10877:
URL: https://github.com/apache/tvm/issues/10877#issuecomment-1086372190


   `CompactBufferAllocation` requires intermediate buffer inside kernels to have both writers and readers in order to calculate the actual shape needed for allocation. If you have already specified the compacted shape, you can disable both `tir.CompactBufferAllocation` and `tir.PlanAndUpdateBufferAllocationLocation`, to keep buffer allocation and its shape as specified in the original TIR script.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] junrushao1994 closed issue #10877: [Bug][TIR] Invalid loop vectorization and buffer compaction

Posted by GitBox <gi...@apache.org>.

junrushao1994 closed issue #10877:
URL: https://github.com/apache/tvm/issues/10877


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org