You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@tvm.apache.org by "cbalint13 (via GitHub)" <gi...@apache.org> on 2023/08/11 09:42:08 UTC
[GitHub] [tvm] cbalint13 commented on issue #15522: [Bug] [MetaScheduler] [TensorIR] Stride values are not inferred within _impl() block.

cbalint13 commented on issue #15522:
URL: https://github.com/apache/tvm/issues/15522#issuecomment-1674470830

   I've tracked down the issue.
   
   1. LLVM lowered code, before [this->Optimize()](https://github.com/apache/tvm/blob/ed8b82c5d6ea91e70d374988b16b47f64cd83f63/src/target/llvm/codegen_llvm.cc#L367), shows that TIR infered stride as i64:
   ```
     %6 = call i32 @VEC_MACC(ptr %4, ptr %5, ptr @fused_constant, i64 16)
     %9 = call i32 @VEC_MACC(ptr %7, ptr %8, ptr @fused_constant, i64 16)
   ```
   2. But after  [this->Optimize()](https://github.com/apache/tvm/blob/ed8b82c5d6ea91e70d374988b16b47f64cd83f63/src/target/llvm/codegen_llvm.cc#L367) dso_local **VEC_MACC** is marked undefined due to **i32** vs **i64** mismatch:
   ```
   %2 = tail call i32 @VEC_MACC(ptr %0, ptr %1, ptr nonnull @fused_constant, i64 16)
   declare dso_local i32 @VEC_MACC(ptr noundef, ptr noundef, ptr noundef, i32 noundef) local_unnamed_addr #4
   ```
   
   3. Trying using explicit **T.int32()** or **T.int32(B.strides[0])** casts are uneffective, lowered llvm type remains **i64**.
   
   4. In contrast to MS-TIR, using the TOPI way of tensorization, ```strides=[te.var("ldw"), 1]``` lowers to **i32**.
   
   5. Anyway, changing **VEC_MACC** declaration to use **i64** solves the whole problem:
   ```
   --- tvm-ms-testcase.py.old	2023-08-11 12:29:56.055634954 +0300
   +++ tvm-ms-testcase.py	2023-08-11 12:30:13.215511711 +0300
   @@ -37,7 +37,7 @@
    int32_t VEC_MACC(int32_t *output,
                      const uint8_t *data,
                      const int8_t *kernel,
   -                  const int32_t stride) {{
   +                  const int64_t stride) {{
      printf("data: \\n");
      for (int j = 0; j < {INT8_MACS}; ++j) {{
          printf(" %i", data[j]);
   @@ -92,7 +92,7 @@
    @T.prim_func
    def vec_u8_i8_s32_impl(
        A: T.Buffer((INT8_MACS,), "uint8", offset_factor=1, align=INT8_MACS, scope="global"),
   -    B: T.Buffer((INT32_LANES, INT8_MACS), "int8", offset_factor=1, strides=[T.int32(), T.int32()], align=INT8_MACS, scope="global"),
   +    B: T.Buffer((INT32_LANES, INT8_MACS), "int8", offset_factor=1, strides=[T.int64(), T.int64()]], align=INT8_MACS, scope="global"),
        C: T.Buffer((INT32_LANES,), "int32", offset_factor=1, align=INT32_LANES, scope="global"),
    ) -> None:
        with T.block("root"):
   
   ```
   
   Issue done.
   
   ---
   
   But there there are still objects of confusions:
   
   * The very confusing part (still) is that explicit casts ```T.int32()``` was not accounted in any ways.
   * Also, in this specific application, due to C function invokation **exact type match is required**.
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org