You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@tvm.apache.org by GitBox <gi...@apache.org> on 2022/12/15 11:02:58 UTC

[GitHub] [tvm] vvchernov opened a new pull request, #13621: WIP: [TIR][TOPI][CI] Support skylake avx512

vvchernov opened a new pull request, #13621:
URL: https://github.com/apache/tvm/pull/13621

   It looks like despite of some avx512 intrinsics were supported (see topi/x86 and tir), they are not used during tuning model on skylake-avx512 target.
   The aim is end-to-end support of Skylake X architecture on TVM side for dense and conv ops.
   
   # Details
   1. CI tests were added


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] cbalint13 commented on a diff in pull request #13621: WIP: [TIR][TOPI][x86][CI] Support skylake avx512

Posted by GitBox <gi...@apache.org>.

cbalint13 commented on code in PR #13621:
URL: https://github.com/apache/tvm/pull/13621#discussion_r1056822605


##########
python/tvm/tir/tensor_intrin/x86.py:
##########
@@ -67,8 +68,50 @@ def dot_product_16x4_u8i8i32_vnni(
         )
 
 
+mem_scope="global"
+@T.prim_func
+def dot_product_16x4_u8i8i32_avx512(a: T.handle, b: T.handle, c: T.handle,) -> None:
+    A = T.match_buffer(a, (4,), "uint8", offset_factor=1, scope=mem_scope)
+    B = T.match_buffer(b, (16, 4), "int8", offset_factor=1, scope=mem_scope)
+    C = T.match_buffer(c, (16,), "int32", offset_factor=1, scope=mem_scope)

Review Comment:
   * Can it be similar to the ```vnni``` counterpart, as the original below ? https://github.com/apache/tvm/blob/209845fb910aac76f1388d868bb4fe4a46d9170f/python/tvm/tir/tensor_intrin/x86.py#L43-L48
   
   * Also, scope ```global``` is default anyway.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] vvchernov commented on a diff in pull request #13621: [TIR][TOPI][x86][CI] Support skylake avx512

Posted by GitBox <gi...@apache.org>.

vvchernov commented on code in PR #13621:
URL: https://github.com/apache/tvm/pull/13621#discussion_r1072093328


##########
tests/python/unittest/test_meta_schedule_trace_apply.py:
##########
@@ -25,6 +25,9 @@
 from tvm.target import Target

Review Comment:
   done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] vvchernov commented on a diff in pull request #13621: WIP: [TIR][TOPI][x86][CI] Support skylake avx512

Posted by GitBox <gi...@apache.org>.

vvchernov commented on code in PR #13621:
URL: https://github.com/apache/tvm/pull/13621#discussion_r1064168344


##########
tests/python/unittest/test_meta_schedule_trace_apply.py:
##########
@@ -2502,6 +2503,7 @@ def apply_trace(sch):
     verify(Conv2dInt8, apply_trace, Conv2dInt8_target, "cuda", Conv2dInt8_tensorcore_scheduled)
 
 
+# TODO(vvchernov): test int8 conv2d foravx512 without VNNI

Review Comment:
   done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] vvchernov commented on a diff in pull request #13621: WIP: [TIR][TOPI][x86][CI] Support skylake avx512

Posted by GitBox <gi...@apache.org>.

vvchernov commented on code in PR #13621:
URL: https://github.com/apache/tvm/pull/13621#discussion_r1064168444


##########
tests/python/unittest/test_meta_schedule_trace_apply.py:
##########
@@ -1131,6 +1131,7 @@ def main(p0: T.Buffer[(1, 32, 7, 7, 16), "uint8"], p1: T.Buffer[(128, 32, 1, 1,
                 T_cast[ax0, ax1, ax2, ax3, ax4] = T.cast(compute_2[ax0, ax1, ax2, ax3, ax4], "int32")
 
 
+# TODO(vvchernov): construct avx512 reference module (without vnni)

Review Comment:
   done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] vvchernov commented on a diff in pull request #13621: [TIR][TOPI][x86][CI] Support skylake avx512

Posted by GitBox <gi...@apache.org>.

vvchernov commented on code in PR #13621:
URL: https://github.com/apache/tvm/pull/13621#discussion_r1071781942


##########
src/meta_schedule/space_generator/space_generator.cc:
##########
@@ -93,6 +103,10 @@ void SpaceGeneratorNode::InitializeWithTuneContext(const TuneContext& context) {
       default_sch_rules = ScheduleRule::DefaultVNNI();
       default_postprocs = Postproc::DefaultVNNI();
       default_mutator_probs = Mutator::DefaultVNNI();
+    } else if (kind == "avx512") {
+      default_sch_rules = ScheduleRule::DefaultAVX512();
+      default_postprocs = Postproc::DefaultVNNI();
+      default_mutator_probs = Mutator::DefaultVNNI();

Review Comment:
   done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] elvin-n commented on a diff in pull request #13621: WIP: [TIR][TOPI][x86][CI] Support skylake avx512

Posted by GitBox <gi...@apache.org>.

elvin-n commented on code in PR #13621:
URL: https://github.com/apache/tvm/pull/13621#discussion_r1057947174


##########
python/tvm/relay/op/strategy/x86.py:
##########
@@ -627,16 +627,16 @@ def batch_matmul_strategy_cpu(attrs, inputs, out_type, target):
     if (
         not attrs.transpose_a
         and attrs.transpose_b
-        and target_has_vnni(mcpu)
+        and target_has_avx512(mcpu)
         and inputs[0].dtype == "uint8"
         and inputs[1].dtype == "int8"
         and inputs[1].shape[-2] % 16 == 0
         and inputs[1].shape[-1] % 4 == 0
     ):
         strategy.add_implementation(
-            wrap_compute_batch_matmul(topi.x86.batch_matmul_vnni_compute, need_out_dtype=True),
-            wrap_topi_schedule(topi.x86.schedule_batch_matmul_vnni),
-            name="batch_matmul_vnni.x86",

Review Comment:
   > different clocking, timing & implementation on ASIC
   
   What kind of ASIC do you mean?
   
   > (auto)tensorization opportunities differ as inner loops match differently
   
   Under `tensorization opportunities differ` do yo mean different number of lanes for different instruction set which can be reflected in potential different blocking size? Or something else?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] cbalint13 commented on a diff in pull request #13621: WIP: [TIR][TOPI][x86][CI] Support skylake avx512

Posted by GitBox <gi...@apache.org>.

cbalint13 commented on code in PR #13621:
URL: https://github.com/apache/tvm/pull/13621#discussion_r1058171563


##########
tests/python/integration/test_auto_tensorize.py:
##########
@@ -278,6 +283,17 @@ def test_vnni_dense():
     )
 
 
+# TODO(vvchernov): need schedule rules and postprocs for avx512

Review Comment:
   * Can reuse/add them here on par with old VNNI one ?
   * Would be great to check auto-tensorizer along with the TOPI template.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] cbalint13 commented on a diff in pull request #13621: WIP: [TIR][TOPI][x86][CI] Support skylake avx512

Posted by GitBox <gi...@apache.org>.

cbalint13 commented on code in PR #13621:
URL: https://github.com/apache/tvm/pull/13621#discussion_r1056822605


##########
python/tvm/tir/tensor_intrin/x86.py:
##########
@@ -67,8 +68,50 @@ def dot_product_16x4_u8i8i32_vnni(
         )
 
 
+mem_scope="global"
+@T.prim_func
+def dot_product_16x4_u8i8i32_avx512(a: T.handle, b: T.handle, c: T.handle,) -> None:
+    A = T.match_buffer(a, (4,), "uint8", offset_factor=1, scope=mem_scope)
+    B = T.match_buffer(b, (16, 4), "int8", offset_factor=1, scope=mem_scope)
+    C = T.match_buffer(c, (16,), "int32", offset_factor=1, scope=mem_scope)

Review Comment:
   [cosmetic] Can it be similar to the ```vnni``` counterpart, as original below ? https://github.com/apache/tvm/blob/209845fb910aac76f1388d868bb4fe4a46d9170f/python/tvm/tir/tensor_intrin/x86.py#L44-L49
   
   Also, scope ```global``` is default anyway.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] junrushao commented on pull request #13621: [TIR][TOPI][x86][CI] Support skylake avx512

Posted by GitBox <gi...@apache.org>.

junrushao commented on PR #13621:
URL: https://github.com/apache/tvm/pull/13621#issuecomment-1381425192

   Happy to take a look tomorrow :-)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] vvchernov commented on a diff in pull request #13621: [TIR][TOPI][x86][CI] Support skylake avx512

Posted by GitBox <gi...@apache.org>.

vvchernov commented on code in PR #13621:
URL: https://github.com/apache/tvm/pull/13621#discussion_r1071800577


##########
src/meta_schedule/space_generator/space_generator.cc:
##########
@@ -73,6 +81,8 @@ void SpaceGeneratorNode::InitializeWithTuneContext(const TuneContext& context) {
     Array<ScheduleRule> default_sch_rules;
     Array<Postproc> default_postprocs;
     Map<Mutator, FloatImm> default_mutator_probs;
+    // TODO(vvchernov): check if need separated ScheduleRule, Postproc, Mutator

Review Comment:
   I've removed this TODO, I think check was done by tests



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] cbalint13 commented on a diff in pull request #13621: WIP: [TIR][TOPI][x86][CI] Support skylake avx512

Posted by GitBox <gi...@apache.org>.

cbalint13 commented on code in PR #13621:
URL: https://github.com/apache/tvm/pull/13621#discussion_r1057198498


##########
python/tvm/relay/op/strategy/x86.py:
##########
@@ -627,16 +627,16 @@ def batch_matmul_strategy_cpu(attrs, inputs, out_type, target):
     if (
         not attrs.transpose_a
         and attrs.transpose_b
-        and target_has_vnni(mcpu)
+        and target_has_avx512(mcpu)
         and inputs[0].dtype == "uint8"
         and inputs[1].dtype == "int8"
         and inputs[1].shape[-2] % 16 == 0
         and inputs[1].shape[-1] % 4 == 0
     ):
         strategy.add_implementation(
-            wrap_compute_batch_matmul(topi.x86.batch_matmul_vnni_compute, need_out_dtype=True),
-            wrap_topi_schedule(topi.x86.schedule_batch_matmul_vnni),
-            name="batch_matmul_vnni.x86",

Review Comment:
   >  know VNNI is a part of AVX512 architectures. The fork is here:
   >https://github.com/apache/tvm/blob/main/python/tvm/topi/x86/tensor_intrin.py#:~:text=def%20dot_16x1x16_uint8_int8_int32()%3A,return%20dot_16x1x16_uint8_int8_int32_skylake()
   > As you correctly remarked avx2 and ssse3 are also processed here, but they are not accessable due to high-level check target_has_avx512.
   
   Can't see original **llvm.x86.vnni** instrinsic one in the above.
   This switch, right in the provided **tensor_intrin.py** fork:
   ```
       if target_has_vnni(mcpu):
           # VNNI capable platform
           return dot_16x1x16_uint8_int8_int32_cascadelake()
       # vpmaddubsw/vpmaddwd fallback
       return dot_16x1x16_uint8_int8_int32_skylake()
   ```
   
   As +SIMD keeps coming, would't be better to stay as upcoming [strategy/x86.py](https://github.com/apache/tvm/blob/dd1eb2498a4e2c867a16d7d9ccea010c396e10ae/python/tvm/relay/op/strategy/x86.py#L594-L621)  ```if/elif``` + preferential ```plevel``` ?
    * The **tensor_intrin.py** would remain only a enumeration of SIMD (no decisions), triages would stay **strategy**, etc.
    * User may control the fall into right strategy by narrowing ```"llvm +mattr={avx512bw,avxvnni,...}"```, as [llvm flags](https://github.com/llvm/llvm-project/blob/main/clang/include/clang/Basic/BuiltinsX86.def).
   
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] elvin-n commented on a diff in pull request #13621: WIP: [TIR][TOPI][x86][CI] Support skylake avx512

Posted by GitBox <gi...@apache.org>.

elvin-n commented on code in PR #13621:
URL: https://github.com/apache/tvm/pull/13621#discussion_r1058100234


##########
python/tvm/relay/op/strategy/x86.py:
##########
@@ -627,16 +627,16 @@ def batch_matmul_strategy_cpu(attrs, inputs, out_type, target):
     if (
         not attrs.transpose_a
         and attrs.transpose_b
-        and target_has_vnni(mcpu)
+        and target_has_avx512(mcpu)
         and inputs[0].dtype == "uint8"
         and inputs[1].dtype == "int8"
         and inputs[1].shape[-2] % 16 == 0
         and inputs[1].shape[-1] % 4 == 0
     ):
         strategy.add_implementation(
-            wrap_compute_batch_matmul(topi.x86.batch_matmul_vnni_compute, need_out_dtype=True),
-            wrap_topi_schedule(topi.x86.schedule_batch_matmul_vnni),
-            name="batch_matmul_vnni.x86",

Review Comment:
   I would consider amx vs vnni avx512 avx2 sse3 (btw, there is no sse2 for int8, required instructions appeared if I am not mistaken in sse3.x) because first is matrix multiplication, other ones are vector instructions. For now I propose to go from local to generic and when we see needs in differentiate vector sets, we will do this. For now pattern look similar for all of vector instructions, the aspect of blocking should be added separately if it is not done yet, The aspect of lanes in TVM intrinsic should be covered in this PR
   
   > match inner loops to these varying sizes.
   
   The inner loop is the same for all these instructions. It will be 
   ```
                   for (int k = 0; k < 4; k++){
                       output[i] += data[k] * kernel[i][k]
                   }
   ```
   
   > TVM is a compiler after all, to my knowledge the only capable of auto-tensorization with arbitrary intrinsic.
   
   I agree, at the same time I propose to move from local to generic patterns. We do not limit anything for now



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] vvchernov commented on a diff in pull request #13621: WIP: [TIR][TOPI][x86][CI] Support skylake avx512

Posted by GitBox <gi...@apache.org>.

vvchernov commented on code in PR #13621:
URL: https://github.com/apache/tvm/pull/13621#discussion_r1057081342


##########
python/tvm/tir/tensor_intrin/x86.py:
##########
@@ -67,8 +68,50 @@ def dot_product_16x4_u8i8i32_vnni(
         )
 
 
+mem_scope="global"
+@T.prim_func
+def dot_product_16x4_u8i8i32_avx512(a: T.handle, b: T.handle, c: T.handle,) -> None:
+    A = T.match_buffer(a, (4,), "uint8", offset_factor=1, scope=mem_scope)
+    B = T.match_buffer(b, (16, 4), "int8", offset_factor=1, scope=mem_scope)
+    C = T.match_buffer(c, (16,), "int32", offset_factor=1, scope=mem_scope)
+    with T.block("root"):
+        T.reads(C[0:16], A[0:4], B[0:16, 0:4])
+        T.writes(C[0:16])
+
+        A_u8x4 = A.vload([0], "uint8x4")
+        A_i32 = T.reinterpret(A_u8x4, dtype="int32")
+        A_brdcst = T.broadcast(A_i32, 16)
+        A_u8x64 = T.reinterpret(A_brdcst, dtype="uint8x64")
+
+        B_i8x64 = B.vload([0, 0], dtype="int8x64")
+
+        Red = T.call_llvm_pure_intrin(
+            T.llvm_lookup_intrinsic_id("llvm.x86.avx512.pmaddubs.w.512"),
+            T.uint32(2),
+            A_u8x64,
+            B_i8x64,
+            dtype="int16x32",
+        )
+
+        One_int16x32 = tvm.tir.const(1, "int16x32")

Review Comment:
   done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] cbalint13 commented on a diff in pull request #13621: WIP: [TIR][TOPI][x86][CI] Support skylake avx512

Posted by GitBox <gi...@apache.org>.

cbalint13 commented on code in PR #13621:
URL: https://github.com/apache/tvm/pull/13621#discussion_r1056870085


##########
python/tvm/relay/op/strategy/x86.py:
##########
@@ -627,16 +627,16 @@ def batch_matmul_strategy_cpu(attrs, inputs, out_type, target):
     if (
         not attrs.transpose_a
         and attrs.transpose_b
-        and target_has_vnni(mcpu)
+        and target_has_avx512(mcpu)
         and inputs[0].dtype == "uint8"
         and inputs[1].dtype == "int8"
         and inputs[1].shape[-2] % 16 == 0
         and inputs[1].shape[-1] % 4 == 0
     ):
         strategy.add_implementation(
-            wrap_compute_batch_matmul(topi.x86.batch_matmul_vnni_compute, need_out_dtype=True),
-            wrap_topi_schedule(topi.x86.schedule_batch_matmul_vnni),
-            name="batch_matmul_vnni.x86",

Review Comment:
   * Don't remove the ```vnni``` one completley
   * Instead, maybe a descend strategy would be better:
            -  if ```has_vnni```
            - elsif ```has_avx512```
            - elsif ```has_avx2``` (in a next PR?)
            - elsif ```has_ssse3``` (in a next PR?)
   * See [strategy/x86.py](https://github.com/apache/tvm/blob/dd1eb2498a4e2c867a16d7d9ccea010c396e10ae/python/tvm/relay/op/strategy/x86.py#L594-L621) in upcoming [PR#13642 ](https://github.com/apache/tvm/pull/13642)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] vvchernov commented on pull request #13621: [TIR][TOPI][x86][CI] Support skylake avx512

Posted by GitBox <gi...@apache.org>.

vvchernov commented on PR #13621:
URL: https://github.com/apache/tvm/pull/13621#issuecomment-1383908313

   Hello @masahi! Could you see this PR?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] masahi merged pull request #13621: [TIR][TOPI][x86][CI] Support skylake avx512

Posted by GitBox <gi...@apache.org>.

masahi merged PR #13621:
URL: https://github.com/apache/tvm/pull/13621


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] cbalint13 commented on a diff in pull request #13621: WIP: [TIR][TOPI][x86][CI] Support skylake avx512

Posted by GitBox <gi...@apache.org>.

cbalint13 commented on code in PR #13621:
URL: https://github.com/apache/tvm/pull/13621#discussion_r1056822933


##########
python/tvm/tir/tensor_intrin/x86.py:
##########
@@ -67,8 +68,50 @@ def dot_product_16x4_u8i8i32_vnni(
         )
 
 
+mem_scope="global"
+@T.prim_func
+def dot_product_16x4_u8i8i32_avx512(a: T.handle, b: T.handle, c: T.handle,) -> None:
+    A = T.match_buffer(a, (4,), "uint8", offset_factor=1, scope=mem_scope)
+    B = T.match_buffer(b, (16, 4), "int8", offset_factor=1, scope=mem_scope)
+    C = T.match_buffer(c, (16,), "int32", offset_factor=1, scope=mem_scope)
+    with T.block("root"):
+        T.reads(C[0:16], A[0:4], B[0:16, 0:4])
+        T.writes(C[0:16])
+
+        A_u8x4 = A.vload([0], "uint8x4")
+        A_i32 = T.reinterpret(A_u8x4, dtype="int32")
+        A_brdcst = T.broadcast(A_i32, 16)
+        A_u8x64 = T.reinterpret(A_brdcst, dtype="uint8x64")
+
+        B_i8x64 = B.vload([0, 0], dtype="int8x64")
+
+        Red = T.call_llvm_pure_intrin(
+            T.llvm_lookup_intrinsic_id("llvm.x86.avx512.pmaddubs.w.512"),
+            T.uint32(2),
+            A_u8x64,
+            B_i8x64,
+            dtype="int16x32",
+        )
+
+        One_int16x32 = tvm.tir.const(1, "int16x32")

Review Comment:
   * ~One_int16x32 =~ T.int16x32(1) ?
   * Would remove ```import tvm``` line from header.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] cbalint13 commented on a diff in pull request #13621: WIP: [TIR][TOPI][x86][CI] Support skylake avx512

Posted by GitBox <gi...@apache.org>.

cbalint13 commented on code in PR #13621:
URL: https://github.com/apache/tvm/pull/13621#discussion_r1056870085


##########
python/tvm/relay/op/strategy/x86.py:
##########
@@ -627,16 +627,16 @@ def batch_matmul_strategy_cpu(attrs, inputs, out_type, target):
     if (
         not attrs.transpose_a
         and attrs.transpose_b
-        and target_has_vnni(mcpu)
+        and target_has_avx512(mcpu)
         and inputs[0].dtype == "uint8"
         and inputs[1].dtype == "int8"
         and inputs[1].shape[-2] % 16 == 0
         and inputs[1].shape[-1] % 4 == 0
     ):
         strategy.add_implementation(
-            wrap_compute_batch_matmul(topi.x86.batch_matmul_vnni_compute, need_out_dtype=True),
-            wrap_topi_schedule(topi.x86.schedule_batch_matmul_vnni),
-            name="batch_matmul_vnni.x86",

Review Comment:
   * Don't remove the ```vnni``` one completley
   * Instead, maybe a descend strategy would be better:
            -  if ```has_vnni```
            - elsif ```has_avx512```
            - elsif ```has_avx2``` (in a next PR?)
            - elsif ```has_ssse3``` (in a next PR?)
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] cbalint13 commented on a diff in pull request #13621: WIP: [TIR][TOPI][x86][CI] Support skylake avx512

Posted by GitBox <gi...@apache.org>.

cbalint13 commented on code in PR #13621:
URL: https://github.com/apache/tvm/pull/13621#discussion_r1057955355


##########
python/tvm/relay/op/strategy/x86.py:
##########
@@ -627,16 +627,16 @@ def batch_matmul_strategy_cpu(attrs, inputs, out_type, target):
     if (
         not attrs.transpose_a
         and attrs.transpose_b
-        and target_has_vnni(mcpu)
+        and target_has_avx512(mcpu)
         and inputs[0].dtype == "uint8"
         and inputs[1].dtype == "int8"
         and inputs[1].shape[-2] % 16 == 0
         and inputs[1].shape[-1] % 4 == 0
     ):
         strategy.add_implementation(
-            wrap_compute_batch_matmul(topi.x86.batch_matmul_vnni_compute, need_out_dtype=True),
-            wrap_topi_schedule(topi.x86.schedule_batch_matmul_vnni),
-            name="batch_matmul_vnni.x86",

Review Comment:
   > > different clocking, timing & implementation on ASIC
   > 
   > What kind of ASIC do you mean?
   
   * CPU, family of x86, different generations, varying extended ISA layouts: amx avx512 vnni avx2 ssse3 sse2
   
   > > (auto)tensorization opportunities differ as inner loops match differently
   > Under `tensorization opportunities differ` do yo mean different number of lanes for different instruction set which can be reflected in potential different blocking size?
   
   * Yes, both input-widths and output-lanes yields different outcomes, varying performances.
   * E.g. autotensorizer will opportunistically search to permute & match inner loops to these varying sizes.
   
   > Or something else?
   
   * TVM is a compiler after all, to my knowledge the only one capable of auto-tensorization with arbitrary intrinsic.
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] vvchernov commented on a diff in pull request #13621: [TIR][TOPI][x86][CI] Support skylake avx512

Posted by GitBox <gi...@apache.org>.

vvchernov commented on code in PR #13621:
URL: https://github.com/apache/tvm/pull/13621#discussion_r1071791346


##########
tests/python/unittest/test_meta_schedule_16x4_integration.py:
##########
@@ -28,6 +28,7 @@
 from tvm.tir.schedule import BlockRV, Schedule

Review Comment:
   done. But I'm not sure that it is clearest name due to cpu includes not only Intel architectures. Nevertheless there is no other similar test to disturb somebody



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] cbalint13 commented on a diff in pull request #13621: WIP: [TIR][TOPI][x86][CI] Support skylake avx512

Posted by GitBox <gi...@apache.org>.

cbalint13 commented on code in PR #13621:
URL: https://github.com/apache/tvm/pull/13621#discussion_r1058171563


##########
tests/python/integration/test_auto_tensorize.py:
##########
@@ -278,6 +283,17 @@ def test_vnni_dense():
     )
 
 
+# TODO(vvchernov): need schedule rules and postprocs for avx512

Review Comment:
   * Can reuse/add them here on par with VNNI one ?
   * Would be great to check auto-tensorizer along with the TOPI template.



##########
tests/python/unittest/test_meta_schedule_trace_apply.py:
##########
@@ -2502,6 +2503,7 @@ def apply_trace(sch):
     verify(Conv2dInt8, apply_trace, Conv2dInt8_target, "cuda", Conv2dInt8_tensorcore_scheduled)
 
 
+# TODO(vvchernov): test int8 conv2d foravx512 without VNNI

Review Comment:
   * Can add the mentioned avx512 test case ?



##########
tests/python/unittest/test_meta_schedule_trace_apply.py:
##########
@@ -1131,6 +1131,7 @@ def main(p0: T.Buffer[(1, 32, 7, 7, 16), "uint8"], p1: T.Buffer[(128, 32, 1, 1,
                 T_cast[ax0, ax1, ax2, ax3, ax4] = T.cast(compute_2[ax0, ax1, ax2, ax3, ax4], "int32")
 
 
+# TODO(vvchernov): construct avx512 reference module (without vnni)

Review Comment:
   * Can add mentioned reference module ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] tvm-bot commented on pull request #13621: WIP: [TIR][TOPI][CI] Support skylake avx512

Posted by GitBox <gi...@apache.org>.

tvm-bot commented on PR #13621:
URL: https://github.com/apache/tvm/pull/13621#issuecomment-1352898295

   <!---bot-comment-->
   
   Thanks for contributing to TVM! Please refer to the contributing guidelines https://tvm.apache.org/docs/contribute/ for useful information and tips. Please request code reviews from [Reviewers](https://github.com/apache/incubator-tvm/blob/master/CONTRIBUTORS.md#reviewers) by @-ing them in a comment.
   
   <!--bot-comment-ccs-start-->
    * cc @Hzfengsy, @Mousius, @areusch, @driazati, @gigiblender, @junrushao, @leandron <sub>See [#10317](https://github.com/apache/tvm/issues/10317) for details</sub><!--bot-comment-ccs-end-->
   
   <sub>Generated by [tvm-bot](https://github.com/apache/tvm/blob/main/ci/README.md#github-actions)</sub>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] cbalint13 commented on a diff in pull request #13621: WIP: [TIR][TOPI][x86][CI] Support skylake avx512

Posted by GitBox <gi...@apache.org>.

cbalint13 commented on code in PR #13621:
URL: https://github.com/apache/tvm/pull/13621#discussion_r1057198498


##########
python/tvm/relay/op/strategy/x86.py:
##########
@@ -627,16 +627,16 @@ def batch_matmul_strategy_cpu(attrs, inputs, out_type, target):
     if (
         not attrs.transpose_a
         and attrs.transpose_b
-        and target_has_vnni(mcpu)
+        and target_has_avx512(mcpu)
         and inputs[0].dtype == "uint8"
         and inputs[1].dtype == "int8"
         and inputs[1].shape[-2] % 16 == 0
         and inputs[1].shape[-1] % 4 == 0
     ):
         strategy.add_implementation(
-            wrap_compute_batch_matmul(topi.x86.batch_matmul_vnni_compute, need_out_dtype=True),
-            wrap_topi_schedule(topi.x86.schedule_batch_matmul_vnni),
-            name="batch_matmul_vnni.x86",

Review Comment:
   >  know VNNI is a part of AVX512 architectures. The fork is here:
   >https://github.com/apache/tvm/blob/main/python/tvm/topi/x86/tensor_intrin.py#:~:text=def%20dot_16x1x16_uint8_int8_int32()%3A,return%20dot_16x1x16_uint8_int8_int32_skylake()
   > As you correctly remarked avx2 and ssse3 are also processed here, but they are not accessable due to high-level check target_has_avx512.
   
   Can't see original **llvm.x86.vnni** instrinsic one in the above.
   This switch, right in the provided **tensor_intrin.py** fork:
   ```
       if target_has_vnni(mcpu):
           # VNNI capable platform
           return dot_16x1x16_uint8_int8_int32_cascadelake()
       # vpmaddubsw/vpmaddwd fallback
       return dot_16x1x16_uint8_int8_int32_skylake()
   ```
   
   As +SIMD keeps coming, would't be better to stay as upcoming [strategy/x86.py](https://github.com/apache/tvm/blob/dd1eb2498a4e2c867a16d7d9ccea010c396e10ae/python/tvm/relay/op/strategy/x86.py#L594-L621)  ```if/elif``` + preferece ```plevel``` ?
    * The **tensor_intrin.py** would remain only as enums of SIMD (no decisions), triages would stay **strategy**, etc.
    * User may control the fall into strategy by narrowing ```"llvm +mattr={avx512bw,avxvnni,...}"```, as [llvm flags](https://github.com/llvm/llvm-project/blob/main/clang/include/clang/Basic/BuiltinsX86.def).
   
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] vvchernov commented on a diff in pull request #13621: WIP: [TIR][TOPI][x86][CI] Support skylake avx512

Posted by GitBox <gi...@apache.org>.

vvchernov commented on code in PR #13621:
URL: https://github.com/apache/tvm/pull/13621#discussion_r1057312493


##########
python/tvm/relay/op/strategy/x86.py:
##########
@@ -627,16 +627,16 @@ def batch_matmul_strategy_cpu(attrs, inputs, out_type, target):
     if (
         not attrs.transpose_a
         and attrs.transpose_b
-        and target_has_vnni(mcpu)
+        and target_has_avx512(mcpu)
         and inputs[0].dtype == "uint8"
         and inputs[1].dtype == "int8"
         and inputs[1].shape[-2] % 16 == 0
         and inputs[1].shape[-1] % 4 == 0
     ):
         strategy.add_implementation(
-            wrap_compute_batch_matmul(topi.x86.batch_matmul_vnni_compute, need_out_dtype=True),
-            wrap_topi_schedule(topi.x86.schedule_batch_matmul_vnni),
-            name="batch_matmul_vnni.x86",

Review Comment:
   Hello @cbalint13! Your view looks reasonable and there is no problems to reimplement it from my side. But I did not implement method `dot_16x1x16_uint8_int8_int32` with conditions on tensor_intrin.py side and thought that it is brick to build some concept. @elvin-n and @jwfromm what do you think about Balint's view?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] cbalint13 commented on a diff in pull request #13621: WIP: [TIR][TOPI][x86][CI] Support skylake avx512

Posted by GitBox <gi...@apache.org>.

cbalint13 commented on code in PR #13621:
URL: https://github.com/apache/tvm/pull/13621#discussion_r1057926988


##########
python/tvm/relay/op/strategy/x86.py:
##########
@@ -627,16 +627,16 @@ def batch_matmul_strategy_cpu(attrs, inputs, out_type, target):
     if (
         not attrs.transpose_a
         and attrs.transpose_b
-        and target_has_vnni(mcpu)
+        and target_has_avx512(mcpu)
         and inputs[0].dtype == "uint8"
         and inputs[1].dtype == "int8"
         and inputs[1].shape[-2] % 16 == 0
         and inputs[1].shape[-1] % 4 == 0
     ):
         strategy.add_implementation(
-            wrap_compute_batch_matmul(topi.x86.batch_matmul_vnni_compute, need_out_dtype=True),
-            wrap_topi_schedule(topi.x86.schedule_batch_matmul_vnni),
-            name="batch_matmul_vnni.x86",

Review Comment:
   > The instruction set for SSE/AVX2/AVX512 for int8 is absolutely the same, the only difference is the number of lanes.
   
      Yes, same class doing integer dot products on immediate registers, but mention:
   
      - *different* clocking, timing & implementation on ASIC
      - (auto)tensorization opportunities differ as inner loops match *differently*
   
   > Additionally, the patterns how these int8 instructions (VPMADDUBSW/VPMADDWD/VPADDD) are used, is the >same as the only VNNI instruction (VPDPBUSD).
   
     Right.
   
      - VNNI insn. accumulates into int32 lanes in single step: vpdpbusd
      - AVX512 (incl. AVX2, SSSE3 ones)  does same in two-step, e.g: pmaddubs + pmadd
   
   > I.e. it is reasonable to have the only tvm intrinsic, it is reasonable to remove VNNI from the name of the function, and it is reasonable to extend these intrinsic function to SSE and AVX2 that is not done yet in this PR
   
      - Indeed the proposed intrinsic merger is perfectly fine.
      - It was possible to question it with reasonable arguments.
   
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] vvchernov commented on a diff in pull request #13621: [TIR][TOPI][x86][CI] Support skylake avx512

Posted by GitBox <gi...@apache.org>.

vvchernov commented on code in PR #13621:
URL: https://github.com/apache/tvm/pull/13621#discussion_r1071829255


##########
src/meta_schedule/schedule_rule/schedule_rule.cc:
##########
@@ -130,6 +130,51 @@ Array<ScheduleRule> ScheduleRule::DefaultVNNI() {
   };
 }
 
+Array<ScheduleRule> ScheduleRule::DefaultAVX512() {

Review Comment:
   done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] vvchernov commented on a diff in pull request #13621: [TIR][TOPI][x86][CI] Support skylake avx512

Posted by GitBox <gi...@apache.org>.

vvchernov commented on code in PR #13621:
URL: https://github.com/apache/tvm/pull/13621#discussion_r1071794140


##########
src/meta_schedule/space_generator/space_generator.cc:
##########
@@ -93,6 +103,10 @@ void SpaceGeneratorNode::InitializeWithTuneContext(const TuneContext& context) {
       default_sch_rules = ScheduleRule::DefaultVNNI();
       default_postprocs = Postproc::DefaultVNNI();
       default_mutator_probs = Mutator::DefaultVNNI();
+    } else if (kind == "avx512") {
+      default_sch_rules = ScheduleRule::DefaultAVX512();
+      default_postprocs = Postproc::DefaultVNNI();

Review Comment:
   done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] masahi commented on a diff in pull request #13621: [TIR][TOPI][x86][CI] Support skylake avx512

Posted by GitBox <gi...@apache.org>.

masahi commented on code in PR #13621:
URL: https://github.com/apache/tvm/pull/13621#discussion_r1071618299


##########
src/meta_schedule/space_generator/space_generator.cc:
##########
@@ -93,6 +103,10 @@ void SpaceGeneratorNode::InitializeWithTuneContext(const TuneContext& context) {
       default_sch_rules = ScheduleRule::DefaultVNNI();
       default_postprocs = Postproc::DefaultVNNI();
       default_mutator_probs = Mutator::DefaultVNNI();
+    } else if (kind == "avx512") {
+      default_sch_rules = ScheduleRule::DefaultAVX512();
+      default_postprocs = Postproc::DefaultVNNI();

Review Comment:
   Rename `DefaultVNNI` to `DefaultCPUTensorization`



##########
src/meta_schedule/schedule_rule/schedule_rule.cc:
##########
@@ -130,6 +130,51 @@ Array<ScheduleRule> ScheduleRule::DefaultVNNI() {
   };
 }
 
+Array<ScheduleRule> ScheduleRule::DefaultAVX512() {

Review Comment:
   Unify `DefaultVNNI` and this function.



##########
tests/python/unittest/test_meta_schedule_trace_apply.py:
##########
@@ -25,6 +25,9 @@
 from tvm.target import Target

Review Comment:
   Please remove change from this file, since VNNI vs AVX512 difference doesn't matter for this unittest.



##########
tests/python/unittest/test_tir_schedule_analysis.py:
##########
@@ -41,6 +41,71 @@
 from tvm.te import create_prim_func

Review Comment:
   Also remove change from this file



##########
src/meta_schedule/space_generator/space_generator.cc:
##########
@@ -93,6 +103,10 @@ void SpaceGeneratorNode::InitializeWithTuneContext(const TuneContext& context) {
       default_sch_rules = ScheduleRule::DefaultVNNI();
       default_postprocs = Postproc::DefaultVNNI();
       default_mutator_probs = Mutator::DefaultVNNI();
+    } else if (kind == "avx512") {
+      default_sch_rules = ScheduleRule::DefaultAVX512();
+      default_postprocs = Postproc::DefaultVNNI();
+      default_mutator_probs = Mutator::DefaultVNNI();

Review Comment:
   Remove `DefaultVNNI` and use `DefaultLLVM` (they are the same) for both VNNI and AVX512 



##########
src/meta_schedule/space_generator/space_generator.cc:
##########
@@ -73,6 +81,8 @@ void SpaceGeneratorNode::InitializeWithTuneContext(const TuneContext& context) {
     Array<ScheduleRule> default_sch_rules;
     Array<Postproc> default_postprocs;
     Map<Mutator, FloatImm> default_mutator_probs;
+    // TODO(vvchernov): check if need separated ScheduleRule, Postproc, Mutator

Review Comment:
   seperated -> dedicated



##########
tests/python/unittest/test_meta_schedule_16x4_integration.py:
##########
@@ -28,6 +28,7 @@
 from tvm.tir.schedule import BlockRV, Schedule

Review Comment:
   Let's rename it to `test_meta_schedule_cpu_dot_product.py`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] cbalint13 commented on a diff in pull request #13621: WIP: [TIR][TOPI][x86][CI] Support skylake avx512

Posted by GitBox <gi...@apache.org>.

cbalint13 commented on code in PR #13621:
URL: https://github.com/apache/tvm/pull/13621#discussion_r1056870085


##########
python/tvm/relay/op/strategy/x86.py:
##########
@@ -627,16 +627,16 @@ def batch_matmul_strategy_cpu(attrs, inputs, out_type, target):
     if (
         not attrs.transpose_a
         and attrs.transpose_b
-        and target_has_vnni(mcpu)
+        and target_has_avx512(mcpu)
         and inputs[0].dtype == "uint8"
         and inputs[1].dtype == "int8"
         and inputs[1].shape[-2] % 16 == 0
         and inputs[1].shape[-1] % 4 == 0
     ):
         strategy.add_implementation(
-            wrap_compute_batch_matmul(topi.x86.batch_matmul_vnni_compute, need_out_dtype=True),
-            wrap_topi_schedule(topi.x86.schedule_batch_matmul_vnni),
-            name="batch_matmul_vnni.x86",

Review Comment:
   * Don't remove the ```vnni``` one completley
   * Instead, maybe a descend strategy would be better:
            -  if ```has_vnni```
            - elsif ```has_avx512```
            - elsif ```has_avx2``` (in a next PR?)
            - elsif ```has_ssse3``` (in a next PR?)
   * See [strategy.py](https://github.com/apache/tvm/blob/dd1eb2498a4e2c867a16d7d9ccea010c396e10ae/python/tvm/relay/op/strategy/x86.py#L594-L621) in upcoming [PR#13642 ](https://github.com/apache/tvm/pull/13642)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] vvchernov commented on pull request #13621: WIP: [TIR][TOPI][x86][CI] Support skylake avx512

Posted by GitBox <gi...@apache.org>.

vvchernov commented on PR #13621:
URL: https://github.com/apache/tvm/pull/13621#issuecomment-1364905860

   > Thank you @vvchernov for this excellent addition on x86 coverage !
   > 
   > * I've suggested few nits in the code, mostly cosmetic.
   > * We could add (next PR?)  4x4 (ssse3/m128), 8x4 (avx2/m256) `skylake` counterparts:
   >   https://github.com/apache/tvm/blob/58f924da5ca61f25ca7236e0ff4e8d78e9de1f4d/python/tvm/topi/x86/tensor_intrin.py#L90-L107
   
   Thanks! I agree and also thought how it can be done in reasonable way. And yes, I think it should be done in separated PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] cbalint13 commented on a diff in pull request #13621: WIP: [TIR][TOPI][x86][CI] Support skylake avx512

Posted by GitBox <gi...@apache.org>.

cbalint13 commented on code in PR #13621:
URL: https://github.com/apache/tvm/pull/13621#discussion_r1056822605


##########
python/tvm/tir/tensor_intrin/x86.py:
##########
@@ -67,8 +68,50 @@ def dot_product_16x4_u8i8i32_vnni(
         )
 
 
+mem_scope="global"
+@T.prim_func
+def dot_product_16x4_u8i8i32_avx512(a: T.handle, b: T.handle, c: T.handle,) -> None:
+    A = T.match_buffer(a, (4,), "uint8", offset_factor=1, scope=mem_scope)
+    B = T.match_buffer(b, (16, 4), "int8", offset_factor=1, scope=mem_scope)
+    C = T.match_buffer(c, (16,), "int32", offset_factor=1, scope=mem_scope)

Review Comment:
   [cosmetic] Can it be similar to the ```vnni``` counterpart, as the original below ? https://github.com/apache/tvm/blob/209845fb910aac76f1388d868bb4fe4a46d9170f/python/tvm/tir/tensor_intrin/x86.py#L44-L49
   
   Also, scope ```global``` is default anyway.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] cbalint13 commented on a diff in pull request #13621: WIP: [TIR][TOPI][x86][CI] Support skylake avx512

Posted by GitBox <gi...@apache.org>.

cbalint13 commented on code in PR #13621:
URL: https://github.com/apache/tvm/pull/13621#discussion_r1056870085


##########
python/tvm/relay/op/strategy/x86.py:
##########
@@ -627,16 +627,16 @@ def batch_matmul_strategy_cpu(attrs, inputs, out_type, target):
     if (
         not attrs.transpose_a
         and attrs.transpose_b
-        and target_has_vnni(mcpu)
+        and target_has_avx512(mcpu)
         and inputs[0].dtype == "uint8"
         and inputs[1].dtype == "int8"
         and inputs[1].shape[-2] % 16 == 0
         and inputs[1].shape[-1] % 4 == 0
     ):
         strategy.add_implementation(
-            wrap_compute_batch_matmul(topi.x86.batch_matmul_vnni_compute, need_out_dtype=True),
-            wrap_topi_schedule(topi.x86.schedule_batch_matmul_vnni),
-            name="batch_matmul_vnni.x86",

Review Comment:
   * Don't remove the ```vnni``` one completley
   * Instead, maybe a descend strategy would be better:
            -  if ```has_vnni``` // llvm -mcpu=cascadelake
            - elif ```has_avx512```  // llvm -mcpu=skylake
            - elif ```has_avx2``` (in a future PR) // llvm -mcpu=haswell
            - elif ```has_ssse3``` (in a future PR) // llvm -mcpu=sandybridge 
   * See [strategy/x86.py](https://github.com/apache/tvm/blob/dd1eb2498a4e2c867a16d7d9ccea010c396e10ae/python/tvm/relay/op/strategy/x86.py#L594-L621), also descending ```plevel```, in the upcoming [PR#13642 ](https://github.com/apache/tvm/pull/13642)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] elvin-n commented on a diff in pull request #13621: WIP: [TIR][TOPI][x86][CI] Support skylake avx512

Posted by GitBox <gi...@apache.org>.

elvin-n commented on code in PR #13621:
URL: https://github.com/apache/tvm/pull/13621#discussion_r1058100234


##########
python/tvm/relay/op/strategy/x86.py:
##########
@@ -627,16 +627,16 @@ def batch_matmul_strategy_cpu(attrs, inputs, out_type, target):
     if (
         not attrs.transpose_a
         and attrs.transpose_b
-        and target_has_vnni(mcpu)
+        and target_has_avx512(mcpu)
         and inputs[0].dtype == "uint8"
         and inputs[1].dtype == "int8"
         and inputs[1].shape[-2] % 16 == 0
         and inputs[1].shape[-1] % 4 == 0
     ):
         strategy.add_implementation(
-            wrap_compute_batch_matmul(topi.x86.batch_matmul_vnni_compute, need_out_dtype=True),
-            wrap_topi_schedule(topi.x86.schedule_batch_matmul_vnni),
-            name="batch_matmul_vnni.x86",

Review Comment:
   I would consider amx vs vnni avx512 avx2 sse3 (btw, there is no sse2 for int8, required instructions appeared if I am not mistaken in sse3.x) because first is matrix multiplication, other ones are vector instructions. For now I propose to go from local to generic and when we see needs in differentiate vector sets, we will do this. For now pattern look similar for all of vector instructions, the aspect of blocking should be added separately if it is not done yet, The aspect of lanes in TVM intrinsic should be covered in this PR
   
   > match inner loops to these varying sizes.
   The inner loop is the same for all these instructions. It will be 
   ```
                   for (int k = 0; k < 4; k++){
                       output[i] += data[k] * kernel[i][k]
                   }
   ```
   
   > TVM is a compiler after all, to my knowledge the only capable of auto-tensorization with arbitrary intrinsic.
   
   again, I propose to move from local to generic patterns. We do not limit anything for now



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] vvchernov commented on a diff in pull request #13621: [TIR][TOPI][x86][CI] Support skylake avx512

Posted by GitBox <gi...@apache.org>.

vvchernov commented on code in PR #13621:
URL: https://github.com/apache/tvm/pull/13621#discussion_r1068445499


##########
tests/python/integration/test_auto_tensorize.py:
##########
@@ -278,6 +283,17 @@ def test_vnni_dense():
     )
 
 
+# TODO(vvchernov): need schedule rules and postprocs for avx512

Review Comment:
   I've constructed tests for avx512 as pair to VNNI



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] cbalint13 commented on a diff in pull request #13621: WIP: [TIR][TOPI][x86][CI] Support skylake avx512

Posted by GitBox <gi...@apache.org>.

cbalint13 commented on code in PR #13621:
URL: https://github.com/apache/tvm/pull/13621#discussion_r1056870085


##########
python/tvm/relay/op/strategy/x86.py:
##########
@@ -627,16 +627,16 @@ def batch_matmul_strategy_cpu(attrs, inputs, out_type, target):
     if (
         not attrs.transpose_a
         and attrs.transpose_b
-        and target_has_vnni(mcpu)
+        and target_has_avx512(mcpu)
         and inputs[0].dtype == "uint8"
         and inputs[1].dtype == "int8"
         and inputs[1].shape[-2] % 16 == 0
         and inputs[1].shape[-1] % 4 == 0
     ):
         strategy.add_implementation(
-            wrap_compute_batch_matmul(topi.x86.batch_matmul_vnni_compute, need_out_dtype=True),
-            wrap_topi_schedule(topi.x86.schedule_batch_matmul_vnni),
-            name="batch_matmul_vnni.x86",

Review Comment:
   * Don't remove the ```vnni``` one completley
   * Instead, maybe a descend strategy would be better:
            -  if ```has_vnni```
            - elsif ```has_avx512```
            - elsif ```has_avx2``` (in a future PR)
            - elsif ```has_ssse3``` (in a future PR)
   * See [strategy/x86.py](https://github.com/apache/tvm/blob/dd1eb2498a4e2c867a16d7d9ccea010c396e10ae/python/tvm/relay/op/strategy/x86.py#L594-L621) in upcoming [PR#13642 ](https://github.com/apache/tvm/pull/13642)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] cbalint13 commented on a diff in pull request #13621: WIP: [TIR][TOPI][x86][CI] Support skylake avx512

Posted by GitBox <gi...@apache.org>.

cbalint13 commented on code in PR #13621:
URL: https://github.com/apache/tvm/pull/13621#discussion_r1057198498


##########
python/tvm/relay/op/strategy/x86.py:
##########
@@ -627,16 +627,16 @@ def batch_matmul_strategy_cpu(attrs, inputs, out_type, target):
     if (
         not attrs.transpose_a
         and attrs.transpose_b
-        and target_has_vnni(mcpu)
+        and target_has_avx512(mcpu)
         and inputs[0].dtype == "uint8"
         and inputs[1].dtype == "int8"
         and inputs[1].shape[-2] % 16 == 0
         and inputs[1].shape[-1] % 4 == 0
     ):
         strategy.add_implementation(
-            wrap_compute_batch_matmul(topi.x86.batch_matmul_vnni_compute, need_out_dtype=True),
-            wrap_topi_schedule(topi.x86.schedule_batch_matmul_vnni),
-            name="batch_matmul_vnni.x86",

Review Comment:
   >  know VNNI is a part of AVX512 architectures. The fork is here:
   >https://github.com/apache/tvm/blob/main/python/tvm/topi/x86/tensor_intrin.py#:~:text=def%20dot_16x1x16_uint8_int8_int32()%3A,return%20dot_16x1x16_uint8_int8_int32_skylake()
   > As you correctly remarked avx2 and ssse3 are also processed here, but they are not accessable due to high-level check target_has_avx512.
   
   Can't see original **llvm.x86.vnni** instrinsic one in the above.
   This switch, right in the provided **tensor_intrin.py** fork:
   ```
       if target_has_vnni(mcpu):
           # VNNI capable platform
           return dot_16x1x16_uint8_int8_int32_cascadelake()
       # vpmaddubsw/vpmaddwd fallback
       return dot_16x1x16_uint8_int8_int32_skylake()
   ```
   
   As +SIMD keeps coming, would't be better to stay as upcoming [strategy/x86.py](https://github.com/apache/tvm/blob/dd1eb2498a4e2c867a16d7d9ccea010c396e10ae/python/tvm/relay/op/strategy/x86.py#L594-L621)  ```if/elif``` + preferential ```plevel``` ?
    * The **tensor_intrin.py** would remain only as enums of SIMD (no decisions), triages would stay **strategy**, etc.
    * User may control the fall into strategy by narrowing ```"llvm +mattr={avx512bw,avxvnni,...}"```, as [llvm flags](https://github.com/llvm/llvm-project/blob/main/clang/include/clang/Basic/BuiltinsX86.def).
   
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] cbalint13 commented on a diff in pull request #13621: WIP: [TIR][TOPI][x86][CI] Support skylake avx512

Posted by GitBox <gi...@apache.org>.

cbalint13 commented on code in PR #13621:
URL: https://github.com/apache/tvm/pull/13621#discussion_r1057322567


##########
python/tvm/relay/op/strategy/x86.py:
##########
@@ -627,16 +627,16 @@ def batch_matmul_strategy_cpu(attrs, inputs, out_type, target):
     if (
         not attrs.transpose_a
         and attrs.transpose_b
-        and target_has_vnni(mcpu)
+        and target_has_avx512(mcpu)
         and inputs[0].dtype == "uint8"
         and inputs[1].dtype == "int8"
         and inputs[1].shape[-2] % 16 == 0
         and inputs[1].shape[-1] % 4 == 0
     ):
         strategy.add_implementation(
-            wrap_compute_batch_matmul(topi.x86.batch_matmul_vnni_compute, need_out_dtype=True),
-            wrap_topi_schedule(topi.x86.schedule_batch_matmul_vnni),
-            name="batch_matmul_vnni.x86",

Review Comment:
   > Hello @cbalint13! Your view looks reasonable and there is no problems to reimplement it from my side. But I did not implement method `dot_16x1x16_uint8_int8_int32` with conditions on tensor_intrin.py side and thought that it is brick to build some concept. @elvin-n and @jwfromm what do you think about Balint's view?
   
   @vvchernov ,
   
   Thanks for clarifications, I see your point, it is perfectly fine way too.
   I think by making all CI tests to pass in green more reviewers will come.
   
   ---
   
   I try sum up, on this very pinned PR change on **strategy/x86.py**, visibile on top of this thread:
   ```
   -from tvm.topi.x86.utils import target_has_vnni
   +from tvm.topi.x86.utils import target_has_avx512
   
   -        and target_has_vnni(mcpu)
   +        and target_has_avx512(mcpu)
   
   -        wrap_compute_dense(topi.x86.dense_vnni),
   -        wrap_topi_schedule(topi.x86.schedule_dense_vnni),
   +        wrap_compute_dense(topi.x86.dense_int8),
   +        wrap_topi_schedule(topi.x86.schedule_dense_int8),
   ```
   * This merge vnni to avx512 (under new **dense_int8** umbrella) arguing that VNNI is subset of AVX512 group.
   * VNNI is subset of AVX512 group, however there are CPU having AVX512 but **no VNNI** [1].
   
   [1] https://en.wikipedia.org/wiki/AVX-512#VNNI
   
   My view was to leave separate avx512 & vnni(as was) in strategy/x86.py (not to merge vnni->avx512)
   My argument was to triage any SIMD right in **strategy/x86.py** as upcoming AMX do [here](https://github.com/apache/tvm/blob/dd1eb2498a4e2c867a16d7d9ccea010c396e10ae/python/tvm/relay/op/strategy/x86.py#L594-L621) + plevel control.
   I saw VNNI and AVX512 as potentialy independend things, moreover choosable via "llvm +mattr=...".
   
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] elvin-n commented on a diff in pull request #13621: WIP: [TIR][TOPI][x86][CI] Support skylake avx512

Posted by GitBox <gi...@apache.org>.

elvin-n commented on code in PR #13621:
URL: https://github.com/apache/tvm/pull/13621#discussion_r1057896792


##########
python/tvm/relay/op/strategy/x86.py:
##########
@@ -627,16 +627,16 @@ def batch_matmul_strategy_cpu(attrs, inputs, out_type, target):
     if (
         not attrs.transpose_a
         and attrs.transpose_b
-        and target_has_vnni(mcpu)
+        and target_has_avx512(mcpu)
         and inputs[0].dtype == "uint8"
         and inputs[1].dtype == "int8"
         and inputs[1].shape[-2] % 16 == 0
         and inputs[1].shape[-1] % 4 == 0
     ):
         strategy.add_implementation(
-            wrap_compute_batch_matmul(topi.x86.batch_matmul_vnni_compute, need_out_dtype=True),
-            wrap_topi_schedule(topi.x86.schedule_batch_matmul_vnni),
-            name="batch_matmul_vnni.x86",

Review Comment:
   The instruction set for SSE/AVX2/AVX512 for int8 is absolutely the same, the only difference is the number of lanes. Additionally, the patterns how these int8 instructions (VPMADDUBSW/VPMADDWD/VPADDD) are used, is the same as the only VNNI instruction (VPDPBUSD). I.e. it is reasonable to have the only tvm intrinsic, it is reasonable to remove VNNI from the name of the function, and it is reasonable to extend these intrinsic function to SSE and AVX2 that is not done yet in this PR



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] cbalint13 commented on a diff in pull request #13621: WIP: [TIR][TOPI][x86][CI] Support skylake avx512

Posted by GitBox <gi...@apache.org>.

cbalint13 commented on code in PR #13621:
URL: https://github.com/apache/tvm/pull/13621#discussion_r1056822933


##########
python/tvm/tir/tensor_intrin/x86.py:
##########
@@ -67,8 +68,50 @@ def dot_product_16x4_u8i8i32_vnni(
         )
 
 
+mem_scope="global"
+@T.prim_func
+def dot_product_16x4_u8i8i32_avx512(a: T.handle, b: T.handle, c: T.handle,) -> None:
+    A = T.match_buffer(a, (4,), "uint8", offset_factor=1, scope=mem_scope)
+    B = T.match_buffer(b, (16, 4), "int8", offset_factor=1, scope=mem_scope)
+    C = T.match_buffer(c, (16,), "int32", offset_factor=1, scope=mem_scope)
+    with T.block("root"):
+        T.reads(C[0:16], A[0:4], B[0:16, 0:4])
+        T.writes(C[0:16])
+
+        A_u8x4 = A.vload([0], "uint8x4")
+        A_i32 = T.reinterpret(A_u8x4, dtype="int32")
+        A_brdcst = T.broadcast(A_i32, 16)
+        A_u8x64 = T.reinterpret(A_brdcst, dtype="uint8x64")
+
+        B_i8x64 = B.vload([0, 0], dtype="int8x64")
+
+        Red = T.call_llvm_pure_intrin(
+            T.llvm_lookup_intrinsic_id("llvm.x86.avx512.pmaddubs.w.512"),
+            T.uint32(2),
+            A_u8x64,
+            B_i8x64,
+            dtype="int16x32",
+        )
+
+        One_int16x32 = tvm.tir.const(1, "int16x32")

Review Comment:
   * ```One_int16x32 = T.int16x32(1)``` ?
   * Would remove ```import tvm``` line.



##########
python/tvm/tir/tensor_intrin/x86.py:
##########
@@ -67,8 +68,50 @@ def dot_product_16x4_u8i8i32_vnni(
         )
 
 
+mem_scope="global"
+@T.prim_func
+def dot_product_16x4_u8i8i32_avx512(a: T.handle, b: T.handle, c: T.handle,) -> None:
+    A = T.match_buffer(a, (4,), "uint8", offset_factor=1, scope=mem_scope)
+    B = T.match_buffer(b, (16, 4), "int8", offset_factor=1, scope=mem_scope)
+    C = T.match_buffer(c, (16,), "int32", offset_factor=1, scope=mem_scope)

Review Comment:
   [cosmetic] Can be similar to the ```vnni``` part, as below ? https://github.com/apache/tvm/blob/209845fb910aac76f1388d868bb4fe4a46d9170f/python/tvm/tir/tensor_intrin/x86.py#L44-L49
   
   Also, scope ```global``` is default anyway.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] vvchernov commented on a diff in pull request #13621: WIP: [TIR][TOPI][x86][CI] Support skylake avx512

Posted by GitBox <gi...@apache.org>.

vvchernov commented on code in PR #13621:
URL: https://github.com/apache/tvm/pull/13621#discussion_r1057084844


##########
python/tvm/relay/op/strategy/x86.py:
##########
@@ -627,16 +627,16 @@ def batch_matmul_strategy_cpu(attrs, inputs, out_type, target):
     if (
         not attrs.transpose_a
         and attrs.transpose_b
-        and target_has_vnni(mcpu)
+        and target_has_avx512(mcpu)
         and inputs[0].dtype == "uint8"
         and inputs[1].dtype == "int8"
         and inputs[1].shape[-2] % 16 == 0
         and inputs[1].shape[-1] % 4 == 0
     ):
         strategy.add_implementation(
-            wrap_compute_batch_matmul(topi.x86.batch_matmul_vnni_compute, need_out_dtype=True),
-            wrap_topi_schedule(topi.x86.schedule_batch_matmul_vnni),
-            name="batch_matmul_vnni.x86",

Review Comment:
   Hello @cbalint13! Thank you for your nits and remarks! In this case VNNI was not removed but extended, as you know VNNI is a part of AVX512 architectures. The fork is here:
   https://github.com/apache/tvm/blob/main/python/tvm/topi/x86/tensor_intrin.py#:~:text=def%20dot_16x1x16_uint8_int8_int32()%3A,return%20dot_16x1x16_uint8_int8_int32_skylake()
   As you correctly remarked avx2 and ssse3 are also processed here, but they are not accessable due to high-level check target_has_avx512. Possibly you suggestion is good way how to resolve it further. Now I only extended existed approach for avx512.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] cbalint13 commented on a diff in pull request #13621: WIP: [TIR][TOPI][x86][CI] Support skylake avx512

Posted by GitBox <gi...@apache.org>.

cbalint13 commented on code in PR #13621:
URL: https://github.com/apache/tvm/pull/13621#discussion_r1057322567


##########
python/tvm/relay/op/strategy/x86.py:
##########
@@ -627,16 +627,16 @@ def batch_matmul_strategy_cpu(attrs, inputs, out_type, target):
     if (
         not attrs.transpose_a
         and attrs.transpose_b
-        and target_has_vnni(mcpu)
+        and target_has_avx512(mcpu)
         and inputs[0].dtype == "uint8"
         and inputs[1].dtype == "int8"
         and inputs[1].shape[-2] % 16 == 0
         and inputs[1].shape[-1] % 4 == 0
     ):
         strategy.add_implementation(
-            wrap_compute_batch_matmul(topi.x86.batch_matmul_vnni_compute, need_out_dtype=True),
-            wrap_topi_schedule(topi.x86.schedule_batch_matmul_vnni),
-            name="batch_matmul_vnni.x86",

Review Comment:
   > Hello @cbalint13! Your view looks reasonable and there is no problems to reimplement it from my side. But I did not implement method `dot_16x1x16_uint8_int8_int32` with conditions on tensor_intrin.py side and thought that it is brick to build some concept. @elvin-n and @jwfromm what do you think about Balint's view?
   
   @vvchernov ,
   
   Thanks for clarifications, I see your point, it is perfectly fine way too.
   I think by making all CI tests to pass in green more reviewers will come.
   
   ---
   
   I try sum up, on this very pinned PR change on **strategy/x86.py**, visibile on top of this thread:
   ```
   -from tvm.topi.x86.utils import target_has_vnni
   +from tvm.topi.x86.utils import target_has_avx512
   
   -        and target_has_vnni(mcpu)
   +        and target_has_avx512(mcpu)
   
   -        wrap_compute_dense(topi.x86.dense_vnni),
   -        wrap_topi_schedule(topi.x86.schedule_dense_vnni),
   +        wrap_compute_dense(topi.x86.dense_int8),
   +        wrap_topi_schedule(topi.x86.schedule_dense_int8),
   ```
   * This merge vnni to avx512 (under new **dense_int8** umbrella) arguing that VNNI is subset of AVX512 group.
   * VNNI is subset of AVX512 group, however there are CPU having AVX512 but **no VNNI** [1].
   
   [1] https://en.wikipedia.org/wiki/AVX-512#VNNI
   
   My view was to leave separate avx512 & vnni(as was) in strategy/x86.py (not to merge vnni->avx512)
   My argument was to triage any SIMD right in **strategy/x86.py** as upcoming AMX do [here](https://github.com/apache/tvm/blob/dd1eb2498a4e2c867a16d7d9ccea010c396e10ae/python/tvm/relay/op/strategy/x86.py#L594-L621) + plevel control.
   I saw VNNI and AVX512 +(AVX2, SSSE3) as potentialy independend things, moreover choosable via "llvm +mattr=...".
   
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] vvchernov commented on pull request #13621: [TIR][TOPI][x86][CI] Support skylake avx512

Posted by GitBox <gi...@apache.org>.

vvchernov commented on PR #13621:
URL: https://github.com/apache/tvm/pull/13621#issuecomment-1381423804

   Hello @areusch, @driazati, @junrushao! Could you see this PR?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] vvchernov commented on pull request #13621: [TIR][TOPI][x86][CI] Support skylake avx512

Posted by GitBox <gi...@apache.org>.

vvchernov commented on PR #13621:
URL: https://github.com/apache/tvm/pull/13621#issuecomment-1385308033

   Hello @masahi! Some words about check of VNNI functionality after changing: 1. Unfortunately I do not have machine with VNNI to check it locally therefore I based on CI test for VNNI, 2. This PR is devoted to support of avx512 which was implemented in paralell to VNNI functionality. Changes touching VNNI are related to renaming or unifying test common code. 3. I plan to open new PR with fixes for VNNI of some small issues observed during this work if they (fixes) are correct.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] cbalint13 commented on a diff in pull request #13621: WIP: [TIR][TOPI][x86][CI] Support skylake avx512

Posted by GitBox <gi...@apache.org>.

cbalint13 commented on code in PR #13621:
URL: https://github.com/apache/tvm/pull/13621#discussion_r1056822605


##########
python/tvm/tir/tensor_intrin/x86.py:
##########
@@ -67,8 +68,50 @@ def dot_product_16x4_u8i8i32_vnni(
         )
 
 
+mem_scope="global"
+@T.prim_func
+def dot_product_16x4_u8i8i32_avx512(a: T.handle, b: T.handle, c: T.handle,) -> None:
+    A = T.match_buffer(a, (4,), "uint8", offset_factor=1, scope=mem_scope)
+    B = T.match_buffer(b, (16, 4), "int8", offset_factor=1, scope=mem_scope)
+    C = T.match_buffer(c, (16,), "int32", offset_factor=1, scope=mem_scope)

Review Comment:
   * Can it be similar to the ```vnni``` counterpart, as the original below ? https://github.com/apache/tvm/blob/209845fb910aac76f1388d868bb4fe4a46d9170f/python/tvm/tir/tensor_intrin/x86.py#L44-L49
   
   * Also, scope ```global``` is default anyway.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] cbalint13 commented on a diff in pull request #13621: WIP: [TIR][TOPI][x86][CI] Support skylake avx512

Posted by GitBox <gi...@apache.org>.

cbalint13 commented on code in PR #13621:
URL: https://github.com/apache/tvm/pull/13621#discussion_r1056822605


##########
python/tvm/tir/tensor_intrin/x86.py:
##########
@@ -67,8 +68,50 @@ def dot_product_16x4_u8i8i32_vnni(
         )
 
 
+mem_scope="global"
+@T.prim_func
+def dot_product_16x4_u8i8i32_avx512(a: T.handle, b: T.handle, c: T.handle,) -> None:
+    A = T.match_buffer(a, (4,), "uint8", offset_factor=1, scope=mem_scope)
+    B = T.match_buffer(b, (16, 4), "int8", offset_factor=1, scope=mem_scope)
+    C = T.match_buffer(c, (16,), "int32", offset_factor=1, scope=mem_scope)

Review Comment:
   Can it be similar to the ```vnni``` counterpart, as the original below ? https://github.com/apache/tvm/blob/209845fb910aac76f1388d868bb4fe4a46d9170f/python/tvm/tir/tensor_intrin/x86.py#L44-L49
   
   Also, scope ```global``` is default anyway.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] cbalint13 commented on a diff in pull request #13621: WIP: [TIR][TOPI][x86][CI] Support skylake avx512

Posted by GitBox <gi...@apache.org>.

cbalint13 commented on code in PR #13621:
URL: https://github.com/apache/tvm/pull/13621#discussion_r1056822933


##########
python/tvm/tir/tensor_intrin/x86.py:
##########
@@ -67,8 +68,50 @@ def dot_product_16x4_u8i8i32_vnni(
         )
 
 
+mem_scope="global"
+@T.prim_func
+def dot_product_16x4_u8i8i32_avx512(a: T.handle, b: T.handle, c: T.handle,) -> None:
+    A = T.match_buffer(a, (4,), "uint8", offset_factor=1, scope=mem_scope)
+    B = T.match_buffer(b, (16, 4), "int8", offset_factor=1, scope=mem_scope)
+    C = T.match_buffer(c, (16,), "int32", offset_factor=1, scope=mem_scope)
+    with T.block("root"):
+        T.reads(C[0:16], A[0:4], B[0:16, 0:4])
+        T.writes(C[0:16])
+
+        A_u8x4 = A.vload([0], "uint8x4")
+        A_i32 = T.reinterpret(A_u8x4, dtype="int32")
+        A_brdcst = T.broadcast(A_i32, 16)
+        A_u8x64 = T.reinterpret(A_brdcst, dtype="uint8x64")
+
+        B_i8x64 = B.vload([0, 0], dtype="int8x64")
+
+        Red = T.call_llvm_pure_intrin(
+            T.llvm_lookup_intrinsic_id("llvm.x86.avx512.pmaddubs.w.512"),
+            T.uint32(2),
+            A_u8x64,
+            B_i8x64,
+            dtype="int16x32",
+        )
+
+        One_int16x32 = tvm.tir.const(1, "int16x32")

Review Comment:
   * ```One_int16x32 = T.int16x32(1)``` ?
   * Would remove ```import tvm``` line from header.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] cbalint13 commented on a diff in pull request #13621: WIP: [TIR][TOPI][x86][CI] Support skylake avx512

Posted by GitBox <gi...@apache.org>.

cbalint13 commented on code in PR #13621:
URL: https://github.com/apache/tvm/pull/13621#discussion_r1056870085


##########
python/tvm/relay/op/strategy/x86.py:
##########
@@ -627,16 +627,16 @@ def batch_matmul_strategy_cpu(attrs, inputs, out_type, target):
     if (
         not attrs.transpose_a
         and attrs.transpose_b
-        and target_has_vnni(mcpu)
+        and target_has_avx512(mcpu)
         and inputs[0].dtype == "uint8"
         and inputs[1].dtype == "int8"
         and inputs[1].shape[-2] % 16 == 0
         and inputs[1].shape[-1] % 4 == 0
     ):
         strategy.add_implementation(
-            wrap_compute_batch_matmul(topi.x86.batch_matmul_vnni_compute, need_out_dtype=True),
-            wrap_topi_schedule(topi.x86.schedule_batch_matmul_vnni),
-            name="batch_matmul_vnni.x86",

Review Comment:
   * Don't remove the ```vnni``` one completley
   * Instead, maybe a descend strategy would be better:
            -  if ```has_vnni```
            - elsif ```has_avx512```
            - elsif ```has_avx2``` (in a future PR)
            - elsif ```has_ssse3``` (in a future PR)
   * See [strategy/x86.py](https://github.com/apache/tvm/blob/dd1eb2498a4e2c867a16d7d9ccea010c396e10ae/python/tvm/relay/op/strategy/x86.py#L594-L621), also descending ```plevel```, in the upcoming [PR#13642 ](https://github.com/apache/tvm/pull/13642)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] cbalint13 commented on a diff in pull request #13621: WIP: [TIR][TOPI][x86][CI] Support skylake avx512

Posted by GitBox <gi...@apache.org>.

cbalint13 commented on code in PR #13621:
URL: https://github.com/apache/tvm/pull/13621#discussion_r1056870085


##########
python/tvm/relay/op/strategy/x86.py:
##########
@@ -627,16 +627,16 @@ def batch_matmul_strategy_cpu(attrs, inputs, out_type, target):
     if (
         not attrs.transpose_a
         and attrs.transpose_b
-        and target_has_vnni(mcpu)
+        and target_has_avx512(mcpu)
         and inputs[0].dtype == "uint8"
         and inputs[1].dtype == "int8"
         and inputs[1].shape[-2] % 16 == 0
         and inputs[1].shape[-1] % 4 == 0
     ):
         strategy.add_implementation(
-            wrap_compute_batch_matmul(topi.x86.batch_matmul_vnni_compute, need_out_dtype=True),
-            wrap_topi_schedule(topi.x86.schedule_batch_matmul_vnni),
-            name="batch_matmul_vnni.x86",

Review Comment:
   * Don't remove the ```vnni``` one completley
   * Instead, maybe a descend strategy would be better:
            -  if ```has_vnni```
            - elif ```has_avx512```
            - elif ```has_avx2``` (in a future PR)
            - elif ```has_ssse3``` (in a future PR)
   * See [strategy/x86.py](https://github.com/apache/tvm/blob/dd1eb2498a4e2c867a16d7d9ccea010c396e10ae/python/tvm/relay/op/strategy/x86.py#L594-L621), also descending ```plevel```, in the upcoming [PR#13642 ](https://github.com/apache/tvm/pull/13642)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] cbalint13 commented on a diff in pull request #13621: WIP: [TIR][TOPI][x86][CI] Support skylake avx512

Posted by GitBox <gi...@apache.org>.

cbalint13 commented on code in PR #13621:
URL: https://github.com/apache/tvm/pull/13621#discussion_r1056870085


##########
python/tvm/relay/op/strategy/x86.py:
##########
@@ -627,16 +627,16 @@ def batch_matmul_strategy_cpu(attrs, inputs, out_type, target):
     if (
         not attrs.transpose_a
         and attrs.transpose_b
-        and target_has_vnni(mcpu)
+        and target_has_avx512(mcpu)
         and inputs[0].dtype == "uint8"
         and inputs[1].dtype == "int8"
         and inputs[1].shape[-2] % 16 == 0
         and inputs[1].shape[-1] % 4 == 0
     ):
         strategy.add_implementation(
-            wrap_compute_batch_matmul(topi.x86.batch_matmul_vnni_compute, need_out_dtype=True),
-            wrap_topi_schedule(topi.x86.schedule_batch_matmul_vnni),
-            name="batch_matmul_vnni.x86",

Review Comment:
   * Don't remove the ```vnni``` one completley
   * Instead, maybe a descend strategy would be better:
            -  if ```has_vnni```
            - elsif ```has_avx512```
            - elsif ```has_avx2``` (in a future PR)
            - elsif ```has_ssse3``` (in a future PR)
   * See [strategy/x86.py](https://github.com/apache/tvm/blob/dd1eb2498a4e2c867a16d7d9ccea010c396e10ae/python/tvm/relay/op/strategy/x86.py#L594-L621) in the upcoming [PR#13642 ](https://github.com/apache/tvm/pull/13642)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] vvchernov commented on a diff in pull request #13621: WIP: [TIR][TOPI][x86][CI] Support skylake avx512

Posted by GitBox <gi...@apache.org>.

vvchernov commented on code in PR #13621:
URL: https://github.com/apache/tvm/pull/13621#discussion_r1057075879


##########
python/tvm/tir/tensor_intrin/x86.py:
##########
@@ -67,8 +68,50 @@ def dot_product_16x4_u8i8i32_vnni(
         )
 
 
+mem_scope="global"
+@T.prim_func
+def dot_product_16x4_u8i8i32_avx512(a: T.handle, b: T.handle, c: T.handle,) -> None:
+    A = T.match_buffer(a, (4,), "uint8", offset_factor=1, scope=mem_scope)
+    B = T.match_buffer(b, (16, 4), "int8", offset_factor=1, scope=mem_scope)
+    C = T.match_buffer(c, (16,), "int32", offset_factor=1, scope=mem_scope)

Review Comment:
   Ok. I had some problems in this part of code and it was try to resolve them (I used hexagon example). But now the problems were resolved and it can be returned back. I believe it will be ok.



##########
python/tvm/tir/tensor_intrin/x86.py:
##########
@@ -67,8 +68,50 @@ def dot_product_16x4_u8i8i32_vnni(
         )
 
 
+mem_scope="global"
+@T.prim_func
+def dot_product_16x4_u8i8i32_avx512(a: T.handle, b: T.handle, c: T.handle,) -> None:
+    A = T.match_buffer(a, (4,), "uint8", offset_factor=1, scope=mem_scope)
+    B = T.match_buffer(b, (16, 4), "int8", offset_factor=1, scope=mem_scope)
+    C = T.match_buffer(c, (16,), "int32", offset_factor=1, scope=mem_scope)

Review Comment:
   done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] cbalint13 commented on a diff in pull request #13621: WIP: [TIR][TOPI][x86][CI] Support skylake avx512

Posted by GitBox <gi...@apache.org>.

cbalint13 commented on code in PR #13621:
URL: https://github.com/apache/tvm/pull/13621#discussion_r1057955355


##########
python/tvm/relay/op/strategy/x86.py:
##########
@@ -627,16 +627,16 @@ def batch_matmul_strategy_cpu(attrs, inputs, out_type, target):
     if (
         not attrs.transpose_a
         and attrs.transpose_b
-        and target_has_vnni(mcpu)
+        and target_has_avx512(mcpu)
         and inputs[0].dtype == "uint8"
         and inputs[1].dtype == "int8"
         and inputs[1].shape[-2] % 16 == 0
         and inputs[1].shape[-1] % 4 == 0
     ):
         strategy.add_implementation(
-            wrap_compute_batch_matmul(topi.x86.batch_matmul_vnni_compute, need_out_dtype=True),
-            wrap_topi_schedule(topi.x86.schedule_batch_matmul_vnni),
-            name="batch_matmul_vnni.x86",

Review Comment:
   > > different clocking, timing & implementation on ASIC
   > 
   > What kind of ASIC do you mean?
   
   * CPU, family of x86, different generations, varying extended ISA layouts: amx avx512 vnni avx2 ssse3 sse2
   
   > > (auto)tensorization opportunities differ as inner loops match differently
   > Under `tensorization opportunities differ` do yo mean different number of lanes for different instruction set which can be reflected in potential different blocking size?
   
   * Yes, both input-widths and output-lanes yields different outcomes, varying performances.
   * E.g. autotensorizer will opportunistically search to permute & match inner loops to these varying sizes.
   
   > Or something else?
   
   * TVM is a compiler after all, to my knowledge the only capable of auto-tensorization with arbitrary intrinsic.
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] vvchernov commented on a diff in pull request #13621: [TIR][TOPI][x86][CI] Support skylake avx512

Posted by GitBox <gi...@apache.org>.

vvchernov commented on code in PR #13621:
URL: https://github.com/apache/tvm/pull/13621#discussion_r1071850972


##########
tests/python/unittest/test_tir_schedule_analysis.py:
##########
@@ -41,6 +41,71 @@
 from tvm.te import create_prim_func

Review Comment:
   There is no big changes, I tried to unify tests using the same classes, but my try failed and I return it back (as fact it was replaced inside the file). I've rollbacked the trasferred code. Just now there is pylint fix and renaming for the sake of clarity (not only VNNI is checked)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org