You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@tvm.apache.org by GitBox <gi...@apache.org> on 2021/09/01 11:08:14 UTC

[GitHub] [tvm] elvin-n opened a new pull request #8897: Add sse4/avx2 support for fast x86 int8 (vpmaddubsw/vpmaddwd/vpaddd)

elvin-n opened a new pull request #8897:
URL: https://github.com/apache/tvm/pull/8897


   - Extend the list of different target for x86 topi
   - Extend tests for conv2d x86 int8 for fast i8 x86 platforms
   
   this change in theory can give up to 2x speedup on int8 models vs fp32 models, currently slightly less
   
   <html><body>
   <!--StartFragment--><google-sheets-html-origin><style type="text/css"><!--td {border: 1px solid #ccc;}br {mso-data-placement:same-cell;}--></style>
   
     | Core i7-1185G7 sse4 | Core i7-1185G7 avx2 | Core i7-1185G7 avx512 | Core i7-1185G7 VNNI | Core i7-8700B | Core i5-9400T
   -- | -- | -- | -- | -- | -- | --
     | FPS | FPS | FPS | FPS | FPS | FPS
   TVM FP32 |   | 53 | 53 | 53 | 54 | 48
   TVM int32 |   | 12 |   |   | 16 |  
   TVM int8 default | 34 | 61 | 92 | 142 | 78 | 62
   TVM int8 atvm |   | 70 |   | 134 | 95 | 79
   
   <!--EndFragment-->
   </body>
   </html>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] elvin-n edited a comment on pull request #8897: Add sse4/avx2 support for fast x86 int8 (vpmaddubsw/vpmaddwd/vpaddd)

Posted by GitBox <gi...@apache.org>.

elvin-n edited a comment on pull request #8897:
URL: https://github.com/apache/tvm/pull/8897#issuecomment-912435557


   The change in get_fp32_len affected ARM flow - now it started to block by 4 instead previous default 8. It must not affect from performance point of view since NEON SIMD vector size is 64 or 128 bit, but will affect the knowledge database of tuned kernels.
   
   Will verify the performance aspect on ARM. As for backward compatibility - still open question. So far I have an impression that we do not care about it so much.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] elvin-n commented on pull request #8897: Add sse4/avx2 support for fast x86 int8 (vpmaddubsw/vpmaddwd/vpaddd)

Posted by GitBox <gi...@apache.org>.

elvin-n commented on pull request #8897:
URL: https://github.com/apache/tvm/pull/8897#issuecomment-912588508


   I verified ARM flow and confirm that it started to use 4 channel values instead of 8 for blocking and this fact did not affect performance  anyhow (as i expected)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] jcf94 commented on a change in pull request #8897: Add sse4/avx2 support for fast x86 int8 (vpmaddubsw/vpmaddwd/vpaddd)

Posted by GitBox <gi...@apache.org>.

jcf94 commented on a change in pull request #8897:
URL: https://github.com/apache/tvm/pull/8897#discussion_r701523344



##########
File path: tests/python/relay/test_op_level2.py
##########
@@ -1687,7 +1692,7 @@ def _has_fast_int8_instructions(asm, target):
             dtypes=fast_int8_dtypes,
         )
         # Check that vector int mult and add instructions are generated.
-        assert "vpmulld" in asm and "vpadd" in asm
+        assert "pmulhw" in asm and "paddd" in asm

Review comment:
       I'm not so familiar the specific instructions, does `pmulhw` and `paddd` still be vectorize instructions in this test?

##########
File path: python/tvm/topi/x86/utils.py
##########
@@ -18,9 +18,95 @@
 import tvm
 
 
-def get_fp32_len():
+def target_has_sse42(target):

Review comment:
       Just curious about why it's named `sse42` ... Is it for `sse4 & avx2` like the pr title, or minor version like `sse 4.2`?
   
   And an unimportant suggestion which you can ignore is to merge all of these functions below to something like `target_has_attr(target, attr)` and list the candidates of attr in the doc string.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] elvin-n commented on pull request #8897: Add sse4/avx2 support for fast x86 int8 (vpmaddubsw/vpmaddwd/vpaddd)

Posted by GitBox <gi...@apache.org>.

elvin-n commented on pull request #8897:
URL: https://github.com/apache/tvm/pull/8897#issuecomment-912435557


   The change in get_fp32_len affected ARM flow - now it started to block by 4 instead previous default 8. It must not affect from performance point of view since NEON SIMD vector size if 64 or 128 bit, but will affect the knowledge database of tuned kernels.
   
   Will verify the performance aspect on ARM. As for backward compatibility - still open question. So far I have an impression that we do not care about it so much.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] elvin-n commented on a change in pull request #8897: Add sse4/avx2 support for fast x86 int8 (vpmaddubsw/vpmaddwd/vpaddd)

Posted by GitBox <gi...@apache.org>.

elvin-n commented on a change in pull request #8897:
URL: https://github.com/apache/tvm/pull/8897#discussion_r701616017



##########
File path: tests/python/relay/test_op_level2.py
##########
@@ -1687,7 +1692,7 @@ def _has_fast_int8_instructions(asm, target):
             dtypes=fast_int8_dtypes,
         )
         # Check that vector int mult and add instructions are generated.
-        assert "vpmulld" in asm and "vpadd" in asm
+        assert "pmulhw" in asm and "paddd" in asm

Review comment:
       They are vector instructions from the middle of 90'th :) It is MMX working on 64 bits




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] elvin-n commented on a change in pull request #8897: Add sse4/avx2 support for fast x86 int8 (vpmaddubsw/vpmaddwd/vpaddd)

Posted by GitBox <gi...@apache.org>.

elvin-n commented on a change in pull request #8897:
URL: https://github.com/apache/tvm/pull/8897#discussion_r701616017



##########
File path: tests/python/relay/test_op_level2.py
##########
@@ -1687,7 +1692,7 @@ def _has_fast_int8_instructions(asm, target):
             dtypes=fast_int8_dtypes,
         )
         # Check that vector int mult and add instructions are generated.
-        assert "vpmulld" in asm and "vpadd" in asm
+        assert "pmulhw" in asm and "paddd" in asm

Review comment:
       It is vector instruction from the middle of 90'th :) It is MMX working on 64 bits




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] elvin-n commented on a change in pull request #8897: Add sse4/avx2 support for fast x86 int8 (vpmaddubsw/vpmaddwd/vpaddd)

Posted by GitBox <gi...@apache.org>.

elvin-n commented on a change in pull request #8897:
URL: https://github.com/apache/tvm/pull/8897#discussion_r701614252



##########
File path: python/tvm/topi/x86/utils.py
##########
@@ -18,9 +18,95 @@
 import tvm
 
 
-def get_fp32_len():
+def target_has_sse42(target):

Review comment:
       sse4.2 is a latest standard of SSE type of instructions and supported in more processors than sse4/sse3 or sse2. It continues to be used in the latest Intel edge devices for IoT or low end segments (Atom based). If we come to the requirement distinguish more precisely, we probably will have to redesign this part.
   
   as for suggestion to introduce the only function instead several ones - I consider this as valuable comment and in the the future if we want to check more features this should done. Not sure that it make sense to do in this PR




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] masahi merged pull request #8897: Add sse4/avx2 support for fast x86 int8 (vpmaddubsw/vpmaddwd/vpaddd)

Posted by GitBox <gi...@apache.org>.

masahi merged pull request #8897:
URL: https://github.com/apache/tvm/pull/8897


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org