You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@doris.apache.org by GitBox <gi...@apache.org> on 2022/06/28 10:37:54 UTC

[GitHub] [doris] englefly opened a new pull request, #10472: [Enhancement] a better vec version for count_zero_num

englefly opened a new pull request, #10472:
URL: https://github.com/apache/doris/pull/10472

   # Proposed changes
   a better version of count_zero_num
   
   in test, we count zero number from  600,000,000 int8_t vector.
   former version: 138ms
   new version: 30ms
   
   Issue Number: close #xxx
   
   ## Problem Summary:
   
   Describe the overview of changes.
   
   ## Checklist(Required)
   
   1. Does it affect the original behavior: (Yes/No/I Don't know)
   2. Has unit tests been added: (Yes/No/No Need)
   3. Has document been added or modified: (Yes/No/No Need)
   4. Does it need to update dependencies: (Yes/No)
   5. Are there any changes that cannot be rolled back: (Yes/No)
   
   ## Further comments
   
   If this is a relatively large or complex change, kick off the discussion at [dev@doris.apache.org](mailto:dev@doris.apache.org) by explaining why you chose the solution you did and what alternatives you considered, etc...
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] BiteTheDDDDt commented on a diff in pull request #10472: [Enhancement] a better vec version for count_zero_num

Posted by GitBox <gi...@apache.org>.
BiteTheDDDDt commented on code in PR #10472:
URL: https://github.com/apache/doris/pull/10472#discussion_r909126908


##########
be/src/util/simd/bits.h:
##########
@@ -59,10 +59,28 @@ inline uint32_t bytes32_mask_to_bits32_mask(const bool* data) {
     return bytes32_mask_to_bits32_mask(reinterpret_cast<const uint8_t*>(data));
 }
 
-// compiler will make this SIMD automatically
 inline size_t count_zero_num(const int8_t* __restrict data, size_t size) {
     size_t num = 0;
     const int8_t* end = data + size;
+#if defined(__SSE2__) && defined(__POPCNT__)
+    const __m128i zero16 = _mm_setzero_si128();
+    const int8_t* end64 = data + (size / 64 * 64);
+
+    for (; data < end64; data += 64) {
+        num += __builtin_popcountll(
+                static_cast<uint64_t>(_mm_movemask_epi8(_mm_cmpeq_epi8(

Review Comment:
   Sorry, I find another way, we can shift bit directly.
   But there doesn't seem to be a significant difference in performance either.
   
   This should get the correct number of `1`:
   ```cpp
   __builtin_popcountll(
                   static_cast<uint64_t>(
                           _mm_movemask_epi8(*reinterpret_cast<const __m128i*>(data) << 7)) |
                   (static_cast<uint64_t>(
                            _mm_movemask_epi8(*reinterpret_cast<const __m128i*>(data + 16) << 7))
                    << 16) |
                   (static_cast<uint64_t>(
                            _mm_movemask_epi8(*reinterpret_cast<const __m128i*>(data + 32) << 7))
                    << 32) |
                   (static_cast<uint64_t>(
                            _mm_movemask_epi8(*reinterpret_cast<const __m128i*>(data + 48) << 7))
                    << 48));
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] englefly commented on a diff in pull request #10472: [Enhancement] a better vec version for count_zero_num

Posted by GitBox <gi...@apache.org>.
englefly commented on code in PR #10472:
URL: https://github.com/apache/doris/pull/10472#discussion_r909111109


##########
be/src/util/simd/bits.h:
##########
@@ -59,10 +59,28 @@ inline uint32_t bytes32_mask_to_bits32_mask(const bool* data) {
     return bytes32_mask_to_bits32_mask(reinterpret_cast<const uint8_t*>(data));
 }
 
-// compiler will make this SIMD automatically
 inline size_t count_zero_num(const int8_t* __restrict data, size_t size) {
     size_t num = 0;
     const int8_t* end = data + size;
+#if defined(__SSE2__) && defined(__POPCNT__)
+    const __m128i zero16 = _mm_setzero_si128();
+    const int8_t* end64 = data + (size / 64 * 64);
+
+    for (; data < end64; data += 64) {
+        num += __builtin_popcountll(
+                static_cast<uint64_t>(_mm_movemask_epi8(_mm_cmpeq_epi8(

Review Comment:
   According to my test, it gives wrong answer.
   I checked the intel docs for `_mm_slli_epi16`. Maybe the above codes need more intrinsics work together.
   
   Here is the spec for `_mm_slli_epi16`:
   FOR j := 0 to 7
   	i := j*16
   	IF imm8[7:0] > 15
   		dst[i+15:i] := 0
   	ELSE
   		dst[i+15:i] := ZeroExtend16(a[i+15:i] << imm8[7:0])
   	FI
   ENDFOR
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] BiteTheDDDDt commented on a diff in pull request #10472: [Enhancement] a better vec version for count_zero_num

Posted by GitBox <gi...@apache.org>.
BiteTheDDDDt commented on code in PR #10472:
URL: https://github.com/apache/doris/pull/10472#discussion_r908491767


##########
be/src/util/simd/bits.h:
##########
@@ -59,10 +59,28 @@ inline uint32_t bytes32_mask_to_bits32_mask(const bool* data) {
     return bytes32_mask_to_bits32_mask(reinterpret_cast<const uint8_t*>(data));
 }
 
-// compiler will make this SIMD automatically
 inline size_t count_zero_num(const int8_t* __restrict data, size_t size) {
     size_t num = 0;
     const int8_t* end = data + size;
+#if defined(__SSE2__) && defined(__POPCNT__)
+    const __m128i zero16 = _mm_setzero_si128();
+    const int8_t* end64 = data + (size / 64 * 64);
+
+    for (; data < end64; data += 64) {
+        num += __builtin_popcountll(
+                static_cast<uint64_t>(_mm_movemask_epi8(_mm_cmpeq_epi8(

Review Comment:
   Maybe we can set num=size at start, and subtract the number of `1`?
   In this way we can avoid using `_mm_cmpeq_epi8`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] Gabriel39 commented on a diff in pull request #10472: [Enhancement] a better vec version for count_zero_num

Posted by GitBox <gi...@apache.org>.
Gabriel39 commented on code in PR #10472:
URL: https://github.com/apache/doris/pull/10472#discussion_r908335273


##########
be/src/util/simd/bits.h:
##########
@@ -59,10 +59,28 @@ inline uint32_t bytes32_mask_to_bits32_mask(const bool* data) {
     return bytes32_mask_to_bits32_mask(reinterpret_cast<const uint8_t*>(data));
 }
 
-// compiler will make this SIMD automatically
 inline size_t count_zero_num(const int8_t* __restrict data, size_t size) {
     size_t num = 0;
     const int8_t* end = data + size;
+

Review Comment:
   #if defined(__SSE2__)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] englefly commented on a diff in pull request #10472: [Enhancement] a better vec version for count_zero_num

Posted by GitBox <gi...@apache.org>.
englefly commented on code in PR #10472:
URL: https://github.com/apache/doris/pull/10472#discussion_r909111109


##########
be/src/util/simd/bits.h:
##########
@@ -59,10 +59,28 @@ inline uint32_t bytes32_mask_to_bits32_mask(const bool* data) {
     return bytes32_mask_to_bits32_mask(reinterpret_cast<const uint8_t*>(data));
 }
 
-// compiler will make this SIMD automatically
 inline size_t count_zero_num(const int8_t* __restrict data, size_t size) {
     size_t num = 0;
     const int8_t* end = data + size;
+#if defined(__SSE2__) && defined(__POPCNT__)
+    const __m128i zero16 = _mm_setzero_si128();
+    const int8_t* end64 = data + (size / 64 * 64);
+
+    for (; data < end64; data += 64) {
+        num += __builtin_popcountll(
+                static_cast<uint64_t>(_mm_movemask_epi8(_mm_cmpeq_epi8(

Review Comment:
   According to my test, it gives wrong answer.
   I checked the intel docs for `_mm_slli_epi16`. Maybe the above codes need more intrinsics work together.
   
   Here is the spec for `_mm_slli_epi16`:
   ```
   FOR j := 0 to 7
   	i := j*16
   	IF imm8[7:0] > 15
   		dst[i+15:i] := 0
   	ELSE
   		dst[i+15:i] := ZeroExtend16(a[i+15:i] << imm8[7:0])
   	FI
   ENDFOR
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] englefly commented on a diff in pull request #10472: [Enhancement] a better vec version for count_zero_num

Posted by GitBox <gi...@apache.org>.
englefly commented on code in PR #10472:
URL: https://github.com/apache/doris/pull/10472#discussion_r908567844


##########
be/src/util/simd/bits.h:
##########
@@ -59,10 +59,28 @@ inline uint32_t bytes32_mask_to_bits32_mask(const bool* data) {
     return bytes32_mask_to_bits32_mask(reinterpret_cast<const uint8_t*>(data));
 }
 
-// compiler will make this SIMD automatically
 inline size_t count_zero_num(const int8_t* __restrict data, size_t size) {
     size_t num = 0;
     const int8_t* end = data + size;
+#if defined(__SSE2__) && defined(__POPCNT__)
+    const __m128i zero16 = _mm_setzero_si128();
+    const int8_t* end64 = data + (size / 64 * 64);
+
+    for (; data < end64; data += 64) {
+        num += __builtin_popcountll(
+                static_cast<uint64_t>(_mm_movemask_epi8(_mm_cmpeq_epi8(

Review Comment:
   could you explain more? without _mm_cmpeq_epi8, how to figure out if the data is zero or not?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] englefly commented on a diff in pull request #10472: [Enhancement] a better vec version for count_zero_num

Posted by GitBox <gi...@apache.org>.
englefly commented on code in PR #10472:
URL: https://github.com/apache/doris/pull/10472#discussion_r908397829


##########
be/src/util/simd/bits.h:
##########
@@ -59,10 +59,28 @@ inline uint32_t bytes32_mask_to_bits32_mask(const bool* data) {
     return bytes32_mask_to_bits32_mask(reinterpret_cast<const uint8_t*>(data));
 }
 
-// compiler will make this SIMD automatically
 inline size_t count_zero_num(const int8_t* __restrict data, size_t size) {
     size_t num = 0;
     const int8_t* end = data + size;
+

Review Comment:
   done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] BiteTheDDDDt commented on a diff in pull request #10472: [Enhancement] a better vec version for count_zero_num

Posted by GitBox <gi...@apache.org>.
BiteTheDDDDt commented on code in PR #10472:
URL: https://github.com/apache/doris/pull/10472#discussion_r908740012


##########
be/src/util/simd/bits.h:
##########
@@ -59,10 +59,28 @@ inline uint32_t bytes32_mask_to_bits32_mask(const bool* data) {
     return bytes32_mask_to_bits32_mask(reinterpret_cast<const uint8_t*>(data));
 }
 
-// compiler will make this SIMD automatically
 inline size_t count_zero_num(const int8_t* __restrict data, size_t size) {
     size_t num = 0;
     const int8_t* end = data + size;
+#if defined(__SSE2__) && defined(__POPCNT__)
+    const __m128i zero16 = _mm_setzero_si128();
+    const int8_t* end64 = data + (size / 64 * 64);
+
+    for (; data < end64; data += 64) {
+        num += __builtin_popcountll(
+                static_cast<uint64_t>(_mm_movemask_epi8(_mm_cmpeq_epi8(

Review Comment:
   we can shift lowest bit to highest, then movemask can get result.
   ```cpp
   __builtin_popcountll(static_cast<uint64_t>(_mm_movemask_epi8(_mm_slli_epi16(
                                               *reinterpret_cast<const __m128i*>(data), 15))) |
                                       (static_cast<uint64_t>(_mm_movemask_epi8(_mm_slli_epi16(
                                                *reinterpret_cast<const __m128i*>(data + 16), 15)))
                                        << 16u) |
                                       (static_cast<uint64_t>(_mm_movemask_epi8(_mm_slli_epi16(
                                                *reinterpret_cast<const __m128i*>(data + 32), 15)))
                                        << 32u) |
                                       (static_cast<uint64_t>(_mm_movemask_epi8(_mm_slli_epi16(
                                                *reinterpret_cast<const __m128i*>(data + 48), 15)))
                                        << 48u));
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] yiguolei merged pull request #10472: [Enhancement] a better vec version for count_zero_num

Posted by GitBox <gi...@apache.org>.
yiguolei merged PR #10472:
URL: https://github.com/apache/doris/pull/10472


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] github-actions[bot] commented on pull request #10472: [Enhancement] a better vec version for count_zero_num

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #10472:
URL: https://github.com/apache/doris/pull/10472#issuecomment-1169446684

   PR approved by at least one committer and no changes requested.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org