You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@doris.apache.org by "ZhangYu0123 (via GitHub)" <gi...@apache.org> on 2023/04/02 21:50:08 UTC

[GitHub] [doris] ZhangYu0123 opened a new pull request, #18321: [opt](string) optimize constant empty string compare

ZhangYu0123 opened a new pull request, #18321:
URL: https://github.com/apache/doris/pull/18321

   # Proposed changes
   Optimize constant empty string compare:
   (1) When the constant empy string  '' (size is 0), we can compare offsets in SIMD directly.
    
   ````
   q10: SELECT MobilePhoneModel, COUNT(DISTINCT UserID) AS u FROM hits WHERE MobilePhoneModel <> '' GROUP BY MobilePhoneModel ORDER BY u DESC LIMIT 10;
   q11: SELECT MobilePhone, MobilePhoneModel, COUNT(DISTINCT UserID) AS u FROM hits WHERE MobilePhoneModel <> '' GROUP BY MobilePhone, MobilePhoneModel ORDER BY u DESC LIMIT 10;
   q12: SELECT SearchPhrase, COUNT(*) AS c FROM hits WHERE SearchPhrase <> '' GROUP BY SearchPhrase ORDER BY c DESC LIMIT 10;
   q13: SELECT SearchPhrase, COUNT(DISTINCT UserID) AS u FROM hits WHERE SearchPhrase <> '' GROUP BY SearchPhrase ORDER BY u DESC LIMIT 10;
   q14: SELECT SearchEngineID, SearchPhrase, COUNT(*) AS c FROM hits WHERE SearchPhrase <> '' GROUP BY SearchEngineID, SearchPhrase ORDER BY c DESC LIMIT 10;
   ````
   
   Issue Number: close #xxx
   
   ## Problem summary
   
   Describe your changes.
   
   ## Checklist(Required)
   
   * [ ] Does it affect the original behavior
   * [ ] Has unit tests been added
   * [ ] Has document been added or modified
   * [ ] Does it need to update dependencies
   * [ ] Is this PR support rollback (If NO, please explain WHY)
   
   ## Further comments
   
   If this is a relatively large or complex change, kick off the discussion at [dev@doris.apache.org](mailto:dev@doris.apache.org) by explaining why you chose the solution you did and what alternatives you considered, etc...
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] github-actions[bot] commented on pull request #18321: [Optimization](string) optimize constant empty string compare ( column='', column!='')

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #18321:
URL: https://github.com/apache/doris/pull/18321#issuecomment-1494254076

   clang-tidy review says "All clean, LGTM! :+1:"


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] github-actions[bot] commented on pull request #18321: [opt](string) optimize constant empty string compare

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #18321:
URL: https://github.com/apache/doris/pull/18321#issuecomment-1493448673

   clang-tidy review says "All clean, LGTM! :+1:"


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] HappenLee commented on a diff in pull request #18321: [Optimization](string) optimize constant empty string compare ( column='', column!='')

Posted by "HappenLee (via GitHub)" <gi...@apache.org>.
HappenLee commented on code in PR #18321:
URL: https://github.com/apache/doris/pull/18321#discussion_r1155789374


##########
be/src/vec/functions/functions_comparison.h:
##########
@@ -211,15 +211,28 @@ struct StringEqualsImpl {
                                                  ColumnString::Offset b_size,
                                                  PaddedPODArray<UInt8>& c) {
         size_t size = a_offsets.size();
-        ColumnString::Offset prev_a_offset = 0;
-
-        for (size_t i = 0; i < size; ++i) {
-            auto a_size = a_offsets[i] - prev_a_offset;
-
-            c[i] = positive == memequal_small_allow_overflow15(a_data.data() + prev_a_offset,
-                                                               a_size, b_data.data(), b_size);
-
-            prev_a_offset = a_offsets[i];
+        if (b_size == 0) {
+            auto* __restrict data = c.data();
+            auto* __restrict offsets = a_offsets.data();
+            if (positive) {
+                for (size_t i = 0; i < size; ++i) {
+                    data[i] = (offsets[i] == offsets[i - 1]);

Review Comment:
   `data[i] = positive ? (offsets[i] == offsets[i - 1]) : (offsets[i] != offsets[i - 1]);`
   
   The code can auto simd



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] ZhangYu0123 commented on pull request #18321: [opt](string) optimize constant empty string compare

Posted by "ZhangYu0123 (via GitHub)" <gi...@apache.org>.
ZhangYu0123 commented on PR #18321:
URL: https://github.com/apache/doris/pull/18321#issuecomment-1493509740

   run p1


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] HappenLee commented on a diff in pull request #18321: [Optimization](string) optimize constant empty string compare ( column='', column!='')

Posted by "HappenLee (via GitHub)" <gi...@apache.org>.
HappenLee commented on code in PR #18321:
URL: https://github.com/apache/doris/pull/18321#discussion_r1156169074


##########
be/src/vec/functions/functions_comparison.h:
##########
@@ -211,15 +211,28 @@ struct StringEqualsImpl {
                                                  ColumnString::Offset b_size,
                                                  PaddedPODArray<UInt8>& c) {
         size_t size = a_offsets.size();
-        ColumnString::Offset prev_a_offset = 0;
-
-        for (size_t i = 0; i < size; ++i) {
-            auto a_size = a_offsets[i] - prev_a_offset;
-
-            c[i] = positive == memequal_small_allow_overflow15(a_data.data() + prev_a_offset,
-                                                               a_size, b_data.data(), b_size);
-
-            prev_a_offset = a_offsets[i];
+        if (b_size == 0) {
+            auto* __restrict data = c.data();
+            auto* __restrict offsets = a_offsets.data();
+            if (positive) {
+                for (size_t i = 0; i < size; ++i) {
+                    data[i] = (offsets[i] == offsets[i - 1]);

Review Comment:
   `data[i] = positive ? (offsets[i] == offsets[i - 1]) : (offsets[i] != offsets[i - 1]);`
   
   The code can auto simd



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] hello-stephen commented on pull request #18321: [opt](string) optimize constant empty string compare

Posted by "hello-stephen (via GitHub)" <gi...@apache.org>.
hello-stephen commented on PR #18321:
URL: https://github.com/apache/doris/pull/18321#issuecomment-1493453125

   TeamCity pipeline, clickbench performance test result:
    the sum of best hot time: 33.68 seconds
    stream load tsv:          453 seconds loaded 74807831229 Bytes, about 157 MB/s
    stream load json:         21 seconds loaded 2358488459 Bytes, about 107 MB/s
    stream load orc:          72 seconds loaded 1101869774 Bytes, about 14 MB/s
    stream load parquet:          31 seconds loaded 861443392 Bytes, about 26 MB/s
    https://doris-community-test-1308700295.cos.ap-hongkong.myqcloud.com/tmp/20230402221646_clickbench_pr_123839.html


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] yiguolei merged pull request #18321: [Optimization](string) optimize constant empty string compare ( column='', column!='')

Posted by "yiguolei (via GitHub)" <gi...@apache.org>.
yiguolei merged PR #18321:
URL: https://github.com/apache/doris/pull/18321


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] ZhangYu0123 commented on a diff in pull request #18321: [Optimization](string) optimize constant empty string compare ( column='', column!='')

Posted by "ZhangYu0123 (via GitHub)" <gi...@apache.org>.
ZhangYu0123 commented on code in PR #18321:
URL: https://github.com/apache/doris/pull/18321#discussion_r1156499051


##########
be/src/vec/functions/functions_comparison.h:
##########
@@ -211,15 +211,28 @@ struct StringEqualsImpl {
                                                  ColumnString::Offset b_size,
                                                  PaddedPODArray<UInt8>& c) {
         size_t size = a_offsets.size();
-        ColumnString::Offset prev_a_offset = 0;
-
-        for (size_t i = 0; i < size; ++i) {
-            auto a_size = a_offsets[i] - prev_a_offset;
-
-            c[i] = positive == memequal_small_allow_overflow15(a_data.data() + prev_a_offset,
-                                                               a_size, b_data.data(), b_size);
-
-            prev_a_offset = a_offsets[i];
+        if (b_size == 0) {
+            auto* __restrict data = c.data();
+            auto* __restrict offsets = a_offsets.data();
+            if (positive) {
+                for (size_t i = 0; i < size; ++i) {
+                    data[i] = (offsets[i] == offsets[i - 1]);

Review Comment:
   done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] ZhangYu0123 commented on pull request #18321: [opt](string) optimize constant empty string compare

Posted by "ZhangYu0123 (via GitHub)" <gi...@apache.org>.
ZhangYu0123 commented on PR #18321:
URL: https://github.com/apache/doris/pull/18321#issuecomment-1493448534

   run buildall


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] jacktengg commented on pull request #18321: [Optimization](string) optimize constant empty string compare ( column='', column!='')

Posted by "jacktengg (via GitHub)" <gi...@apache.org>.
jacktengg commented on PR #18321:
URL: https://github.com/apache/doris/pull/18321#issuecomment-1500817700

   LGTM


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] ZhangYu0123 commented on pull request #18321: [Optimization](string) optimize constant empty string compare ( column='', column!='')

Posted by "ZhangYu0123 (via GitHub)" <gi...@apache.org>.
ZhangYu0123 commented on PR #18321:
URL: https://github.com/apache/doris/pull/18321#issuecomment-1494256071

   run buildall


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] ZhangYu0123 commented on pull request #18321: [opt](string) optimize constant empty string compare

Posted by "ZhangYu0123 (via GitHub)" <gi...@apache.org>.
ZhangYu0123 commented on PR #18321:
URL: https://github.com/apache/doris/pull/18321#issuecomment-1493480859

   run p1


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org