You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@doris.apache.org by GitBox <gi...@apache.org> on 2023/01/15 15:18:56 UTC

[GitHub] [doris] HappenLee opened a new pull request, #15945: [Exec](opt) Opt the vexplode_split function performance

HappenLee opened a new pull request, #15945:
URL: https://github.com/apache/doris/pull/15945

   ## Problem summary
   
   After do the opt, 2000w line data "1920X1080"
   
   ```
   before:
   
   mysql> select  count(name) from t lateral view explode_split(resolution,'x') resolution as name;
   +---------------+
   | count(`name`) |
   +---------------+
   |      20000000 |
   +---------------+
   1 row in set (2.91 sec)
   
   after:
   mysql> select  count(name) from t lateral view explode_split(resolution,'x') resolution as name;
   +---------------+
   | count(`name`) |
   +---------------+
   |      20000000 |
   +---------------+
   1 row in set (1.70 sec)
   ```
   
   Describe your changes.
   
   ## Checklist(Required)
   
   1. Does it affect the original behavior: 
       - [ ] Yes
       - [ ] No
       - [ ] I don't know
   2. Has unit tests been added:
       - [ ] Yes
       - [ ] No
       - [ ] No Need
   3. Has document been added or modified:
       - [ ] Yes
       - [ ] No
       - [ ] No Need
   4. Does it need to update dependencies:
       - [ ] Yes
       - [ ] No
   5. Are there any changes that cannot be rolled back:
       - [ ] Yes (If Yes, please explain WHY)
       - [ ] No
   
   ## Further comments
   
   If this is a relatively large or complex change, kick off the discussion at [dev@doris.apache.org](mailto:dev@doris.apache.org) by explaining why you chose the solution you did and what alternatives you considered, etc...
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] hello-stephen commented on pull request #15945: [Exec](opt) Opt the vexplode_split function performance

Posted by GitBox <gi...@apache.org>.
hello-stephen commented on PR #15945:
URL: https://github.com/apache/doris/pull/15945#issuecomment-1383515221

   TeamCity pipeline, clickbench performance test result:
    the sum of best hot time: 35.69 seconds
    load time: 494 seconds
    storage size: 17121622481 Bytes
    https://doris-community-test-1308700295.cos.ap-hongkong.myqcloud.com/tmp/20230116053159_clickbench_pr_80818.html


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] HappenLee merged pull request #15945: [Exec](opt) Opt the vexplode_split function performance

Posted by GitBox <gi...@apache.org>.
HappenLee merged PR #15945:
URL: https://github.com/apache/doris/pull/15945


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] github-actions[bot] commented on pull request #15945: [Exec](opt) Opt the vexplode_split function performance

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #15945:
URL: https://github.com/apache/doris/pull/15945#issuecomment-1384807659

   clang-tidy review says "All clean, LGTM! :+1:"


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] github-actions[bot] commented on a diff in pull request #15945: [Exec](opt) Opt the vexplode_split function performance

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on code in PR #15945:
URL: https://github.com/apache/doris/pull/15945#discussion_r1070627284


##########
be/src/vec/exprs/table_function/vexplode_split.cpp:
##########
@@ -63,17 +80,31 @@ Status VExplodeSplitTableFunction::process_row(size_t row_idx) {
     _is_current_empty = false;
     _eos = false;
 
-    StringRef text = _text_column->get_data_at(row_idx);
-    StringRef delimiter = _delimiter_column->get_data_at(row_idx);
-
-    if (text.data == nullptr) {
+    if ((!_test_null_map and _test_null_map[row_idx]) || _delimiter.data == nullptr) {
         _is_current_empty = true;
         _cur_size = 0;
         _cur_offset = 0;
     } else {
-        //TODO: implement non-copy split string reference
-        _backup = strings::Split(StringPiece((char*)text.data, text.size),
-                                 StringPiece((char*)delimiter.data, delimiter.size));
+        auto split = [](std::string_view strv, std::string_view delims = " ") {
+            std::vector<std::string_view> output;
+            auto first = strv.begin();
+
+            while (first != strv.end()) {
+                const auto second = std::find_first_of(first, std::cend(strv), std::cbegin(delims),
+                                                       std::cend(delims));
+                if (first != second) {
+                    output.emplace_back(strv.substr(std::distance(strv.begin(), first),
+                                                    std::distance(first, second)));
+                }
+
+                if (second == strv.end()) break;

Review Comment:
   warning: statement should be inside braces [readability-braces-around-statements]
   
   ```suggestion
                   if (second == strv.end()) { break;
   }
   ```
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] BiteTheDDDDt commented on a diff in pull request #15945: [Exec](opt) Opt the vexplode_split function performance

Posted by GitBox <gi...@apache.org>.
BiteTheDDDDt commented on code in PR #15945:
URL: https://github.com/apache/doris/pull/15945#discussion_r1071687390


##########
be/src/vec/exprs/table_function/vexplode_split.cpp:
##########
@@ -63,17 +80,34 @@ Status VExplodeSplitTableFunction::process_row(size_t row_idx) {
     _is_current_empty = false;
     _eos = false;
 
-    StringRef text = _text_column->get_data_at(row_idx);
-    StringRef delimiter = _delimiter_column->get_data_at(row_idx);
-
-    if (text.data == nullptr) {
+    if ((!_test_null_map and _test_null_map[row_idx]) || _delimiter.data == nullptr) {
         _is_current_empty = true;
         _cur_size = 0;
         _cur_offset = 0;
     } else {
-        //TODO: implement non-copy split string reference
-        _backup = strings::Split(StringPiece((char*)text.data, text.size),
-                                 StringPiece((char*)delimiter.data, delimiter.size));
+        auto split = [](std::string_view strv, std::string_view delims = " ") {
+            std::vector<std::string_view> output;
+            auto first = strv.begin();
+            auto last = strv.end();
+
+            do {
+                const auto second =
+                        std::search(first, last, std::cbegin(delims), std::cend(delims));
+                if (first != second) {
+                    output.emplace_back(strv.substr(std::distance(strv.begin(), first),
+                                                    std::distance(first, second)));
+                    first = std::next(second);
+                } else {
+                    output.emplace_back("", 0);
+                    first = std::next(second, delims.size());
+                }
+
+                if (second == last) break;

Review Comment:
   need fix



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] github-actions[bot] commented on pull request #15945: [Exec](opt) Opt the vexplode_split function performance

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #15945:
URL: https://github.com/apache/doris/pull/15945#issuecomment-1384842604

   PR approved by at least one committer and no changes requested.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] github-actions[bot] commented on pull request #15945: [Exec](opt) Opt the vexplode_split function performance

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #15945:
URL: https://github.com/apache/doris/pull/15945#issuecomment-1384842639

   PR approved by anyone and no changes requested.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] github-actions[bot] commented on a diff in pull request #15945: [Exec](opt) Opt the vexplode_split function performance

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on code in PR #15945:
URL: https://github.com/apache/doris/pull/15945#discussion_r1071192372


##########
be/src/vec/exprs/table_function/vexplode_split.cpp:
##########
@@ -63,17 +80,34 @@ Status VExplodeSplitTableFunction::process_row(size_t row_idx) {
     _is_current_empty = false;
     _eos = false;
 
-    StringRef text = _text_column->get_data_at(row_idx);
-    StringRef delimiter = _delimiter_column->get_data_at(row_idx);
-
-    if (text.data == nullptr) {
+    if ((!_test_null_map and _test_null_map[row_idx]) || _delimiter.data == nullptr) {
         _is_current_empty = true;
         _cur_size = 0;
         _cur_offset = 0;
     } else {
-        //TODO: implement non-copy split string reference
-        _backup = strings::Split(StringPiece((char*)text.data, text.size),
-                                 StringPiece((char*)delimiter.data, delimiter.size));
+        auto split = [](std::string_view strv, std::string_view delims = " ") {
+            std::vector<std::string_view> output;
+            auto first = strv.begin();
+            auto last = strv.end();
+
+            do {
+                const auto second =
+                        std::search(first, last, std::cbegin(delims), std::cend(delims));
+                if (first != second) {
+                    output.emplace_back(strv.substr(std::distance(strv.begin(), first),
+                                                    std::distance(first, second)));
+                    first = std::next(second);
+                } else {
+                    output.emplace_back("", 0);
+                    first = std::next(second, delims.size());
+                }
+
+                if (second == last) break;

Review Comment:
   warning: statement should be inside braces [readability-braces-around-statements]
   
   ```suggestion
                   if (second == last) { break;
   }
   ```
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] github-actions[bot] commented on a diff in pull request #15945: [Exec](opt) Opt the vexplode_split function performance

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on code in PR #15945:
URL: https://github.com/apache/doris/pull/15945#discussion_r1070621821


##########
be/src/vec/exprs/table_function/vexplode_split.cpp:
##########
@@ -63,17 +80,35 @@ Status VExplodeSplitTableFunction::process_row(size_t row_idx) {
     _is_current_empty = false;
     _eos = false;
 
-    StringRef text = _text_column->get_data_at(row_idx);
-    StringRef delimiter = _delimiter_column->get_data_at(row_idx);
-
-    if (text.data == nullptr) {
+    if ((!_test_null_map and _test_null_map[row_idx]) || _delimiter.data == nullptr) {
         _is_current_empty = true;
         _cur_size = 0;
         _cur_offset = 0;
     } else {
-        //TODO: implement non-copy split string reference
-        _backup = strings::Split(StringPiece((char*)text.data, text.size),
-                                 StringPiece((char*)delimiter.data, delimiter.size));
+        auto split =[] (std::string_view strv, std::string_view delims = " ")
+        {
+            std::vector<std::string_view> output;
+            auto first = strv.begin();
+
+            while (first != strv.end())
+            {
+                const auto second = std::find_first_of(first, std::cend(strv),
+                                                       std::cbegin(delims), std::cend(delims));
+                //std::cout << first << ", " << second << '\n';
+                if (first != second)
+                {
+                    output.emplace_back(strv.substr(std::distance(strv.begin(), first), std::distance(first, second)));
+                }
+
+                if (second == strv.end())
+                    break;

Review Comment:
   warning: statement should be inside braces [readability-braces-around-statements]
   
   ```suggestion
                   if (second == strv.end()) {
                       break;
   }
   ```
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org