You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@doris.apache.org by GitBox <gi...@apache.org> on 2023/01/15 15:18:56 UTC
[GitHub] [doris] HappenLee opened a new pull request, #15945: [Exec](opt) Opt the vexplode_split function performance
HappenLee opened a new pull request, #15945:
URL: https://github.com/apache/doris/pull/15945
## Problem summary
After do the opt, 2000w line data "1920X1080"
```
before:
mysql> select count(name) from t lateral view explode_split(resolution,'x') resolution as name;
+---------------+
| count(`name`) |
+---------------+
| 20000000 |
+---------------+
1 row in set (2.91 sec)
after:
mysql> select count(name) from t lateral view explode_split(resolution,'x') resolution as name;
+---------------+
| count(`name`) |
+---------------+
| 20000000 |
+---------------+
1 row in set (1.70 sec)
```
Describe your changes.
## Checklist(Required)
1. Does it affect the original behavior:
- [ ] Yes
- [ ] No
- [ ] I don't know
2. Has unit tests been added:
- [ ] Yes
- [ ] No
- [ ] No Need
3. Has document been added or modified:
- [ ] Yes
- [ ] No
- [ ] No Need
4. Does it need to update dependencies:
- [ ] Yes
- [ ] No
5. Are there any changes that cannot be rolled back:
- [ ] Yes (If Yes, please explain WHY)
- [ ] No
## Further comments
If this is a relatively large or complex change, kick off the discussion at [dev@doris.apache.org](mailto:dev@doris.apache.org) by explaining why you chose the solution you did and what alternatives you considered, etc...
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org
[GitHub] [doris] hello-stephen commented on pull request #15945: [Exec](opt) Opt the vexplode_split function performance
Posted by GitBox <gi...@apache.org>.
hello-stephen commented on PR #15945:
URL: https://github.com/apache/doris/pull/15945#issuecomment-1383515221
TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 35.69 seconds
load time: 494 seconds
storage size: 17121622481 Bytes
https://doris-community-test-1308700295.cos.ap-hongkong.myqcloud.com/tmp/20230116053159_clickbench_pr_80818.html
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org
[GitHub] [doris] HappenLee merged pull request #15945: [Exec](opt) Opt the vexplode_split function performance
Posted by GitBox <gi...@apache.org>.
HappenLee merged PR #15945:
URL: https://github.com/apache/doris/pull/15945
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org
[GitHub] [doris] github-actions[bot] commented on pull request #15945: [Exec](opt) Opt the vexplode_split function performance
Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #15945:
URL: https://github.com/apache/doris/pull/15945#issuecomment-1384807659
clang-tidy review says "All clean, LGTM! :+1:"
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org
[GitHub] [doris] github-actions[bot] commented on a diff in pull request #15945: [Exec](opt) Opt the vexplode_split function performance
Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on code in PR #15945:
URL: https://github.com/apache/doris/pull/15945#discussion_r1070627284
##########
be/src/vec/exprs/table_function/vexplode_split.cpp:
##########
@@ -63,17 +80,31 @@ Status VExplodeSplitTableFunction::process_row(size_t row_idx) {
_is_current_empty = false;
_eos = false;
- StringRef text = _text_column->get_data_at(row_idx);
- StringRef delimiter = _delimiter_column->get_data_at(row_idx);
-
- if (text.data == nullptr) {
+ if ((!_test_null_map and _test_null_map[row_idx]) || _delimiter.data == nullptr) {
_is_current_empty = true;
_cur_size = 0;
_cur_offset = 0;
} else {
- //TODO: implement non-copy split string reference
- _backup = strings::Split(StringPiece((char*)text.data, text.size),
- StringPiece((char*)delimiter.data, delimiter.size));
+ auto split = [](std::string_view strv, std::string_view delims = " ") {
+ std::vector<std::string_view> output;
+ auto first = strv.begin();
+
+ while (first != strv.end()) {
+ const auto second = std::find_first_of(first, std::cend(strv), std::cbegin(delims),
+ std::cend(delims));
+ if (first != second) {
+ output.emplace_back(strv.substr(std::distance(strv.begin(), first),
+ std::distance(first, second)));
+ }
+
+ if (second == strv.end()) break;
Review Comment:
warning: statement should be inside braces [readability-braces-around-statements]
```suggestion
if (second == strv.end()) { break;
}
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org
[GitHub] [doris] BiteTheDDDDt commented on a diff in pull request #15945: [Exec](opt) Opt the vexplode_split function performance
Posted by GitBox <gi...@apache.org>.
BiteTheDDDDt commented on code in PR #15945:
URL: https://github.com/apache/doris/pull/15945#discussion_r1071687390
##########
be/src/vec/exprs/table_function/vexplode_split.cpp:
##########
@@ -63,17 +80,34 @@ Status VExplodeSplitTableFunction::process_row(size_t row_idx) {
_is_current_empty = false;
_eos = false;
- StringRef text = _text_column->get_data_at(row_idx);
- StringRef delimiter = _delimiter_column->get_data_at(row_idx);
-
- if (text.data == nullptr) {
+ if ((!_test_null_map and _test_null_map[row_idx]) || _delimiter.data == nullptr) {
_is_current_empty = true;
_cur_size = 0;
_cur_offset = 0;
} else {
- //TODO: implement non-copy split string reference
- _backup = strings::Split(StringPiece((char*)text.data, text.size),
- StringPiece((char*)delimiter.data, delimiter.size));
+ auto split = [](std::string_view strv, std::string_view delims = " ") {
+ std::vector<std::string_view> output;
+ auto first = strv.begin();
+ auto last = strv.end();
+
+ do {
+ const auto second =
+ std::search(first, last, std::cbegin(delims), std::cend(delims));
+ if (first != second) {
+ output.emplace_back(strv.substr(std::distance(strv.begin(), first),
+ std::distance(first, second)));
+ first = std::next(second);
+ } else {
+ output.emplace_back("", 0);
+ first = std::next(second, delims.size());
+ }
+
+ if (second == last) break;
Review Comment:
need fix
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org
[GitHub] [doris] github-actions[bot] commented on pull request #15945: [Exec](opt) Opt the vexplode_split function performance
Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #15945:
URL: https://github.com/apache/doris/pull/15945#issuecomment-1384842604
PR approved by at least one committer and no changes requested.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org
[GitHub] [doris] github-actions[bot] commented on pull request #15945: [Exec](opt) Opt the vexplode_split function performance
Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #15945:
URL: https://github.com/apache/doris/pull/15945#issuecomment-1384842639
PR approved by anyone and no changes requested.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org
[GitHub] [doris] github-actions[bot] commented on a diff in pull request #15945: [Exec](opt) Opt the vexplode_split function performance
Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on code in PR #15945:
URL: https://github.com/apache/doris/pull/15945#discussion_r1071192372
##########
be/src/vec/exprs/table_function/vexplode_split.cpp:
##########
@@ -63,17 +80,34 @@ Status VExplodeSplitTableFunction::process_row(size_t row_idx) {
_is_current_empty = false;
_eos = false;
- StringRef text = _text_column->get_data_at(row_idx);
- StringRef delimiter = _delimiter_column->get_data_at(row_idx);
-
- if (text.data == nullptr) {
+ if ((!_test_null_map and _test_null_map[row_idx]) || _delimiter.data == nullptr) {
_is_current_empty = true;
_cur_size = 0;
_cur_offset = 0;
} else {
- //TODO: implement non-copy split string reference
- _backup = strings::Split(StringPiece((char*)text.data, text.size),
- StringPiece((char*)delimiter.data, delimiter.size));
+ auto split = [](std::string_view strv, std::string_view delims = " ") {
+ std::vector<std::string_view> output;
+ auto first = strv.begin();
+ auto last = strv.end();
+
+ do {
+ const auto second =
+ std::search(first, last, std::cbegin(delims), std::cend(delims));
+ if (first != second) {
+ output.emplace_back(strv.substr(std::distance(strv.begin(), first),
+ std::distance(first, second)));
+ first = std::next(second);
+ } else {
+ output.emplace_back("", 0);
+ first = std::next(second, delims.size());
+ }
+
+ if (second == last) break;
Review Comment:
warning: statement should be inside braces [readability-braces-around-statements]
```suggestion
if (second == last) { break;
}
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org
[GitHub] [doris] github-actions[bot] commented on a diff in pull request #15945: [Exec](opt) Opt the vexplode_split function performance
Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on code in PR #15945:
URL: https://github.com/apache/doris/pull/15945#discussion_r1070621821
##########
be/src/vec/exprs/table_function/vexplode_split.cpp:
##########
@@ -63,17 +80,35 @@ Status VExplodeSplitTableFunction::process_row(size_t row_idx) {
_is_current_empty = false;
_eos = false;
- StringRef text = _text_column->get_data_at(row_idx);
- StringRef delimiter = _delimiter_column->get_data_at(row_idx);
-
- if (text.data == nullptr) {
+ if ((!_test_null_map and _test_null_map[row_idx]) || _delimiter.data == nullptr) {
_is_current_empty = true;
_cur_size = 0;
_cur_offset = 0;
} else {
- //TODO: implement non-copy split string reference
- _backup = strings::Split(StringPiece((char*)text.data, text.size),
- StringPiece((char*)delimiter.data, delimiter.size));
+ auto split =[] (std::string_view strv, std::string_view delims = " ")
+ {
+ std::vector<std::string_view> output;
+ auto first = strv.begin();
+
+ while (first != strv.end())
+ {
+ const auto second = std::find_first_of(first, std::cend(strv),
+ std::cbegin(delims), std::cend(delims));
+ //std::cout << first << ", " << second << '\n';
+ if (first != second)
+ {
+ output.emplace_back(strv.substr(std::distance(strv.begin(), first), std::distance(first, second)));
+ }
+
+ if (second == strv.end())
+ break;
Review Comment:
warning: statement should be inside braces [readability-braces-around-statements]
```suggestion
if (second == strv.end()) {
break;
}
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org