You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@doris.apache.org by GitBox <gi...@apache.org> on 2023/01/05 16:51:41 UTC
[GitHub] [doris] xiaokang opened a new pull request, #15663: [Improvement](topn) order by key topn query optimization
xiaokang opened a new pull request, #15663:
URL: https://github.com/apache/doris/pull/15663
# Proposed changes
Issue Number: close #xxx
## Problem summary
Describe your changes.
Optimize for order by key topn query like `SELECT * FROM table1 ORDER BY k1, k2 LIMIT n` in which k1 and k2 is the prefix of table sort key.
This optimization is only for table with DUP_KEYS and UNQIUE_KEYS with merge on write.
## Checklist(Required)
1. Does it affect the original behavior:
- [ ] Yes
- [ ] No
- [ ] I don't know
2. Has unit tests been added:
- [ ] Yes
- [ ] No
- [ ] No Need
3. Has document been added or modified:
- [ ] Yes
- [ ] No
- [ ] No Need
4. Does it need to update dependencies:
- [ ] Yes
- [ ] No
5. Are there any changes that cannot be rolled back:
- [ ] Yes (If Yes, please explain WHY)
- [ ] No
## Further comments
If this is a relatively large or complex change, kick off the discussion at [dev@doris.apache.org](mailto:dev@doris.apache.org) by explaining why you chose the solution you did and what alternatives you considered, etc...
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org
[GitHub] [doris] github-actions[bot] commented on a diff in pull request #15663: [Improvement](topn) order by key topn query optimization
Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on code in PR #15663:
URL: https://github.com/apache/doris/pull/15663#discussion_r1071032157
##########
be/src/vec/olap/vcollect_iterator.h:
##########
@@ -72,6 +72,26 @@ class VCollectIterator {
}
private:
+ // next for topn query
+ Status topn_next(Block* block);
+
+ class BlockRowposComparator {
+ public:
+ BlockRowposComparator(MutableBlock* mutable_block,
+ const std::vector<uint32_t>* compare_columns, bool is_reverse)
+ : _mutable_block(mutable_block),
+ _compare_columns(compare_columns),
+ _is_reverse(is_reverse) {}
+
+ bool operator()(const size_t& lpos, const size_t& rpos) const;
+
+ private:
+ const MutableBlock* _mutable_block = nullptr;
+ const std::vector<uint32_t>* _compare_columns;
+ // reverse the compare order
+ const bool _is_reverse = false;
Review Comment:
warning: private field '_is_reverse' is not used [clang-diagnostic-unused-private-field]
```cpp
const bool _is_reverse = false;
^
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org
[GitHub] [doris] github-actions[bot] commented on pull request #15663: [Improvement](topn) order by key topn query optimization
Posted by github-actions.
github-actions[bot] commented on PR #15663:
URL: https://github.com/apache/doris/pull/15663#issuecomment-1412319422
clang-tidy review says "All clean, LGTM! :+1:"
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org
[GitHub] [doris] Gabriel39 merged pull request #15663: [Improvement](topn) order by key topn query optimization
Posted by "Gabriel39 (via GitHub)" <gi...@apache.org>.
Gabriel39 merged PR #15663:
URL: https://github.com/apache/doris/pull/15663
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org
[GitHub] [doris] Gabriel39 commented on a diff in pull request #15663: [Improvement](topn) order by key topn query optimization
Posted by GitBox <gi...@apache.org>.
Gabriel39 commented on code in PR #15663:
URL: https://github.com/apache/doris/pull/15663#discussion_r1071696041
##########
be/src/vec/olap/vcollect_iterator.h:
##########
@@ -72,6 +72,26 @@ class VCollectIterator {
}
private:
+ // next for topn query
+ Status topn_next(Block* block);
Review Comment:
_topn_next
##########
be/src/vec/olap/vcollect_iterator.h:
##########
@@ -72,6 +72,26 @@ class VCollectIterator {
}
private:
+ // next for topn query
+ Status topn_next(Block* block);
+
+ class BlockRowposComparator {
Review Comment:
BlockRowPosComparator
##########
be/src/vec/olap/vcollect_iterator.cpp:
##########
@@ -185,13 +195,170 @@ Status VCollectIterator::next(IteratorRowRef* ref) {
}
Status VCollectIterator::next(Block* block) {
+ if (_topn_limit > 0) {
+ return topn_next(block);
+ }
+
if (LIKELY(_inner_iter)) {
return _inner_iter->next(block);
} else {
return Status::Error<END_OF_FILE>();
}
}
+Status VCollectIterator::topn_next(Block* block) {
+ if (_topn_eof) {
+ return Status::Error<END_OF_FILE>();
+ }
+
+ auto cloneBlock = block->clone_without_columns();
Review Comment:
clone_block
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org
[GitHub] [doris] github-actions[bot] commented on a diff in pull request #15663: [Improvement](topn) order by key topn query optimization
Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on code in PR #15663:
URL: https://github.com/apache/doris/pull/15663#discussion_r1063242736
##########
be/src/vec/core/block.cpp:
##########
@@ -1075,4 +1081,33 @@ void MutableBlock::clear_column_data() noexcept {
}
}
+void MutableBlock::initialize_index_by_name() {
+ for (size_t i = 0, size = _names.size(); i < size; ++i) {
+ index_by_name[_names[i]] = i;
+ }
+}
+
+bool MutableBlock::has(const std::string& name) const {
+ return index_by_name.end() != index_by_name.find(name);
+}
+
+size_t MutableBlock::get_position_by_name(const std::string& name) const {
+ auto it = index_by_name.find(name);
+ if (index_by_name.end() == it) {
+ LOG(FATAL) << fmt::format("Not found column {} in block. There are only columns: {}", name,
+ dump_names());
+ }
+
+ return it->second;
+}
+
+std::string MutableBlock::dump_names() const {
+ std::stringstream out;
+ for (auto it = _names.begin(); it != _names.end(); ++it) {
+ if (it != _names.begin()) out << ", ";
Review Comment:
warning: statement should be inside braces [readability-braces-around-statements]
```suggestion
if (it != _names.begin()) { out << ", ";
}
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org
[GitHub] [doris] github-actions[bot] commented on pull request #15663: [Improvement](topn) order by key topn query optimization
Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #15663:
URL: https://github.com/apache/doris/pull/15663#issuecomment-1413373172
clang-tidy review says "All clean, LGTM! :+1:"
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org
[GitHub] [doris] hello-stephen commented on pull request #15663: [Improvement](topn) order by key topn query optimization
Posted by GitBox <gi...@apache.org>.
hello-stephen commented on PR #15663:
URL: https://github.com/apache/doris/pull/15663#issuecomment-1372656453
TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 36 seconds
load time: 470 seconds
storage size: 17122168178 Bytes
https://doris-community-test-1308700295.cos.ap-hongkong.myqcloud.com/tmp/20230105194532_clickbench_pr_74490.html
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org
[GitHub] [doris] github-actions[bot] commented on pull request #15663: [Improvement](topn) order by key topn query optimization
Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #15663:
URL: https://github.com/apache/doris/pull/15663#issuecomment-1372477544
clang-tidy review says "All clean, LGTM! :+1:"
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org
[GitHub] [doris] github-actions[bot] commented on pull request #15663: [Improvement](topn) order by key topn query optimization
Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #15663:
URL: https://github.com/apache/doris/pull/15663#issuecomment-1374505544
clang-tidy review says "All clean, LGTM! :+1:"
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org
[GitHub] [doris] github-actions[bot] commented on pull request #15663: [Improvement](topn) order by key topn query optimization
Posted by github-actions.
github-actions[bot] commented on PR #15663:
URL: https://github.com/apache/doris/pull/15663#issuecomment-1412316013
clang-tidy review says "All clean, LGTM! :+1:"
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org