You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@doris.apache.org by GitBox <gi...@apache.org> on 2023/01/05 16:51:41 UTC

[GitHub] [doris] xiaokang opened a new pull request, #15663: [Improvement](topn) order by key topn query optimization

xiaokang opened a new pull request, #15663:
URL: https://github.com/apache/doris/pull/15663

   # Proposed changes
   
   Issue Number: close #xxx
   
   ## Problem summary
   
   Describe your changes.
   
   Optimize for order by key topn query like `SELECT * FROM table1 ORDER BY k1, k2 LIMIT n` in which k1 and k2 is the prefix of table sort key. 
   
   This optimization is only for table with DUP_KEYS and UNQIUE_KEYS with merge on write.
   
   ## Checklist(Required)
   
   1. Does it affect the original behavior: 
       - [ ] Yes
       - [ ] No
       - [ ] I don't know
   2. Has unit tests been added:
       - [ ] Yes
       - [ ] No
       - [ ] No Need
   3. Has document been added or modified:
       - [ ] Yes
       - [ ] No
       - [ ] No Need
   4. Does it need to update dependencies:
       - [ ] Yes
       - [ ] No
   5. Are there any changes that cannot be rolled back:
       - [ ] Yes (If Yes, please explain WHY)
       - [ ] No
   
   ## Further comments
   
   If this is a relatively large or complex change, kick off the discussion at [dev@doris.apache.org](mailto:dev@doris.apache.org) by explaining why you chose the solution you did and what alternatives you considered, etc...
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] github-actions[bot] commented on a diff in pull request #15663: [Improvement](topn) order by key topn query optimization

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on code in PR #15663:
URL: https://github.com/apache/doris/pull/15663#discussion_r1071032157


##########
be/src/vec/olap/vcollect_iterator.h:
##########
@@ -72,6 +72,26 @@ class VCollectIterator {
     }
 
 private:
+    // next for topn query
+    Status topn_next(Block* block);
+
+    class BlockRowposComparator {
+    public:
+        BlockRowposComparator(MutableBlock* mutable_block,
+                              const std::vector<uint32_t>* compare_columns, bool is_reverse)
+                : _mutable_block(mutable_block),
+                  _compare_columns(compare_columns),
+                  _is_reverse(is_reverse) {}
+
+        bool operator()(const size_t& lpos, const size_t& rpos) const;
+
+    private:
+        const MutableBlock* _mutable_block = nullptr;
+        const std::vector<uint32_t>* _compare_columns;
+        // reverse the compare order
+        const bool _is_reverse = false;

Review Comment:
   warning: private field '_is_reverse' is not used [clang-diagnostic-unused-private-field]
   ```cpp
           const bool _is_reverse = false;
                      ^
   ```
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] github-actions[bot] commented on pull request #15663: [Improvement](topn) order by key topn query optimization

Posted by github-actions.
github-actions[bot] commented on PR #15663:
URL: https://github.com/apache/doris/pull/15663#issuecomment-1412319422

   clang-tidy review says "All clean, LGTM! :+1:"


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] Gabriel39 merged pull request #15663: [Improvement](topn) order by key topn query optimization

Posted by "Gabriel39 (via GitHub)" <gi...@apache.org>.
Gabriel39 merged PR #15663:
URL: https://github.com/apache/doris/pull/15663


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] Gabriel39 commented on a diff in pull request #15663: [Improvement](topn) order by key topn query optimization

Posted by GitBox <gi...@apache.org>.
Gabriel39 commented on code in PR #15663:
URL: https://github.com/apache/doris/pull/15663#discussion_r1071696041


##########
be/src/vec/olap/vcollect_iterator.h:
##########
@@ -72,6 +72,26 @@ class VCollectIterator {
     }
 
 private:
+    // next for topn query
+    Status topn_next(Block* block);

Review Comment:
   _topn_next



##########
be/src/vec/olap/vcollect_iterator.h:
##########
@@ -72,6 +72,26 @@ class VCollectIterator {
     }
 
 private:
+    // next for topn query
+    Status topn_next(Block* block);
+
+    class BlockRowposComparator {

Review Comment:
   BlockRowPosComparator



##########
be/src/vec/olap/vcollect_iterator.cpp:
##########
@@ -185,13 +195,170 @@ Status VCollectIterator::next(IteratorRowRef* ref) {
 }
 
 Status VCollectIterator::next(Block* block) {
+    if (_topn_limit > 0) {
+        return topn_next(block);
+    }
+
     if (LIKELY(_inner_iter)) {
         return _inner_iter->next(block);
     } else {
         return Status::Error<END_OF_FILE>();
     }
 }
 
+Status VCollectIterator::topn_next(Block* block) {
+    if (_topn_eof) {
+        return Status::Error<END_OF_FILE>();
+    }
+
+    auto cloneBlock = block->clone_without_columns();

Review Comment:
   clone_block



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] github-actions[bot] commented on a diff in pull request #15663: [Improvement](topn) order by key topn query optimization

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on code in PR #15663:
URL: https://github.com/apache/doris/pull/15663#discussion_r1063242736


##########
be/src/vec/core/block.cpp:
##########
@@ -1075,4 +1081,33 @@ void MutableBlock::clear_column_data() noexcept {
     }
 }
 
+void MutableBlock::initialize_index_by_name() {
+    for (size_t i = 0, size = _names.size(); i < size; ++i) {
+        index_by_name[_names[i]] = i;
+    }
+}
+
+bool MutableBlock::has(const std::string& name) const {
+    return index_by_name.end() != index_by_name.find(name);
+}
+
+size_t MutableBlock::get_position_by_name(const std::string& name) const {
+    auto it = index_by_name.find(name);
+    if (index_by_name.end() == it) {
+        LOG(FATAL) << fmt::format("Not found column {} in block. There are only columns: {}", name,
+                                  dump_names());
+    }
+
+    return it->second;
+}
+
+std::string MutableBlock::dump_names() const {
+    std::stringstream out;
+    for (auto it = _names.begin(); it != _names.end(); ++it) {
+        if (it != _names.begin()) out << ", ";

Review Comment:
   warning: statement should be inside braces [readability-braces-around-statements]
   
   ```suggestion
           if (it != _names.begin()) { out << ", ";
   }
   ```
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] github-actions[bot] commented on pull request #15663: [Improvement](topn) order by key topn query optimization

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #15663:
URL: https://github.com/apache/doris/pull/15663#issuecomment-1413373172

   clang-tidy review says "All clean, LGTM! :+1:"


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] hello-stephen commented on pull request #15663: [Improvement](topn) order by key topn query optimization

Posted by GitBox <gi...@apache.org>.
hello-stephen commented on PR #15663:
URL: https://github.com/apache/doris/pull/15663#issuecomment-1372656453

   TeamCity pipeline, clickbench performance test result:
    the sum of best hot time: 36 seconds
    load time: 470 seconds
    storage size: 17122168178 Bytes
    https://doris-community-test-1308700295.cos.ap-hongkong.myqcloud.com/tmp/20230105194532_clickbench_pr_74490.html


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] github-actions[bot] commented on pull request #15663: [Improvement](topn) order by key topn query optimization

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #15663:
URL: https://github.com/apache/doris/pull/15663#issuecomment-1372477544

   clang-tidy review says "All clean, LGTM! :+1:"


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] github-actions[bot] commented on pull request #15663: [Improvement](topn) order by key topn query optimization

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #15663:
URL: https://github.com/apache/doris/pull/15663#issuecomment-1374505544

   clang-tidy review says "All clean, LGTM! :+1:"


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] github-actions[bot] commented on pull request #15663: [Improvement](topn) order by key topn query optimization

Posted by github-actions.
github-actions[bot] commented on PR #15663:
URL: https://github.com/apache/doris/pull/15663#issuecomment-1412316013

   clang-tidy review says "All clean, LGTM! :+1:"


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org