You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@doris.apache.org by "hust-hhb (via GitHub)" <gi...@apache.org> on 2023/04/14 11:17:47 UTC

[GitHub] [doris] hust-hhb opened a new pull request, #18686: [performance](stream-load) use vector instead of skiplist when insert dup keys

hust-hhb opened a new pull request, #18686:
URL: https://github.com/apache/doris/pull/18686

   … dup keys
   
   # Proposed changes
   
   Issue Number: close #xxx
   
   ## Problem summary
   
   Describe your changes.
   
   ## Checklist(Required)
   
   * [ ] Does it affect the original behavior
   * [ ] Has unit tests been added
   * [ ] Has document been added or modified
   * [ ] Does it need to update dependencies
   * [ ] Is this PR support rollback (If NO, please explain WHY)
   
   ## Further comments
   
   If this is a relatively large or complex change, kick off the discussion at [dev@doris.apache.org](mailto:dev@doris.apache.org) by explaining why you chose the solution you did and what alternatives you considered, etc...
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] zhannngchen commented on a diff in pull request #18686: [performance](stream-load) use vector instead of skiplist when insert dup keys

Posted by "zhannngchen (via GitHub)" <gi...@apache.org>.
zhannngchen commented on code in PR #18686:
URL: https://github.com/apache/doris/pull/18686#discussion_r1169566576


##########
be/src/olap/memtable.cpp:
##########
@@ -243,11 +243,16 @@ void MemTable::_collect_vskiplist_results() {
     VecTable::Iterator it(_vec_skip_list.get());
     vectorized::Block in_block = _input_mutable_block.to_block();
     if (_keys_type == KeysType::DUP_KEYS) {
+        RowComparator _cmp(_schema);
+        vectorized::MutableBlock mutable_block =
+                vectorized::MutableBlock::build_mutable_block(&in_block);
+        _cmp.set_block(&mutable_block);
+        std::sort(_vec_row.begin(), _vec_row.end(), _cmp);

Review Comment:
   you can use the `_vec_row_comparator` with lambda directly, like this way:
   ```
           std::sort(_vec_row.begin(), _vec_row.end(),
                     [this](const RowInBlock* l, const RowInBlock* r) -> bool {
                         return (*(this->_vec_row_comparator))(l, r) < 0;
                     });
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] zhannngchen commented on a diff in pull request #18686: [performance](stream-load) use vector instead of skiplist when insert dup keys

Posted by "zhannngchen (via GitHub)" <gi...@apache.org>.
zhannngchen commented on code in PR #18686:
URL: https://github.com/apache/doris/pull/18686#discussion_r1170674446


##########
be/src/olap/memtable.cpp:
##########
@@ -182,17 +182,16 @@ void MemTable::insert(const vectorized::Block* input_block, const std::vector<in
     _mem_usage += input_size;
     _insert_mem_tracker->consume(input_size);
     for (int i = 0; i < num_rows; i++) {
-        _row_in_blocks.emplace_back(new RowInBlock {cursor_in_mutableblock + i});
-        _insert_one_row_from_block(_row_in_blocks.back());
+        _row_in_blocks.emplace(_row_in_blocks.begin(), new RowInBlock {cursor_in_mutableblock + i});

Review Comment:
   Why insert elements to the head of `_row_in_blocks`? It's quite low efficient for vector



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] hust-hhb commented on pull request #18686: [performance](stream-load) use vector instead of skiplist when insert dup keys

Posted by "hust-hhb (via GitHub)" <gi...@apache.org>.
hust-hhb commented on PR #18686:
URL: https://github.com/apache/doris/pull/18686#issuecomment-1514628360

   run buildall


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] zhannngchen commented on a diff in pull request #18686: [performance](stream-load) use vector instead of skiplist when insert dup keys

Posted by "zhannngchen (via GitHub)" <gi...@apache.org>.
zhannngchen commented on code in PR #18686:
URL: https://github.com/apache/doris/pull/18686#discussion_r1172121910


##########
be/src/olap/memtable.cpp:
##########
@@ -243,11 +242,22 @@ void MemTable::_collect_vskiplist_results() {
     VecTable::Iterator it(_vec_skip_list.get());
     vectorized::Block in_block = _input_mutable_block.to_block();
     if (_keys_type == KeysType::DUP_KEYS) {
+        vectorized::MutableBlock mutable_block =
+                vectorized::MutableBlock::build_mutable_block(&in_block);
+        _vec_row_comparator->set_block(&mutable_block);
+        std::sort(_row_in_blocks.begin(), _row_in_blocks.end(),
+                  [this](const RowInBlock* l, const RowInBlock* r) -> bool {
+                      if ((*(this->_vec_row_comparator))(l, r) == 0) {
+                          return l->_row_pos - r->_row_pos > 0;
+                      } else {
+                          return (*(this->_vec_row_comparator))(l, r) < 0;

Review Comment:
   +1



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] hust-hhb commented on pull request #18686: [performance](stream-load) use vector instead of skiplist when insert dup keys

Posted by "hust-hhb (via GitHub)" <gi...@apache.org>.
hust-hhb commented on PR #18686:
URL: https://github.com/apache/doris/pull/18686#issuecomment-1514358502

   run buildall


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] github-actions[bot] commented on pull request #18686: [performance](stream-load) use vector instead of skiplist when insert dup keys

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #18686:
URL: https://github.com/apache/doris/pull/18686#issuecomment-1517319712

   clang-tidy review says "All clean, LGTM! :+1:"


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] zhannngchen commented on pull request #18686: [performance](stream-load) use vector instead of skiplist when insert dup keys

Posted by "zhannngchen (via GitHub)" <gi...@apache.org>.
zhannngchen commented on PR #18686:
URL: https://github.com/apache/doris/pull/18686#issuecomment-1512358898

   run buildall


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] github-actions[bot] commented on pull request #18686: [performance](stream-load) use vector instead of skiplist when insert dup keys

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #18686:
URL: https://github.com/apache/doris/pull/18686#issuecomment-1511276652

   clang-tidy review says "All clean, LGTM! :+1:"


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] github-actions[bot] commented on pull request #18686: [performance](stream-load) use vector instead of skiplist when insert dup keys

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #18686:
URL: https://github.com/apache/doris/pull/18686#issuecomment-1511266607

   clang-tidy review says "All clean, LGTM! :+1:"


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] carlvinhust2012 commented on a diff in pull request #18686: [performance](stream-load) use vector instead of skiplist when insert dup keys

Posted by "carlvinhust2012 (via GitHub)" <gi...@apache.org>.
carlvinhust2012 commented on code in PR #18686:
URL: https://github.com/apache/doris/pull/18686#discussion_r1169873051


##########
be/src/olap/memtable.h:
##########
@@ -88,6 +88,26 @@ class MemTable {
         char* agg_places(size_t offset) const { return _agg_mem + _agg_state_offset[offset]; }
     };
 
+    class RowComparator {
+    public:
+        RowComparator(const Schema* schema) : _schema(schema) {}
+
+        void set_block(vectorized::MutableBlock* pblock) { _pblock = pblock; }
+
+        int compare(const RowInBlock* left, const RowInBlock* right) const {
+            return _pblock->compare_at(left->_row_pos, right->_row_pos, _schema->num_key_columns(),
+                                       *_pblock, -1);
+        }
+
+        bool operator()(const RowInBlock* left, const RowInBlock* right) const {
+            return compare(left, right) < 0;
+        }
+
+    private:
+        const Schema* _schema;
+        vectorized::MutableBlock* _pblock; // 对应Memtable::_input_mutable_block

Review Comment:
   here may be use English comment is more better



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] liaoxin01 commented on a diff in pull request #18686: [performance](stream-load) use vector instead of skiplist when insert dup keys

Posted by "liaoxin01 (via GitHub)" <gi...@apache.org>.
liaoxin01 commented on code in PR #18686:
URL: https://github.com/apache/doris/pull/18686#discussion_r1172491844


##########
be/src/olap/memtable.cpp:
##########
@@ -243,11 +242,23 @@ void MemTable::_collect_vskiplist_results() {
     VecTable::Iterator it(_vec_skip_list.get());
     vectorized::Block in_block = _input_mutable_block.to_block();
     if (_keys_type == KeysType::DUP_KEYS) {
+        vectorized::MutableBlock mutable_block =
+                vectorized::MutableBlock::build_mutable_block(&in_block);
+        _vec_row_comparator->set_block(&mutable_block);
+        std::sort(_row_in_blocks.begin(), _row_in_blocks.end(),

Review Comment:
   Can we use `std::stable_sort` here directly? 
    `std::stable_sort(_row_in_blocks.begin(), _row_in_blocks.end(), _vec_row_comparator);`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] github-actions[bot] commented on pull request #18686: [performance](stream-load) use vector instead of skiplist when insert dup keys

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #18686:
URL: https://github.com/apache/doris/pull/18686#issuecomment-1516194562

   clang-tidy review says "All clean, LGTM! :+1:"


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] hust-hhb commented on a diff in pull request #18686: [performance](stream-load) use vector instead of skiplist when insert dup keys

Posted by "hust-hhb (via GitHub)" <gi...@apache.org>.
hust-hhb commented on code in PR #18686:
URL: https://github.com/apache/doris/pull/18686#discussion_r1172506516


##########
be/src/olap/memtable.cpp:
##########
@@ -243,11 +242,23 @@ void MemTable::_collect_vskiplist_results() {
     VecTable::Iterator it(_vec_skip_list.get());
     vectorized::Block in_block = _input_mutable_block.to_block();
     if (_keys_type == KeysType::DUP_KEYS) {
+        vectorized::MutableBlock mutable_block =
+                vectorized::MutableBlock::build_mutable_block(&in_block);
+        _vec_row_comparator->set_block(&mutable_block);
+        std::sort(_row_in_blocks.begin(), _row_in_blocks.end(),

Review Comment:
   compare _row_pos can ensure that the sorting is stable, in my test the output order is equal with skiplist, so i think std::sort is enough.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] github-actions[bot] commented on pull request #18686: [performance](stream-load) use vector instead of skiplist when insert dup keys

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #18686:
URL: https://github.com/apache/doris/pull/18686#issuecomment-1511330714

   clang-tidy review says "All clean, LGTM! :+1:"


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] hust-hhb commented on pull request #18686: [performance](stream-load) use vector instead of skiplist when insert dup keys

Posted by "hust-hhb (via GitHub)" <gi...@apache.org>.
hust-hhb commented on PR #18686:
URL: https://github.com/apache/doris/pull/18686#issuecomment-1512931503

   @zhannngchen @carlvinhust2012 Thanks for comment, now i use _row_in_blocks to store and sort dup keys, which is already exist in memtable, it no need to create a new vector and RowComparator to do this.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] hust-hhb commented on pull request #18686: [performance](stream-load) use vector instead of skiplist when insert dup keys

Posted by "hust-hhb (via GitHub)" <gi...@apache.org>.
hust-hhb commented on PR #18686:
URL: https://github.com/apache/doris/pull/18686#issuecomment-1512933061

   run buildall


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] hust-hhb commented on pull request #18686: [performance](stream-load) use vector instead of skiplist when insert dup keys

Posted by "hust-hhb (via GitHub)" <gi...@apache.org>.
hust-hhb commented on PR #18686:
URL: https://github.com/apache/doris/pull/18686#issuecomment-1517319868

   run buildall


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] github-actions[bot] commented on pull request #18686: [performance](stream-load) use vector instead of skiplist when insert dup keys

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #18686:
URL: https://github.com/apache/doris/pull/18686#issuecomment-1508357648

   clang-tidy review says "All clean, LGTM! :+1:"


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] github-actions[bot] commented on pull request #18686: [performance](stream-load) use vector instead of skiplist when insert dup keys

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #18686:
URL: https://github.com/apache/doris/pull/18686#issuecomment-1514364726

   clang-tidy review says "All clean, LGTM! :+1:"


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] zhannngchen commented on a diff in pull request #18686: [performance](stream-load) use vector instead of skiplist when insert dup keys

Posted by "zhannngchen (via GitHub)" <gi...@apache.org>.
zhannngchen commented on code in PR #18686:
URL: https://github.com/apache/doris/pull/18686#discussion_r1169577868


##########
be/src/olap/memtable.cpp:
##########
@@ -191,8 +191,8 @@ void MemTable::_insert_one_row_from_block(RowInBlock* row_in_block) {
     _rows++;
     bool overwritten = false;
     if (_keys_type == KeysType::DUP_KEYS) {
-        // TODO: dup keys only need sort opertaion. Rethink skiplist is the beat way to sort columns?
-        _vec_skip_list->Insert(row_in_block, &overwritten);
+        // dup keys only need sort operation, use vector to sort is faster than skiplist
+        _vec_row.emplace_back(row_in_block);

Review Comment:
   reserve space at `insert` will be faster.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] hust-hhb commented on pull request #18686: [performance](stream-load) use vector instead of skiplist when insert dup keys

Posted by "hust-hhb (via GitHub)" <gi...@apache.org>.
hust-hhb commented on PR #18686:
URL: https://github.com/apache/doris/pull/18686#issuecomment-1516194865

   run buildall


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] hust-hhb commented on pull request #18686: [performance](stream-load) use vector instead of skiplist when insert dup keys

Posted by "hust-hhb (via GitHub)" <gi...@apache.org>.
hust-hhb commented on PR #18686:
URL: https://github.com/apache/doris/pull/18686#issuecomment-1517308133

   run buildall


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] zhannngchen commented on a diff in pull request #18686: [performance](stream-load) use vector instead of skiplist when insert dup keys

Posted by "zhannngchen (via GitHub)" <gi...@apache.org>.
zhannngchen commented on code in PR #18686:
URL: https://github.com/apache/doris/pull/18686#discussion_r1169566576


##########
be/src/olap/memtable.cpp:
##########
@@ -243,11 +243,16 @@ void MemTable::_collect_vskiplist_results() {
     VecTable::Iterator it(_vec_skip_list.get());
     vectorized::Block in_block = _input_mutable_block.to_block();
     if (_keys_type == KeysType::DUP_KEYS) {
+        RowComparator _cmp(_schema);
+        vectorized::MutableBlock mutable_block =
+                vectorized::MutableBlock::build_mutable_block(&in_block);
+        _cmp.set_block(&mutable_block);
+        std::sort(_vec_row.begin(), _vec_row.end(), _cmp);

Review Comment:
   you can use the `_vec_row_comparator` with lambda function directly, like this way:
   ```
           std::sort(_vec_row.begin(), _vec_row.end(),
                     [this](const RowInBlock* l, const RowInBlock* r) -> bool {
                         return (*(this->_vec_row_comparator))(l, r) < 0;
                     });
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] dataroaring merged pull request #18686: [performance](stream-load) use vector instead of skiplist when insert dup keys

Posted by "dataroaring (via GitHub)" <gi...@apache.org>.
dataroaring merged PR #18686:
URL: https://github.com/apache/doris/pull/18686


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] dataroaring commented on a diff in pull request #18686: [performance](stream-load) use vector instead of skiplist when insert dup keys

Posted by "dataroaring (via GitHub)" <gi...@apache.org>.
dataroaring commented on code in PR #18686:
URL: https://github.com/apache/doris/pull/18686#discussion_r1172094084


##########
be/src/olap/memtable.cpp:
##########
@@ -243,11 +242,22 @@ void MemTable::_collect_vskiplist_results() {
     VecTable::Iterator it(_vec_skip_list.get());
     vectorized::Block in_block = _input_mutable_block.to_block();
     if (_keys_type == KeysType::DUP_KEYS) {
+        vectorized::MutableBlock mutable_block =
+                vectorized::MutableBlock::build_mutable_block(&in_block);
+        _vec_row_comparator->set_block(&mutable_block);
+        std::sort(_row_in_blocks.begin(), _row_in_blocks.end(),
+                  [this](const RowInBlock* l, const RowInBlock* r) -> bool {
+                      if ((*(this->_vec_row_comparator))(l, r) == 0) {
+                          return l->_row_pos - r->_row_pos > 0;
+                      } else {
+                          return (*(this->_vec_row_comparator))(l, r) < 0;

Review Comment:
   two times comparator?



##########
be/src/olap/memtable.cpp:
##########
@@ -191,8 +191,7 @@ void MemTable::_insert_one_row_from_block(RowInBlock* row_in_block) {
     _rows++;
     bool overwritten = false;
     if (_keys_type == KeysType::DUP_KEYS) {
-        // TODO: dup keys only need sort opertaion. Rethink skiplist is the beat way to sort columns?
-        _vec_skip_list->Insert(row_in_block, &overwritten);
+        // dup keys store in vector _row_in_blocks and sort it on flush stage

Review Comment:
   store row_in_blocks in vector.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] github-actions[bot] commented on pull request #18686: [performance](stream-load) use vector instead of skiplist when insert dup keys

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #18686:
URL: https://github.com/apache/doris/pull/18686#issuecomment-1512927976

   clang-tidy review says "All clean, LGTM! :+1:"


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org