You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@doris.apache.org by GitBox <gi...@apache.org> on 2022/04/20 04:45:52 UTC

[GitHub] [incubator-doris] yangzhg opened a new pull request, #9123: [enhancement](load) optimize load string data and dict page write

yangzhg opened a new pull request, #9123:
URL: https://github.com/apache/incubator-doris/pull/9123

   # Proposed changes
   
   1.  reduce condition check when memtable insert
   2. reduce virtual function call of BinaryDictPageBuilder
   3. remove temparte `_dict_items`  add to `dict_builder` directly
   
   ## Problem Summary:
   
   Describe the overview of changes.
   
   ## Checklist(Required)
   
   1. Does it affect the original behavior: (Yes/No/I Don't know)
   4. Has unit tests been added: (Yes/No/No Need)
   5. Has document been added or modified: (Yes/No/No Need)
   6. Does it need to update dependencies: (Yes/No)
   7. Are there any changes that cannot be rolled back: (Yes/No)
   
   ## Further comments
   
   If this is a relatively large or complex change, kick off the discussion at [dev@doris.apache.org](mailto:dev@doris.apache.org) by explaining why you chose the solution you did and what alternatives you considered, etc...
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] github-actions[bot] commented on pull request #9123: [enhancement](load) optimize load string data and dict page write

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #9123:
URL: https://github.com/apache/incubator-doris/pull/9123#issuecomment-1120097816

   PR approved by at least one committer and no changes requested.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] zbtzbtzbt commented on a diff in pull request #9123: [enhancement](load) optimize load string data and dict page write

Posted by GitBox <gi...@apache.org>.
zbtzbtzbt commented on code in PR #9123:
URL: https://github.com/apache/incubator-doris/pull/9123#discussion_r855141656


##########
be/src/olap/rowset/segment_v2/binary_plain_page.h:
##########
@@ -47,14 +47,13 @@ namespace segment_v2 {
 class BinaryPlainPageBuilder : public PageBuilder {
 public:
     BinaryPlainPageBuilder(const PageBuilderOptions& options)
-            : _size_estimate(0), _prepared_size(0), _options(options) {
+            : _size_estimate(0), _options(options) {
         reset();
     }
 
     bool is_page_full() override {
         // data_page_size is 0, do not limit the page size
-        return _options.data_page_size != 0 && (_size_estimate > _options.data_page_size ||
-                                                _prepared_size > _options.data_page_size);
+        return _options.data_page_size != 0 && (_size_estimate > _options.data_page_size);

Review Comment:
   better
   `return (_options.data_page_size != 0 && _size_estimate > _options.data_page_size);`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] yangzhg commented on a diff in pull request #9123: [enhancement](load) optimize load string data and dict page write

Posted by GitBox <gi...@apache.org>.
yangzhg commented on code in PR #9123:
URL: https://github.com/apache/incubator-doris/pull/9123#discussion_r855706646


##########
be/src/olap/rowset/segment_v2/binary_dict_page.cpp:
##########
@@ -90,13 +92,18 @@ Status BinaryDictPageBuilder::add(const uint8_t* vals, size_t* count) {
                     dict_item.relocate(item_mem);
                 }
                 value_code = _dictionary.size();
+                size_t add_count = 1;
+                RETURN_IF_ERROR(_dict_builder->add(reinterpret_cast<const uint8_t*>(&dict_item),
+                                                   &add_count));
+                if (add_count == 0) {
+                    // current dict page is full, stop processing remaining inputs
+                    break;
+                }
                 _dictionary.emplace(dict_item, value_code);
-                _dict_items.push_back(dict_item);
-                _dict_builder->update_prepared_size(dict_item.size);
             }
             size_t add_count = 1;
-            RETURN_IF_ERROR(_data_page_builder->add(reinterpret_cast<const uint8_t*>(&value_code),
-                                                    &add_count));
+            RETURN_IF_ERROR(actual_builder->single_add(

Review Comment:
   The reason for adding add internal is that I want to unify the logic of add and single add, and I don't want to add a separate function, so add internal doesn't want to be exposed for use



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] zbtzbtzbt commented on a diff in pull request #9123: [enhancement](load) optimize load string data and dict page write

Posted by GitBox <gi...@apache.org>.
zbtzbtzbt commented on code in PR #9123:
URL: https://github.com/apache/incubator-doris/pull/9123#discussion_r855124864


##########
be/src/olap/rowset/segment_v2/binary_plain_page.h:
##########
@@ -127,11 +125,16 @@ class BinaryPlainPageBuilder : public PageBuilder {
         return Status::OK();
     }
 
-    void update_prepared_size(size_t added_size) {
-        _prepared_size += added_size;
-        _prepared_size += sizeof(uint32_t);
+    inline Slice operator[](std::size_t idx) const {

Review Comment:
   size_t



##########
be/src/olap/rowset/segment_v2/bitshuffle_page.h:
##########
@@ -93,13 +93,52 @@ class BitshufflePageBuilder : public PageBuilder {
     bool is_page_full() override { return _remain_element_capacity == 0; }
 
     Status add(const uint8_t* vals, size_t* count) override {
+        return add_internal<false>(vals, count);
+    }
+
+    Status single_add(const uint8_t* vals, size_t* count) {

Review Comment:
   can we delete this function?



##########
be/src/olap/rowset/segment_v2/bitshuffle_page.h:
##########
@@ -93,13 +93,52 @@ class BitshufflePageBuilder : public PageBuilder {
     bool is_page_full() override { return _remain_element_capacity == 0; }
 
     Status add(const uint8_t* vals, size_t* count) override {
+        return add_internal<false>(vals, count);

Review Comment:
   can we delete this function?



##########
be/src/olap/memtable.cpp:
##########
@@ -65,44 +75,46 @@ int MemTable::RowCursorComparator::operator()(const char* left, const char* righ
     return compare_row(lhs_row, rhs_row);
 }
 
-void MemTable::insert(const Tuple* tuple) {
+// For non-DUP models, for the data rows passed from the upper layer, when copying the data,
+// we first allocate from _buffer_mem_pool, and then check whether it already exists in
+// _skiplist.  If it exists, we aggregate the new row into the row in skiplist.
+// otherwise, we need to copy it into _table_mem_pool before we can insert it.
+void MemTable::_insert_agg(const Tuple* tuple) {

Review Comment:
   we may change a name, such as `_insert_no_dup`? and put  `_insert_dup` before `_insert_no_dup`.
   becasue this kind of `insert` agg and unique data.
   



##########
be/src/olap/rowset/segment_v2/bitshuffle_page.h:
##########
@@ -93,13 +93,52 @@ class BitshufflePageBuilder : public PageBuilder {
     bool is_page_full() override { return _remain_element_capacity == 0; }
 
     Status add(const uint8_t* vals, size_t* count) override {
+        return add_internal<false>(vals, count);
+    }
+
+    Status single_add(const uint8_t* vals, size_t* count) {
+        return add_internal<true>(vals, count);
+    }
+
+    template <bool single>
+    inline Status add_internal(const uint8_t* vals, size_t* count) {
         DCHECK(!_finished);
+        if (_remain_element_capacity <= 0) {
+            *count = 0;
+            return Status::RuntimeError("page is full.");
+        }
         int to_add = std::min<int>(_remain_element_capacity, *count);
-        _data.append(vals, to_add * SIZE_OF_TYPE);
+        int to_add_size = to_add * SIZE_OF_TYPE;
+        size_t orig_size = _data.size();
+        _data.resize(orig_size + to_add_size);
         _count += to_add;
         _remain_element_capacity -= to_add;
         // return added number through count
         *count = to_add;
+        if constexpr (single) {
+            if constexpr (SIZE_OF_TYPE == 1) {
+                *reinterpret_cast<uint8_t*>(&_data[orig_size]) = *vals;
+                return Status::OK();
+            }
+            if constexpr (SIZE_OF_TYPE == 2) {
+                *reinterpret_cast<uint16_t*>(&_data[orig_size]) =
+                        *reinterpret_cast<const uint16_t*>(vals);
+                return Status::OK();
+            }
+            if constexpr (SIZE_OF_TYPE == 4) {
+                *reinterpret_cast<uint32_t*>(&_data[orig_size]) =
+                        *reinterpret_cast<const uint32_t*>(vals);
+                return Status::OK();
+            }
+            if constexpr (SIZE_OF_TYPE == 8) {
+                *reinterpret_cast<uint64_t*>(&_data[orig_size]) =
+                        *reinterpret_cast<const uint64_t*>(vals);
+                return Status::OK();
+            }
+            memcpy(&_data[orig_size], vals, to_add_size);

Review Comment:
   else



##########
be/src/olap/rowset/segment_v2/binary_dict_page.cpp:
##########
@@ -90,13 +92,18 @@ Status BinaryDictPageBuilder::add(const uint8_t* vals, size_t* count) {
                     dict_item.relocate(item_mem);
                 }
                 value_code = _dictionary.size();
+                size_t add_count = 1;
+                RETURN_IF_ERROR(_dict_builder->add(reinterpret_cast<const uint8_t*>(&dict_item),
+                                                   &add_count));
+                if (add_count == 0) {
+                    // current dict page is full, stop processing remaining inputs
+                    break;
+                }
                 _dictionary.emplace(dict_item, value_code);
-                _dict_items.push_back(dict_item);
-                _dict_builder->update_prepared_size(dict_item.size);
             }
             size_t add_count = 1;
-            RETURN_IF_ERROR(_data_page_builder->add(reinterpret_cast<const uint8_t*>(&value_code),
-                                                    &add_count));
+            RETURN_IF_ERROR(actual_builder->single_add(

Review Comment:
   use `add_internal<true>`



##########
be/src/olap/rowset/segment_v2/bitshuffle_page.h:
##########
@@ -93,13 +93,52 @@ class BitshufflePageBuilder : public PageBuilder {
     bool is_page_full() override { return _remain_element_capacity == 0; }
 
     Status add(const uint8_t* vals, size_t* count) override {
+        return add_internal<false>(vals, count);
+    }
+
+    Status single_add(const uint8_t* vals, size_t* count) {
+        return add_internal<true>(vals, count);
+    }
+
+    template <bool single>
+    inline Status add_internal(const uint8_t* vals, size_t* count) {
         DCHECK(!_finished);
+        if (_remain_element_capacity <= 0) {
+            *count = 0;
+            return Status::RuntimeError("page is full.");
+        }
         int to_add = std::min<int>(_remain_element_capacity, *count);
-        _data.append(vals, to_add * SIZE_OF_TYPE);
+        int to_add_size = to_add * SIZE_OF_TYPE;
+        size_t orig_size = _data.size();
+        _data.resize(orig_size + to_add_size);
         _count += to_add;
         _remain_element_capacity -= to_add;
         // return added number through count
         *count = to_add;
+        if constexpr (single) {
+            if constexpr (SIZE_OF_TYPE == 1) {
+                *reinterpret_cast<uint8_t*>(&_data[orig_size]) = *vals;
+                return Status::OK();
+            }
+            if constexpr (SIZE_OF_TYPE == 2) {

Review Comment:
   else if constexpr



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] github-actions[bot] commented on pull request #9123: [enhancement](load) optimize load string data and dict page write

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #9123:
URL: https://github.com/apache/incubator-doris/pull/9123#issuecomment-1117238270

   PR approved by anyone and no changes requested.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] zbtzbtzbt commented on a diff in pull request #9123: [enhancement](load) optimize load string data and dict page write

Posted by GitBox <gi...@apache.org>.
zbtzbtzbt commented on code in PR #9123:
URL: https://github.com/apache/incubator-doris/pull/9123#discussion_r855722235


##########
be/src/olap/rowset/segment_v2/binary_plain_page.h:
##########
@@ -47,14 +47,13 @@ namespace segment_v2 {
 class BinaryPlainPageBuilder : public PageBuilder {
 public:
     BinaryPlainPageBuilder(const PageBuilderOptions& options)
-            : _size_estimate(0), _prepared_size(0), _options(options) {
+            : _size_estimate(0), _options(options) {
         reset();
     }
 
     bool is_page_full() override {
         // data_page_size is 0, do not limit the page size
-        return _options.data_page_size != 0 && (_size_estimate > _options.data_page_size ||
-                                                _prepared_size > _options.data_page_size);
+        return _options.data_page_size != 0 && (_size_estimate > _options.data_page_size);

Review Comment:
   we might remove the `( )`
   `return _options.data_page_size != 0 && _size_estimate > _options.data_page_size;`
   
   or
   `return (_options.data_page_size != 0) && (_size_estimate > _options.data_page_size);`
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] yangzhg merged pull request #9123: [enhancement](load) optimize load string data and dict page write

Posted by GitBox <gi...@apache.org>.
yangzhg merged PR #9123:
URL: https://github.com/apache/incubator-doris/pull/9123


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] github-actions[bot] commented on pull request #9123: [enhancement](load) optimize load string data and dict page write

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #9123:
URL: https://github.com/apache/incubator-doris/pull/9123#issuecomment-1117238230

   PR approved by at least one committer and no changes requested.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] yangzhg commented on a diff in pull request #9123: [enhancement](load) optimize load string data and dict page write

Posted by GitBox <gi...@apache.org>.
yangzhg commented on code in PR #9123:
URL: https://github.com/apache/incubator-doris/pull/9123#discussion_r855707706


##########
be/src/olap/rowset/segment_v2/bitshuffle_page.h:
##########
@@ -93,13 +93,52 @@ class BitshufflePageBuilder : public PageBuilder {
     bool is_page_full() override { return _remain_element_capacity == 0; }
 
     Status add(const uint8_t* vals, size_t* count) override {
+        return add_internal<false>(vals, count);
+    }
+
+    Status single_add(const uint8_t* vals, size_t* count) {
+        return add_internal<true>(vals, count);
+    }
+
+    template <bool single>
+    inline Status add_internal(const uint8_t* vals, size_t* count) {
         DCHECK(!_finished);
+        if (_remain_element_capacity <= 0) {
+            *count = 0;
+            return Status::RuntimeError("page is full.");
+        }
         int to_add = std::min<int>(_remain_element_capacity, *count);
-        _data.append(vals, to_add * SIZE_OF_TYPE);
+        int to_add_size = to_add * SIZE_OF_TYPE;
+        size_t orig_size = _data.size();
+        _data.resize(orig_size + to_add_size);
         _count += to_add;
         _remain_element_capacity -= to_add;
         // return added number through count
         *count = to_add;
+        if constexpr (single) {
+            if constexpr (SIZE_OF_TYPE == 1) {
+                *reinterpret_cast<uint8_t*>(&_data[orig_size]) = *vals;
+                return Status::OK();
+            }
+            if constexpr (SIZE_OF_TYPE == 2) {

Review Comment:
   else is no need this is like switch case  every if has a return and,and there is no overlap between if statements



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] yangzhg commented on a diff in pull request #9123: [enhancement](load) optimize load string data and dict page write

Posted by GitBox <gi...@apache.org>.
yangzhg commented on code in PR #9123:
URL: https://github.com/apache/incubator-doris/pull/9123#discussion_r855705856


##########
be/src/olap/memtable.cpp:
##########
@@ -65,44 +75,46 @@ int MemTable::RowCursorComparator::operator()(const char* left, const char* righ
     return compare_row(lhs_row, rhs_row);
 }
 
-void MemTable::insert(const Tuple* tuple) {
+// For non-DUP models, for the data rows passed from the upper layer, when copying the data,
+// we first allocate from _buffer_mem_pool, and then check whether it already exists in
+// _skiplist.  If it exists, we aggregate the new row into the row in skiplist.
+// otherwise, we need to copy it into _table_mem_pool before we can insert it.
+void MemTable::_insert_agg(const Tuple* tuple) {

Review Comment:
   The unique table is essentially an Agg table where the aggregation function on the value column is replace, and the use of unique is only for the convenience of optimization.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] yangzhg commented on a diff in pull request #9123: [enhancement](load) optimize load string data and dict page write

Posted by GitBox <gi...@apache.org>.
yangzhg commented on code in PR #9123:
URL: https://github.com/apache/incubator-doris/pull/9123#discussion_r855703713


##########
be/src/olap/rowset/segment_v2/binary_plain_page.h:
##########
@@ -47,14 +47,13 @@ namespace segment_v2 {
 class BinaryPlainPageBuilder : public PageBuilder {
 public:
     BinaryPlainPageBuilder(const PageBuilderOptions& options)
-            : _size_estimate(0), _prepared_size(0), _options(options) {
+            : _size_estimate(0), _options(options) {
         reset();
     }
 
     bool is_page_full() override {
         // data_page_size is 0, do not limit the page size
-        return _options.data_page_size != 0 && (_size_estimate > _options.data_page_size ||
-                                                _prepared_size > _options.data_page_size);
+        return _options.data_page_size != 0 && (_size_estimate > _options.data_page_size);

Review Comment:
   why ?
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] yangzhg commented on a diff in pull request #9123: [enhancement](load) optimize load string data and dict page write

Posted by GitBox <gi...@apache.org>.
yangzhg commented on code in PR #9123:
URL: https://github.com/apache/incubator-doris/pull/9123#discussion_r864724071


##########
be/src/olap/rowset/segment_v2/bitshuffle_page.h:
##########
@@ -93,13 +93,48 @@ class BitshufflePageBuilder : public PageBuilder {
     bool is_page_full() override { return _remain_element_capacity == 0; }
 
     Status add(const uint8_t* vals, size_t* count) override {
+        return add_internal<false>(vals, count);
+    }
+
+    Status single_add(const uint8_t* vals, size_t* count) {
+        return add_internal<true>(vals, count);
+    }
+
+    template <bool single>
+    inline Status add_internal(const uint8_t* vals, size_t* count) {

Review Comment:
   use assign instead of memcpy



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] morningman commented on a diff in pull request #9123: [enhancement](load) optimize load string data and dict page write

Posted by GitBox <gi...@apache.org>.
morningman commented on code in PR #9123:
URL: https://github.com/apache/incubator-doris/pull/9123#discussion_r863887284


##########
be/src/olap/rowset/segment_v2/bitshuffle_page.h:
##########
@@ -93,13 +93,48 @@ class BitshufflePageBuilder : public PageBuilder {
     bool is_page_full() override { return _remain_element_capacity == 0; }
 
     Status add(const uint8_t* vals, size_t* count) override {
+        return add_internal<false>(vals, count);
+    }
+
+    Status single_add(const uint8_t* vals, size_t* count) {
+        return add_internal<true>(vals, count);
+    }
+
+    template <bool single>
+    inline Status add_internal(const uint8_t* vals, size_t* count) {

Review Comment:
   Why this implement faster than `_data.append`?



##########
be/src/olap/memtable.cpp:
##########
@@ -65,44 +75,46 @@ int MemTable::RowCursorComparator::operator()(const char* left, const char* righ
     return compare_row(lhs_row, rhs_row);
 }
 
-void MemTable::insert(const Tuple* tuple) {
+// For non-DUP models, for the data rows passed from the upper layer, when copying the data,
+// we first allocate from _buffer_mem_pool, and then check whether it already exists in
+// _skiplist.  If it exists, we aggregate the new row into the row in skiplist.
+// otherwise, we need to copy it into _table_mem_pool before we can insert it.
+void MemTable::_insert_agg(const Tuple* tuple) {
     _rows++;
-    bool overwritten = false;
-    uint8_t* _tuple_buf = nullptr;
-    if (_keys_type == KeysType::DUP_KEYS) {
-        // Will insert directly, so use memory from _table_mem_pool
-        _tuple_buf = _table_mem_pool->allocate(_schema_size);
-        ContiguousRow row(_schema, _tuple_buf);
-        _tuple_to_row(tuple, &row, _table_mem_pool.get());
-        _skip_list->Insert((TableKey)_tuple_buf, &overwritten);
-        DCHECK(!overwritten) << "Duplicate key model meet overwrite in SkipList";
-        return;
-    }
+    uint8_t* tuple_buf = nullptr;

Review Comment:
   ```suggestion
       uint8_t* tuple_buf = _buffer_mem_pool->allocate(_schema_size);
   ```
   And remove following line.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] yangzhg commented on a diff in pull request #9123: [enhancement](load) optimize load string data and dict page write

Posted by GitBox <gi...@apache.org>.
yangzhg commented on code in PR #9123:
URL: https://github.com/apache/incubator-doris/pull/9123#discussion_r855706783


##########
be/src/olap/rowset/segment_v2/bitshuffle_page.h:
##########
@@ -93,13 +93,52 @@ class BitshufflePageBuilder : public PageBuilder {
     bool is_page_full() override { return _remain_element_capacity == 0; }
 
     Status add(const uint8_t* vals, size_t* count) override {
+        return add_internal<false>(vals, count);
+    }
+
+    Status single_add(const uint8_t* vals, size_t* count) {

Review Comment:
   No,The reason for adding add internal is that I want to unify the logic of add and single add, and I don't want to add a separate function, so add internal doesn't want to be exposed for use



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] zbtzbtzbt commented on a diff in pull request #9123: [enhancement](load) optimize load string data and dict page write

Posted by GitBox <gi...@apache.org>.
zbtzbtzbt commented on code in PR #9123:
URL: https://github.com/apache/incubator-doris/pull/9123#discussion_r855719343


##########
be/src/olap/memtable.cpp:
##########
@@ -65,44 +75,46 @@ int MemTable::RowCursorComparator::operator()(const char* left, const char* righ
     return compare_row(lhs_row, rhs_row);
 }
 
-void MemTable::insert(const Tuple* tuple) {
+// For non-DUP models, for the data rows passed from the upper layer, when copying the data,
+// we first allocate from _buffer_mem_pool, and then check whether it already exists in
+// _skiplist.  If it exists, we aggregate the new row into the row in skiplist.
+// otherwise, we need to copy it into _table_mem_pool before we can insert it.
+void MemTable::_insert_agg(const Tuple* tuple) {

Review Comment:
   ok



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] yangzhg commented on a diff in pull request #9123: [enhancement](load) optimize load string data and dict page write

Posted by GitBox <gi...@apache.org>.
yangzhg commented on code in PR #9123:
URL: https://github.com/apache/incubator-doris/pull/9123#discussion_r855730553


##########
be/src/olap/rowset/segment_v2/bitshuffle_page.h:
##########
@@ -93,13 +93,52 @@ class BitshufflePageBuilder : public PageBuilder {
     bool is_page_full() override { return _remain_element_capacity == 0; }
 
     Status add(const uint8_t* vals, size_t* count) override {
+        return add_internal<false>(vals, count);

Review Comment:
   no add is pure virtual



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org