You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@doris.apache.org by GitBox <gi...@apache.org> on 2022/05/09 04:21:31 UTC

[GitHub] [incubator-doris] spaces-X opened a new pull request, #9459: [stream-load-vec]: memtable flush only if necessary after aggregated

spaces-X opened a new pull request, #9459:
URL: https://github.com/apache/incubator-doris/pull/9459

   
   # Proposed changes
   
   When the input data in the current memtable is full, flush will be performed, and there is no logic to judge whether it is full after aggregation. 
   I try to supplement this logic in this pr and do some performance tests later.
   
   ## Problem Summary:
   
   Describe the overview of changes.
   
   ## Checklist(Required)
   
   1. Does it affect the original behavior: (Yes/No/I Don't know)
   2. Has unit tests been added: (Yes/No/No Need)
   3. Has document been added or modified: (Yes/No/No Need)
   4. Does it need to update dependencies: (Yes/No)
   5. Are there any changes that cannot be rolled back: (Yes/No)
   
   ## Further comments
   
   If this is a relatively large or complex change, kick off the discussion at [dev@doris.apache.org](mailto:dev@doris.apache.org) by explaining why you chose the solution you did and what alternatives you considered, etc...
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] spaces-X commented on a diff in pull request #9459: [stream-load-vec]: memtable flush only if necessary after aggregated

Posted by GitBox <gi...@apache.org>.
spaces-X commented on code in PR #9459:
URL: https://github.com/apache/incubator-doris/pull/9459#discussion_r870024960


##########
be/src/olap/memtable.cpp:
##########
@@ -263,22 +274,56 @@ vectorized::Block MemTable::_collect_vskiplist_results() {
                 auto function = _agg_functions[i];
                 function->insert_result_into(it.key()->_agg_places[i],
                                              *(_output_mutable_block.get_column_by_position(i)));
-                function->destroy(it.key()->_agg_places[i]);
+                if (final) {
+                    function->destroy(it.key()->_agg_places[i]);
+                }
             }
+            // re-index the row_pos in VSkipList
+            it.key()->_row_pos = idx;
+            idx++;
+        }
+        if (!final) {
+            _input_mutable_block.swap(_output_mutable_block);
+            //TODO(weixang):opt here.
+            _output_mutable_block = vectorized::MutableBlock::build_mutable_block(&in_block);
+            _output_mutable_block.clear_column_data();
+        }
+    }
+}
+
+void MemTable::shrink_memtable_by_agg() {
+    {
+        SCOPED_TIMER(_shrunk_agg_time);
+        if (_is_shrunk_by_agg) {
+            return;
         }
+        size_t old_size = _input_mutable_block.allocated_bytes();
+        _collect_vskiplist_to_output(false);
+        size_t new_size = _input_mutable_block.allocated_bytes();
+        // shrink mem usage of memetable after agged.
+        _mem_usage += new_size - old_size;
+        _mem_tracker->consume(new_size - old_size);
+        _is_shrunk_by_agg = true;
     }
-    return _output_mutable_block.to_block();
+}
+
+bool MemTable::is_full() {
+    return memory_usage() >= config::write_buffer_size;
 }
 
 Status MemTable::flush() {
-    VLOG_CRITICAL << "begin to flush memtable for tablet: " << _tablet_id
+    clock_t now = clock();

Review Comment:
   i pushed my debug code by mistake.
   These logs and profiles will be removed later.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] spaces-X commented on a diff in pull request #9459: [stream-load-vec]: memtable flush only if necessary after aggregated

Posted by GitBox <gi...@apache.org>.
spaces-X commented on code in PR #9459:
URL: https://github.com/apache/incubator-doris/pull/9459#discussion_r876825011


##########
be/src/olap/memtable.cpp:
##########
@@ -263,11 +266,43 @@ vectorized::Block MemTable::_collect_vskiplist_results() {
                 auto function = _agg_functions[i];
                 function->insert_result_into(it.key()->_agg_places[i],
                                              *(_output_mutable_block.get_column_by_position(i)));
-                function->destroy(it.key()->_agg_places[i]);
+                if constexpr (is_final) {
+                    function->destroy(it.key()->_agg_places[i]);
+                }
             }
+            // re-index the row_pos in VSkipList
+            it.key()->_row_pos = idx;
+            idx++;
         }
+        if constexpr (!is_final) {
+            size_t shrunked_after_agg = _output_mutable_block.allocated_bytes();
+            _mem_tracker->consume(shrunked_after_agg - _mem_usage);
+            _mem_usage = shrunked_after_agg;
+            _input_mutable_block.swap(_output_mutable_block);
+            //TODO(weixang):opt here.
+            std::unique_ptr<vectorized::Block> empty_input_block =
+                    std::move(in_block.create_same_struct_block(0));
+            _output_mutable_block =
+                    vectorized::MutableBlock::build_mutable_block(empty_input_block.get());
+            _output_mutable_block.clear_column_data();
+        }
+    }
+}
+
+void MemTable::shrink_memtable_by_agg() {
+    if (_is_shrunk_by_agg) {
+        return;
     }
-    return _output_mutable_block.to_block();
+    _collect_vskiplist_to_output<false>();
+    _is_shrunk_by_agg = true;
+}
+
+bool MemTable::is_flush() {
+    return memory_usage() >= config::write_buffer_size;
+}
+
+bool MemTable::need_to_agg() {
+    return memory_usage() >= config::memtable_max_buffer_size;

Review Comment:
   done



##########
be/src/olap/memtable.cpp:
##########
@@ -126,11 +126,12 @@ void MemTable::insert(const vectorized::Block* block, size_t row_pos, size_t num
         }
     }
     size_t cursor_in_mutableblock = _input_mutable_block.rows();
-    size_t oldsize = _input_mutable_block.allocated_bytes();
     _input_mutable_block.add_rows(block, row_pos, num_rows);
-    size_t newsize = _input_mutable_block.allocated_bytes();
-    _mem_usage += newsize - oldsize;
-    _mem_tracker->consume(newsize - oldsize);
+    size_t input_size = block->allocated_bytes() * num_rows / block->rows();

Review Comment:
   just to calculate input size of block 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] HappenLee commented on a diff in pull request #9459: [stream-load-vec]: memtable flush only if necessary after aggregated

Posted by GitBox <gi...@apache.org>.
HappenLee commented on code in PR #9459:
URL: https://github.com/apache/incubator-doris/pull/9459#discussion_r869979883


##########
be/src/olap/memtable.cpp:
##########
@@ -98,6 +99,7 @@ void MemTable::_init_agg_functions(const vectorized::Block* block) {
 MemTable::~MemTable() {
     std::for_each(_row_in_blocks.begin(), _row_in_blocks.end(), std::default_delete<RowInBlock>());
     _mem_tracker->release(_mem_usage);
+    print_profile();

Review Comment:
   should not print the profile of each time.



##########
be/src/olap/memtable.cpp:
##########
@@ -115,53 +117,61 @@ int MemTable::RowInBlockComparator::operator()(const RowInBlock* left,
 }
 
 void MemTable::insert(const vectorized::Block* block, size_t row_pos, size_t num_rows) {
-    if (_is_first_insertion) {
-        _is_first_insertion = false;
-        auto cloneBlock = block->clone_without_columns();
-        _input_mutable_block = vectorized::MutableBlock::build_mutable_block(&cloneBlock);
-        _vec_row_comparator->set_block(&_input_mutable_block);
-        _output_mutable_block = vectorized::MutableBlock::build_mutable_block(&cloneBlock);
-        if (_keys_type != KeysType::DUP_KEYS) {
-            _init_agg_functions(block);
+    {
+        SCOPED_TIMER(_insert_time);
+        if (_is_first_insertion) {
+            _is_first_insertion = false;
+            auto cloneBlock = block->clone_without_columns();
+            _input_mutable_block = vectorized::MutableBlock::build_mutable_block(&cloneBlock);
+            _vec_row_comparator->set_block(&_input_mutable_block);
+            _output_mutable_block = vectorized::MutableBlock::build_mutable_block(&cloneBlock);
+            if (_keys_type != KeysType::DUP_KEYS) {
+                _init_agg_functions(block);
+            }
+        }
+        size_t cursor_in_mutableblock = _input_mutable_block.rows();
+        size_t oldsize = _input_mutable_block.allocated_bytes();
+        _input_mutable_block.add_rows(block, row_pos, num_rows);
+        size_t newsize = _input_mutable_block.allocated_bytes();
+        _mem_usage += newsize - oldsize;
+        _mem_tracker->consume(newsize - oldsize);
+        // when new data inserted, the mem_usage of memtable should be re-shrunk again.
+        _is_shrunk_by_agg = false;
+
+        for (int i = 0; i < num_rows; i++) {
+            _row_in_blocks.emplace_back(new RowInBlock {cursor_in_mutableblock + i});
+            _insert_one_row_from_block(_row_in_blocks.back());
         }
-    }
-    size_t cursor_in_mutableblock = _input_mutable_block.rows();
-    size_t oldsize = _input_mutable_block.allocated_bytes();
-    _input_mutable_block.add_rows(block, row_pos, num_rows);
-    size_t newsize = _input_mutable_block.allocated_bytes();
-    _mem_usage += newsize - oldsize;
-    _mem_tracker->consume(newsize - oldsize);
-
-    for (int i = 0; i < num_rows; i++) {
-        _row_in_blocks.emplace_back(new RowInBlock {cursor_in_mutableblock + i});
-        _insert_one_row_from_block(_row_in_blocks.back());
     }
 }
 
 void MemTable::_insert_one_row_from_block(RowInBlock* row_in_block) {
-    _rows++;
-    bool overwritten = false;
-    if (_keys_type == KeysType::DUP_KEYS) {
-        // TODO: dup keys only need sort opertaion. Rethink skiplist is the beat way to sort columns?
-        _vec_skip_list->Insert(row_in_block, &overwritten);
-        DCHECK(!overwritten) << "Duplicate key model meet overwrite in SkipList";
-        return;
-    }
-
-    bool is_exist = _vec_skip_list->Find(row_in_block, &_vec_hint);
-    if (is_exist) {
-        _aggregate_two_row_in_block(row_in_block, _vec_hint.curr->key);
-    } else {
-        row_in_block->init_agg_places(_agg_functions, _schema->num_key_columns());
-        for (auto cid = _schema->num_key_columns(); cid < _schema->num_columns(); cid++) {
-            auto col_ptr = _input_mutable_block.mutable_columns()[cid].get();
-            auto place = row_in_block->_agg_places[cid];
-            _agg_functions[cid]->add(place,
-                                     const_cast<const doris::vectorized::IColumn**>(&col_ptr),
-                                     row_in_block->_row_pos, nullptr);
+    {
+        SCOPED_TIMER(_sort_agg_time);

Review Comment:
   Timers have a certain overhead and I think we should turn them on when we need to debug. And the timing of each line will greatly affect the execution efficiency, we need more efficient but not necessarily accurate timing



##########
be/src/olap/memtable.cpp:
##########
@@ -263,22 +274,56 @@ vectorized::Block MemTable::_collect_vskiplist_results() {
                 auto function = _agg_functions[i];
                 function->insert_result_into(it.key()->_agg_places[i],
                                              *(_output_mutable_block.get_column_by_position(i)));
-                function->destroy(it.key()->_agg_places[i]);
+                if (final) {
+                    function->destroy(it.key()->_agg_places[i]);
+                }
             }
+            // re-index the row_pos in VSkipList
+            it.key()->_row_pos = idx;
+            idx++;
+        }
+        if (!final) {
+            _input_mutable_block.swap(_output_mutable_block);
+            //TODO(weixang):opt here.
+            _output_mutable_block = vectorized::MutableBlock::build_mutable_block(&in_block);
+            _output_mutable_block.clear_column_data();
+        }
+    }
+}
+
+void MemTable::shrink_memtable_by_agg() {
+    {
+        SCOPED_TIMER(_shrunk_agg_time);
+        if (_is_shrunk_by_agg) {
+            return;
         }
+        size_t old_size = _input_mutable_block.allocated_bytes();
+        _collect_vskiplist_to_output(false);
+        size_t new_size = _input_mutable_block.allocated_bytes();
+        // shrink mem usage of memetable after agged.
+        _mem_usage += new_size - old_size;
+        _mem_tracker->consume(new_size - old_size);
+        _is_shrunk_by_agg = true;
     }
-    return _output_mutable_block.to_block();
+}
+
+bool MemTable::is_full() {
+    return memory_usage() >= config::write_buffer_size;
 }
 
 Status MemTable::flush() {
-    VLOG_CRITICAL << "begin to flush memtable for tablet: " << _tablet_id
+    clock_t now = clock();

Review Comment:
   do too many log operation



##########
be/src/olap/memtable.cpp:
##########
@@ -263,22 +274,56 @@ vectorized::Block MemTable::_collect_vskiplist_results() {
                 auto function = _agg_functions[i];
                 function->insert_result_into(it.key()->_agg_places[i],
                                              *(_output_mutable_block.get_column_by_position(i)));
-                function->destroy(it.key()->_agg_places[i]);
+                if (final) {

Review Comment:
   use a template and if constexpr to do the thing will more effective



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] HappenLee commented on a diff in pull request #9459: [stream-load-vec]: memtable flush only if necessary after aggregated

Posted by GitBox <gi...@apache.org>.
HappenLee commented on code in PR #9459:
URL: https://github.com/apache/incubator-doris/pull/9459#discussion_r878111822


##########
be/src/olap/memtable.cpp:
##########
@@ -263,11 +264,46 @@ vectorized::Block MemTable::_collect_vskiplist_results() {
                 auto function = _agg_functions[i];
                 function->insert_result_into(it.key()->_agg_places[i],
                                              *(_output_mutable_block.get_column_by_position(i)));
-                function->destroy(it.key()->_agg_places[i]);
+                if constexpr (is_final) {
+                    function->destroy(it.key()->_agg_places[i]);
+                }
             }
+            if constexpr (!is_final) {
+                // re-index the row_pos in VSkipList
+                it.key()->_row_pos = idx;
+                idx++;
+            }
+        }
+        if constexpr (!is_final) {
+            // if is not final, we collect the agg results to input_block and then continue to insert
+            size_t shrunked_after_agg = _output_mutable_block.allocated_bytes();
+            _mem_tracker->consume(shrunked_after_agg - _mem_usage);
+            _mem_usage = shrunked_after_agg;
+            _input_mutable_block.swap(_output_mutable_block);
+            //TODO(weixang):opt here.
+            std::unique_ptr<vectorized::Block> empty_input_block =
+                    std::move(in_block.create_same_struct_block(0));
+            _output_mutable_block =
+                    vectorized::MutableBlock::build_mutable_block(empty_input_block.get());
+            _output_mutable_block.clear_column_data();
         }
     }
-    return _output_mutable_block.to_block();
+}
+
+void MemTable::shrink_memtable_by_agg() {
+    if (_keys_type == KeysType::DUP_KEYS) {

Review Comment:
   DCHECK(_key_type != KeysType::DUP_KEYS) 
   
   the dup keys should not call this function



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] morningman commented on a diff in pull request #9459: [stream-load-vec]: memtable flush only if necessary after aggregated

Posted by GitBox <gi...@apache.org>.
morningman commented on code in PR #9459:
URL: https://github.com/apache/incubator-doris/pull/9459#discussion_r878861395


##########
be/src/common/config.h:
##########
@@ -464,6 +464,8 @@ CONF_Int32(memory_max_alignment, "16");
 // write buffer size before flush
 CONF_mInt64(write_buffer_size, "209715200");
 
+CONF_mInt64(memtable_max_buffer_size, "419430400");

Review Comment:
   Add comment for this new config



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] wangshuo128 commented on a diff in pull request #9459: [stream-load-vec]: memtable flush only if necessary after aggregated

Posted by GitBox <gi...@apache.org>.
wangshuo128 commented on code in PR #9459:
URL: https://github.com/apache/incubator-doris/pull/9459#discussion_r867666484


##########
be/src/olap/memtable.h:
##########
@@ -195,9 +199,11 @@ class MemTable {
     //for vectorized
     vectorized::MutableBlock _input_mutable_block;
     vectorized::MutableBlock _output_mutable_block;
-    vectorized::Block _collect_vskiplist_results();
+    void _collect_vskiplist_to_output(bool final);

Review Comment:
   nit: cpp has `final` keyword, it's better to use another name,  maybe `is_final`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] HappenLee commented on a diff in pull request #9459: [stream-load-vec]: memtable flush only if necessary after aggregated

Posted by GitBox <gi...@apache.org>.
HappenLee commented on code in PR #9459:
URL: https://github.com/apache/incubator-doris/pull/9459#discussion_r876733240


##########
be/src/olap/memtable.cpp:
##########
@@ -263,11 +266,43 @@ vectorized::Block MemTable::_collect_vskiplist_results() {
                 auto function = _agg_functions[i];
                 function->insert_result_into(it.key()->_agg_places[i],
                                              *(_output_mutable_block.get_column_by_position(i)));
-                function->destroy(it.key()->_agg_places[i]);
+                if constexpr (is_final) {
+                    function->destroy(it.key()->_agg_places[i]);
+                }
             }
+            // re-index the row_pos in VSkipList
+            it.key()->_row_pos = idx;
+            idx++;
         }
+        if constexpr (!is_final) {
+            size_t shrunked_after_agg = _output_mutable_block.allocated_bytes();
+            _mem_tracker->consume(shrunked_after_agg - _mem_usage);
+            _mem_usage = shrunked_after_agg;
+            _input_mutable_block.swap(_output_mutable_block);
+            //TODO(weixang):opt here.
+            std::unique_ptr<vectorized::Block> empty_input_block =
+                    std::move(in_block.create_same_struct_block(0));
+            _output_mutable_block =
+                    vectorized::MutableBlock::build_mutable_block(empty_input_block.get());
+            _output_mutable_block.clear_column_data();
+        }
+    }
+}
+
+void MemTable::shrink_memtable_by_agg() {
+    if (_is_shrunk_by_agg) {
+        return;
     }
-    return _output_mutable_block.to_block();
+    _collect_vskiplist_to_output<false>();
+    _is_shrunk_by_agg = true;
+}
+
+bool MemTable::is_flush() {
+    return memory_usage() >= config::write_buffer_size;
+}
+
+bool MemTable::need_to_agg() {
+    return memory_usage() >= config::memtable_max_buffer_size;

Review Comment:
   if is dup key, just call `is_flush()`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] github-actions[bot] commented on pull request #9459: [stream-load-vec]: memtable flush only if necessary after aggregated

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #9459:
URL: https://github.com/apache/incubator-doris/pull/9459#issuecomment-1132875352

   PR approved by anyone and no changes requested.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] HappenLee commented on a diff in pull request #9459: [stream-load-vec]: memtable flush only if necessary after aggregated

Posted by GitBox <gi...@apache.org>.
HappenLee commented on code in PR #9459:
URL: https://github.com/apache/incubator-doris/pull/9459#discussion_r869989038


##########
be/src/vec/core/block.cpp:
##########
@@ -846,6 +846,18 @@ size_t MutableBlock::rows() const {
     return 0;
 }
 
+void MutableBlock::swap(MutableBlock& another) noexcept {
+    _columns.swap(another._columns);
+    _data_types.swap(another._data_types);
+}
+
+void MutableBlock::swap(MutableBlock&& another) noexcept {

Review Comment:
   the function should not name `swap`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] github-actions[bot] commented on pull request #9459: [stream-load-vec]: memtable flush only if necessary after aggregated

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #9459:
URL: https://github.com/apache/incubator-doris/pull/9459#issuecomment-1136060229

   PR approved by at least one committer and no changes requested.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] spaces-X commented on a diff in pull request #9459: [stream-load-vec]: memtable flush only if necessary after aggregated

Posted by GitBox <gi...@apache.org>.
spaces-X commented on code in PR #9459:
URL: https://github.com/apache/incubator-doris/pull/9459#discussion_r874785677


##########
be/src/vec/core/block.cpp:
##########
@@ -846,6 +846,18 @@ size_t MutableBlock::rows() const {
     return 0;
 }
 
+void MutableBlock::swap(MutableBlock& another) noexcept {
+    _columns.swap(another._columns);
+    _data_types.swap(another._data_types);
+}
+
+void MutableBlock::swap(MutableBlock&& another) noexcept {

Review Comment:
   It kept the same name in`class block`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] spaces-X commented on a diff in pull request #9459: [stream-load-vec]: memtable flush only if necessary after aggregated

Posted by GitBox <gi...@apache.org>.
spaces-X commented on code in PR #9459:
URL: https://github.com/apache/incubator-doris/pull/9459#discussion_r876825481


##########
be/src/olap/delta_writer.cpp:
##########
@@ -222,9 +222,12 @@ Status DeltaWriter::write(const vectorized::Block* block, const std::vector<int>
         }
     }
 
-    if (_mem_table->memory_usage() >= config::write_buffer_size) {
-        RETURN_NOT_OK(_flush_memtable_async());
-        _reset_mem_table();
+    if (_mem_table->need_to_agg()) {
+        _mem_table->shrink_memtable_by_agg();
+        if (UNLIKELY(_mem_table->is_flush())) {

Review Comment:
   may be not sure, i remove this in the next commit



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] github-actions[bot] commented on pull request #9459: [stream-load-vec]: memtable flush only if necessary after aggregated

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #9459:
URL: https://github.com/apache/incubator-doris/pull/9459#issuecomment-1132875313

   PR approved by at least one committer and no changes requested.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] spaces-X commented on a diff in pull request #9459: [stream-load-vec]: memtable flush only if necessary after aggregated

Posted by GitBox <gi...@apache.org>.
spaces-X commented on code in PR #9459:
URL: https://github.com/apache/incubator-doris/pull/9459#discussion_r880541569


##########
be/src/common/config.h:
##########
@@ -464,6 +464,8 @@ CONF_Int32(memory_max_alignment, "16");
 // write buffer size before flush
 CONF_mInt64(write_buffer_size, "209715200");
 
+CONF_mInt64(memtable_max_buffer_size, "419430400");

Review Comment:
   done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] HappenLee commented on a diff in pull request #9459: [stream-load-vec]: memtable flush only if necessary after aggregated

Posted by GitBox <gi...@apache.org>.
HappenLee commented on code in PR #9459:
URL: https://github.com/apache/incubator-doris/pull/9459#discussion_r876523763


##########
be/src/olap/delta_writer.cpp:
##########
@@ -222,9 +222,12 @@ Status DeltaWriter::write(const vectorized::Block* block, const std::vector<int>
         }
     }
 
-    if (_mem_table->memory_usage() >= config::write_buffer_size) {
-        RETURN_NOT_OK(_flush_memtable_async());
-        _reset_mem_table();
+    if (_mem_table->need_to_agg()) {
+        _mem_table->shrink_memtable_by_agg();
+        if (UNLIKELY(_mem_table->is_flush())) {

Review Comment:
   why here is unlikely? if unlikely means need_to_agg config is not suitiable? 



##########
be/src/olap/memtable.cpp:
##########
@@ -126,11 +126,12 @@ void MemTable::insert(const vectorized::Block* block, size_t row_pos, size_t num
         }
     }
     size_t cursor_in_mutableblock = _input_mutable_block.rows();
-    size_t oldsize = _input_mutable_block.allocated_bytes();
     _input_mutable_block.add_rows(block, row_pos, num_rows);
-    size_t newsize = _input_mutable_block.allocated_bytes();
-    _mem_usage += newsize - oldsize;
-    _mem_tracker->consume(newsize - oldsize);
+    size_t input_size = block->allocated_bytes() * num_rows / block->rows();

Review Comment:
   what the logic means ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] HappenLee merged pull request #9459: [stream-load-vec]: memtable flush only if necessary after aggregated

Posted by GitBox <gi...@apache.org>.
HappenLee merged PR #9459:
URL: https://github.com/apache/incubator-doris/pull/9459


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org