You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@doris.apache.org by GitBox <gi...@apache.org> on 2022/05/26 07:30:09 UTC

[GitHub] [incubator-doris] yiguolei opened a new pull request, #9792: Opt scannode

yiguolei opened a new pull request, #9792:
URL: https://github.com/apache/incubator-doris/pull/9792

   # Proposed changes
   
   Issue Number: close #xxx
   
   ## Problem Summary:
   
   Describe the overview of changes.
   
   ## Checklist(Required)
   
   1. Does it affect the original behavior: (Yes/No/I Don't know)
   2. Has unit tests been added: (Yes/No/No Need)
   3. Has document been added or modified: (Yes/No/No Need)
   4. Does it need to update dependencies: (Yes/No)
   5. Are there any changes that cannot be rolled back: (Yes/No)
   
   ## Further comments
   
   If this is a relatively large or complex change, kick off the discussion at [dev@doris.apache.org](mailto:dev@doris.apache.org) by explaining why you chose the solution you did and what alternatives you considered, etc...
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] yiguolei merged pull request #9792: [Improvement] optimize scannode concurrency query performance in vectorized engine.

Posted by GitBox <gi...@apache.org>.
yiguolei merged PR #9792:
URL: https://github.com/apache/incubator-doris/pull/9792


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] yiguolei commented on a diff in pull request #9792: [Improvement] optimize scannode concurrency query performance in vectorized engine.

Posted by GitBox <gi...@apache.org>.
yiguolei commented on code in PR #9792:
URL: https://github.com/apache/incubator-doris/pull/9792#discussion_r884138466


##########
be/src/olap/column_predicate.h:
##########
@@ -91,6 +91,9 @@ class ColumnPredicate {
 protected:
     uint32_t _column_id;
     bool _opposite;
+    mutable uint64_t _evaluated_rows = 1;

Review Comment:
   Yes. My mistake.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] yiguolei commented on a diff in pull request #9792: [Improvement] optimize scannode concurrency query performance in vectorized engine.

Posted by GitBox <gi...@apache.org>.
yiguolei commented on code in PR #9792:
URL: https://github.com/apache/incubator-doris/pull/9792#discussion_r884138891


##########
be/src/exec/olap_scan_node.cpp:
##########
@@ -689,11 +692,11 @@ Status OlapScanNode::build_scan_key() {
     return Status::OK();
 }
 
-static Status get_hints(TabletSharedPtr table, const TPaloScanRange& scan_range,
-                        int block_row_count, bool is_begin_include, bool is_end_include,
-                        const std::vector<std::unique_ptr<OlapScanRange>>& scan_key_range,
-                        std::vector<std::unique_ptr<OlapScanRange>>* sub_scan_range,
-                        RuntimeProfile* profile) {
+Status OlapScanNode::get_hints(TabletSharedPtr table, const TPaloScanRange& scan_range,

Review Comment:
   I do not like static method. I think we should use member function as much as possible. And I will use it in volapscannode. It is only defined in olap_scan_node.cpp .



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] morningman commented on a diff in pull request #9792: [Improvement] optimize scannode concurrency query performance in vectorized engine.

Posted by GitBox <gi...@apache.org>.
morningman commented on code in PR #9792:
URL: https://github.com/apache/incubator-doris/pull/9792#discussion_r884239554


##########
be/src/vec/exec/volap_scan_node.cpp:
##########
@@ -554,8 +575,11 @@ Block* VOlapScanNode::_alloc_block(bool& get_free_block) {
 int VOlapScanNode::_start_scanner_thread_task(RuntimeState* state, int block_per_scanner) {
     std::list<VOlapScanner*> olap_scanners;
     int assigned_thread_num = _running_thread;
-    size_t max_thread = std::min(_volap_scanners.size(),
-                                 static_cast<size_t>(config::doris_scanner_thread_pool_thread_num));
+    size_t max_thread = config::doris_scanner_queue_size;
+    if (config::doris_scanner_row_num > state->batch_size()) {

Review Comment:
   This logic need refine. Actually, no one would change this `doris_scanner_row_num` config, and this `if` condition is always true. So that this `if` condition is meaningless.



##########
be/src/common/config.h:
##########
@@ -746,6 +746,10 @@ CONF_Int32(object_pool_buffer_size, "100");
 // ParquetReaderWrap prefetch buffer size
 CONF_Int32(parquet_reader_max_buffer_size, "50");
 
+// When the rows number reached this limit, will check the filter rate the of bloomfilter
+// if it is lower than a specific threshold, the predicate will be disabled.
+CONF_Int32(bloom_filter_predicate_check_row_num, "1000");

Review Comment:
   ```suggestion
   CONF_mInt32(bloom_filter_predicate_check_row_num, "1000");
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] github-actions[bot] commented on pull request #9792: [Improvement] optimize scannode concurrency query performance in vectorized engine.

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #9792:
URL: https://github.com/apache/incubator-doris/pull/9792#issuecomment-1140834282

   PR approved by anyone and no changes requested.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] github-actions[bot] commented on pull request #9792: [Improvement] optimize scannode concurrency query performance in vectorized engine.

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #9792:
URL: https://github.com/apache/incubator-doris/pull/9792#issuecomment-1140834250

   PR approved by at least one committer and no changes requested.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] HappenLee commented on a diff in pull request #9792: [Improvement] optimize scannode concurrency query performance in vectorized engine.

Posted by GitBox <gi...@apache.org>.
HappenLee commented on code in PR #9792:
URL: https://github.com/apache/incubator-doris/pull/9792#discussion_r884111183


##########
be/src/exec/olap_scanner.cpp:
##########
@@ -327,6 +327,7 @@ Status OlapScanner::get_batch(RuntimeState* state, RowBatch* batch, bool* eof) {
                 break;
             }
 
+            SCOPED_TIMER(_parent->_eval_conjuctx_timer);

Review Comment:
   should not do timer each rows, could cause performance problem



##########
be/src/olap/column_predicate.h:
##########
@@ -91,6 +91,9 @@ class ColumnPredicate {
 protected:
     uint32_t _column_id;
     bool _opposite;
+    mutable uint64_t _evaluated_rows = 1;

Review Comment:
   the value only use in bloom filter predicate. it should not define here?



##########
be/src/vec/exec/volap_scan_node.cpp:
##########
@@ -554,8 +576,11 @@ Block* VOlapScanNode::_alloc_block(bool& get_free_block) {
 int VOlapScanNode::_start_scanner_thread_task(RuntimeState* state, int block_per_scanner) {
     std::list<VOlapScanner*> olap_scanners;
     int assigned_thread_num = _running_thread;
-    size_t max_thread = std::min(_volap_scanners.size(),
-                                 static_cast<size_t>(config::doris_scanner_thread_pool_thread_num));
+    size_t max_thread = config::doris_scanner_queue_size;

Review Comment:
   If this code change is not sure to be effective, I think it can not be modified first



##########
be/src/olap/bloom_filter_predicate.h:
##########
@@ -158,6 +160,16 @@ void BloomFilterColumnPredicate<T>::evaluate(vectorized::IColumn& column, uint16
             new_size += _specific_filter->find_olap_engine(cell_value);
         }
     }
+    // If the pass rate is very high, for example > 50%, then the bloomfilter is useless.
+    // Some bloomfilter is useless, for example ssb 4.3, it consumes a lot of cpu but it is
+    // useless.
+    _evaluated_rows += *size;
+    _passed_rows += new_size;
+    if (_evaluated_rows > 100) {

Review Comment:
   `100` should be a config of BE



##########
be/src/exec/olap_scan_node.cpp:
##########
@@ -689,11 +692,11 @@ Status OlapScanNode::build_scan_key() {
     return Status::OK();
 }
 
-static Status get_hints(TabletSharedPtr table, const TPaloScanRange& scan_range,
-                        int block_row_count, bool is_begin_include, bool is_end_include,
-                        const std::vector<std::unique_ptr<OlapScanRange>>& scan_key_range,
-                        std::vector<std::unique_ptr<OlapScanRange>>* sub_scan_range,
-                        RuntimeProfile* profile) {
+Status OlapScanNode::get_hints(TabletSharedPtr table, const TPaloScanRange& scan_range,

Review Comment:
   why change the code to not static



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] yiguolei commented on a diff in pull request #9792: [Improvement] optimize scannode concurrency query performance in vectorized engine.

Posted by GitBox <gi...@apache.org>.
yiguolei commented on code in PR #9792:
URL: https://github.com/apache/incubator-doris/pull/9792#discussion_r884138206


##########
be/src/vec/exec/volap_scan_node.cpp:
##########
@@ -554,8 +576,11 @@ Block* VOlapScanNode::_alloc_block(bool& get_free_block) {
 int VOlapScanNode::_start_scanner_thread_task(RuntimeState* state, int block_per_scanner) {
     std::list<VOlapScanner*> olap_scanners;
     int assigned_thread_num = _running_thread;
-    size_t max_thread = std::min(_volap_scanners.size(),
-                                 static_cast<size_t>(config::doris_scanner_thread_pool_thread_num));
+    size_t max_thread = config::doris_scanner_queue_size;

Review Comment:
   The previous logic is wrong, For example, if assiged thread num = 13 and volap_scanner.size = 5.
   thread_slot_num = std::min(thread_slot_num, max_thread - assigned_thread_num);
                   if (thread_slot_num <= 0) {
                       thread_slot_num = 1;
                   }
   This logic will set thread_slot_num = 1. Will not put as many as task to queue. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] Gabriel39 commented on a diff in pull request #9792: Opt scannode

Posted by GitBox <gi...@apache.org>.
Gabriel39 commented on code in PR #9792:
URL: https://github.com/apache/incubator-doris/pull/9792#discussion_r882702083


##########
be/src/olap/rowset/segment_v2/segment_iterator.cpp:
##########
@@ -705,6 +704,9 @@ void SegmentIterator::_vec_init_lazy_materialization() {
     //  _schema.column_ids() stands for storage layer block schema, so it contains delete columnid
     //  we just regard delete column as common pred column here.
     if (_schema.column_ids().size() > pred_column_ids.size()) {
+        if (pred_column_ids.size() > 0) {

Review Comment:
   remove for-loop codes below



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] yiguolei commented on a diff in pull request #9792: [Improvement] optimize scannode concurrency query performance in vectorized engine.

Posted by GitBox <gi...@apache.org>.
yiguolei commented on code in PR #9792:
URL: https://github.com/apache/incubator-doris/pull/9792#discussion_r884240440


##########
be/src/vec/exec/volap_scan_node.cpp:
##########
@@ -554,8 +575,11 @@ Block* VOlapScanNode::_alloc_block(bool& get_free_block) {
 int VOlapScanNode::_start_scanner_thread_task(RuntimeState* state, int block_per_scanner) {
     std::list<VOlapScanner*> olap_scanners;
     int assigned_thread_num = _running_thread;
-    size_t max_thread = std::min(_volap_scanners.size(),
-                                 static_cast<size_t>(config::doris_scanner_thread_pool_thread_num));
+    size_t max_thread = config::doris_scanner_queue_size;
+    if (config::doris_scanner_row_num > state->batch_size()) {

Review Comment:
   I think we need this if condition check. If we do not allow anyone modify config:: doris_scanner_row_num, I think it should be a const expr variable not in config file. If it is in conf file, then it maybe modified by someone, so that we should check it.
   And also, maybe nobody will modify doris_scanner_row_num, but somebody could modify batch size, the batch size maybe larger than the config::doris_scanner_row_num.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] yiguolei commented on a diff in pull request #9792: [Improvement] optimize scannode concurrency query performance in vectorized engine.

Posted by GitBox <gi...@apache.org>.
yiguolei commented on code in PR #9792:
URL: https://github.com/apache/incubator-doris/pull/9792#discussion_r884138601


##########
be/src/exec/olap_scanner.cpp:
##########
@@ -327,6 +327,7 @@ Status OlapScanner::get_batch(RuntimeState* state, RowBatch* batch, bool* eof) {
                 break;
             }
 
+            SCOPED_TIMER(_parent->_eval_conjuctx_timer);

Review Comment:
   my mistake, it should be removed.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org