You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@doris.apache.org by GitBox <gi...@apache.org> on 2020/11/03 14:05:04 UTC

[GitHub] [incubator-doris] weizuo93 opened a new pull request #4837: [Optimize] Take 'tablet scan frequency' into consideration when selecting a tablet for compaction

weizuo93 opened a new pull request #4837:
URL: https://github.com/apache/incubator-doris/pull/4837


   ## Proposed changes
   
   A large  number of small segment files will lead to low efficiency for scan operations. Multiple small files can be merged into a large file by compaction operation. So we could take the tablet scan frequency into consideration when selecting an tablet for compaction and preferentially do compaction for those tablets which are scanned frequently during a latest period of time at the present.
   
   Using the compaction strategy of `Kudu`for reference, `scan frequency` can be calculated for tablet during a latest period of time and be taken into consideration when calculating compaction score.
   
   
   ## Types of changes
   
   What types of changes does your code introduce to Doris?
   _Put an `x` in the boxes that apply_
   
   - [] Bugfix (non-breaking change which fixes an issue)
   - [x] New feature (non-breaking change which adds functionality)
   - [] Breaking change (fix or feature that would cause existing functionality to not work as expected)
   - [] Documentation Update (if none of the other choices apply)
   - [] Code refactor (Modify the code structure, format the code, etc...)
   
   ## Checklist
   
   _Put an `x` in the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before merging your code._
   
   - [x] I have create an issue on (Fix #4834), and have described the bug/feature there in detail
   - [x] Compiling and unit tests pass locally with my changes
   - [] I have added tests that prove my fix is effective or that my feature works
   - [x] If this change need a document change, I have updated the document
   - [x] Any dependent changes have been merged
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] weizuo93 commented on a change in pull request #4837: [Optimize] Take 'tablet scan frequency' into consideration when selecting a tablet for compaction

Posted by GitBox <gi...@apache.org>.
weizuo93 commented on a change in pull request #4837:
URL: https://github.com/apache/incubator-doris/pull/4837#discussion_r519575946



##########
File path: be/src/olap/tablet_manager.cpp
##########
@@ -745,17 +747,22 @@ TabletSharedPtr TabletManager::find_best_tablet_to_compaction(
                     }
                 }
 
-                uint32_t table_score = 0;
+                double tablet_score = 0;
+                uint32_t current_compaction_score = 0;
                 {
                     ReadLock rdlock(tablet_ptr->get_header_lock_ptr());
                     if (compaction_type == CompactionType::BASE_COMPACTION) {
-                        table_score = tablet_ptr->calc_base_compaction_score();
+                        current_compaction_score = tablet_ptr->calc_base_compaction_score();
                     } else if (compaction_type == CompactionType::CUMULATIVE_COMPACTION) {
-                        table_score = tablet_ptr->calc_cumulative_compaction_score();
+                        current_compaction_score = tablet_ptr->calc_cumulative_compaction_score();
                     }
                 }
-                if (table_score > highest_score) {
-                    highest_score = table_score;
+                double scan_frequency = tablet_ptr->calculate_scan_frequency();

Review comment:
       > if `compaction_tablet_scan_frequency_factor` is zero, we can skip calling `calculate_scan_frequency()` to save some CPU.
   
   It's reasonable.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] acelyc111 commented on a change in pull request #4837: [Optimize] Take 'tablet scan frequency' into consideration when selecting a tablet for compaction

Posted by GitBox <gi...@apache.org>.
acelyc111 commented on a change in pull request #4837:
URL: https://github.com/apache/incubator-doris/pull/4837#discussion_r517147556



##########
File path: be/src/common/config.h
##########
@@ -329,6 +329,13 @@ namespace config {
     CONF_mInt32(base_compaction_trace_threshold, "10");
     CONF_mInt32(cumulative_compaction_trace_threshold, "2");
 
+    // update tablet scan count in second
+    CONF_mInt64(update_tablet_scan_count_interval_second, "300");
+    // coefficient for tablet scan frequency and compaction score when finding a tablet for compaction
+    CONF_mInt32(compaction_tablet_scan_frequency_factor, "0");
+    CONF_mInt32(compaction_tablet_compaction_score_factor, "1");

Review comment:
       Do they need to be normalized? If needed, you should define them as double.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] weizuo93 commented on a change in pull request #4837: [Optimize] Take 'tablet scan frequency' into consideration when selecting a tablet for compaction

Posted by GitBox <gi...@apache.org>.
weizuo93 commented on a change in pull request #4837:
URL: https://github.com/apache/incubator-doris/pull/4837#discussion_r519576194



##########
File path: docs/zh-CN/administrator-guide/config/be_config.md
##########
@@ -180,6 +180,10 @@ Metrics: {"filtered_rows":0,"input_row_num":3346807,"input_rowsets_count":42,"in
 
 ### `column_dictionary_key_size_threshold`
 
+### `compaction_tablet_compaction_score_factor`

Review comment:
       > Why no document for these 2 configs?
   > Better give best practice for them.
   
   done.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] weizuo93 commented on a change in pull request #4837: [Optimize] Take 'tablet scan frequency' into consideration when selecting a tablet for compaction

Posted by GitBox <gi...@apache.org>.
weizuo93 commented on a change in pull request #4837:
URL: https://github.com/apache/incubator-doris/pull/4837#discussion_r517306989



##########
File path: be/src/olap/tablet.h
##########
@@ -301,6 +303,10 @@ class Tablet : public BaseTablet {
     // cumulative compaction policy
     std::unique_ptr<CumulativeCompactionPolicy> _cumulative_compaction_policy;
     std::string _cumulative_compaction_type;
+
+    int64_t _last_update_scan_count;

Review comment:
       done.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] weizuo93 commented on a change in pull request #4837: [Optimize] Take 'tablet scan frequency' into consideration when selecting a tablet for compaction

Posted by GitBox <gi...@apache.org>.
weizuo93 commented on a change in pull request #4837:
URL: https://github.com/apache/incubator-doris/pull/4837#discussion_r517193268



##########
File path: be/src/common/config.h
##########
@@ -329,6 +329,13 @@ namespace config {
     CONF_mInt32(base_compaction_trace_threshold, "10");
     CONF_mInt32(cumulative_compaction_trace_threshold, "2");
 
+    // update tablet scan count in second
+    CONF_mInt64(update_tablet_scan_count_interval_second, "300");
+    // coefficient for tablet scan frequency and compaction score when finding a tablet for compaction
+    CONF_mInt32(compaction_tablet_scan_frequency_factor, "0");
+    CONF_mInt32(compaction_tablet_compaction_score_factor, "1");

Review comment:
       > Do they need to be normalized? If needed, you should define them as double.
   
    Normalization is not required.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] morningman commented on a change in pull request #4837: [Optimize] Take 'tablet scan frequency' into consideration when selecting a tablet for compaction

Posted by GitBox <gi...@apache.org>.
morningman commented on a change in pull request #4837:
URL: https://github.com/apache/incubator-doris/pull/4837#discussion_r516629748



##########
File path: be/src/olap/tablet.h
##########
@@ -301,6 +303,10 @@ class Tablet : public BaseTablet {
     // cumulative compaction policy
     std::unique_ptr<CumulativeCompactionPolicy> _cumulative_compaction_policy;
     std::string _cumulative_compaction_type;
+
+    int64_t _last_update_scan_count;

Review comment:
       Add comment for the new fields




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] acelyc111 commented on a change in pull request #4837: [Optimize] Take 'tablet scan frequency' into consideration when selecting a tablet for compaction

Posted by GitBox <gi...@apache.org>.
acelyc111 commented on a change in pull request #4837:
URL: https://github.com/apache/incubator-doris/pull/4837#discussion_r517147556



##########
File path: be/src/common/config.h
##########
@@ -329,6 +329,13 @@ namespace config {
     CONF_mInt32(base_compaction_trace_threshold, "10");
     CONF_mInt32(cumulative_compaction_trace_threshold, "2");
 
+    // update tablet scan count in second
+    CONF_mInt64(update_tablet_scan_count_interval_second, "300");
+    // coefficient for tablet scan frequency and compaction score when finding a tablet for compaction
+    CONF_mInt32(compaction_tablet_scan_frequency_factor, "0");
+    CONF_mInt32(compaction_tablet_compaction_score_factor, "1");

Review comment:
       Do they need to be normalized? If it needed, you should define them as double.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] morningman merged pull request #4837: [Optimize] Take 'tablet scan frequency' into consideration when selecting a tablet for compaction

Posted by GitBox <gi...@apache.org>.
morningman merged pull request #4837:
URL: https://github.com/apache/incubator-doris/pull/4837


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] morningman commented on a change in pull request #4837: [Optimize] Take 'tablet scan frequency' into consideration when selecting a tablet for compaction

Posted by GitBox <gi...@apache.org>.
morningman commented on a change in pull request #4837:
URL: https://github.com/apache/incubator-doris/pull/4837#discussion_r519184314



##########
File path: docs/zh-CN/administrator-guide/config/be_config.md
##########
@@ -180,6 +180,10 @@ Metrics: {"filtered_rows":0,"input_row_num":3346807,"input_rowsets_count":42,"in
 
 ### `column_dictionary_key_size_threshold`
 
+### `compaction_tablet_compaction_score_factor`

Review comment:
       Why no document for these 2 configs?
   Better give best practice for them.

##########
File path: be/src/olap/tablet_manager.cpp
##########
@@ -745,17 +747,22 @@ TabletSharedPtr TabletManager::find_best_tablet_to_compaction(
                     }
                 }
 
-                uint32_t table_score = 0;
+                double tablet_score = 0;
+                uint32_t current_compaction_score = 0;
                 {
                     ReadLock rdlock(tablet_ptr->get_header_lock_ptr());
                     if (compaction_type == CompactionType::BASE_COMPACTION) {
-                        table_score = tablet_ptr->calc_base_compaction_score();
+                        current_compaction_score = tablet_ptr->calc_base_compaction_score();
                     } else if (compaction_type == CompactionType::CUMULATIVE_COMPACTION) {
-                        table_score = tablet_ptr->calc_cumulative_compaction_score();
+                        current_compaction_score = tablet_ptr->calc_cumulative_compaction_score();
                     }
                 }
-                if (table_score > highest_score) {
-                    highest_score = table_score;
+                double scan_frequency = tablet_ptr->calculate_scan_frequency();

Review comment:
       if `compaction_tablet_scan_frequency_factor` is zero, we can skip calling `calculate_scan_frequency()` to save some CPU.

##########
File path: be/src/olap/tablet.cpp
##########
@@ -1309,4 +1311,16 @@ void Tablet::generate_tablet_meta_copy_unlocked(TabletMetaSharedPtr new_tablet_m
     new_tablet_meta->init_from_pb(tablet_meta_pb);
 }
 
+double Tablet::calculate_scan_frequency() {
+    time_t now = time(nullptr);
+    int64_t current_count = query_scan_count->value();
+    double interval = difftime(now, _last_record_scan_count_timestamp);
+    double scan_frequency = (current_count - _last_record_scan_count) * 60 / interval;

Review comment:
       Why multi 60?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] acelyc111 commented on a change in pull request #4837: [Optimize] Take 'tablet scan frequency' into consideration when selecting a tablet for compaction

Posted by GitBox <gi...@apache.org>.
acelyc111 commented on a change in pull request #4837:
URL: https://github.com/apache/incubator-doris/pull/4837#discussion_r517147556



##########
File path: be/src/common/config.h
##########
@@ -329,6 +329,13 @@ namespace config {
     CONF_mInt32(base_compaction_trace_threshold, "10");
     CONF_mInt32(cumulative_compaction_trace_threshold, "2");
 
+    // update tablet scan count in second
+    CONF_mInt64(update_tablet_scan_count_interval_second, "300");
+    // coefficient for tablet scan frequency and compaction score when finding a tablet for compaction
+    CONF_mInt32(compaction_tablet_scan_frequency_factor, "0");
+    CONF_mInt32(compaction_tablet_compaction_score_factor, "1");

Review comment:
       Does it need to be normalized? If it needed, you should define them as double.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] weizuo93 commented on a change in pull request #4837: [Optimize] Take 'tablet scan frequency' into consideration when selecting a tablet for compaction

Posted by GitBox <gi...@apache.org>.
weizuo93 commented on a change in pull request #4837:
URL: https://github.com/apache/incubator-doris/pull/4837#discussion_r519575666



##########
File path: be/src/olap/tablet.cpp
##########
@@ -1309,4 +1311,16 @@ void Tablet::generate_tablet_meta_copy_unlocked(TabletMetaSharedPtr new_tablet_m
     new_tablet_meta->init_from_pb(tablet_meta_pb);
 }
 
+double Tablet::calculate_scan_frequency() {
+    time_t now = time(nullptr);
+    int64_t current_count = query_scan_count->value();
+    double interval = difftime(now, _last_record_scan_count_timestamp);
+    double scan_frequency = (current_count - _last_record_scan_count) * 60 / interval;

Review comment:
       > Why multi 60?
   
   It means the average count of tablet scans for each minute, Otherwise it will be the average count of tablet scans for each second .




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org