You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@doris.apache.org by mo...@apache.org on 2020/06/03 13:58:25 UTC

[incubator-doris] branch master updated: [Config] Add new BE config for tcmalloc (#3732)

This is an automated email from the ASF dual-hosted git repository.

morningman pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-doris.git


The following commit(s) were added to refs/heads/master by this push:
     new 2ad1b20  [Config] Add new BE config for tcmalloc (#3732)
2ad1b20 is described below

commit 2ad1b20b243330827093569c9c5d97c59a42056f
Author: Mingyu Chen <mo...@gmail.com>
AuthorDate: Wed Jun 3 21:58:13 2020 +0800

    [Config] Add new BE config for tcmalloc (#3732)
    
    Add a new BE config tc_max_total_thread_cache_bytes
---
 be/src/common/config.h                             |  10 +
 be/src/service/doris_main.cpp                      |  11 +-
 docs/en/administrator-guide/config/be_config.md    | 373 +++++++++++----------
 docs/zh-CN/administrator-guide/config/be_config.md |   8 +
 4 files changed, 218 insertions(+), 184 deletions(-)

diff --git a/be/src/common/config.h b/be/src/common/config.h
index b93b72c..ae2c838 100644
--- a/be/src/common/config.h
+++ b/be/src/common/config.h
@@ -44,6 +44,16 @@ namespace config {
     // free memory rate.[0-100]
     CONF_mInt64(tc_free_memory_rate, "20");
 
+    // Bound on the total amount of bytes allocated to thread caches.
+    // This bound is not strict, so it is possible for the cache to go over this bound
+    // in certain circumstances. This value defaults to 1GB
+    // If you suspect your application is not scaling to many threads due to lock contention in TCMalloc,
+    // you can try increasing this value. This may improve performance, at a cost of extra memory
+    // use by TCMalloc.
+    // reference: https://gperftools.github.io/gperftools/tcmalloc.html: TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES
+    //            https://github.com/gperftools/gperftools/issues/1111
+    CONF_Int64(tc_max_total_thread_cache_bytes, "1073741824");
+
     // process memory limit specified as number of bytes
     // ('<int>[bB]?'), megabytes ('<float>[mM]'), gigabytes ('<float>[gG]'),
     // or percentage of the physical memory ('<int>%').
diff --git a/be/src/service/doris_main.cpp b/be/src/service/doris_main.cpp
index 6a2e778..7f8a238 100644
--- a/be/src/service/doris_main.cpp
+++ b/be/src/service/doris_main.cpp
@@ -126,8 +126,15 @@ int main(int argc, char** argv) {
     }
 
 #if !defined(ADDRESS_SANITIZER) && !defined(LEAK_SANITIZER) && !defined(THREAD_SANITIZER)
-    MallocExtension::instance()->SetNumericProperty("tcmalloc.aggressive_memory_decommit",
-                                                    21474836480);
+    // Aggressive decommit is required so that unused pages in the TCMalloc page heap are
+    // not backed by physical pages and do not contribute towards memory consumption.
+    MallocExtension::instance()->SetNumericProperty("tcmalloc.aggressive_memory_decommit", 1);
+    // Change the total TCMalloc thread cache size if necessary.
+    if (!MallocExtension::instance()->SetNumericProperty(
+                "tcmalloc.max_total_thread_cache_bytes", doris::config::tc_max_total_thread_cache_bytes)) {
+        fprintf(stderr, "Failed to change TCMalloc total thread cache size.\n");
+        return -1;
+    }
 #endif
 
     std::vector<doris::StorePath> paths;
diff --git a/docs/en/administrator-guide/config/be_config.md b/docs/en/administrator-guide/config/be_config.md
index da4b3bd..36d45ad 100644
--- a/docs/en/administrator-guide/config/be_config.md
+++ b/docs/en/administrator-guide/config/be_config.md
@@ -41,25 +41,25 @@ This document mainly introduces the relevant configuration items of BE.
 
 ## Configurations
 
-### alter_tablet_worker_count
+### `alter_tablet_worker_count`
 
-### base_compaction_check_interval_seconds
+### `base_compaction_check_interval_seconds`
 
-### base_compaction_interval_seconds_since_last_operation
+### `base_compaction_interval_seconds_since_last_operation`
 
-### base_compaction_num_cumulative_deltas
+### `base_compaction_num_cumulative_deltas`
 
-### base_compaction_num_threads_per_disk
+### `base_compaction_num_threads_per_disk`
 
-### base_compaction_write_mbytes_per_sec
+### `base_compaction_write_mbytes_per_sec`
 
-### base_cumulative_delta_ratio
+### `base_cumulative_delta_ratio`
 
-### be_port
+### `be_port`
 
-### be_service_threads
+### `be_service_threads`
 
-### brpc_max_body_size
+### `brpc_max_body_size`
 
 This configuration is mainly used to modify the parameter `max_body_size` of brpc.
 
@@ -77,340 +77,349 @@ Sometimes the query fails and an error message of `The server is overcrowded` wi
 
 Since this is a brpc configuration, users can also modify this parameter directly during operation. Modify by visiting `http://be_host:brpc_port/flags`.
 
-### brpc_port
+### `brpc_port`
 
-### buffer_pool_clean_pages_limit
+### `buffer_pool_clean_pages_limit`
 
-### buffer_pool_limit
+### `buffer_pool_limit`
 
-### check_consistency_worker_count
+### `check_consistency_worker_count`
 
-### chunk_reserved_bytes_limit
+### `chunk_reserved_bytes_limit`
 
-### clear_transaction_task_worker_count
+### `clear_transaction_task_worker_count`
 
-### clone_worker_count
+### `clone_worker_count`
 
-### cluster_id
+### `cluster_id`
 
-### column_dictionary_key_ratio_threshold
+### `column_dictionary_key_ratio_threshold`
 
-### column_dictionary_key_size_threshold
+### `column_dictionary_key_size_threshold`
 
-### compress_rowbatches
+### `compress_rowbatches`
 
-### create_tablet_worker_count
+### `create_tablet_worker_count`
 
-### cumulative_compaction_budgeted_bytes
+### `cumulative_compaction_budgeted_bytes`
 
-### cumulative_compaction_check_interval_seconds
+### `cumulative_compaction_check_interval_seconds`
 
-### cumulative_compaction_num_threads_per_disk
+### `cumulative_compaction_num_threads_per_disk`
 
-### cumulative_compaction_skip_window_seconds
+### `cumulative_compaction_skip_window_seconds`
 
-### default_num_rows_per_column_file_block
+### `default_num_rows_per_column_file_block`
 
-### default_query_options
+### `default_query_options`
 
-### default_rowset_type
+### `default_rowset_type`
 
-### delete_worker_count
+### `delete_worker_count`
 
-### disable_mem_pools
+### `disable_mem_pools`
 
-### disable_storage_page_cache
+### `disable_storage_page_cache`
 
-### disk_stat_monitor_interval
+### `disk_stat_monitor_interval`
 
-### doris_cgroups
+### `doris_cgroups`
 
-### doris_max_pushdown_conjuncts_return_rate
+### `doris_max_pushdown_conjuncts_return_rate`
 
-### doris_max_scan_key_num
+### `doris_max_scan_key_num`
 
-### doris_scan_range_row_count
+### `doris_scan_range_row_count`
 
-### doris_scanner_queue_size
+### `doris_scanner_queue_size`
 
-### doris_scanner_row_num
+### `doris_scanner_row_num`
 
-### doris_scanner_thread_pool_queue_size
+### `doris_scanner_thread_pool_queue_size`
 
-### doris_scanner_thread_pool_thread_num
+### `doris_scanner_thread_pool_thread_num`
 
-### download_low_speed_limit_kbps
+### `download_low_speed_limit_kbps`
 
-### download_low_speed_time
+### `download_low_speed_time`
 
-### download_worker_count
+### `download_worker_count`
 
-### drop_tablet_worker_count
+### `drop_tablet_worker_count`
 
-### enable_metric_calculator
+### `enable_metric_calculator`
 
-### enable_partitioned_aggregation
+### `enable_partitioned_aggregation`
 
-### enable_prefetch
+### `enable_prefetch`
 
-### enable_quadratic_probing
+### `enable_quadratic_probing`
 
-### enable_system_metrics
+### `enable_system_metrics`
 
-### enable_token_check
+### `enable_token_check`
 
-### es_http_timeout_ms
+### `es_http_timeout_ms`
 
-### es_scroll_keepalive
+### `es_scroll_keepalive`
 
-### etl_thread_pool_queue_size
+### `etl_thread_pool_queue_size`
 
-### etl_thread_pool_size
+### `etl_thread_pool_size`
 
-### exchg_node_buffer_size_bytes
+### `exchg_node_buffer_size_bytes`
 
-### file_descriptor_cache_capacity
+### `file_descriptor_cache_capacity`
 
-### file_descriptor_cache_clean_interval
+### `file_descriptor_cache_clean_interval`
 
-### flush_thread_num_per_store
+### `flush_thread_num_per_store`
 
-### force_recovery
+### `force_recovery`
 
-### fragment_pool_queue_size
+### `fragment_pool_queue_size`
 
-### fragment_pool_thread_num
+### `fragment_pool_thread_num`
 
-### heartbeat_service_port
+### `heartbeat_service_port`
 
-### heartbeat_service_thread_count
+### `heartbeat_service_thread_count`
 
-### ignore_broken_disk
+### `ignore_broken_disk`
 
-### inc_rowset_expired_sec
+### `ignore_load_tablet_failure`
 
-### index_stream_cache_capacity
+* Type: boolean
+* Description: Whether to continue to start be when load tablet from header failed.
+* Default: false
 
-### load_data_reserve_hours
+When the BE starts, it will start a separate thread for each data directory to load the tablet header meta information. In the default configuration, if a tablet fails to load its header, the startup process is terminated. At the same time, you will see the following error message in the `be.INFO`:
 
-### load_error_log_reserve_hours
+```
+load tablets from header failed, failed tablets size: xxx, path=xxx
+```
 
-### load_process_max_memory_limit_bytes
+Indicates how many tablets in this data directory failed to load. At the same time, the log will also have specific information about the tablet that failed to load. In this case, manual intervention is required to troubleshoot the cause of the error. After troubleshooting, there are usually two ways to recover:
 
-### load_process_max_memory_limit_percent
+1. If the tablet information is not repairable, you can delete the wrong tablet through the `meta_tool` tool under the condition that other copies are normal.
+2. Set `ignore_load_tablet_failure` to true, BE will ignore these wrong tablets and start normally.
 
-### local_library_dir
+### `inc_rowset_expired_sec`
 
-### log_buffer_level
+### `index_stream_cache_capacity`
 
-### madvise_huge_pages
+### `load_data_reserve_hours`
 
-### make_snapshot_worker_count
+### `load_error_log_reserve_hours`
 
-### max_client_cache_size_per_host
+### `load_process_max_memory_limit_bytes`
 
-### max_compaction_concurrency
+### `load_process_max_memory_limit_percent`
 
-### max_consumer_num_per_group
+### `local_library_dir`
 
-### max_cumulative_compaction_num_singleton_deltas
+### `log_buffer_level`
 
-### max_download_speed_kbps
+### `madvise_huge_pages`
 
-### max_free_io_buffers
+### `make_snapshot_worker_count`
 
-### max_garbage_sweep_interval
+### `max_client_cache_size_per_host`
 
-### max_memory_sink_batch_count
+### `max_compaction_concurrency`
 
-### max_percentage_of_error_disk
+### `max_consumer_num_per_group`
 
-### max_runnings_transactions_per_txn_map
+### `max_cumulative_compaction_num_singleton_deltas`
 
-### max_tablet_num_per_shard
+### `max_download_speed_kbps`
 
-### mem_limit
+### `max_free_io_buffers`
 
-### memory_limitation_per_thread_for_schema_change
+### `max_garbage_sweep_interval`
 
-### memory_maintenance_sleep_time_s
+### `max_memory_sink_batch_count`
 
-### memory_max_alignment
+### `max_percentage_of_error_disk`
 
-### min_buffer_size
+### `max_runnings_transactions_per_txn_map`
 
-### min_compaction_failure_interval_sec
+### `max_tablet_num_per_shard`
 
-### min_cumulative_compaction_num_singleton_deltas
+### `mem_limit`
 
-### min_file_descriptor_number
+### `memory_limitation_per_thread_for_schema_change`
 
-### min_garbage_sweep_interval
+### `memory_maintenance_sleep_time_s`
 
-### mmap_buffers
+### `memory_max_alignment`
 
-### num_cores
+### `min_buffer_size`
 
-### num_disks
+### `min_compaction_failure_interval_sec`
 
-### num_threads_per_core
+### `min_cumulative_compaction_num_singleton_deltas`
 
-### num_threads_per_disk
+### `min_file_descriptor_number`
 
-### number_tablet_writer_threads
+### `min_garbage_sweep_interval`
 
-### path_gc_check
+### `mmap_buffers`
 
-### path_gc_check_interval_second
+### `num_cores`
 
-### path_gc_check_step
+### `num_disks`
 
-### path_gc_check_step_interval_ms
+### `num_threads_per_core`
 
-### path_scan_interval_second
+### `num_threads_per_disk`
 
-### pending_data_expire_time_sec
+### `number_tablet_writer_threads`
 
-### periodic_counter_update_period_ms
+### `path_gc_check`
 
-### plugin_path
+### `path_gc_check_interval_second`
 
-### port
+### `path_gc_check_step`
 
-### pprof_profile_dir
+### `path_gc_check_step_interval_ms`
 
-### priority_networks
+### `path_scan_interval_second`
 
-### priority_queue_remaining_tasks_increased_frequency
+### `pending_data_expire_time_sec`
 
-### publish_version_worker_count
+### `periodic_counter_update_period_ms`
 
-### pull_load_task_dir
+### `plugin_path`
 
-### push_worker_count_high_priority
+### `port`
 
-### push_worker_count_normal_priority
+### `pprof_profile_dir`
 
-### push_write_mbytes_per_sec
+### `priority_networks`
 
-### query_scratch_dirs
+### `priority_queue_remaining_tasks_increased_frequency`
 
-### read_size
+### `publish_version_worker_count`
 
-### release_snapshot_worker_count
+### `pull_load_task_dir`
 
-### report_disk_state_interval_seconds
+### `push_worker_count_high_priority`
 
-### report_tablet_interval_seconds
+### `push_worker_count_normal_priority`
 
-### report_task_interval_seconds
+### `push_write_mbytes_per_sec`
 
-### result_buffer_cancelled_interval_time
+### `query_scratch_dirs`
 
-### routine_load_thread_pool_size
+### `read_size`
 
-### row_nums_check
+### `release_snapshot_worker_count`
 
-### scan_context_gc_interval_min
+### `report_disk_state_interval_seconds`
 
-### scratch_dirs
+### `report_tablet_interval_seconds`
 
-### serialize_batch
+### `report_task_interval_seconds`
 
-### sleep_five_seconds
+### `result_buffer_cancelled_interval_time`
 
-### sleep_one_second
+### `routine_load_thread_pool_size`
 
-### small_file_dir
+### `row_nums_check`
 
-### snapshot_expire_time_sec
+### `scan_context_gc_interval_min`
 
-### sorter_block_size
+### `scratch_dirs`
 
-### status_report_interval
+### `serialize_batch`
 
-### storage_flood_stage_left_capacity_bytes
+### `sleep_five_seconds`
 
-### storage_flood_stage_usage_percent
+### `sleep_one_second`
 
-### storage_medium_migrate_count
+### `small_file_dir`
 
-### storage_page_cache_limit
+### `snapshot_expire_time_sec`
 
-### storage_root_path
+### `sorter_block_size`
 
-### streaming_load_max_mb
+### `status_report_interval`
 
-### streaming_load_rpc_max_alive_time_sec
+### `storage_flood_stage_left_capacity_bytes`
 
-### sync_tablet_meta
+### `storage_flood_stage_usage_percent`
 
-### sys_log_dir
+### `storage_medium_migrate_count`
 
-### sys_log_level
+### `storage_page_cache_limit`
 
-### sys_log_roll_mode
+### `storage_root_path`
 
-### sys_log_roll_num
+### `streaming_load_max_mb`
 
-### sys_log_verbose_level
+### `streaming_load_rpc_max_alive_time_sec`
 
-### sys_log_verbose_modules
+### `sync_tablet_meta`
 
-### tablet_map_shard_size
+### `sys_log_dir`
 
-### tablet_meta_checkpoint_min_interval_secs
+### `sys_log_level`
 
-### tablet_meta_checkpoint_min_new_rowsets_num
+### `sys_log_roll_mode`
 
-### tablet_stat_cache_update_interval_second
+### `sys_log_roll_num`
 
-### tablet_writer_open_rpc_timeout_sec
+### `sys_log_verbose_level`
 
-### tc_free_memory_rate
+### `sys_log_verbose_modules`
 
-### tc_use_memory_min
+### `tablet_map_shard_size`
 
-### thrift_connect_timeout_seconds
+### `tablet_meta_checkpoint_min_interval_secs`
 
-### thrift_rpc_timeout_ms
+### `tablet_meta_checkpoint_min_new_rowsets_num`
 
-### trash_file_expire_time_sec
+### `tablet_stat_cache_update_interval_second`
 
-### txn_commit_rpc_timeout_ms
+### `tablet_writer_open_rpc_timeout_sec`
 
-### txn_map_shard_size
+### `tc_free_memory_rate`
 
-### txn_shard_size
+### `tc_max_total_thread_cache_bytes`
 
-### unused_rowset_monitor_interval
+* Type: int64
+* Description: Used to limit the total thread cache size in tcmalloc. This limit is not a hard limit, so the actual thread cache usage may exceed this limit. For details, please refer to [TCMALLOC\_MAX\_TOTAL\_THREAD\_CACHE\_BYTES](https://gperftools.github.io/gperftools/tcmalloc.html)
+* Default: 1073741824
 
-### upload_worker_count
+If the system is found to be in a high-stress scenario and a large number of threads are found in the tcmalloc lock competition phase through the BE thread stack, such as a large number of `SpinLock` related stacks, you can try increasing this parameter to improve system performance. [Reference] (https://github.com/gperftools/gperftools/issues/1111)
 
-### use_mmap_allocate_chunk
+### `tc_use_memory_min`
 
-### user_function_dir
+### `thrift_connect_timeout_seconds`
 
-### web_log_bytes
+### `thrift_rpc_timeout_ms`
 
-### webserver_num_workers
+### `trash_file_expire_time_sec`
 
-### webserver_port
+### `txn_commit_rpc_timeout_ms`
 
-### write_buffer_size
+### `txn_map_shard_size`
 
-### ignore_load_tablet_failure
-* Type: boolean
-* Description: Whether to continue to start be when load tablet from header failed.
-* Default: false
+### `txn_shard_size`
 
-When the BE starts, it will start a separate thread for each data directory to load the tablet header meta information. In the default configuration, if a tablet fails to load its header, the startup process is terminated. At the same time, you will see the following error message in the `be.INFO`:
+### `unused_rowset_monitor_interval`
 
-```
-load tablets from header failed, failed tablets size: xxx, path=xxx
-```
+### `upload_worker_count`
 
-Indicates how many tablets in this data directory failed to load. At the same time, the log will also have specific information about the tablet that failed to load. In this case, manual intervention is required to troubleshoot the cause of the error. After troubleshooting, there are usually two ways to recover:
+### `use_mmap_allocate_chunk`
 
-1. If the tablet information is not repairable, you can delete the wrong tablet through the `meta_tool` tool under the condition that other copies are normal.
-2. Set `ignore_load_tablet_failure` to true, BE will ignore these wrong tablets and start normally.
+### `user_function_dir`
+
+### `web_log_bytes`
+
+### `webserver_num_workers`
+
+### `webserver_port`
+
+### `write_buffer_size`
\ No newline at end of file
diff --git a/docs/zh-CN/administrator-guide/config/be_config.md b/docs/zh-CN/administrator-guide/config/be_config.md
index f461335..63d3606 100644
--- a/docs/zh-CN/administrator-guide/config/be_config.md
+++ b/docs/zh-CN/administrator-guide/config/be_config.md
@@ -367,6 +367,14 @@ under the License.
 
 ### `tc_free_memory_rate`
 
+### `tc_max_total_thread_cache_bytes`
+
+* 类型:int64
+* 描述:用来限制 tcmalloc 中总的线程缓存大小。这个限制不是硬限,因此实际线程缓存使用可能超过这个限制。具体可参阅 [TCMALLOC\_MAX\_TOTAL\_THREAD\_CACHE\_BYTES](https://gperftools.github.io/gperftools/tcmalloc.html)
+* 默认值: 1073741824
+
+如果发现系统在高压力场景下,通过 BE 线程堆栈发现大量线程处于 tcmalloc 的锁竞争阶段,如大量的 `SpinLock` 相关堆栈,则可以尝试增大该参数来提升系统性能。[参考](https://github.com/gperftools/gperftools/issues/1111)
+
 ### `tc_use_memory_min`
 
 ### `thrift_connect_timeout_seconds`


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org