You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@doris.apache.org by yi...@apache.org on 2023/04/17 10:00:56 UTC
[doris] branch master updated: [feature](segcompaction) enable segcompaction by default (#18722)

This is an automated email from the ASF dual-hosted git repository.

yiguolei pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/doris.git


The following commit(s) were added to refs/heads/master by this push:
     new 16cdd9e95a [feature](segcompaction) enable segcompaction by default (#18722)
16cdd9e95a is described below

commit 16cdd9e95aeecc77dfc9ff675fcc5e12d9516605
Author: zhengyu <fr...@gmail.com>
AuthorDate: Mon Apr 17 18:00:49 2023 +0800

    [feature](segcompaction) enable segcompaction by default (#18722)
    
    Signed-off-by: freemandealer <fr...@gmail.com>
---
 be/src/common/config.h                           |  2 +-
 docs/en/docs/admin-manual/config/be-config.md    | 10 +++++-----
 docs/en/docs/faq/data-faq.md                     |  4 ++--
 docs/zh-CN/docs/admin-manual/config/be-config.md | 10 +++++-----
 docs/zh-CN/docs/faq/data-faq.md                  |  6 +++---
 5 files changed, 16 insertions(+), 16 deletions(-)

diff --git a/be/src/common/config.h b/be/src/common/config.h
index 3f2034191c..976dfd3149 100644
--- a/be/src/common/config.h
+++ b/be/src/common/config.h
@@ -875,7 +875,7 @@ CONF_String(be_node_role, "mix");
 // Hide the be config page for webserver.
 CONF_Bool(hide_webserver_config_page, "false");
 
-CONF_Bool(enable_segcompaction, "false"); // currently only support vectorized storage
+CONF_Bool(enable_segcompaction, "true");
 
 // Trigger segcompaction if the num of segments in a rowset exceeds this threshold.
 CONF_Int32(segcompaction_threshold_segment_num, "10");
diff --git a/docs/en/docs/admin-manual/config/be-config.md b/docs/en/docs/admin-manual/config/be-config.md
index 821f81b03e..c9443394cf 100644
--- a/docs/en/docs/admin-manual/config/be-config.md
+++ b/docs/en/docs/admin-manual/config/be-config.md
@@ -7,7 +7,7 @@
 }
 ---
 
-<!-- 
+<!--
 Licensed to the Apache Software Foundation (ASF) under one
 or more contributor license agreements.  See the NOTICE file
 distributed with this work for additional information
@@ -246,7 +246,7 @@ There are two ways to configure BE configuration items:
 
 * Description: This configuration is mainly used to modify the parameter `socket_max_unwritten_bytes` of brpc.
   - Sometimes the query fails and an error message of `The server is overcrowded` will appear in the BE log. This means there are too many messages to buffer at the sender side, which may happen when the SQL needs to send large bitmap value. You can avoid this error by increasing the configuration.
-    
+
 #### `transfer_large_data_by_brpc`
 
 * Type: bool
@@ -279,7 +279,7 @@ There are two ways to configure BE configuration items:
 
 * Type: string
 * Description:This configuration indicates the service model used by FE's Thrift service. The type is string and is case-insensitive. This parameter needs to be consistent with the setting of fe's thrift_server_type parameter. Currently there are two values for this parameter, `THREADED` and `THREAD_POOL`.
-  
+
     - If the parameter is `THREADED`, the model is a non-blocking I/O model.
 
     - If the parameter is `THREAD_POOL`, the model is a blocking I/O model.
@@ -628,8 +628,8 @@ Metrics: {"filtered_rows":0,"input_row_num":3346807,"input_rowsets_count":42,"in
 #### `enable_segcompaction`
 
 * Type: bool
-* Description: Enable to use segment compaction during loading
-* Default value: false
+* Description: Enable to use segment compaction during loading to avoid -238 error
+* Default value: true
 
 #### `segcompaction_threshold_segment_num`
 
diff --git a/docs/en/docs/faq/data-faq.md b/docs/en/docs/faq/data-faq.md
index e8fd0b30b5..e5926c65c4 100644
--- a/docs/en/docs/faq/data-faq.md
+++ b/docs/en/docs/faq/data-faq.md
@@ -60,7 +60,7 @@ This error usually occurs during data import operations. The error code is -235.
 
 This error is usually caused by the import frequency being too high, which is greater than the compaction speed of the backend data, causing versions to pile up and eventually exceed the limit. At this point, we can first pass the show tablet 27306172 statement, and then execute the show proc statement in the result to check the status of each copy of the tablet. The versionCount in the result represents the number of versions. If you find that a copy has too many versions, you need to r [...]
 
-The -238 error usually occurs when the same batch of imported data is too large, resulting in too many Segment files for a tablet (default is 200, controlled by the BE parameter `max_segment_num_per_rowset`). At this time, it is recommended to reduce the amount of data imported in one batch, or appropriately increase the BE configuration parameter value to solve the problem.
+The -238 error usually occurs when the same batch of imported data is too large, resulting in too many Segment files for a tablet (default is 200, controlled by the BE parameter `max_segment_num_per_rowset`). At this time, it is recommended to reduce the amount of data imported in one batch, or appropriately increase the BE configuration parameter value to solve the problem. Since version 2.0, users can enable segment compaction feature to reduce segment file number by setting `enable_se [...]
 
 ### Q5. tablet 110309738 has few replicas: 1, alive backends: [10003]
 
@@ -152,7 +152,7 @@ broker_timeout_ms = 10000
 
 Adding parameters here requires restarting the FE service.
 
-### Q11. [ Routine load ] ReasonOfStateChanged: ErrorReason{code=errCode = 104, msg='be 10004 abort task with reason: fetch failed due to requested offset not available on the broker: Broker: Offset out of range'} 
+### Q11. [ Routine load ] ReasonOfStateChanged: ErrorReason{code=errCode = 104, msg='be 10004 abort task with reason: fetch failed due to requested offset not available on the broker: Broker: Offset out of range'}
 
 The reason for this problem is that Kafka's cleanup policy defaults to 7 days. When a routine load task is suspended for some reason and the task is not restored for a long time, when the task is resumed, the routine load records the consumption offset, and This problem occurs when kafka has cleaned up the corresponding offset
 
diff --git a/docs/zh-CN/docs/admin-manual/config/be-config.md b/docs/zh-CN/docs/admin-manual/config/be-config.md
index d0aeec1020..0a009d4ce3 100644
--- a/docs/zh-CN/docs/admin-manual/config/be-config.md
+++ b/docs/zh-CN/docs/admin-manual/config/be-config.md
@@ -7,7 +7,7 @@
 }
 ---
 
-<!-- 
+<!--
 Licensed to the Apache Software Foundation (ASF) under one
 or more contributor license agreements.  See the NOTICE file
 distributed with this work for additional information
@@ -190,7 +190,7 @@ BE 重启后该配置将失效。如果想持久化修改结果，使用如下
 * 描述：当BE启动时，会检查``storage_root_path`` 配置下的所有路径。
 
   - `ignore_broken_disk=true`
-  
+
   如果路径不存在或路径下无法进行读写文件(坏盘)，将忽略此路径，如果有其他可用路径则不中断启动。
 
   - `ignore_broken_disk=false`
@@ -642,8 +642,8 @@ Metrics: {"filtered_rows":0,"input_row_num":3346807,"input_rowsets_count":42,"in
 #### `enable_segcompaction`
 
 * 类型：bool
-* 描述：在导入时进行 segment compaction 来减少 segment 数量
-* 默认值：false
+* 描述：在导入时进行 segment compaction 来减少 segment 数量, 以避免出现写入时的 -238 错误
+* 默认值：true
 
 #### `segcompaction_threshold_segment_num`
 
@@ -1303,7 +1303,7 @@ load tablets from header failed, failed tablets size: xxx, path=xxx
 #### `jvm_max_heap_size`
 
 * 类型：string
-* 描述：BE 使用 JVM 堆内存的最大值，即 JVM 的 -Xmx 参数 
+* 描述：BE 使用 JVM 堆内存的最大值，即 JVM 的 -Xmx 参数
 * 默认值：1024M
 
 </version>
diff --git a/docs/zh-CN/docs/faq/data-faq.md b/docs/zh-CN/docs/faq/data-faq.md
index 8c9cbd3d62..45c6629ff6 100644
--- a/docs/zh-CN/docs/faq/data-faq.md
+++ b/docs/zh-CN/docs/faq/data-faq.md
@@ -60,7 +60,7 @@ Unique Key模型的表是一个对业务比较友好的表，因为其特有的
 
 这个错误通常是因为导入的频率过高，大于后台数据的compaction速度，导致版本堆积并最终超过了限制。此时，我们可以先通过show tablet 27306172 语句，然后执行结果中的 show proc 语句，查看tablet各个副本的情况。结果中的 versionCount即表示版本数量。如果发现某个副本的版本数量过多，则需要降低导入频率或停止导入，并观察版本数是否有下降。如果停止导入后，版本数依然没有下降，则需要去对应的BE节点查看be.INFO日志，搜索tablet id以及 compaction关键词，检查compaction是否正常运行。关于compaction调优相关，可以参阅 ApacheDoris 公众号文章：Doris 最佳实践-Compaction调优(3)
 
--238 错误通常出现在同一批导入数据量过大的情况，从而导致某一个 tablet 的 Segment 文件过多（默认是 200，由 BE 参数 `max_segment_num_per_rowset` 控制）。此时建议减少一批次导入的数据量，或者适当提高 BE 配置参数值来解决。
+-238 错误通常出现在同一批导入数据量过大的情况，从而导致某一个 tablet 的 Segment 文件过多（默认是 200，由 BE 参数 `max_segment_num_per_rowset` 控制）。此时建议减少一批次导入的数据量，或者适当提高 BE 配置参数值来解决。在2.0版本及以后，可以通过打开 segment compaction 功能来减少 Segment 文件数量(BE config 中 enable_segcompaction=true)。
 
 ### Q5. tablet 110309738 has few replicas: 1, alive backends: [10003]
 
@@ -92,7 +92,7 @@ Unique Key模型的表是一个对业务比较友好的表，因为其特有的
 
    可以升级到 Doris 0.15 及之后的版本，已修复这个问题。
 
-### Q8. 执行导入、查询时报错-214 
+### Q8. 执行导入、查询时报错-214
 
 在执行导入、查询等操作时，可能会遇到如下错误：
 
@@ -150,7 +150,7 @@ broker_timeout_ms = 10000
 
 这里添加参数，需要重启 FE 服务。
 
-### Q11.[ Routine load ] ReasonOfStateChanged: ErrorReason{code=errCode = 104, msg='be 10004 abort task with reason: fetch failed due to requested offset not available on the broker: Broker: Offset out of range'} 
+### Q11.[ Routine load ] ReasonOfStateChanged: ErrorReason{code=errCode = 104, msg='be 10004 abort task with reason: fetch failed due to requested offset not available on the broker: Broker: Offset out of range'}
 
 出现这个问题的原因是因为kafka的清理策略默认为7天，当某个routine load任务因为某种原因导致任务暂停，长时间没有恢复，当重新恢复任务的时候routine load记录了消费的offset,而kafka的清理策略已经清理了对应的offset,就会出现这个问题
 


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org