You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@doris.apache.org by mo...@apache.org on 2020/06/21 08:39:35 UTC

[incubator-doris] branch master updated: [Doc] Fix doc-bug (#3914)

This is an automated email from the ASF dual-hosted git repository.

morningman pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-doris.git


The following commit(s) were added to refs/heads/master by this push:
     new 03fa1fe  [Doc] Fix doc-bug (#3914)
03fa1fe is described below

commit 03fa1fefa9e90673916323a0f00840b4c91e76a9
Author: YuJun <sk...@outlook.com>
AuthorDate: Sun Jun 21 16:39:27 2020 +0800

    [Doc] Fix doc-bug (#3914)
---
 docs/en/getting-started/data-partition.md    | 2 +-
 docs/zh-CN/getting-started/data-partition.md | 5 ++---
 2 files changed, 3 insertions(+), 4 deletions(-)

diff --git a/docs/en/getting-started/data-partition.md b/docs/en/getting-started/data-partition.md
index 532ef6b..b0f6e1e 100644
--- a/docs/en/getting-started/data-partition.md
+++ b/docs/en/getting-started/data-partition.md
@@ -182,7 +182,7 @@ It is also possible to use only one layer of partitioning. When using a layer pa
     * The bucket column can be multiple columns, but it must be a Key column. The bucket column can be the same or different from the Partition column.
     * The choice of bucket column is a trade-off between **query throughput** and **query concurrency**:
 
-        1. If you select multiple bucket columns, the data is more evenly distributed. However, if the query condition does not include the equivalent condition for all bucket columns, a query will scan all buckets. The throughput of such queries will increase, but the latency of a single query will increase. This method is suitable for large throughput and low concurrent query scenarios.
+        1. If you select multiple bucket columns, the data is more evenly distributed. However, if the query condition does not include the equivalent condition for all bucket columns, a query will scan all buckets. The throughput of such queries will increase, and the latency of a single query will decrease. This method is suitable for large throughput and low concurrent query scenarios.
         2. If you select only one or a few bucket columns, the point query can query only one bucket. This approach is suitable for high-concurrency point query scenarios.
         
     * There is no theoretical limit on the number of buckets.
diff --git a/docs/zh-CN/getting-started/data-partition.md b/docs/zh-CN/getting-started/data-partition.md
index ef93749..9b7346b 100644
--- a/docs/zh-CN/getting-started/data-partition.md
+++ b/docs/zh-CN/getting-started/data-partition.md
@@ -185,9 +185,8 @@ Doris 支持两层的数据划分。第一层是 Partition,仅支持 Range 的
     * 如果使用了 Partition,则 `DISTRIBUTED ...` 语句描述的是数据在**各个分区内**的划分规则。如果不使用 Partition,则描述的是对整个表的数据的划分规则。
     * 分桶列可以是多列,但必须为 Key 列。分桶列可以和 Partition 列相同或不同。
     * 分桶列的选择,是在 **查询吞吐** 和 **查询并发** 之间的一种权衡:
-
-        1. 如果选择多个分桶列,则数据分布更均匀。但如果查询条件不包含所有分桶列的等值条件的话,一个查询会扫描所有分桶。这样查询的吞吐会增加,但是单个查询的延迟也会增加。这个方式适合大吞吐低并发的查询场景。
-        2. 如果仅选择一个或少数分桶列,则点查询可以仅查询一个分桶。这种方式适合高并发的点查询场景。
+        1. 如果选择多个分桶列,则数据分布更均匀。如果一个查询条件不包含所有分桶列的等值条件,那么该查询会触发所有分桶同时扫描,这样查询的吞吐会增加,单个查询的延迟随之降低。这个方式适合大吞吐低并发的查询场景。
+        2. 如果仅选择一个或少数分桶列,则对应的点查询可以仅触发一个分桶扫描。此时,当多个点查询并发时,这些查询有较大的概率分别触发不同的分桶扫描,各个查询之间的IO影响较小(尤其当不同桶分布在不同磁盘上时),所以这种方式适合高并发的点查询场景。
         
     * 分桶的数量理论上没有上限。
 


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org