You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@doris.apache.org by mo...@apache.org on 2022/04/23 14:04:04 UTC

[incubator-doris] branch master updated: [docs][typo] Fix some typos in "getting-started" content. (#9124)

This is an automated email from the ASF dual-hosted git repository.

morningman pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-doris.git


The following commit(s) were added to refs/heads/master by this push:
     new 4445d3188d [docs][typo] Fix some typos in "getting-started" content. (#9124)
4445d3188d is described below

commit 4445d3188d3545c50337ba538775aa02f8f1f1ec
Author: liuzhuang2017 <95...@users.noreply.github.com>
AuthorDate: Sat Apr 23 22:03:59 2022 +0800

    [docs][typo] Fix some typos in "getting-started" content. (#9124)
---
 docs/en/getting-started/data-model-rollup.md    | 44 +++++------
 docs/en/getting-started/data-partition.md       | 98 ++++++++++++-------------
 docs/zh-CN/getting-started/data-model-rollup.md |  4 +-
 docs/zh-CN/getting-started/data-partition.md    | 88 +++++++++++-----------
 4 files changed, 117 insertions(+), 117 deletions(-)

diff --git a/docs/en/getting-started/data-model-rollup.md b/docs/en/getting-started/data-model-rollup.md
index 8e3c7057d6..d70e064eb4 100644
--- a/docs/en/getting-started/data-model-rollup.md
+++ b/docs/en/getting-started/data-model-rollup.md
@@ -136,12 +136,12 @@ As you can see, there is only one line of aggregated data left for 10,000 users.
 
 The first five columns remain unchanged, starting with column 6 `last_visit_date':
 
-*`2017-10-01 07:00`: Because the `last_visit_date`column is aggregated by REPLACE, the `2017-10-01 07:00` column has been replaced by `2017-10-01 06:00'.
+* `2017-10-01 07:00`: Because the `last_visit_date`column is aggregated by REPLACE, the `2017-10-01 07:00` column has been replaced by `2017-10-01 06:00'.
 > Note: For data in the same import batch, the order of replacement is not guaranteed for the aggregation of REPLACE. For example, in this case, it may be `2017-10-01 06:00'. For data from different imported batches, it can be guaranteed that the data from the latter batch will replace the former batch.
 
-*`35`: Because the aggregation type of the `cost'column is SUM, 35 is accumulated from 20 + 15.
-*`10`: Because the aggregation type of the`max_dwell_time'column is MAX, 10 and 2 take the maximum and get 10.
-*`2`: Because the aggregation type of `min_dwell_time'column is MIN, 10 and 2 take the minimum value and get 2.
+* `35`: Because the aggregation type of the `cost'column is SUM, 35 is accumulated from 20 + 15.
+* `10`: Because the aggregation type of the`max_dwell_time'column is MAX, 10 and 2 take the maximum and get 10.
+* `2`: Because the aggregation type of `min_dwell_time'column is MIN, 10 and 2 take the minimum value and get 2.
 
 After aggregation, Doris ultimately only stores aggregated data. In other words, detailed data will be lost and users can no longer query the detailed data before aggregation.
 
@@ -329,9 +329,9 @@ DUPLICATE KEY(`timestamp`, `type`)
 ```
 
 This data model is different from Aggregate and Unique models. Data is stored entirely in accordance with the data in the imported file, without any aggregation. Even if the two rows of data are identical, they will be retained.
-The DUPLICATE KEY specified in the table building statement is only used to specify which columns the underlying data is sorted according to. (The more appropriate name should be "Sorted Column", where the name "DUPLICATE KEY" is used to specify the data model used. For more explanations of "Sorted Column", see the section ** Prefix Index **. On the choice of DUPLICATE KEY, we recommend that the first 2-4 columns be selected appropriately.
+The DUPLICATE KEY specified in the table building statement is only used to specify which columns the underlying data is sorted according to. (The more appropriate name should be "Sorted Column", where the name "DUPLICATE KEY" is used to specify the data model used. For more explanations of "Sorted Column", see the section [Prefix Index](https://doris.apache.org/getting-started/data-model-rollup.html#prefix-index). On the choice of DUPLICATE KEY, we recommend that the first 2-4 columns b [...]
 
-This data model is suitable for storing raw data without aggregation requirements and primary key uniqueness constraints. For more usage scenarios, see the ** Limitations of the Aggregation Model ** section.
+This data model is suitable for storing raw data without aggregation requirements and primary key uniqueness constraints. For more usage scenarios, see the [Limitations of the Aggregation Model](https://doris.apache.org/getting-started/data-model-rollup.html#limitations-of-aggregation-model) section.
 
 ## ROLLUP
 
@@ -427,7 +427,7 @@ After the creation, the data stored in the ROLLUP is as follows:
 
 When we do the following queries:
 
-* Select City, Age, Sum (Cost), Max (Max dwell time), min (min dwell time) from table group by City, age;*
+* `SELECT city, age, sum(cost), max(max_dwell_time), min(min_dwell_time) FROM table GROUP BY city, age;`
 * `SELECT city, sum(cost), max(max_dwell_time), min(min_dwell_time) FROM table GROUP BY city;`
 * `SELECT city, age, sum(cost), min(min_dwell_time) FROM table GROUP BY city, age;`
 
@@ -470,15 +470,15 @@ We use the prefix index of **36 bytes** of a row of data as the prefix index of
 |max\_dwell\_time|DATETIME|
 |min\_dwell\_time|DATETIME|
 
-When our query condition is the prefix of ** prefix index **, it can greatly speed up the query speed. For example, in the first example, we execute the following queries:
+When our query condition is the prefix of **prefix index**, it can greatly speed up the query speed. For example, in the first example, we execute the following queries:
 
 `SELECT * FROM table WHERE user_id=1829239 and age=20;`
 
-The efficiency of this query is much higher than that of ** the following queries:
+The efficiency of this query is **much higher than that of** the following queries:
 
 `SELECT * FROM table WHERE age=20;`
 
-Therefore, when constructing tables, ** correctly choosing column order can greatly improve query efficiency **.
+Therefore, when constructing tables, **correctly choosing column order can greatly improve query efficiency**.
 
 #### ROLLUP adjusts prefix index
 
@@ -517,8 +517,8 @@ The ROLLUP table is preferred because the prefix index of ROLLUP matches better.
 * ROLLUP data is stored in separate physical storage. Therefore, the more ROLLUP you create, the more disk space you occupy. It also has an impact on the speed of import (the ETL phase of import automatically generates all ROLLUP data), but it does not reduce query efficiency (only better).
 * Data updates for ROLLUP are fully synchronized with Base representations. Users need not care about this problem.
 * Columns in ROLLUP are aggregated in exactly the same way as Base tables. There is no need to specify or modify ROLLUP when creating it.
-* A necessary (inadequate) condition for a query to hit ROLLUP is that all columns ** (including the query condition columns in select list and where) involved in the query exist in the column of the ROLLUP. Otherwise, the query can only hit the Base table.
-* Certain types of queries (such as count (*)) cannot hit ROLLUP under any conditions. See the next section **Limitations of the aggregation model**.
+* A necessary (inadequate) condition for a query to hit ROLLUP is that **all columns** (including the query condition columns in select list and where) involved in the query exist in the column of the ROLLUP. Otherwise, the query can only hit the Base table.
+* Certain types of queries (such as count(*)) cannot hit ROLLUP under any conditions. See the next section **Limitations of the aggregation model**.
 * The query execution plan can be obtained by `EXPLAIN your_sql;` command, and in the execution plan, whether ROLLUP has been hit or not can be checked.
 * Base tables and all created ROLLUP can be displayed by `DESC tbl_name ALL;` statement.
 
@@ -574,11 +574,11 @@ The result is 5, not 1.
 
 At the same time, this consistency guarantee will greatly reduce the query efficiency in some queries.
 
-Let's take the most basic count (*) query as an example:
+Let's take the most basic count(*) query as an example:
 
 `SELECT COUNT(*) FROM table;`
 
-In other databases, such queries return results quickly. Because in the implementation, we can get the query result by counting rows at the time of import and saving count statistics information, or by scanning only a column of data to get count value at the time of query, with very little overhead. But in Doris's aggregation model, the overhead of this query ** is very large **.
+In other databases, such queries return results quickly. Because in the implementation, we can get the query result by counting rows at the time of import and saving count statistics information, or by scanning only a column of data to get count value at the time of query, with very little overhead. But in Doris's aggregation model, the overhead of this query **is very large**.
 
 Let's take the data as an example.
 
@@ -606,11 +606,11 @@ Because the final aggregation result is:
 |10002|2017-11-21|39|
 |10003|2017-11-22|22|
 
-So `select count (*) from table;` The correct result should be **4**. But if we only scan the `user_id'column and add query aggregation, the final result is **3** (10001, 10002, 10003). If aggregated without queries, the result is **5** (a total of five rows in two batches). It can be seen that both results are wrong.
+So `select count(*) from table;` The correct result should be **4**. But if we only scan the `user_id`column and add query aggregation, the final result is **3** (10001, 10002, 10003). If aggregated without queries, the result is **5** (a total of five rows in two batches). It can be seen that both results are wrong.
 
-In order to get the correct result, we must read the data of `user_id` and `date`, and **together with aggregate** when querying, to return the correct result of **4**. That is to say, in the count (*) query, Doris must scan all AGGREGATE KEY columns (here are `user_id` and `date`) and aggregate them to get the semantically correct results. When aggregated columns are large, count (*) queries need to scan a large amount of data.
+In order to get the correct result, we must read the data of `user_id` and `date`, and **together with aggregate** when querying, to return the correct result of **4**. That is to say, in the `count(*)` query, Doris must scan all AGGREGATE KEY columns (here are `user_id` and `date`) and aggregate them to get the semantically correct results. When aggregated columns are large, `count(*)` queries need to scan a large amount of data.
 
-Therefore, when there are frequent count (*) queries in the business, we recommend that users simulate count (*) by adding a column with a value of 1 and aggregation type of SUM. As the table structure in the previous example, we modify it as follows:
+Therefore, when there are frequent `count(*)` queries in the business, we recommend that users simulate `count(*)` by adding a column with a value of 1 and aggregation type of SUM. As the table structure in the previous example, we modify it as follows:
 
 |ColumnName|Type|AggregationType|Comment|
 |---|---|---|---|
@@ -619,18 +619,18 @@ Therefore, when there are frequent count (*) queries in the business, we recomme
 | Cost | BIGINT | SUM | Total User Consumption|
 | count | BIGINT | SUM | for counting|
 
-Add a count column and import the data with the column value **equal to 1**. The result of `select count (*) from table;`is equivalent to `select sum (count) from table;` The query efficiency of the latter is much higher than that of the former. However, this method also has limitations, that is, users need to guarantee that they will not import rows with the same AGGREGATE KEY column repeatedly. Otherwise, `select sum (count) from table;`can only express the number of rows originally im [...]
+Add a count column and import the data with the column value **equal to 1**. The result of `select count(*) from table;`is equivalent to `select sum(count) from table;` The query efficiency of the latter is much higher than that of the former. However, this method also has limitations, that is, users need to guarantee that they will not import rows with the same AGGREGATE KEY column repeatedly. Otherwise, `select sum(count) from table;`can only express the number of rows originally impor [...]
 
-Another way is to **change the aggregation type of the count column above to REPLACE, and still weigh 1**. Then`select sum (count) from table;` and `select count (*) from table;` the results will be consistent. And in this way, there is no restriction on importing duplicate rows.
+Another way is to **change the aggregation type of the count column above to REPLACE, and still weigh 1**. Then`select sum(count) from table;` and `select count(*) from table;` the results will be consistent. And in this way, there is no restriction on importing duplicate rows.
 
 ### Duplicate Model
 
-Duplicate model has no limitation of aggregation model. Because the model does not involve aggregate semantics, when doing count (*) query, we can get the correct semantics by choosing a column of queries arbitrarily.
+Duplicate model has no limitation of aggregation model. Because the model does not involve aggregate semantics, when doing count(*) query, we can get the correct semantics by choosing a column of queries arbitrarily.
 
 ## Suggestions for Choosing Data Model
 
-Because the data model was established when the table was built, and **could not be modified **. Therefore, it is very important to select an appropriate data model**.
+Because the data model was established when the table was built, and **could not be modified**. Therefore, it is **very important** to select an appropriate data model.
 
-1. Aggregate model can greatly reduce the amount of data scanned and the amount of query computation by pre-aggregation. It is very suitable for report query scenarios with fixed patterns. But this model is not very friendly for count (*) queries. At the same time, because the aggregation method on the Value column is fixed, semantic correctness should be considered in other types of aggregation queries.
+1. Aggregate model can greatly reduce the amount of data scanned and the amount of query computation by pre-aggregation. It is very suitable for report query scenarios with fixed patterns. But this model is not very friendly for count(*) queries. At the same time, because the aggregation method on the Value column is fixed, semantic correctness should be considered in other types of aggregation queries.
 2. Unique model guarantees the uniqueness of primary key for scenarios requiring unique primary key constraints. However, the query advantage brought by pre-aggregation such as ROLLUP cannot be exploited (because the essence is REPLACE, there is no such aggregation as SUM).
 3. Duplicate is suitable for ad-hoc queries of any dimension. Although it is also impossible to take advantage of the pre-aggregation feature, it is not constrained by the aggregation model and can take advantage of the queue-store model (only reading related columns, but not all Key columns).
diff --git a/docs/en/getting-started/data-partition.md b/docs/en/getting-started/data-partition.md
index 4ab0241d95..55c3fe2b6c 100644
--- a/docs/en/getting-started/data-partition.md
+++ b/docs/en/getting-started/data-partition.md
@@ -134,8 +134,8 @@ When defining columns, you can refer to the following suggestions:
 
 1. The Key column must precede all Value columns.
 2. Try to choose the type of integer. Because integer type calculations and lookups are much more efficient than strings.
-3. For the selection principle of integer types of different lengths, follow ** enough to **.
-4. For lengths of type VARCHAR and STRING, follow ** is sufficient.
+3. For the selection principle of integer types of different lengths, follow **enough to**.
+4. For lengths of type VARCHAR and STRING, follow **is sufficient**.
 5. The total byte length of all columns (including Key and Value) cannot exceed 100KB.
 
 ### Partitioning and binning
@@ -275,68 +275,68 @@ Doris supports specifying multiple columns as partition columns, examples are as
 
 ##### Range Partition
 
-    ```
+```
     PARTITION BY RANGE(`date`, `id`)
     (
         PARTITION `p201701_1000` VALUES LESS THAN ("2017-02-01", "1000"),
         PARTITION `p201702_2000` VALUES LESS THAN ("2017-03-01", "2000"),
         PARTITION `p201703_all` VALUES LESS THAN ("2017-04-01")
     )
-    ```
-
-    In the above example, we specify `date` (DATE type) and `id` (INT type) as partition columns. The resulting partitions in the above example are as follows:
-
-    ```
-    *p201701_1000: [(MIN_VALUE, MIN_VALUE), ("2017-02-01", "1000") )
-    *p201702_2000: [("2017-02-01", "1000"), ("2017-03-01", "2000") )
-    *p201703_all: [("2017-03-01", "2000"), ("2017-04-01", MIN_VALUE))
-    ```
-
-    Note that the last partition user defaults only the partition value of the `date` column, so the partition value of the `id` column will be filled with `MIN_VALUE` by default. When the user inserts data, the partition column values ​​are compared in order, and the corresponding partition is finally obtained. Examples are as follows:
-
-    ```
-    * Data --> Partition
-    * 2017-01-01, 200   --> p201701_1000
-    * 2017-01-01, 2000  --> p201701_1000
-    * 2017-02-01, 100   --> p201701_1000
-    * 2017-02-01, 2000  --> p201702_2000
-    * 2017-02-15, 5000  --> p201702_2000
-    * 2017-03-01, 2000  --> p201703_all
-    * 2017-03-10, 1     --> p201703_all
-    * 2017-04-01, 1000  --> Unable to import
-    * 2017-05-01, 1000  --> Unable to import
-    ```
+```
+
+ In the above example, we specify `date`(DATE type) and `id`(INT type) as partition columns. The resulting partitions in the above example are as follows:
+
+```
+p201701_1000: [(MIN_VALUE, MIN_VALUE), ("2017-02-01", "1000") )
+p201702_2000: [("2017-02-01", "1000"), ("2017-03-01", "2000") )
+p201703_all: [("2017-03-01", "2000"), ("2017-04-01", MIN_VALUE))
+```
+
+Note that the last partition user defaults only the partition value of the `date` column, so the partition value of the `id` column will be filled with `MIN_VALUE` by default. When the user inserts data, the partition column values ​​are compared in order, and the corresponding partition is finally obtained. Examples are as follows:
+
+```
+ Data --> Partition
+ 2017-01-01, 200   --> p201701_1000
+ 2017-01-01, 2000  --> p201701_1000
+ 2017-02-01, 100   --> p201701_1000
+ 2017-02-01, 2000  --> p201702_2000
+ 2017-02-15, 5000  --> p201702_2000
+ 2017-03-01, 2000  --> p201703_all
+ 2017-03-10, 1     --> p201703_all
+ 2017-04-01, 1000  --> Unable to import
+ 2017-05-01, 1000  --> Unable to import
+```
 
 ##### List Partition
 
-    ```
+```
     PARTITION BY LIST(`id`, `city`)
     (
         PARTITION `p1_city` VALUES IN (("1", "Beijing"), ("1", "Shanghai")),
         PARTITION `p2_city` VALUES IN (("2", "Beijing"), ("2", "Shanghai")),
         PARTITION `p3_city` VALUES IN (("3", "Beijing"), ("3", "Shanghai"))
     )
-    ```
+```
 
-    In the above example, we specify `id`(INT type) and `city`(VARCHAR type) as partition columns. The above example ends up with the following partitions.
+In the above example, we specify `id`(INT type) and `city`(VARCHAR type) as partition columns. The above example ends up with the following partitions.
 
-    ```
-    * p1_city: [("1", "Beijing"), ("1", "Shanghai")]
-    * p2_city: [("2", "Beijing"), ("2", "Shanghai")]
-    * p3_city: [("3", "Beijing"), ("3", "Shanghai")]
-    ```
+```
+ p1_city: [("1", "Beijing"), ("1", "Shanghai")]
+ p2_city: [("2", "Beijing"), ("2", "Shanghai")]
+ p3_city: [("3", "Beijing"), ("3", "Shanghai")]
+```
 
-    When the user inserts data, the partition column values will be compared sequentially in order to finally get the corresponding partition. An example is as follows.
+When the user inserts data, the partition column values will be compared sequentially in order to finally get the corresponding partition. An example is as follows.
 
-    ```
-    * Data ---> Partition
-    * 1, Beijing  ---> p1_city
-    * 1, Shanghai ---> p1_city
-    * 2, Shanghai ---> p2_city
-    * 3, Beijing  ---> p3_city
-    * 1, Tianjin  ---> Unable to import
-    * 4, Beijing  ---> Unable to import
-    ```
+```  
+Data ---> Partition
+1, Beijing  ---> p1_city
+1, Shanghai ---> p1_city
+2, Shanghai ---> p2_city
+3, Beijing  ---> p3_city
+1, Tianjin  ---> Unable to import
+4, Beijing  ---> Unable to import
+```      
 
 ### PROPERTIES
 
@@ -344,10 +344,10 @@ In the last PROPERTIES of the table statement, you can specify the following two
 
 Replication_num
 
-    * The number of copies per tablet. The default is 3, it is recommended to keep the default. In the build statement, the number of Tablet copies in all Partitions is uniformly specified. When you add a new partition, you can individually specify the number of copies of the tablet in the new partition.
-    * The number of copies can be modified at runtime. It is strongly recommended to keep odd numbers.
-    * The maximum number of copies depends on the number of independent IPs in the cluster (note that it is not the number of BEs). The principle of replica distribution in Doris is that the copies of the same Tablet are not allowed to be distributed on the same physical machine, and the physical machine is identified as IP. Therefore, even if 3 or more BE instances are deployed on the same physical machine, if the BEs have the same IP, you can only set the number of copies to 1.
-    * For some small, and infrequently updated dimension tables, consider setting more copies. In this way, when joining queries, there is a greater probability of local data join.
+  * The number of copies per tablet. The default is 3, it is recommended to keep the default. In the build statement, the number of Tablet copies in all Partitions is uniformly specified. When you add a new partition, you can individually specify the number of copies of the tablet in the new partition.
+  * The number of copies can be modified at runtime. It is strongly recommended to keep odd numbers.
+  * The maximum number of copies depends on the number of independent IPs in the cluster (note that it is not the number of BEs). The principle of replica distribution in Doris is that the copies of the same Tablet are not allowed to be distributed on the same physical machine, and the physical machine is identified as IP. Therefore, even if 3 or more BE instances are deployed on the same physical machine, if the BEs have the same IP, you can only set the number of copies to 1.
+  * For some small, and infrequently updated dimension tables, consider setting more copies. In this way, when joining queries, there is a greater probability of local data join.
 
 2. storage_medium & storage\_cooldown\_time
 
diff --git a/docs/zh-CN/getting-started/data-model-rollup.md b/docs/zh-CN/getting-started/data-model-rollup.md
index 4decc8197a..eda5f075ec 100644
--- a/docs/zh-CN/getting-started/data-model-rollup.md
+++ b/docs/zh-CN/getting-started/data-model-rollup.md
@@ -331,9 +331,9 @@ DUPLICATE KEY(`timestamp`, `type`)
 ```
 
 这种数据模型区别于 Aggregate 和 Unique 模型。数据完全按照导入文件中的数据进行存储,不会有任何聚合。即使两行数据完全相同,也都会保留。
-而在建表语句中指定的 DUPLICATE KEY,只是用来指明底层数据按照那些列进行排序。(更贴切的名称应该为 “Sorted Column”,这里取名 “DUPLICATE KEY” 只是用以明确表示所用的数据模型。关于 “Sorted Column”的更多解释,可以参阅**前缀索引**小节)。在 DUPLICATE KEY 的选择上,我们建议适当的选择前 2-4 列就可以。
+而在建表语句中指定的 DUPLICATE KEY,只是用来指明底层数据按照那些列进行排序。(更贴切的名称应该为 “Sorted Column”,这里取名 “DUPLICATE KEY” 只是用以明确表示所用的数据模型。关于 “Sorted Column”的更多解释,可以参阅 [前綴索引](https://doris.apache.org/zh-CN/getting-started/data-model-rollup.html#%E5%89%8D%E7%BC%80%E7%B4%A2%E5%BC%95) 小节。在 DUPLICATE KEY 的选择上,我们建议适当的选择前 2-4 列就可以。
 
-这种数据模型适用于既没有聚合需求,又没有主键唯一性约束的原始数据的存储。更多使用场景,可参阅**聚合模型的局限性**小节。
+这种数据模型适用于既没有聚合需求,又没有主键唯一性约束的原始数据的存储。更多使用场景,可参阅 [聚合模型的局限性](https://doris.apache.org/zh-CN/getting-started/data-model-rollup.html#%E8%81%9A%E5%90%88%E6%A8%A1%E5%9E%8B%E7%9A%84%E5%B1%80%E9%99%90%E6%80%A7) 小节。
 
 ## ROLLUP
 
diff --git a/docs/zh-CN/getting-started/data-partition.md b/docs/zh-CN/getting-started/data-partition.md
index 14f6073a1a..7f5a75f12d 100644
--- a/docs/zh-CN/getting-started/data-partition.md
+++ b/docs/zh-CN/getting-started/data-partition.md
@@ -278,68 +278,68 @@ Doris 支持指定多列作为分区列,示例如下:
 
 ##### Range 分区
 
-    ```
+```
     PARTITION BY RANGE(`date`, `id`)
     (
         PARTITION `p201701_1000` VALUES LESS THAN ("2017-02-01", "1000"),
         PARTITION `p201702_2000` VALUES LESS THAN ("2017-03-01", "2000"),
         PARTITION `p201703_all`  VALUES LESS THAN ("2017-04-01")
     )
-    ```
-
-    在以上示例中,我们指定 `date`(DATE 类型) 和 `id`(INT 类型) 作为分区列。以上示例最终得到的分区如下:
-
-    ```
-    * p201701_1000:    [(MIN_VALUE,  MIN_VALUE), ("2017-02-01", "1000")   )
-    * p201702_2000:    [("2017-02-01", "1000"),  ("2017-03-01", "2000")   )
-    * p201703_all:     [("2017-03-01", "2000"),  ("2017-04-01", MIN_VALUE)) 
-    ```
-
-    注意,最后一个分区用户缺省只指定了 `date` 列的分区值,所以 `id` 列的分区值会默认填充 `MIN_VALUE`。当用户插入数据时,分区列值会按照顺序依次比较,最终得到对应的分区。举例如下:
-
-    ```
-    * 数据  -->  分区
-    * 2017-01-01, 200     --> p201701_1000
-    * 2017-01-01, 2000    --> p201701_1000
-    * 2017-02-01, 100     --> p201701_1000
-    * 2017-02-01, 2000    --> p201702_2000
-    * 2017-02-15, 5000    --> p201702_2000
-    * 2017-03-01, 2000    --> p201703_all
-    * 2017-03-10, 1       --> p201703_all
-    * 2017-04-01, 1000    --> 无法导入
-    * 2017-05-01, 1000    --> 无法导入
-    ```
+```
+
+ 在以上示例中,我们指定 `date`(DATE 类型) 和 `id`(INT 类型) 作为分区列。以上示例最终得到的分区如下:
+
+```
+ p201701_1000:    [(MIN_VALUE,  MIN_VALUE), ("2017-02-01", "1000")   )
+ p201702_2000:    [("2017-02-01", "1000"),  ("2017-03-01", "2000")   )
+ p201703_all:     [("2017-03-01", "2000"),  ("2017-04-01", MIN_VALUE)) 
+```
+
+注意,最后一个分区用户缺省只指定了 `date` 列的分区值,所以 `id` 列的分区值会默认填充 `MIN_VALUE`。当用户插入数据时,分区列值会按照顺序依次比较,最终得到对应的分区。举例如下:
+
+```
+ 数据  -->  分区
+ 2017-01-01, 200     --> p201701_1000
+ 2017-01-01, 2000    --> p201701_1000
+ 2017-02-01, 100     --> p201701_1000
+ 2017-02-01, 2000    --> p201702_2000
+ 2017-02-15, 5000    --> p201702_2000
+ 2017-03-01, 2000    --> p201703_all
+ 2017-03-10, 1       --> p201703_all
+ 2017-04-01, 1000    --> 无法导入
+ 2017-05-01, 1000    --> 无法导入
+```
 
 ##### List 分区
 
-    ```
+```
     PARTITION BY LIST(`id`, `city`)
     (
         PARTITION `p1_city` VALUES IN (("1", "Beijing"), ("1", "Shanghai")),
         PARTITION `p2_city` VALUES IN (("2", "Beijing"), ("2", "Shanghai")),
         PARTITION `p3_city` VALUES IN (("3", "Beijing"), ("3", "Shanghai"))
     )
-    ```
+```
 
-    在以上示例中,我们指定 `id`(INT 类型) 和 `city`(VARCHAR 类型) 作为分区列。以上示例最终得到的分区如下:
+在以上示例中,我们指定 `id`(INT 类型) 和 `city`(VARCHAR 类型) 作为分区列。以上示例最终得到的分区如下:
 
-    ```
-    * p1_city: [("1", "Beijing"), ("1", "Shanghai")]
-    * p2_city: [("2", "Beijing"), ("2", "Shanghai")]
-    * p3_city: [("3", "Beijing"), ("3", "Shanghai")]
-    ```
+```
+ p1_city: [("1", "Beijing"), ("1", "Shanghai")]
+ p2_city: [("2", "Beijing"), ("2", "Shanghai")]
+ p3_city: [("3", "Beijing"), ("3", "Shanghai")]
+```
 
-    当用户插入数据时,分区列值会按照顺序依次比较,最终得到对应的分区。举例如下:
+当用户插入数据时,分区列值会按照顺序依次比较,最终得到对应的分区。举例如下:
 
-    ```
-    * 数据  --->  分区
-    * 1, Beijing     ---> p1_city
-    * 1, Shanghai    ---> p1_city
-    * 2, Shanghai    ---> p2_city
-    * 3, Beijing     ---> p3_city
-    * 1, Tianjin     ---> 无法导入
-    * 4, Beijing     ---> 无法导入
-    ```
+```
+ 数据  --->  分区
+ 1, Beijing     ---> p1_city
+ 1, Shanghai    ---> p1_city
+ 2, Shanghai    ---> p2_city
+ 3, Beijing     ---> p3_city
+ 1, Tianjin     ---> 无法导入
+ 4, Beijing     ---> 无法导入
+```
 
 ### PROPERTIES
 
@@ -367,7 +367,7 @@ Doris 支持指定多列作为分区列,示例如下:
 
 ### 其他
 
-    `IF NOT EXISTS` 表示如果没有创建过该表,则创建。注意这里只判断表名是否存在,而不会判断新建表结构是否与已存在的表结构相同。所以如果存在一个同名但不同构的表,该命令也会返回成功,但并不代表已经创建了新的表和新的结构。
+ `IF NOT EXISTS` 表示如果没有创建过该表,则创建。注意这里只判断表名是否存在,而不会判断新建表结构是否与已存在的表结构相同。所以如果存在一个同名但不同构的表,该命令也会返回成功,但并不代表已经创建了新的表和新的结构。
 
 ## 常见问题
 


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org