You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@doris.apache.org by yi...@apache.org on 2022/04/12 00:45:21 UTC

[incubator-doris] branch master updated: DML help doc (#8953)

This is an automated email from the ASF dual-hosted git repository.

yiguolei pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-doris.git


The following commit(s) were added to refs/heads/master by this push:
     new 7494a4cbdf DML help doc (#8953)
7494a4cbdf is described below

commit 7494a4cbdfafbd04af5cee8decd25c0c1da628ca
Author: jiafeng.zhang <zh...@gmail.com>
AuthorDate: Tue Apr 12 08:45:16 2022 +0800

    DML help doc (#8953)
    
    DML help doc
---
 .../Manipulation/DELETE.md                         |  51 +++++
 .../Manipulation/INSERT.md                         |  82 +++++++
 .../Manipulation/UPDATE.md                         |  46 ++++
 .../Data-Manipulation-Statements/OUTFILE.md        | 246 ++++++++++++++++++++-
 .../Manipulation/DELETE.md                         |  49 ++++
 .../Manipulation/INSERT.md                         |  82 +++++++
 .../Manipulation/UPDATE.md                         |  46 ++++
 .../Data-Manipulation-Statements/OUTFILE.md        | 127 ++++++++---
 8 files changed, 697 insertions(+), 32 deletions(-)

diff --git a/new-docs/en/sql-manual/sql-reference-v2/Data-Manipulation-Statements/Manipulation/DELETE.md b/new-docs/en/sql-manual/sql-reference-v2/Data-Manipulation-Statements/Manipulation/DELETE.md
index e233d84024..447a64cc78 100644
--- a/new-docs/en/sql-manual/sql-reference-v2/Data-Manipulation-Statements/Manipulation/DELETE.md
+++ b/new-docs/en/sql-manual/sql-reference-v2/Data-Manipulation-Statements/Manipulation/DELETE.md
@@ -26,10 +26,61 @@ under the License.
 
 ## DELETE
 
+### Name
+
+DELETE
+
 ### Description
 
+This statement is used to conditionally delete data in the specified table (base index) partition.
+
+This operation will also delete the data of the rollup index related to this base index.
+
+grammar:
+
+````SQL
+DELETE FROM table_name [PARTITION partition_name | PARTITIONS (p1, p2)]
+WHERE
+column_name1 op { value | value_list } [ AND column_name2 op { value | value_list } ...];
+````
+
+illustrate:
+
+1. The optional types of op include: =, >, <, >=, <=, !=, in, not in
+2. Only conditions on the key column can be specified.
+3. When the selected key column does not exist in a rollup, delete cannot be performed.
+4. Conditions can only have an "and" relationship. If you want to achieve an "or" relationship, you need to write the conditions in two DELETE statements.
+5. If it is a partitioned table, you can specify a partition. If not specified, and the session variable delete_without_partition is true, it will be applied to all partitions. If it is a single-partition table, it can be left unspecified.
+
+Notice:
+
+1. This statement may reduce query efficiency for a period of time after execution.
+2. The degree of impact depends on the number of delete conditions specified in the statement.
+3. The more conditions you specify, the greater the impact.
+
 ### Example
 
+1. Delete the data row whose k1 column value is 3 in my_table partition p1
+
+   ```sql
+   DELETE FROM my_table PARTITION p1
+       WHERE k1 = 3;
+   ````
+
+2. Delete the data rows where the value of column k1 is greater than or equal to 3 and the value of column k2 is "abc" in my_table partition p1
+
+   ```sql
+   DELETE FROM my_table PARTITION p1
+   WHERE k1 >= 3 AND k2 = "abc";
+   ````
+
+3. Delete the data rows where the value of column k1 is greater than or equal to 3 and the value of column k2 is "abc" in my_table partition p1, p2
+
+   ```sql
+   DELETE FROM my_table PARTITIONS (p1, p2)
+   WHERE k1 >= 3 AND k2 = "abc";
+   ````
+
 ### Keywords
 
     DELETE
diff --git a/new-docs/en/sql-manual/sql-reference-v2/Data-Manipulation-Statements/Manipulation/INSERT.md b/new-docs/en/sql-manual/sql-reference-v2/Data-Manipulation-Statements/Manipulation/INSERT.md
index 9931392c45..70bf4fa2b1 100644
--- a/new-docs/en/sql-manual/sql-reference-v2/Data-Manipulation-Statements/Manipulation/INSERT.md
+++ b/new-docs/en/sql-manual/sql-reference-v2/Data-Manipulation-Statements/Manipulation/INSERT.md
@@ -26,10 +26,92 @@ under the License.
 
 ## INSERT
 
+### Name
+
+INSERT
+
 ### Description
 
+The change statement is to complete the data insertion operation.
+
+```sql
+INSERT INTO table_name
+    [ PARTITION (p1, ...) ]
+    [ WITH LABEL label]
+    [ (column [, ...]) ]
+    [ [ hint [, ...] ] ]
+    { VALUES ( { expression | DEFAULT } [, ...] ) [, ...] | query }
+````
+
+ Parameters
+
+> tablet_name: The destination table for importing data. Can be of the form `db_name.table_name`
+>
+> partitions: Specify the partitions to be imported, which must be partitions that exist in `table_name`. Multiple partition names are separated by commas
+>
+> label: specify a label for the Insert task
+>
+> column_name: The specified destination column, must be a column that exists in `table_name`
+>
+> expression: the corresponding expression that needs to be assigned to a column
+>
+> DEFAULT: let the corresponding column use the default value
+>
+> query: a common query, the result of the query will be written to the target
+>
+> hint: some indicator used to indicate the execution behavior of `INSERT`. Both `streaming` and the default non-`streaming` method use synchronous mode to complete `INSERT` statement execution
+> The non-`streaming` method will return a label after the execution is completed, which is convenient for users to query the import status through `SHOW LOAD`
+
+Notice:
+
+When executing the `INSERT` statement, the default behavior is to filter the data that does not conform to the target table format, such as the string is too long. However, for business scenarios that require data not to be filtered, you can set the session variable `enable_insert_strict` to `true` to ensure that `INSERT` will not be executed successfully when data is filtered out.
+
 ### Example
 
+The `test` table contains two columns `c1`, `c2`.
+
+1. Import a row of data into the `test` table
+
+```sql
+INSERT INTO test VALUES (1, 2);
+INSERT INTO test (c1, c2) VALUES (1, 2);
+INSERT INTO test (c1, c2) VALUES (1, DEFAULT);
+INSERT INTO test (c1) VALUES (1);
+````
+
+The first and second statements have the same effect. When no target column is specified, the column order in the table is used as the default target column.
+The third and fourth statements express the same meaning, use the default value of the `c2` column to complete the data import.
+
+2. Import multiple rows of data into the `test` table at one time
+
+```sql
+INSERT INTO test VALUES (1, 2), (3, 2 + 2);
+INSERT INTO test (c1, c2) VALUES (1, 2), (3, 2 * 2);
+INSERT INTO test (c1) VALUES (1), (3);
+INSERT INTO test (c1, c2) VALUES (1, DEFAULT), (3, DEFAULT);
+````
+
+The first and second statements have the same effect, import two pieces of data into the `test` table at one time
+The effect of the third and fourth statements is known, and the default value of the `c2` column is used to import two pieces of data into the `test` table
+
+3. Import a query result into the `test` table
+
+```sql
+INSERT INTO test SELECT * FROM test2;
+INSERT INTO test (c1, c2) SELECT * from test2;
+````
+
+4. Import a query result into the `test` table, specifying the partition and label
+
+```sql
+INSERT INTO test PARTITION(p1, p2) WITH LABEL `label1` SELECT * FROM test2;
+INSERT INTO test WITH LABEL `label1` (c1, c2) SELECT * from test2;
+````
+
+Asynchronous import is actually a synchronous import encapsulated into asynchronous. Filling in streaming and not filling in **execution efficiency is the same**.
+
+Since the previous import methods of Doris are all asynchronous import methods, in order to be compatible with the old usage habits, the `INSERT` statement without streaming will still return a label. Users need to view the `label` import job through the `SHOW LOAD` command. state.
+
 ### Keywords
 
     INSERT
diff --git a/new-docs/en/sql-manual/sql-reference-v2/Data-Manipulation-Statements/Manipulation/UPDATE.md b/new-docs/en/sql-manual/sql-reference-v2/Data-Manipulation-Statements/Manipulation/UPDATE.md
index bf608a8849..5605f62d13 100644
--- a/new-docs/en/sql-manual/sql-reference-v2/Data-Manipulation-Statements/Manipulation/UPDATE.md
+++ b/new-docs/en/sql-manual/sql-reference-v2/Data-Manipulation-Statements/Manipulation/UPDATE.md
@@ -26,10 +26,56 @@ under the License.
 
 ## UPDATE
 
+### Name
+
+UPDATE
+
 ### Description
 
+This statement is used to update the data (the update statement currently only supports the Unique Key model).
+
+```sql
+UPDATE table_name
+    SET assignment_list
+    WHERE expression
+
+value:
+    {expr | DEFAULT}
+
+assignment:
+    col_name = value
+
+assignment_list:
+    assignment [, assignment] ...
+````
+
+ Parameters
+
++ table_name: The target table of the data to be updated. Can be of the form 'db_name.table_name'
++ assignment_list: The target column to be updated, in the format 'col_name = value, col_name = value'
++ where expression: the condition that is expected to be updated, an expression that returns true or false can be
+
+ Note
+
+The current UPDATE statement only supports row updates on the Unique model, and there may be data conflicts caused by concurrent updates.
+At present, Doris does not deal with such problems, and users need to avoid such problems from the business side.
+
 ### Example
 
+The `test` table is a unique model table, which contains four columns: k1, k2, v1, v2. Where k1, k2 are keys, v1, v2 are values, and the aggregation method is Replace.
+
+1. Update the v1 column in the 'test' table that satisfies the conditions k1 =1 , k2 =2 to 1
+
+```sql
+UPDATE test SET v1 = 1 WHERE k1=1 and k2=2;
+````
+
+2. Increment the v1 column of the k1=1 column in the 'test' table by 1
+
+```sql
+UPDATE test SET v1 = v1+1 WHERE k1=1;
+````
+
 ### Keywords
 
     UPDATE
diff --git a/new-docs/en/sql-manual/sql-reference-v2/Data-Manipulation-Statements/OUTFILE.md b/new-docs/en/sql-manual/sql-reference-v2/Data-Manipulation-Statements/OUTFILE.md
index 91f7d915b1..fd0978629b 100644
--- a/new-docs/en/sql-manual/sql-reference-v2/Data-Manipulation-Statements/OUTFILE.md
+++ b/new-docs/en/sql-manual/sql-reference-v2/Data-Manipulation-Statements/OUTFILE.md
@@ -25,8 +25,252 @@ under the License.
 -->
 
 # OUTFILE
-## description
+### Name
+
+OURFILE
+
+##description
+
+This statement is used to export query results to a file using the `SELECT INTO OUTFILE` command. Currently, it supports exporting to remote storage, such as HDFS, S3, BOS, COS (Tencent Cloud), through the Broker process, through the S3 protocol, or directly through the HDFS protocol.
+    
+grammar:
+
+````
+query_stmt
+INTO OUTFILE "file_path"
+[format_as]
+[properties]
+````
+
+illustrate:
+
+1. file_path
+
+    file_path points to the path where the file is stored and the file prefix. Such as `hdfs://path/to/my_file_`.
+
+   The final filename will consist of `my_file_`, the file number and the file format suffix. The file serial number starts from 0, and the number is the number of files to be divided. Such as:
+
+    my_file_abcdefg_0.csv
+    my_file_abcdefg_1.csv
+    my_file_abcdegf_2.csv
+
+2. format_as
+
+    FORMAT AS CSV
+
+    Specifies the export format. Default is CSV.
+
+3. properties
+
+    Specify related properties. Currently exporting via the Broker process, or via the S3 protocol is supported.
+
+       grammar:
+       [PROPERTIES ("key"="value", ...)]
+       The following properties are supported:
+       column_separator: column separator
+       line_delimiter: line delimiter
+       max_file_size: The size limit of a single file, if the result exceeds this value, it will be cut into multiple files.
+    
+       Broker related properties need to be prefixed with `broker.`:
+       broker.name: broker name
+       broker.hadoop.security.authentication: specify the authentication method as kerberos
+       broker.kerberos_principal: specifies the principal of kerberos
+       broker.kerberos_keytab: Specifies the path to the keytab file of kerberos. The file must be the absolute path to the file on the server where the broker process is located. and can be accessed by the Broker process
+    
+       HDFS related properties need to be prefixed with `hdfs.`:
+       hdfs.fs.defaultFS: namenode address and port
+       hdfs.hdfs_user: hdfs username
+    
+       For the S3 protocol, you can directly execute the S3 protocol configuration:
+       AWS_ENDPOINT
+       AWS_ACCESS_KEY
+       AWS_SECRET_KEY
+       AWS_REGION
 
 ## example
 
+1. Use the broker method to export, and export the simple query results to the file `hdfs://path/to/result.txt`. Specifies that the export format is CSV. Use `my_broker` and set kerberos authentication information. Specify the column separator as `,` and the row separator as `\n`.
+
+   ```sql
+   SELECT * FROM tbl
+   INTO OUTFILE "hdfs://path/to/result_"
+   FORMAT AS CSV
+   PROPERTIES
+   (
+       "broker.name" = "my_broker",
+       "broker.hadoop.security.authentication" = "kerberos",
+       "broker.kerberos_principal" = "doris@YOUR.COM",
+       "broker.kerberos_keytab" = "/home/doris/my.keytab",
+       "column_separator" = ",",
+       "line_delimiter" = "\n",
+       "max_file_size" = "100MB"
+   );
+   ````
+
+   If the final generated file is not larger than 100MB, it will be: `result_0.csv`.
+   If larger than 100MB, it may be `result_0.csv, result_1.csv, ...`.
+
+2. Export the simple query results to the file `hdfs://path/to/result.parquet`. Specify the export format as PARQUET. Use `my_broker` and set kerberos authentication information.
+
+   ```sql
+   SELECT c1, c2, c3 FROM tbl
+   INTO OUTFILE "hdfs://path/to/result_"
+   FORMAT AS PARQUET
+   PROPERTIES
+   (
+       "broker.name" = "my_broker",
+       "broker.hadoop.security.authentication" = "kerberos",
+       "broker.kerberos_principal" = "doris@YOUR.COM",
+       "broker.kerberos_keytab" = "/home/doris/my.keytab",
+       "schema"="required,int32,c1;required,byte_array,c2;required,byte_array,c2"
+   );
+   ````
+
+   Exporting query results to parquet files requires explicit `schema`.
+
+3. Export the query result of the CTE statement to the file `hdfs://path/to/result.txt`. The default export format is CSV. Use `my_broker` and set hdfs high availability information. Use the default row and column separators.
+
+   ```sql
+   WITH
+   x1 AS
+   (SELECT k1, k2 FROM tbl1),
+   x2 AS
+   (SELECT k3 FROM tbl2)
+   SELEC k1 FROM x1 UNION SELECT k3 FROM x2
+   INTO OUTFILE "hdfs://path/to/result_"
+   PROPERTIES
+   (
+       "broker.name" = "my_broker",
+       "broker.username"="user",
+       "broker.password"="passwd",
+       "broker.dfs.nameservices" = "my_ha",
+       "broker.dfs.ha.namenodes.my_ha" = "my_namenode1, my_namenode2",
+       "broker.dfs.namenode.rpc-address.my_ha.my_namenode1" = "nn1_host:rpc_port",
+       "broker.dfs.namenode.rpc-address.my_ha.my_namenode2" = "nn2_host:rpc_port",
+       "broker.dfs.client.failover.proxy.provider" = "org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider"
+   );
+   ````
+
+   If the final generated file is not larger than 1GB, it will be: `result_0.csv`.
+   If larger than 1GB, it may be `result_0.csv, result_1.csv, ...`.
+
+4. Export the query result of the UNION statement to the file `bos://bucket/result.txt`. Specify the export format as PARQUET. Use `my_broker` and set hdfs high availability information. The PARQUET format does not require a column delimiter to be specified.
+   After the export is complete, an identity file is generated.
+
+   ```sql
+   SELECT k1 FROM tbl1 UNION SELECT k2 FROM tbl1
+   INTO OUTFILE "bos://bucket/result_"
+   FORMAT AS PARQUET
+   PROPERTIES
+   (
+       "broker.name" = "my_broker",
+       "broker.bos_endpoint" = "http://bj.bcebos.com",
+       "broker.bos_accesskey" = "xxxxxxxxxxxxxxxxxxxxxxxxxxx",
+       "broker.bos_secret_accesskey" = "yyyyyyyyyyyyyyyyyyyyyyyyy",
+       "schema"="required,int32,k1;required,byte_array,k2"
+   );
+   ````
+
+5. Export the query result of the select statement to the file `cos://${bucket_name}/path/result.txt`. Specify the export format as csv.
+   After the export is complete, an identity file is generated.
+
+   ```sql
+   select k1,k2,v1 from tbl1 limit 100000
+   into outfile "s3a://my_bucket/export/my_file_"
+   FORMAT AS CSV
+   PROPERTIES
+   (
+       "broker.name" = "hdfs_broker",
+       "broker.fs.s3a.access.key" = "xxx",
+       "broker.fs.s3a.secret.key" = "xxxx",
+       "broker.fs.s3a.endpoint" = "https://cos.xxxxxx.myqcloud.com/",
+       "column_separator" = ",",
+       "line_delimiter" = "\n",
+       "max_file_size" = "1024MB",
+       "success_file_name" = "SUCCESS"
+   )
+   ````
+
+   If the final generated file is not larger than 1GB, it will be: `my_file_0.csv`.
+   If larger than 1GB, it may be `my_file_0.csv, result_1.csv, ...`.
+   Verify on cos
+
+          1. A path that does not exist will be automatically created
+             2. Access.key/secret.key/endpoint needs to be confirmed with cos students. Especially the value of endpoint does not need to fill in bucket_name.
+
+6. Use the s3 protocol to export to bos, and enable concurrent export.
+
+   ```sql
+   set enable_parallel_outfile = true;
+   select k1 from tb1 limit 1000
+   into outfile "s3://my_bucket/export/my_file_"
+   format as csv
+   properties
+   (
+       "AWS_ENDPOINT" = "http://s3.bd.bcebos.com",
+       "AWS_ACCESS_KEY" = "xxxx",
+       "AWS_SECRET_KEY" = "xxx",
+       "AWS_REGION" = "bd"
+   )
+   ````
+
+   The resulting file is prefixed with `my_file_{fragment_instance_id}_`.
+
+7. Use the s3 protocol to export to bos, and enable concurrent export of session variables.
+   Note: However, since the query statement has a top-level sorting node, even if the concurrently exported session variable is enabled for this query, it cannot be exported concurrently.
+
+   ```sql
+   set enable_parallel_outfile = true;
+   select k1 from tb1 order by k1 limit 1000
+   into outfile "s3://my_bucket/export/my_file_"
+   format as csv
+   properties
+   (
+       "AWS_ENDPOINT" = "http://s3.bd.bcebos.com",
+       "AWS_ACCESS_KEY" = "xxxx",
+       "AWS_SECRET_KEY" = "xxx",
+       "AWS_REGION" = "bd"
+   )
+   ````
+
+8. Use hdfs export to export simple query results to the file `hdfs://path/to/result.txt`. Specify the export format as CSV and the user name as work. Specify the column separator as `,` and the row separator as `\n`.
+
+   ```sql
+   SELECT * FROM tbl
+   INTO OUTFILE "hdfs://path/to/result_"
+   FORMAT AS CSV
+   PROPERTIES
+   (
+       "hdfs.fs.defaultFS" = "hdfs://ip:port",
+       "hdfs.hdfs_user" = "work"
+   );
+   ````
+
+   If the final generated file is not larger than 100MB, it will be: `result_0.csv`.
+   If larger than 100MB, it may be `result_0.csv, result_1.csv, ...`.
+
 ## keyword
+
+```
+OUTFILE
+```
+
+### Best Practice
+
+1. Export data volume and export efficiency
+
+   This function essentially executes an SQL query command. The final result is a single-threaded output. Therefore, the time-consuming of the entire export includes the time-consuming of the query itself and the time-consuming of writing the final result set. If the query is large, you need to set the session variable `query_timeout` to appropriately extend the query timeout.
+
+2. Management of export files
+
+   Doris does not manage exported files. Including the successful export, or the remaining files after the export fails, all need to be handled by the user.
+
+3. Export to local file
+
+   The ability to export to a local file is not available for public cloud users, only for private deployments. And the default user has full control over the cluster nodes. Doris will not check the validity of the export path filled in by the user. If the process user of Doris does not have write permission to the path, or the path does not exist, an error will be reported. At the same time, for security reasons, if a file with the same name already exists in this path, the export will  [...]
+
+   Doris does not manage files exported locally, nor does it check disk space, etc. These files need to be managed by the user, such as cleaning and so on.
+
+4. Results Integrity Guarantee
+
+   This command is a synchronous command, so it is possible that the task connection is disconnected during the execution process, so that it is impossible to live the exported data whether it ends normally, or whether it is complete. At this point, you can use the `success_file_name` parameter to request that a successful file identifier be generated in the directory after the task is successful. Users can use this file to determine whether the export ends normally.
diff --git a/new-docs/zh-CN/sql-manual/sql-reference-v2/Data-Manipulation-Statements/Manipulation/DELETE.md b/new-docs/zh-CN/sql-manual/sql-reference-v2/Data-Manipulation-Statements/Manipulation/DELETE.md
index 6831410090..5a7f0417fb 100644
--- a/new-docs/zh-CN/sql-manual/sql-reference-v2/Data-Manipulation-Statements/Manipulation/DELETE.md
+++ b/new-docs/zh-CN/sql-manual/sql-reference-v2/Data-Manipulation-Statements/Manipulation/DELETE.md
@@ -26,10 +26,59 @@ under the License.
 
 ## DELETE
 
+### Name
+
+DELETE
+
 ### Description
 
+该语句用于按条件删除指定 table(base index) partition 中的数据。
+
+该操作会同时删除和此 base index 相关的 rollup index 的数据。
+
+语法:
+
+```SQL
+DELETE FROM table_name [PARTITION partition_name | PARTITIONS (p1, p2)]
+WHERE
+column_name1 op { value | value_list } [ AND column_name2 op { value | value_list } ...];
+```
+说明:
+
+1. op 的可选类型包括:=, >, <, >=, <=, !=, in, not in
+2.  只能指定 key 列上的条件。
+3.  当选定的 key 列不存在于某个 rollup 中时,无法进行 delete。
+4.  条件之间只能是“与”的关系。若希望达成“或”的关系,需要将条件分写在两个 DELETE 语句中。
+5. 如果为分区表,可以指定分区,如不指定,且会话变量 delete_without_partition 为 true,则会应用到所有分区。如果是单分区表,可以不指定。
+
+注意:
+1. 该语句可能会降低执行后一段时间内的查询效率。
+2. 影响程度取决于语句中指定的删除条件的数量。
+3. 指定的条件越多,影响越大。
+
 ### Example
 
+1. 删除 my_table partition p1 中 k1 列值为 3 的数据行
+    
+    ```sql
+    DELETE FROM my_table PARTITION p1
+        WHERE k1 = 3;
+    ```
+    
+1. 删除 my_table partition p1 中 k1 列值大于等于 3 且 k2 列值为 "abc" 的数据行
+    
+    ```sql
+    DELETE FROM my_table PARTITION p1
+    WHERE k1 >= 3 AND k2 = "abc";
+    ```
+    
+1. 删除 my_table partition p1, p2 中 k1 列值大于等于 3 且 k2 列值为 "abc" 的数据行
+    
+    ```sql
+    DELETE FROM my_table PARTITIONS (p1, p2)
+    WHERE k1 >= 3 AND k2 = "abc";
+    ```
+
 ### Keywords
 
     DELETE
diff --git a/new-docs/zh-CN/sql-manual/sql-reference-v2/Data-Manipulation-Statements/Manipulation/INSERT.md b/new-docs/zh-CN/sql-manual/sql-reference-v2/Data-Manipulation-Statements/Manipulation/INSERT.md
index 535ed628c2..f5efb560ac 100644
--- a/new-docs/zh-CN/sql-manual/sql-reference-v2/Data-Manipulation-Statements/Manipulation/INSERT.md
+++ b/new-docs/zh-CN/sql-manual/sql-reference-v2/Data-Manipulation-Statements/Manipulation/INSERT.md
@@ -26,10 +26,92 @@ under the License.
 
 ## INSERT
 
+### Name
+
+INSERT
+
 ### Description
 
+改语句是完成数据插入操作。
+
+```sql
+INSERT INTO table_name
+    [ PARTITION (p1, ...) ]
+    [ WITH LABEL label]
+    [ (column [, ...]) ]
+    [ [ hint [, ...] ] ]
+    { VALUES ( { expression | DEFAULT } [, ...] ) [, ...] | query }
+```
+
+ Parameters
+
+> tablet_name: 导入数据的目的表。可以是 `db_name.table_name` 形式
+>
+> partitions: 指定待导入的分区,必须是 `table_name` 中存在的分区,多个分区名称用逗号分隔
+>
+> label: 为 Insert 任务指定一个 label
+>
+> column_name: 指定的目的列,必须是 `table_name` 中存在的列
+>
+> expression: 需要赋值给某个列的对应表达式
+>
+> DEFAULT: 让对应列使用默认值
+>
+> query: 一个普通查询,查询的结果会写入到目标中
+>
+> hint: 用于指示 `INSERT` 执行行为的一些指示符。`streaming` 和 默认的非 `streaming` 方式均会使用同步方式完成 `INSERT` 语句执行
+>    非 `streaming` 方式在执行完成后会返回一个 label 方便用户通过 `SHOW LOAD` 查询导入的状态
+
+注意:
+
+当前执行 `INSERT` 语句时,对于有不符合目标表格式的数据,默认的行为是过滤,比如字符串超长等。但是对于有要求数据不能够被过滤的业务场景,可以通过设置会话变量 `enable_insert_strict` 为 `true` 来确保当有数据被过滤掉的时候,`INSERT` 不会被执行成功。
+
 ### Example
 
+`test` 表包含两个列`c1`, `c2`。
+
+1. 向`test`表中导入一行数据
+
+```sql
+INSERT INTO test VALUES (1, 2);
+INSERT INTO test (c1, c2) VALUES (1, 2);
+INSERT INTO test (c1, c2) VALUES (1, DEFAULT);
+INSERT INTO test (c1) VALUES (1);
+```
+
+其中第一条、第二条语句是一样的效果。在不指定目标列时,使用表中的列顺序来作为默认的目标列。
+第三条、第四条语句表达的意思是一样的,使用`c2`列的默认值,来完成数据导入。
+
+2. 向`test`表中一次性导入多行数据
+
+```sql
+INSERT INTO test VALUES (1, 2), (3, 2 + 2);
+INSERT INTO test (c1, c2) VALUES (1, 2), (3, 2 * 2);
+INSERT INTO test (c1) VALUES (1), (3);
+INSERT INTO test (c1, c2) VALUES (1, DEFAULT), (3, DEFAULT);
+```
+
+其中第一条、第二条语句效果一样,向`test`表中一次性导入两条数据
+第三条、第四条语句效果已知,使用`c2`列的默认值向`test`表中导入两条数据
+
+3. 向 `test` 表中导入一个查询语句结果
+
+```sql
+INSERT INTO test SELECT * FROM test2;
+INSERT INTO test (c1, c2) SELECT * from test2;
+```
+
+4. 向 `test` 表中导入一个查询语句结果,并指定 partition 和 label
+
+```sql
+INSERT INTO test PARTITION(p1, p2) WITH LABEL `label1` SELECT * FROM test2;
+INSERT INTO test WITH LABEL `label1` (c1, c2) SELECT * from test2;
+```
+
+异步的导入其实是,一个同步的导入封装成了异步。填写 streaming 和不填写的**执行效率是一样**的。
+
+由于Doris之前的导入方式都是异步导入方式,为了兼容旧有的使用习惯,不加 streaming 的 `INSERT` 语句依旧会返回一个 label,用户需要通过`SHOW LOAD`命令查看此`label`导入作业的状态。
+
 ### Keywords
 
     INSERT
diff --git a/new-docs/zh-CN/sql-manual/sql-reference-v2/Data-Manipulation-Statements/Manipulation/UPDATE.md b/new-docs/zh-CN/sql-manual/sql-reference-v2/Data-Manipulation-Statements/Manipulation/UPDATE.md
index 688275b4bb..21160d424d 100644
--- a/new-docs/zh-CN/sql-manual/sql-reference-v2/Data-Manipulation-Statements/Manipulation/UPDATE.md
+++ b/new-docs/zh-CN/sql-manual/sql-reference-v2/Data-Manipulation-Statements/Manipulation/UPDATE.md
@@ -26,10 +26,56 @@ under the License.
 
 ## UPDATE
 
+### Name
+
+UPDATE
+
 ### Description
 
+该语句是为进行对数据进行更新的操作,( update 语句目前仅支持  Unique Key 模型)。
+
+```sql
+UPDATE table_name
+    SET assignment_list
+    WHERE expression
+
+value:
+    {expr | DEFAULT}
+
+assignment:
+    col_name = value
+
+assignment_list:
+    assignment [, assignment] ...
+```
+
+ Parameters
+
++ table_name: 待更新数据的目标表。可以是 'db_name.table_name' 形式
++ assignment_list: 待更新的目标列,形如 'col_name = value, col_name = value' 格式
++ where expression: 期望更新的条件,一个返回 true 或者 false 的表达式即可
+
+ Note
+
+当前 UPDATE 语句仅支持在 Unique 模型上的行更新,存在并发更新导致的数据冲突可能。
+目前 Doris 并不处理这类问题,需要用户从业务侧规避这类问题。
+
 ### Example
 
+`test` 表是一个 unique 模型的表,包含: k1, k2, v1, v2  四个列。其中 k1, k2 是 key,v1, v2 是value,聚合方式是 Replace。
+
+1. 将 'test' 表中满足条件 k1 =1 , k2 =2 的 v1 列更新为 1
+
+```sql
+UPDATE test SET v1 = 1 WHERE k1=1 and k2=2;
+```
+
+2. 将 'test' 表中 k1=1 的列的 v1 列自增1
+
+```sql
+UPDATE test SET v1 = v1+1 WHERE k1=1;
+```
+
 ### Keywords
 
     UPDATE
diff --git a/new-docs/zh-CN/sql-manual/sql-reference-v2/Data-Manipulation-Statements/OUTFILE.md b/new-docs/zh-CN/sql-manual/sql-reference-v2/Data-Manipulation-Statements/OUTFILE.md
index dd8af07b06..be32069492 100644
--- a/new-docs/zh-CN/sql-manual/sql-reference-v2/Data-Manipulation-Statements/OUTFILE.md
+++ b/new-docs/zh-CN/sql-manual/sql-reference-v2/Data-Manipulation-Statements/OUTFILE.md
@@ -25,30 +25,45 @@ under the License.
 -->
 
 # OUTFILE
+
+### Name
+
+OURFILE
+
 ## description
 
-    该语句用于使用 `SELECT INTO OUTFILE` 命令将查询结果的导出为文件。目前支持通过 Broker 进程, 通过 S3 协议, 或直接通过 HDFS 协议,导出到远端存储,如 HDFS,S3,BOS,COS(腾讯云)上。
+该语句用于使用 `SELECT INTO OUTFILE` 命令将查询结果的导出为文件。目前支持通过 Broker 进程, 通过 S3 协议, 或直接通过 HDFS 协议,导出到远端存储,如 HDFS,S3,BOS,COS(腾讯云)上。
     
-    语法:
-        query_stmt
-        INTO OUTFILE "file_path"
-        [format_as]
-        [properties]
+语法:
+
+```
+query_stmt
+INTO OUTFILE "file_path"
+[format_as]
+[properties]
+```
+
+说明:
+
+1. file_path
+   
+    ​    file_path 指向文件存储的路径以及文件前缀。如 `hdfs://path/to/my_file_`。
     
-    1. file_path
-        file_path 指向文件存储的路径以及文件前缀。如 `hdfs://path/to/my_file_`。
-        最终的文件名将由 `my_file_`,文件序号以及文件格式后缀组成。其中文件序号由0开始,数量为文件被分割的数量。如:
-            my_file_abcdefg_0.csv
-            my_file_abcdefg_1.csv
-            my_file_abcdegf_2.csv
+    ​    最终的文件名将由 `my_file_`,文件序号以及文件格式后缀组成。其中文件序号由0开始,数量为文件被分割的数量。如:
     
-    2. format_as
-        FORMAT AS CSV
-        指定导出格式。默认为 CSV。
-
+    ​        my_file_abcdefg_0.csv
+    ​        my_file_abcdefg_1.csv
+    ​        my_file_abcdegf_2.csv
+    
+2. format_as
+   
+    ​    FORMAT AS CSV
 
-    3. properties
-        指定相关属性。目前支持通过 Broker 进程, 或通过 S3 协议进行导出。
+    ​    指定导出格式。默认为 CSV。
+    
+3. properties
+   
+    ​    指定相关属性。目前支持通过 Broker 进程, 或通过 S3 协议进行导出。
     
         语法:
         [PROPERTIES ("key"="value", ...)]
@@ -56,17 +71,17 @@ under the License.
         column_separator: 列分隔符
         line_delimiter: 行分隔符
         max_file_size: 单个文件大小限制,如果结果超过这个值,将切割成多个文件。
-    
+        
         Broker 相关属性需加前缀 `broker.`:
         broker.name: broker名称
         broker.hadoop.security.authentication: 指定认证方式为 kerberos
         broker.kerberos_principal: 指定 kerberos 的 principal
         broker.kerberos_keytab: 指定 kerberos 的 keytab 文件路径。该文件必须为 Broker 进程所在服务器上的文件的绝对路径。并且可以被 Broker 进程访问
-    
+        
         HDFS 相关属性需加前缀 `hdfs.`:
         hdfs.fs.defaultFS: namenode 地址和端口
         hdfs.hdfs_user: hdfs 用户名
-    
+        
         S3 协议则直接执行 S3 协议配置即可:
         AWS_ENDPOINT
         AWS_ACCESS_KEY
@@ -75,7 +90,9 @@ under the License.
 
 ## example
 
-    1. 使用 broker 方式导出,将简单查询结果导出到文件 `hdfs://path/to/result.txt`。指定导出格式为 CSV。使用 `my_broker` 并设置 kerberos 认证信息。指定列分隔符为 `,`,行分隔符为 `\n`。
+1. 使用 broker 方式导出,将简单查询结果导出到文件 `hdfs://path/to/result.txt`。指定导出格式为 CSV。使用 `my_broker` 并设置 kerberos 认证信息。指定列分隔符为 `,`,行分隔符为 `\n`。
+    
+    ```sql
     SELECT * FROM tbl
     INTO OUTFILE "hdfs://path/to/result_"
     FORMAT AS CSV
@@ -89,10 +106,14 @@ under the License.
         "line_delimiter" = "\n",
         "max_file_size" = "100MB"
     );
+    ```
+    
     最终生成文件如如果不大于 100MB,则为:`result_0.csv`。
     如果大于 100MB,则可能为 `result_0.csv, result_1.csv, ...`。
     
-    2. 将简单查询结果导出到文件 `hdfs://path/to/result.parquet`。指定导出格式为 PARQUET。使用 `my_broker` 并设置 kerberos 认证信息。
+2. 将简单查询结果导出到文件 `hdfs://path/to/result.parquet`。指定导出格式为 PARQUET。使用 `my_broker` 并设置 kerberos 认证信息。
+    
+    ```sql
     SELECT c1, c2, c3 FROM tbl
     INTO OUTFILE "hdfs://path/to/result_"
     FORMAT AS PARQUET
@@ -104,9 +125,13 @@ under the License.
         "broker.kerberos_keytab" = "/home/doris/my.keytab",
         "schema"="required,int32,c1;required,byte_array,c2;required,byte_array,c2"
     );
+    ```
+    
     查询结果导出到parquet文件需要明确指定`schema`。
     
-    3. 将 CTE 语句的查询结果导出到文件 `hdfs://path/to/result.txt`。默认导出格式为 CSV。使用 `my_broker` 并设置 hdfs 高可用信息。使用默认的行列分隔符。
+3. 将 CTE 语句的查询结果导出到文件 `hdfs://path/to/result.txt`。默认导出格式为 CSV。使用 `my_broker` 并设置 hdfs 高可用信息。使用默认的行列分隔符。
+    
+    ```sql
     WITH
     x1 AS
     (SELECT k1, k2 FROM tbl1),
@@ -125,11 +150,15 @@ under the License.
         "broker.dfs.namenode.rpc-address.my_ha.my_namenode2" = "nn2_host:rpc_port",
         "broker.dfs.client.failover.proxy.provider" = "org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider"
     );
+    ```
+    
     最终生成文件如如果不大于 1GB,则为:`result_0.csv`。
     如果大于 1GB,则可能为 `result_0.csv, result_1.csv, ...`。
     
-    4. 将 UNION 语句的查询结果导出到文件 `bos://bucket/result.txt`。指定导出格式为 PARQUET。使用 `my_broker` 并设置 hdfs 高可用信息。PARQUET 格式无需指定列分割符。
+4. 将 UNION 语句的查询结果导出到文件 `bos://bucket/result.txt`。指定导出格式为 PARQUET。使用 `my_broker` 并设置 hdfs 高可用信息。PARQUET 格式无需指定列分割符。
     导出完成后,生成一个标识文件。
+    
+    ```sql
     SELECT k1 FROM tbl1 UNION SELECT k2 FROM tbl1
     INTO OUTFILE "bos://bucket/result_"
     FORMAT AS PARQUET
@@ -141,9 +170,12 @@ under the License.
         "broker.bos_secret_accesskey" = "yyyyyyyyyyyyyyyyyyyyyyyyyy",
         "schema"="required,int32,k1;required,byte_array,k2"
     );
+    ```
     
-    5. 将 select 语句的查询结果导出到文件 `cos://${bucket_name}/path/result.txt`。指定导出格式为 csv。
+5. 将 select 语句的查询结果导出到文件 `cos://${bucket_name}/path/result.txt`。指定导出格式为 csv。
     导出完成后,生成一个标识文件。
+    
+    ```sql
     select k1,k2,v1 from tbl1 limit 100000
     into outfile "s3a://my_bucket/export/my_file_"
     FORMAT AS CSV
@@ -158,13 +190,18 @@ under the License.
         "max_file_size" = "1024MB",
         "success_file_name" = "SUCCESS"
     )
+    ```
+    
     最终生成文件如如果不大于 1GB,则为:`my_file_0.csv`。
     如果大于 1GB,则可能为 `my_file_0.csv, result_1.csv, ...`。
     在cos上验证
+    
         1. 不存在的path会自动创建
-        2. access.key/secret.key/endpoint需要和cos的同学确认。尤其是endpoint的值,不需要填写bucket_name。
+            2. access.key/secret.key/endpoint需要和cos的同学确认。尤其是endpoint的值,不需要填写bucket_name。
+    
+6. 使用 s3 协议导出到 bos,并且并发导出开启。
     
-    6. 使用 s3 协议导出到 bos,并且并发导出开启。
+    ```sql
     set enable_parallel_outfile = true;
     select k1 from tb1 limit 1000
     into outfile "s3://my_bucket/export/my_file_"
@@ -176,10 +213,14 @@ under the License.
         "AWS_SECRET_KEY" = "xxx",
         "AWS_REGION" = "bd"
     )
+    ```
+    
     最终生成的文件前缀为 `my_file_{fragment_instance_id}_`。
     
-    7. 使用 s3 协议导出到 bos,并且并发导出 session 变量开启。
+7. 使用 s3 协议导出到 bos,并且并发导出 session 变量开启。
     注意:但由于查询语句带了一个顶层的排序节点,所以这个查询即使开启并发导出的 session 变量,也是无法并发导出的。
+    
+    ```sql
     set enable_parallel_outfile = true;
     select k1 from tb1 order by k1 limit 1000
     into outfile "s3://my_bucket/export/my_file_"
@@ -191,8 +232,11 @@ under the License.
         "AWS_SECRET_KEY" = "xxx",
         "AWS_REGION" = "bd"
     )
+    ```
     
-    8. 使用 hdfs 方式导出,将简单查询结果导出到文件 `hdfs://path/to/result.txt`。指定导出格式为 CSV,用户名为work。指定列分隔符为 `,`,行分隔符为 `\n`。
+8. 使用 hdfs 方式导出,将简单查询结果导出到文件 `hdfs://path/to/result.txt`。指定导出格式为 CSV,用户名为work。指定列分隔符为 `,`,行分隔符为 `\n`。
+    
+    ```sql
     SELECT * FROM tbl
     INTO OUTFILE "hdfs://path/to/result_"
     FORMAT AS CSV
@@ -201,9 +245,30 @@ under the License.
         "hdfs.fs.defaultFS" = "hdfs://ip:port",
         "hdfs.hdfs_user" = "work"
     );
+    ```
+    
     最终生成文件如如果不大于 100MB,则为:`result_0.csv`。
     如果大于 100MB,则可能为 `result_0.csv, result_1.csv, ...`。
 
 ## keyword
-    OUTFILE
+    SELECT, INTO, OUTFILE
+
+### Best Practice
+
+1. 导出数据量和导出效率
+
+   该功能本质上是执行一个 SQL 查询命令。最终的结果是单线程输出的。所以整个导出的耗时包括查询本身的耗时,和最终结果集写出的耗时。如果查询较大,需要设置会话变量 `query_timeout` 适当的延长查询超时时间。
+
+2. 导出文件的管理
+
+   Doris 不会管理导出的文件。包括导出成功的,或者导出失败后残留的文件,都需要用户自行处理。
+
+3. 导出到本地文件
+
+   导出到本地文件的功能不适用于公有云用户,仅适用于私有化部署的用户。并且默认用户对集群节点有完全的控制权限。Doris 对于用户填写的导出路径不会做合法性检查。如果 Doris 的进程用户对该路径无写权限,或路径不存在,则会报错。同时处于安全性考虑,如果该路径已存在同名的文件,则也会导出失败。
+
+   Doris 不会管理导出到本地的文件,也不会检查磁盘空间等。这些文件需要用户自行管理,如清理等。
+
+4. 结果完整性保证
 
+   该命令是一个同步命令,因此有可能在执行过程中任务连接断开了,从而无法活着导出的数据是否正常结束,或是否完整。此时可以使用 `success_file_name` 参数要求任务成功后,在目录下生成一个成功文件标识。用户可以通过这个文件,来判断导出是否正常结束。


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org