You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@doris.apache.org by mo...@apache.org on 2021/09/25 04:25:00 UTC

[incubator-doris] branch master updated: [Bug][Docs]Fix outfile docs for parquet (#6709)

This is an automated email from the ASF dual-hosted git repository.

morningman pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-doris.git


The following commit(s) were added to refs/heads/master by this push:
     new e5a4172  [Bug][Docs]Fix outfile docs for parquet  (#6709)
e5a4172 is described below

commit e5a4172b27d4feac3ced3d082f0c873b2c4f986e
Author: weizuo93 <we...@apache.org>
AuthorDate: Sat Sep 25 12:24:52 2021 +0800

    [Bug][Docs]Fix outfile docs for parquet  (#6709)
    
    Update outfile documents for parquet.
---
 docs/en/administrator-guide/outfile.md    | 36 +++++++++++++++++++++--------
 docs/zh-CN/administrator-guide/outfile.md | 38 +++++++++++++++++++++++--------
 2 files changed, 55 insertions(+), 19 deletions(-)

diff --git a/docs/en/administrator-guide/outfile.md b/docs/en/administrator-guide/outfile.md
index b114bde..e2dd8c2 100644
--- a/docs/en/administrator-guide/outfile.md
+++ b/docs/en/administrator-guide/outfile.md
@@ -88,6 +88,7 @@ INTO OUTFILE "file_path"
     * `column_separator`: Column separator, only applicable to CSV format. The default is `\t`.
     * `line_delimiter`: Line delimiter, only applicable to CSV format. The default is `\n`.
     * `max_file_size`: The max size of a single file. Default is 1GB. Range from 5MB to 2GB. Files exceeding this size will be splitted.
+    * `schema`: schema infomation for PARQUET, only applicable to PARQUET format. If the exported file format is PARQUET, `schema` must be specified.
 
 ## Concurrent export
 
@@ -164,6 +165,26 @@ Planning example for concurrent export:
 
 2. Example 2
 
+    Export simple query results to the file `hdfs:/path/to/result.parquet`. Specify the export format as PARQUET. Use `my_broker` and set kerberos authentication information. 
+    
+    ```
+    SELECT c1, c2, c3 FROM tbl
+    INTO OUTFILE "hdfs:/path/to/result_"
+    FORMAT AS PARQUET
+    PROPERTIES
+    (
+        "broker.name" = "my_broker",
+        "broker.hadoop.security.authentication" = "kerberos",
+        "broker.kerberos_principal" = "doris@YOUR.COM",
+        "broker.kerberos_keytab" = "/home/doris/my.keytab",
+        "schema"="required,int32,c1;required,byte_array,c2;required,byte_array,c2"
+    );
+    ```
+   
+   If the exported file format is PARQUET, `schema` must be specified.
+
+3. Example 3
+
     Export the query result of the CTE statement to the file `hdfs:/path/to/result.txt`. The default export format is CSV. Use `my_broker` and set hdfs high availability information. Use the default column separators and line delimiter.
 
     ```
@@ -191,7 +212,7 @@ Planning example for concurrent export:
     
     If larger than 1GB, may be: `result_0.csv, result_1.csv, ...`.
     
-3. Example 3
+4. Example 4
 
     Export the query results of the UNION statement to the file `bos://bucket/result.parquet`. Specify the export format as PARQUET. Use `my_broker` and set hdfs high availability information. PARQUET format does not need to specify the column separator and line delimiter.
     
@@ -204,15 +225,12 @@ Planning example for concurrent export:
         "broker.name" = "my_broker",
         "broker.bos_endpoint" = "http://bj.bcebos.com",
         "broker.bos_accesskey" = "xxxxxxxxxxxxxxxxxxxxxxxxxx",
-        "broker.bos_secret_accesskey" = "yyyyyyyyyyyyyyyyyyyyyyyyyy"
+        "broker.bos_secret_accesskey" = "yyyyyyyyyyyyyyyyyyyyyyyyyy",
+        "schema"="required,int32,k1;required,byte_array,k2"
     );
     ```
-    
-    If the result is less than 1GB, file will be: `result_0.parquet`.
-    
-    If larger than 1GB, may be: `result_0.parquet, result_1.parquet, ...`.
 
-4. Example 4
+5. Example 5
 
     Export simple query results to the file `cos://${bucket_name}/path/result.txt`. Specify the export format as CSV.
     And create a mark file after export finished.
@@ -242,7 +260,7 @@ Planning example for concurrent export:
     1. Paths that do not exist are automatically created.
     2. These parameters(access.key/secret.key/endpointneed) need to be confirmed with `Tecent Cloud COS`. In particular, the value of endpoint does not need to be filled in bucket_name.
 
-5. Example5
+6. Example 6
 
     Use the s3 protocol to export to bos, and concurrent export is enabled.
 
@@ -262,7 +280,7 @@ Planning example for concurrent export:
 
     The final generated file prefix is `my_file_{fragment_instance_id}_`。
 
-6. Example6
+7. Example 7
 
     Use the s3 protocol to export to bos, and enable concurrent export of session variables.
 
diff --git a/docs/zh-CN/administrator-guide/outfile.md b/docs/zh-CN/administrator-guide/outfile.md
index 2352cf8..762ce21 100644
--- a/docs/zh-CN/administrator-guide/outfile.md
+++ b/docs/zh-CN/administrator-guide/outfile.md
@@ -87,6 +87,7 @@ INTO OUTFILE "file_path"
     * `column_separator`:列分隔符,仅对 CSV 格式适用。默认为 `\t`。
     * `line_delimiter`:行分隔符,仅对 CSV 格式适用。默认为 `\n`。
     * `max_file_size`:单个文件的最大大小。默认为 1GB。取值范围在 5MB 到 2GB 之间。超过这个大小的文件将会被切分。
+    * `schema`:PARQUET 文件schema信息。仅对 PARQUET 格式适用。导出文件格式为PARQUET时,必须指定`schema`。
 
 ## 并发导出
 
@@ -150,7 +151,7 @@ explain select xxx from xxx where xxx  into outfile "s3://xxx" format as csv pro
         "broker.name" = "my_broker",
         "broker.hadoop.security.authentication" = "kerberos",
         "broker.kerberos_principal" = "doris@YOUR.COM",
-        "broker.kerberos_keytab" = "/home/doris/my.keytab"
+        "broker.kerberos_keytab" = "/home/doris/my.keytab",
         "column_separator" = ",",
         "line_delimiter" = "\n",
         "max_file_size" = "100MB"
@@ -163,6 +164,26 @@ explain select xxx from xxx where xxx  into outfile "s3://xxx" format as csv pro
 
 2. 示例2
 
+    将简单查询结果导出到文件 `hdfs:/path/to/result.parquet`。指定导出格式为 PARQUET。使用 `my_broker` 并设置 kerberos 认证信息。
+
+    ```
+    SELECT c1, c2, c3 FROM tbl
+    INTO OUTFILE "hdfs:/path/to/result_"
+    FORMAT AS PARQUET
+    PROPERTIES
+    (
+        "broker.name" = "my_broker",
+        "broker.hadoop.security.authentication" = "kerberos",
+        "broker.kerberos_principal" = "doris@YOUR.COM",
+        "broker.kerberos_keytab" = "/home/doris/my.keytab",
+        "schema"="required,int32,c1;required,byte_array,c2;required,byte_array,c2"
+    );
+    ```
+   
+   查询结果导出到parquet文件需要明确指定`schema`。
+
+3. 示例3
+
     将 CTE 语句的查询结果导出到文件 `hdfs:/path/to/result.txt`。默认导出格式为 CSV。使用 `my_broker` 并设置 hdfs 高可用信息。使用默认的行列分隔符。
 
     ```
@@ -190,7 +211,7 @@ explain select xxx from xxx where xxx  into outfile "s3://xxx" format as csv pro
     
     如果大于 1GB,则可能为 `result_0.csv, result_1.csv, ...`。
     
-3. 示例3
+4. 示例4
 
     将 UNION 语句的查询结果导出到文件 `bos://bucket/result.txt`。指定导出格式为 PARQUET。使用 `my_broker` 并设置 hdfs 高可用信息。PARQUET 格式无需指定列分割符。
     导出完成后,生成一个标识文件。
@@ -204,15 +225,12 @@ explain select xxx from xxx where xxx  into outfile "s3://xxx" format as csv pro
         "broker.name" = "my_broker",
         "broker.bos_endpoint" = "http://bj.bcebos.com",
         "broker.bos_accesskey" = "xxxxxxxxxxxxxxxxxxxxxxxxxx",
-        "broker.bos_secret_accesskey" = "yyyyyyyyyyyyyyyyyyyyyyyyyy"
+        "broker.bos_secret_accesskey" = "yyyyyyyyyyyyyyyyyyyyyyyyyy",
+        "schema"="required,int32,k1;required,byte_array,k2"
     );
     ```
-    
-    最终生成文件如如果不大于 1GB,则为:`result_0.parquet`。
-    
-    如果大于 1GB,则可能为 `result_0.parquet, result_1.parquet, ...`。
 
-4. 示例4
+5. 示例5
 
     将 select 语句的查询结果导出到文件 `cos://${bucket_name}/path/result.txt`。指定导出格式为 csv。
     导出完成后,生成一个标识文件。
@@ -241,7 +259,7 @@ explain select xxx from xxx where xxx  into outfile "s3://xxx" format as csv pro
     1. 不存在的path会自动创建
     2. access.key/secret.key/endpoint需要和cos的同学确认。尤其是endpoint的值,不需要填写bucket_name。
 
-5. 示例5
+6. 示例6
 
     使用 s3 协议导出到 bos,并且并发导出开启。
 
@@ -261,7 +279,7 @@ explain select xxx from xxx where xxx  into outfile "s3://xxx" format as csv pro
 
     最终生成的文件前缀为 `my_file_{fragment_instance_id}_`。
 
-6. 示例6
+7. 示例7
 
     使用 s3 协议导出到 bos,并且并发导出 session 变量开启。
 

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org