You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@doris.apache.org by GitBox <gi...@apache.org> on 2021/12/28 09:48:43 UTC

[GitHub] [incubator-doris] morningman commented on a change in pull request #7497: Add user manual for hdfs load and transaction.

morningman commented on a change in pull request #7497:
URL: https://github.com/apache/incubator-doris/pull/7497#discussion_r775819294



##########
File path: docs/zh-CN/administrator-guide/outfile.md
##########
@@ -138,167 +138,8 @@ explain select xxx from xxx where xxx  into outfile "s3://xxx" format as csv pro
 
 ## 使用示例
 
-1. 示例1
+具体参阅[OUTFILE 文档](../sql-reference/sql-statements/Data Manipulation/OUTFILE.html)。

Review comment:
       ```suggestion
   具体参阅[OUTFILE 文档](../sql-reference/sql-statements/Data%20Manipulation/OUTFILE.md)。
   ```

##########
File path: docs/en/administrator-guide/outfile.md
##########
@@ -139,166 +139,7 @@ Planning example for concurrent export:
 
 ## Usage example
 
-1. Example 1
-
-    Export simple query results to the file `hdfs://path/to/result.txt`. Specify the export format as CSV. Use `my_broker` and set kerberos authentication information. Specify the column separator as `,` and the line delimiter as `\n`.
-    
-    ```
-    SELECT * FROM tbl
-    INTO OUTFILE "hdfs://path/to/result_"
-    FORMAT AS CSV
-    PROPERTIES
-    (
-        "broker.name" = "my_broker",
-        "broker.hadoop.security.authentication" = "kerberos",
-        "broker.kerberos_principal" = "doris@YOUR.COM",
-        "broker.kerberos_keytab" = "/home/doris/my.keytab",
-        "column_separator" = ",",
-        "line_delimiter" = "\n",
-        "max_file_size" = "100MB"
-    );
-    ```
-    
-    If the result is less than 100MB, file will be: `result_0.csv`.
-    
-    If larger than 100MB, may be: `result_0.csv, result_1.csv, ...`.
-
-2. Example 2
-
-    Export simple query results to the file `hdfs://path/to/result.parquet`. Specify the export format as PARQUET. Use `my_broker` and set kerberos authentication information. 
-    
-    ```
-    SELECT c1, c2, c3 FROM tbl
-    INTO OUTFILE "hdfs://path/to/result_"
-    FORMAT AS PARQUET
-    PROPERTIES
-    (
-        "broker.name" = "my_broker",
-        "broker.hadoop.security.authentication" = "kerberos",
-        "broker.kerberos_principal" = "doris@YOUR.COM",
-        "broker.kerberos_keytab" = "/home/doris/my.keytab",
-        "schema"="required,int32,c1;required,byte_array,c2;required,byte_array,c2"
-    );
-    ```
-   
-   If the exported file format is PARQUET, `schema` must be specified.
-
-3. Example 3
-
-    Export the query result of the CTE statement to the file `hdfs://path/to/result.txt`. The default export format is CSV. Use `my_broker` and set hdfs high availability information. Use the default column separators and line delimiter.
-
-    ```
-    WITH
-    x1 AS
-    (SELECT k1, k2 FROM tbl1),
-    x2 AS
-    (SELECT k3 FROM tbl2)
-    SELEC k1 FROM x1 UNION SELECT k3 FROM x2
-    INTO OUTFILE "hdfs://path/to/result_"
-    PROPERTIES
-    (
-        "broker.name" = "my_broker",
-        "broker.username"="user",
-        "broker.password"="passwd",
-        "broker.dfs.nameservices" = "my_ha",
-        "broker.dfs.ha.namenodes.my_ha" = "my_namenode1, my_namenode2",
-        "broker.dfs.namenode.rpc-address.my_ha.my_namenode1" = "nn1_host:rpc_port",
-        "broker.dfs.namenode.rpc-address.my_ha.my_namenode2" = "nn2_host:rpc_port",
-        "broker.dfs.client.failover.proxy.provider" = "org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider"
-    );
-    ```
-    
-    If the result is less than 1GB, file will be: `result_0.csv`.
-    
-    If larger than 1GB, may be: `result_0.csv, result_1.csv, ...`.
-    
-4. Example 4
-
-    Export the query results of the UNION statement to the file `bos://bucket/result.parquet`. Specify the export format as PARQUET. Use `my_broker` and set hdfs high availability information. PARQUET format does not need to specify the column separator and line delimiter.
-    
-    ```
-    SELECT k1 FROM tbl1 UNION SELECT k2 FROM tbl1
-    INTO OUTFILE "bos://bucket/result_"
-    FORMAT AS PARQUET
-    PROPERTIES
-    (
-        "broker.name" = "my_broker",
-        "broker.bos_endpoint" = "http://bj.bcebos.com",
-        "broker.bos_accesskey" = "xxxxxxxxxxxxxxxxxxxxxxxxxx",
-        "broker.bos_secret_accesskey" = "yyyyyyyyyyyyyyyyyyyyyyyyyy",
-        "schema"="required,int32,k1;required,byte_array,k2"
-    );
-    ```
-
-5. Example 5
-
-    Export simple query results to the file `cos://${bucket_name}/path/result.txt`. Specify the export format as CSV.
-    And create a mark file after export finished.
-    
-    ```
-    select k1,k2,v1 from tbl1 limit 100000
-    into outfile "s3a://my_bucket/export/my_file_"
-    FORMAT AS CSV
-    PROPERTIES
-    (
-       "broker.name" = "hdfs_broker",
-       "broker.fs.s3a.access.key" = "xxx",
-       "broker.fs.s3a.secret.key" = "xxxx",
-       "broker.fs.s3a.endpoint" = "https://cos.xxxxxx.myqcloud.com/",
-       "column_separator" = ",",
-       "line_delimiter" = "\n",
-       "max_file_size" = "1024MB",
-       "success_file_name" = "SUCCESS"
-    )
-    ```
-    
-    If the result is less than 1GB, file will be: `my_file_0.csv`.
-    
-    If larger than 1GB, may be: `my_file_0.csv, result_1.csv, ...`.
-    
-    Please Note: 
-    1. Paths that do not exist are automatically created.
-    2. These parameters(access.key/secret.key/endpointneed) need to be confirmed with `Tecent Cloud COS`. In particular, the value of endpoint does not need to be filled in bucket_name.
-
-6. Example 6
-
-    Use the s3 protocol to export to bos, and concurrent export is enabled.
-
-    ```
-    set enable_parallel_outfile = true;
-    select k1 from tb1 limit 1000
-    into outfile "s3://my_bucket/export/my_file_"
-    format as csv
-    properties
-    (
-        "AWS_ENDPOINT" = "http://s3.bd.bcebos.com",
-        "AWS_ACCESS_KEY" = "xxxx",
-        "AWS_SECRET_KEY" = "xxx",
-        "AWS_REGION" = "bd"
-    )
-    ```
-
-    The final generated file prefix is `my_file_{fragment_instance_id}_`。
-
-7. Example 7
-
-    Use the s3 protocol to export to bos, and enable concurrent export of session variables.
-
-    ```
-    set enable_parallel_outfile = true;
-    select k1 from tb1 order by k1 limit 1000
-    into outfile "s3://my_bucket/export/my_file_"
-    format as csv
-    properties
-    (
-        "AWS_ENDPOINT" = "http://s3.bd.bcebos.com",
-        "AWS_ACCESS_KEY" = "xxxx",
-        "AWS_SECRET_KEY" = "xxx",
-        "AWS_REGION" = "bd"
-    )
-    ```
-
-    **But because the query statement has a top-level sorting node, even if the query is enabled for concurrently exported session variables, it cannot be exported concurrently.**
+For details, please refer to [OUTFILE Document](../sql-reference/sql-statements/Data Manipulation/OUTFILE.html).

Review comment:
       ```suggestion
   For details, please refer to [OUTFILE Document](../sql-reference/sql-statements/Data%20Manipulation/OUTFILE.md).
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org