You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@doris.apache.org by mo...@apache.org on 2023/01/28 16:36:38 UTC

[doris] branch master updated: [docs](multi-catalog)update en docs (#16160)

This is an automated email from the ASF dual-hosted git repository.

morningman pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/doris.git


The following commit(s) were added to refs/heads/master by this push:
     new 46ce66cbd8 [docs](multi-catalog)update en docs (#16160)
46ce66cbd8 is described below

commit 46ce66cbd86890aae82ef2eaa97798d67159d2b8
Author: Hu Yanjun <10...@users.noreply.github.com>
AuthorDate: Sun Jan 29 00:36:31 2023 +0800

    [docs](multi-catalog)update en docs (#16160)
---
 docs/en/docs/lakehouse/multi-catalog/dlf.md        |  78 ++++++++++-
 docs/en/docs/lakehouse/multi-catalog/hive.md       | 147 ++++++++++++++++++++-
 docs/en/docs/lakehouse/multi-catalog/hudi.md       |  26 +++-
 docs/en/docs/lakehouse/multi-catalog/iceberg.md    |  49 ++++++-
 .../docs/lakehouse/multi-catalog/multi-catalog.md  |   2 +-
 docs/zh-CN/docs/lakehouse/multi-catalog/hive.md    |  16 +--
 docs/zh-CN/docs/lakehouse/multi-catalog/hudi.md    |   2 +-
 7 files changed, 304 insertions(+), 16 deletions(-)

diff --git a/docs/en/docs/lakehouse/multi-catalog/dlf.md b/docs/en/docs/lakehouse/multi-catalog/dlf.md
index 82bdd1f64d..d533ce943e 100644
--- a/docs/en/docs/lakehouse/multi-catalog/dlf.md
+++ b/docs/en/docs/lakehouse/multi-catalog/dlf.md
@@ -1,6 +1,6 @@
 ---
 {
-    "title": "Aliyun DLF",
+    "title": "Alibaba Cloud DLF",
     "language": "en"
 }
 ---
@@ -25,7 +25,79 @@ under the License.
 -->
 
 
-# Aliyun DLF
+# Alibaba Cloud DLF
+
+Data Lake Formation (DLF) is the unified metadata management service of Alibaba Cloud. It is compatible with the Hive Metastore protocol.
+
+> [What is DLF](https://www.alibabacloud.com/product/datalake-formation)
+
+Doris can access DLF the same way as it accesses Hive Metastore.
+
+## Connect to DLF
+
+1. Create `hive-site.xml`
+
+   Create the  `hive-site.xml` file, and put it in the `fe/conf`  directory.
+
+   ```
+   <?xml version="1.0"?>
+   <configuration>
+       <!--Set to use dlf client-->
+       <property>
+           <name>hive.metastore.type</name>
+           <value>dlf</value>
+       </property>
+       <property>
+           <name>dlf.catalog.endpoint</name>
+           <value>dlf-vpc.cn-beijing.aliyuncs.com</value>
+       </property>
+       <property>
+           <name>dlf.catalog.region</name>
+           <value>cn-beijing</value>
+       </property>
+       <property>
+           <name>dlf.catalog.proxyMode</name>
+           <value>DLF_ONLY</value>
+       </property>
+       <property>
+           <name>dlf.catalog.uid</name>
+           <value>20000000000000000</value>
+       </property>
+       <property>
+           <name>dlf.catalog.accessKeyId</name>
+           <value>XXXXXXXXXXXXXXX</value>
+       </property>
+       <property>
+           <name>dlf.catalog.accessKeySecret</name>
+           <value>XXXXXXXXXXXXXXXXX</value>
+       </property>
+   </configuration>
+   ```
+
+   * `dlf.catalog.endpoint`: DLF Endpoint. See [Regions and Endpoints of DLF](https://www.alibabacloud.com/help/en/data-lake-formation/latest/regions-and-endpoints).
+   * `dlf.catalog.region`: DLF Region. See [Regions and Endpoints of DLF](https://www.alibabacloud.com/help/en/data-lake-formation/latest/regions-and-endpoints).
+   * `dlf.catalog.uid`: Alibaba Cloud account. You can find the "Account ID" in the upper right corner on the Alibaba Cloud console. 
+   * `dlf.catalog.accessKeyId`:AccessKey, which you can create and manage on the [Alibaba Cloud console](https://ram.console.aliyun.com/manage/ak).
+   * `dlf.catalog.accessKeySecret`:SecretKey, which you can create and manage on the [Alibaba Cloud console](https://ram.console.aliyun.com/manage/ak).
+
+   Other configuration items are fixed and require no modifications.
+
+2. Restart FE, and create Catalog via the `CREATE CATALOG`  statement.
+
+   Doris will read and parse  `fe/conf/hive-site.xml`.
+
+   ```sql
+   CREATE CATALOG hive_with_dlf PROPERTIES (
+       "type"="hms",
+       "hive.metastore.uris" = "thrift://127.0.0.1:9083"
+   )
+   ```
+
+    `type`  should always be  `hms`; while  `hive.metastore.uris` can be arbitary since it is not used in real practice, but it should follow the format of Hive Metastore Thrift URI.
+
+   After the above steps, you can access metadata in DLF the same way as you access Hive MetaStore.
+
+   Doris supports accessing Hive/Iceberg/Hudi metadata in DLF.
+
 
-TODO: translate
 
diff --git a/docs/en/docs/lakehouse/multi-catalog/hive.md b/docs/en/docs/lakehouse/multi-catalog/hive.md
index fd3bfd8191..18ae073160 100644
--- a/docs/en/docs/lakehouse/multi-catalog/hive.md
+++ b/docs/en/docs/lakehouse/multi-catalog/hive.md
@@ -26,4 +26,149 @@ under the License.
 
 # Hive
 
-TODO: translate
+Once Doris is connected to Hive Metastore or made compatible with Hive Metastore metadata service, it can access databases and tables in Hive and conduct queries.
+
+Besides Hive, many other systems, such as Iceberg and Hudi, use Hive Metastore to keep their metadata. Thus, Doris can also access these systems via Hive Catalog. 
+
+## Usage
+
+When connnecting to Hive, Doris:
+
+1. Supports Hive version 1/2/3;
+2. Supports both Managed Table and External Table;
+3. Can identify metadata of Hive, Iceberg, and Hudi stored in Hive Metastore;
+4. Supports Hive tables with data stored in JuiceFS, which can be used the same way as normal Hive tables (put `juicefs-hadoop-x.x.x.jar` in `fe/lib/` and `apache_hdfs_broker/lib/`).
+
+## Create Catalog
+
+```sql
+CREATE CATALOG hive PROPERTIES (
+    'type'='hms',
+    'hive.metastore.uris' = 'thrift://172.21.0.1:7004',
+    'hadoop.username' = 'hive',
+    'dfs.nameservices'='your-nameservice',
+    'dfs.ha.namenodes.your-nameservice'='nn1,nn2',
+    'dfs.namenode.rpc-address.your-nameservice.nn1'='172.21.0.2:4007',
+    'dfs.namenode.rpc-address.your-nameservice.nn2'='172.21.0.3:4007',
+    'dfs.client.failover.proxy.provider.your-nameservice'='org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider'
+);
+```
+
+ In addition to `type` and  `hive.metastore.uris` , which are required, you can specify other parameters regarding the connection.
+	
+For example, to specify HDFS HA:
+
+```sql
+CREATE CATALOG hive PROPERTIES (
+    'type'='hms',
+    'hive.metastore.uris' = 'thrift://172.21.0.1:7004',
+    'hadoop.username' = 'hive',
+    'dfs.nameservices'='your-nameservice',
+    'dfs.ha.namenodes.your-nameservice'='nn1,nn2',
+    'dfs.namenode.rpc-address.your-nameservice.nn1'='172.21.0.2:4007',
+    'dfs.namenode.rpc-address.your-nameservice.nn2'='172.21.0.3:4007',
+    'dfs.client.failover.proxy.provider.your-nameservice'='org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider'
+);
+```
+
+To specify HDFS HA and Kerberos authentication information:
+
+```sql
+CREATE CATALOG hive PROPERTIES (
+    'type'='hms',
+    'hive.metastore.uris' = 'thrift://172.21.0.1:7004',
+    'hive.metastore.sasl.enabled' = 'true',
+    'dfs.nameservices'='your-nameservice',
+    'dfs.namenode.rpc-address.your-nameservice.nn1'='172.21.0.2:4007',
+    'dfs.namenode.rpc-address.your-nameservice.nn2'='172.21.0.3:4007',
+    'dfs.client.failover.proxy.provider.your-nameservice'='org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider',
+    'hadoop.security.authentication' = 'kerberos',
+    'hadoop.kerberos.keytab' = '/your-keytab-filepath/your.keytab',   
+    'hadoop.kerberos.principal' = 'your-principal@YOUR.COM',
+    'yarn.resourcemanager.address' = 'your-rm-address:your-rm-port',    
+    'yarn.resourcemanager.principal' = 'your-rm-principal/_HOST@YOUR.COM'
+);
+```
+
+To provide Hadoop KMS encrypted transmission information:
+
+```sql
+CREATE CATALOG hive PROPERTIES (
+    'type'='hms',
+    'hive.metastore.uris' = 'thrift://172.21.0.1:7004',
+    'dfs.encryption.key.provider.uri' = 'kms://http@kms_host:kms_port/kms'
+);
+```
+
+Or to connect to Hive data stored in JuiceFS:
+
+```sql
+CREATE CATALOG hive PROPERTIES (
+    'type'='hms',
+    'hive.metastore.uris' = 'thrift://172.21.0.1:7004',
+    'hadoop.username' = 'root',
+    'fs.jfs.impl' = 'io.juicefs.JuiceFileSystem',
+    'fs.AbstractFileSystem.jfs.impl' = 'io.juicefs.JuiceFS',
+    'juicefs.meta' = 'xxx'
+);
+```
+
+In Doris 1.2.1 and newer, you can create a Resource that contains all these parameters, and reuse the Resource when creating new Catalogs. Here is an example:
+
+```sql
+# 1. Create Resource
+CREATE RESOURCE hms_resource PROPERTIES (
+    'type'='hms',
+    'hive.metastore.uris' = 'thrift://172.21.0.1:7004',
+    'hadoop.username' = 'hive',
+    'dfs.nameservices'='your-nameservice',
+    'dfs.ha.namenodes.your-nameservice'='nn1,nn2',
+    'dfs.namenode.rpc-address.your-nameservice.nn1'='172.21.0.2:4007',
+    'dfs.namenode.rpc-address.your-nameservice.nn2'='172.21.0.3:4007',
+    'dfs.client.failover.proxy.provider.your-nameservice'='org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider'
+);
+	
+# 2. Create Catalog and use an existing Resource. The key and value information in the followings will overwrite the corresponding information in the Resource.
+CREATE CATALOG hive WITH RESOURCE hms_resource PROPERTIES(
+    'key' = 'value'
+);
+```
+
+You can also put the `hive-site.xml` file in the `conf`  directories of FE and BE. This will enable Doris to automatically read information from `hive-site.xml`. The relevant information will be overwritten based on the following rules :
+	
+
+* Information in Resource will overwrite that in  `hive-site.xml`. 
+* Information in `CREATE CATALOG PROPERTIES` will overwrite that in Resource.
+
+### Hive Versions
+
+Doris can access Hive Metastore in all Hive versions. By default, Doris uses the interface compatible with Hive 2.3 to access Hive Metastore. You can specify a certain Hive version when creating Catalogs, for example:
+
+```sql 
+CREATE CATALOG hive PROPERTIES (
+    'type'='hms',
+    'hive.metastore.uris' = 'thrift://172.21.0.1:7004',
+    'hive.version' = '1.1.0'
+);
+```
+
+## Column Type Mapping
+
+This is applicable for Hive/Iceberge/Hudi.
+
+| HMS Type      | Doris Type    | Comment                                           |
+| ------------- | ------------- | ------------------------------------------------- |
+| boolean       | boolean       |                                                   |
+| tinyint       | tinyint       |                                                   |
+| smallint      | smallint      |                                                   |
+| int           | int           |                                                   |
+| bigint        | bigint        |                                                   |
+| date          | date          |                                                   |
+| timestamp     | datetime      |                                                   |
+| float         | float         |                                                   |
+| double        | double        |                                                   |
+| char          | char          |                                                   |
+| varchar       | varchar       |                                                   |
+| decimal       | decimal       |                                                   |
+| `array<type>` | `array<type>` | Support nested array, such as `array<array<int>>` |
+| other         | unsupported   |                                                   |
diff --git a/docs/en/docs/lakehouse/multi-catalog/hudi.md b/docs/en/docs/lakehouse/multi-catalog/hudi.md
index 79e351f994..21f093dcde 100644
--- a/docs/en/docs/lakehouse/multi-catalog/hudi.md
+++ b/docs/en/docs/lakehouse/multi-catalog/hudi.md
@@ -27,4 +27,28 @@ under the License.
 
 # Hudi
 
-TODO: translate
+## Usage
+
+1. Currently, Doris supports Snapshot Query on Copy-on-Write Hudi tables and Read Optimized Query on Merge-on-Read tables. In the future, it will support Snapshot Query on Merge-on-Read tables and Incremental Query.
+2. Doris only supports Hive Metastore Catalogs currently. The usage is basically the same as that of Hive Catalogs. More types of Catalogs will be supported in future versions.
+
+## Create Catalog
+
+Same as creating Hive Catalogs. A simple example is provided here. See [Hive](./hive) for more information.
+
+```sql
+CREATE CATALOG hudi PROPERTIES (
+    'type'='hms',
+    'hive.metastore.uris' = 'thrift://172.21.0.1:7004',
+    'hadoop.username' = 'hive',
+    'dfs.nameservices'='your-nameservice',
+    'dfs.ha.namenodes.your-nameservice'='nn1,nn2',
+    'dfs.namenode.rpc-address.your-nameservice.nn1'='172.21.0.2:4007',
+    'dfs.namenode.rpc-address.your-nameservice.nn2'='172.21.0.3:4007',
+    'dfs.client.failover.proxy.provider.your-nameservice'='org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider'
+);
+```
+
+## Column Type Mapping
+
+Same as that in Hive Catalogs. See the relevant section in [Hive](./hive).
diff --git a/docs/en/docs/lakehouse/multi-catalog/iceberg.md b/docs/en/docs/lakehouse/multi-catalog/iceberg.md
index bff7672543..67ce750066 100644
--- a/docs/en/docs/lakehouse/multi-catalog/iceberg.md
+++ b/docs/en/docs/lakehouse/multi-catalog/iceberg.md
@@ -27,4 +27,51 @@ under the License.
 
 # Iceberg
 
-TODO: translate
+## Usage
+
+When connecting to Iceberg, Doris:
+
+1. Supports Iceberg V1/V2 table formats;
+2. Supports Position Delete but not Equality Delete for V2 format;
+3. Only supports Hive Metastore Catalogs. The usage is the same as that of Hive Catalogs.
+
+## Create Catalog
+
+Same as creating Hive Catalogs. A simple example is provided here. See [Hive](./hive) for more information.
+
+```sql
+CREATE CATALOG iceberg PROPERTIES (
+    'type'='hms',
+    'hive.metastore.uris' = 'thrift://172.21.0.1:7004',
+    'hadoop.username' = 'hive',
+    'dfs.nameservices'='your-nameservice',
+    'dfs.ha.namenodes.your-nameservice'='nn1,nn2',
+    'dfs.namenode.rpc-address.your-nameservice.nn1'='172.21.0.2:4007',
+    'dfs.namenode.rpc-address.your-nameservice.nn2'='172.21.0.3:4007',
+    'dfs.client.failover.proxy.provider.your-nameservice'='org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider'
+);
+```
+
+## Column Type Mapping
+
+Same as that in Hive Catalogs. See the relevant section in [Hive](./hive).
+
+## Time Travel
+
+<version since="dev">
+
+Doris supports reading the specified Snapshot of Iceberg tables.
+
+</version>
+
+Each write operation to an Iceberg table will generate a new Snapshot.
+
+By default, a read request will only read the latest Snapshot.
+
+You can read data of historical table versions using the  `FOR TIME AS OF`  or  `FOR VERSION AS OF`  statements based on the Snapshot ID or the timepoint the Snapshot is generated. For example:
+
+`SELECT * FROM iceberg_tbl FOR TIME AS OF "2022-10-07 17:20:37";`
+
+`SELECT * FROM iceberg_tbl FOR VERSION AS OF 868895038966572;`
+
+You can use the [iceberg_meta](https://doris.apache.org/docs/dev/sql-manual/sql-functions/table-functions/iceberg_meta/) table function to view the Snapshot details of the specified table.
diff --git a/docs/en/docs/lakehouse/multi-catalog/multi-catalog.md b/docs/en/docs/lakehouse/multi-catalog/multi-catalog.md
index 5118a62509..61dc900978 100644
--- a/docs/en/docs/lakehouse/multi-catalog/multi-catalog.md
+++ b/docs/en/docs/lakehouse/multi-catalog/multi-catalog.md
@@ -261,7 +261,7 @@ See [Hudi](./hudi)
 
 ### Connect to Elasticsearch
 
-See [Elasticsearch](./elasticsearch)
+See [Elasticsearch](./es)
 
 ### Connect to JDBC
 
diff --git a/docs/zh-CN/docs/lakehouse/multi-catalog/hive.md b/docs/zh-CN/docs/lakehouse/multi-catalog/hive.md
index 50fc541ada..aa9a7bc53d 100644
--- a/docs/zh-CN/docs/lakehouse/multi-catalog/hive.md
+++ b/docs/zh-CN/docs/lakehouse/multi-catalog/hive.md
@@ -28,7 +28,7 @@ under the License.
 
 通过连接 Hive Metastore,或者兼容 Hive Metatore 的元数据服务,Doris 可以自动获取 Hive 的库表信息,并进行数据查询。
 
-除了 Hive 外,很多其他系统也会使用 Hive Metastore 存储元数据。所以通过 Hive Catalog,我们不仅能方位 Hive,也能访问使用 Hive Metastore 作为元数据存储的系统。如 Iceberg、Hudi 等。
+除了 Hive 外,很多其他系统也会使用 Hive Metastore 存储元数据。所以通过 Hive Catalog,我们不仅能访问 Hive,也能访问使用 Hive Metastore 作为元数据存储的系统。如 Iceberg、Hudi 等。
 
 ## 使用限制
 
@@ -38,7 +38,7 @@ under the License.
 4. 支持数据存储在 Juicefs 上的 hive 表,用法如下(需要把juicefs-hadoop-x.x.x.jar放在 fe/lib/ 和 apache_hdfs_broker/lib/ 下)。
 
 ## 创建 Catalog
-	
+
 ```sql
 CREATE CATALOG hive PROPERTIES (
     'type'='hms',
@@ -51,7 +51,7 @@ CREATE CATALOG hive PROPERTIES (
     'dfs.client.failover.proxy.provider.your-nameservice'='org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider'
 );
 ```
-	
+
 除了 `type` 和 `hive.metastore.uris` 两个必须参数外,还可以通过更多参数来传递连接所需要的信息。
 	
 如提供 HDFS HA 信息,示例如下:
@@ -68,7 +68,7 @@ CREATE CATALOG hive PROPERTIES (
     'dfs.client.failover.proxy.provider.your-nameservice'='org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider'
 );
 ```
-	
+
 同时提供 HDFS HA 信息和 Kerberos 认证信息,示例如下:
 	
 ```sql
@@ -87,7 +87,7 @@ CREATE CATALOG hive PROPERTIES (
     'yarn.resourcemanager.principal' = 'your-rm-principal/_HOST@YOUR.COM'
 );
 ```
-	
+
 提供 Hadoop KMS 加密传输信息,示例如下:
 	
 ```sql
@@ -110,7 +110,7 @@ CREATE CATALOG hive PROPERTIES (
     'juicefs.meta' = 'xxx'
 );
 ```
-	
+
 在 1.2.1 版本之后,我们也可以将这些信息通过创建一个 Resource 统一存储,然后在创建 Catalog 时使用这个 Resource。示例如下:
 	
 ```sql
@@ -126,12 +126,12 @@ CREATE RESOURCE hms_resource PROPERTIES (
     'dfs.client.failover.proxy.provider.your-nameservice'='org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider'
 );
 	
-# 2. 创建 Catalog 并使用 Resource,这里的 Key Value 信息回覆盖 Resource 中的信息。
+# 2. 创建 Catalog 并使用 Resource,这里的 Key Value 信息会覆盖 Resource 中的信息。
 CREATE CATALOG hive WITH RESOURCE hms_resource PROPERTIES(
 	'key' = 'value'
 );
 ```
-	
+
 我们也可以直接将 hive-site.xml 放到 FE 和 BE 的 conf 目录下,系统也会自动读取 hive-site.xml 中的信息。信息覆盖的规则如下:
 	
 * Resource 中的信息覆盖 hive-site.xml 中的信息。
diff --git a/docs/zh-CN/docs/lakehouse/multi-catalog/hudi.md b/docs/zh-CN/docs/lakehouse/multi-catalog/hudi.md
index 0f958a813e..5de988a9b1 100644
--- a/docs/zh-CN/docs/lakehouse/multi-catalog/hudi.md
+++ b/docs/zh-CN/docs/lakehouse/multi-catalog/hudi.md
@@ -37,7 +37,7 @@ under the License.
 和 Hive Catalog 基本一致,这里仅给出简单示例。其他示例可参阅 [Hive Catalog](./hive)。
 
 ```sql
-CREATE CATALOG iceberg PROPERTIES (
+CREATE CATALOG hudi PROPERTIES (
     'type'='hms',
     'hive.metastore.uris' = 'thrift://172.21.0.1:7004',
     'hadoop.username' = 'hive',


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org