You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@doris.apache.org by mo...@apache.org on 2023/01/28 16:36:38 UTC
[doris] branch master updated: [docs](multi-catalog)update en docs (#16160)
This is an automated email from the ASF dual-hosted git repository.
morningman pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/doris.git
The following commit(s) were added to refs/heads/master by this push:
new 46ce66cbd8 [docs](multi-catalog)update en docs (#16160)
46ce66cbd8 is described below
commit 46ce66cbd86890aae82ef2eaa97798d67159d2b8
Author: Hu Yanjun <10...@users.noreply.github.com>
AuthorDate: Sun Jan 29 00:36:31 2023 +0800
[docs](multi-catalog)update en docs (#16160)
---
docs/en/docs/lakehouse/multi-catalog/dlf.md | 78 ++++++++++-
docs/en/docs/lakehouse/multi-catalog/hive.md | 147 ++++++++++++++++++++-
docs/en/docs/lakehouse/multi-catalog/hudi.md | 26 +++-
docs/en/docs/lakehouse/multi-catalog/iceberg.md | 49 ++++++-
.../docs/lakehouse/multi-catalog/multi-catalog.md | 2 +-
docs/zh-CN/docs/lakehouse/multi-catalog/hive.md | 16 +--
docs/zh-CN/docs/lakehouse/multi-catalog/hudi.md | 2 +-
7 files changed, 304 insertions(+), 16 deletions(-)
diff --git a/docs/en/docs/lakehouse/multi-catalog/dlf.md b/docs/en/docs/lakehouse/multi-catalog/dlf.md
index 82bdd1f64d..d533ce943e 100644
--- a/docs/en/docs/lakehouse/multi-catalog/dlf.md
+++ b/docs/en/docs/lakehouse/multi-catalog/dlf.md
@@ -1,6 +1,6 @@
---
{
- "title": "Aliyun DLF",
+ "title": "Alibaba Cloud DLF",
"language": "en"
}
---
@@ -25,7 +25,79 @@ under the License.
-->
-# Aliyun DLF
+# Alibaba Cloud DLF
+
+Data Lake Formation (DLF) is the unified metadata management service of Alibaba Cloud. It is compatible with the Hive Metastore protocol.
+
+> [What is DLF](https://www.alibabacloud.com/product/datalake-formation)
+
+Doris can access DLF the same way as it accesses Hive Metastore.
+
+## Connect to DLF
+
+1. Create `hive-site.xml`
+
+ Create the `hive-site.xml` file, and put it in the `fe/conf` directory.
+
+ ```
+ <?xml version="1.0"?>
+ <configuration>
+ <!--Set to use dlf client-->
+ <property>
+ <name>hive.metastore.type</name>
+ <value>dlf</value>
+ </property>
+ <property>
+ <name>dlf.catalog.endpoint</name>
+ <value>dlf-vpc.cn-beijing.aliyuncs.com</value>
+ </property>
+ <property>
+ <name>dlf.catalog.region</name>
+ <value>cn-beijing</value>
+ </property>
+ <property>
+ <name>dlf.catalog.proxyMode</name>
+ <value>DLF_ONLY</value>
+ </property>
+ <property>
+ <name>dlf.catalog.uid</name>
+ <value>20000000000000000</value>
+ </property>
+ <property>
+ <name>dlf.catalog.accessKeyId</name>
+ <value>XXXXXXXXXXXXXXX</value>
+ </property>
+ <property>
+ <name>dlf.catalog.accessKeySecret</name>
+ <value>XXXXXXXXXXXXXXXXX</value>
+ </property>
+ </configuration>
+ ```
+
+ * `dlf.catalog.endpoint`: DLF Endpoint. See [Regions and Endpoints of DLF](https://www.alibabacloud.com/help/en/data-lake-formation/latest/regions-and-endpoints).
+ * `dlf.catalog.region`: DLF Region. See [Regions and Endpoints of DLF](https://www.alibabacloud.com/help/en/data-lake-formation/latest/regions-and-endpoints).
+ * `dlf.catalog.uid`: Alibaba Cloud account. You can find the "Account ID" in the upper right corner on the Alibaba Cloud console.
+ * `dlf.catalog.accessKeyId`:AccessKey, which you can create and manage on the [Alibaba Cloud console](https://ram.console.aliyun.com/manage/ak).
+ * `dlf.catalog.accessKeySecret`:SecretKey, which you can create and manage on the [Alibaba Cloud console](https://ram.console.aliyun.com/manage/ak).
+
+ Other configuration items are fixed and require no modifications.
+
+2. Restart FE, and create Catalog via the `CREATE CATALOG` statement.
+
+ Doris will read and parse `fe/conf/hive-site.xml`.
+
+ ```sql
+ CREATE CATALOG hive_with_dlf PROPERTIES (
+ "type"="hms",
+ "hive.metastore.uris" = "thrift://127.0.0.1:9083"
+ )
+ ```
+
+ `type` should always be `hms`; while `hive.metastore.uris` can be arbitary since it is not used in real practice, but it should follow the format of Hive Metastore Thrift URI.
+
+ After the above steps, you can access metadata in DLF the same way as you access Hive MetaStore.
+
+ Doris supports accessing Hive/Iceberg/Hudi metadata in DLF.
+
-TODO: translate
diff --git a/docs/en/docs/lakehouse/multi-catalog/hive.md b/docs/en/docs/lakehouse/multi-catalog/hive.md
index fd3bfd8191..18ae073160 100644
--- a/docs/en/docs/lakehouse/multi-catalog/hive.md
+++ b/docs/en/docs/lakehouse/multi-catalog/hive.md
@@ -26,4 +26,149 @@ under the License.
# Hive
-TODO: translate
+Once Doris is connected to Hive Metastore or made compatible with Hive Metastore metadata service, it can access databases and tables in Hive and conduct queries.
+
+Besides Hive, many other systems, such as Iceberg and Hudi, use Hive Metastore to keep their metadata. Thus, Doris can also access these systems via Hive Catalog.
+
+## Usage
+
+When connnecting to Hive, Doris:
+
+1. Supports Hive version 1/2/3;
+2. Supports both Managed Table and External Table;
+3. Can identify metadata of Hive, Iceberg, and Hudi stored in Hive Metastore;
+4. Supports Hive tables with data stored in JuiceFS, which can be used the same way as normal Hive tables (put `juicefs-hadoop-x.x.x.jar` in `fe/lib/` and `apache_hdfs_broker/lib/`).
+
+## Create Catalog
+
+```sql
+CREATE CATALOG hive PROPERTIES (
+ 'type'='hms',
+ 'hive.metastore.uris' = 'thrift://172.21.0.1:7004',
+ 'hadoop.username' = 'hive',
+ 'dfs.nameservices'='your-nameservice',
+ 'dfs.ha.namenodes.your-nameservice'='nn1,nn2',
+ 'dfs.namenode.rpc-address.your-nameservice.nn1'='172.21.0.2:4007',
+ 'dfs.namenode.rpc-address.your-nameservice.nn2'='172.21.0.3:4007',
+ 'dfs.client.failover.proxy.provider.your-nameservice'='org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider'
+);
+```
+
+ In addition to `type` and `hive.metastore.uris` , which are required, you can specify other parameters regarding the connection.
+
+For example, to specify HDFS HA:
+
+```sql
+CREATE CATALOG hive PROPERTIES (
+ 'type'='hms',
+ 'hive.metastore.uris' = 'thrift://172.21.0.1:7004',
+ 'hadoop.username' = 'hive',
+ 'dfs.nameservices'='your-nameservice',
+ 'dfs.ha.namenodes.your-nameservice'='nn1,nn2',
+ 'dfs.namenode.rpc-address.your-nameservice.nn1'='172.21.0.2:4007',
+ 'dfs.namenode.rpc-address.your-nameservice.nn2'='172.21.0.3:4007',
+ 'dfs.client.failover.proxy.provider.your-nameservice'='org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider'
+);
+```
+
+To specify HDFS HA and Kerberos authentication information:
+
+```sql
+CREATE CATALOG hive PROPERTIES (
+ 'type'='hms',
+ 'hive.metastore.uris' = 'thrift://172.21.0.1:7004',
+ 'hive.metastore.sasl.enabled' = 'true',
+ 'dfs.nameservices'='your-nameservice',
+ 'dfs.namenode.rpc-address.your-nameservice.nn1'='172.21.0.2:4007',
+ 'dfs.namenode.rpc-address.your-nameservice.nn2'='172.21.0.3:4007',
+ 'dfs.client.failover.proxy.provider.your-nameservice'='org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider',
+ 'hadoop.security.authentication' = 'kerberos',
+ 'hadoop.kerberos.keytab' = '/your-keytab-filepath/your.keytab',
+ 'hadoop.kerberos.principal' = 'your-principal@YOUR.COM',
+ 'yarn.resourcemanager.address' = 'your-rm-address:your-rm-port',
+ 'yarn.resourcemanager.principal' = 'your-rm-principal/_HOST@YOUR.COM'
+);
+```
+
+To provide Hadoop KMS encrypted transmission information:
+
+```sql
+CREATE CATALOG hive PROPERTIES (
+ 'type'='hms',
+ 'hive.metastore.uris' = 'thrift://172.21.0.1:7004',
+ 'dfs.encryption.key.provider.uri' = 'kms://http@kms_host:kms_port/kms'
+);
+```
+
+Or to connect to Hive data stored in JuiceFS:
+
+```sql
+CREATE CATALOG hive PROPERTIES (
+ 'type'='hms',
+ 'hive.metastore.uris' = 'thrift://172.21.0.1:7004',
+ 'hadoop.username' = 'root',
+ 'fs.jfs.impl' = 'io.juicefs.JuiceFileSystem',
+ 'fs.AbstractFileSystem.jfs.impl' = 'io.juicefs.JuiceFS',
+ 'juicefs.meta' = 'xxx'
+);
+```
+
+In Doris 1.2.1 and newer, you can create a Resource that contains all these parameters, and reuse the Resource when creating new Catalogs. Here is an example:
+
+```sql
+# 1. Create Resource
+CREATE RESOURCE hms_resource PROPERTIES (
+ 'type'='hms',
+ 'hive.metastore.uris' = 'thrift://172.21.0.1:7004',
+ 'hadoop.username' = 'hive',
+ 'dfs.nameservices'='your-nameservice',
+ 'dfs.ha.namenodes.your-nameservice'='nn1,nn2',
+ 'dfs.namenode.rpc-address.your-nameservice.nn1'='172.21.0.2:4007',
+ 'dfs.namenode.rpc-address.your-nameservice.nn2'='172.21.0.3:4007',
+ 'dfs.client.failover.proxy.provider.your-nameservice'='org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider'
+);
+
+# 2. Create Catalog and use an existing Resource. The key and value information in the followings will overwrite the corresponding information in the Resource.
+CREATE CATALOG hive WITH RESOURCE hms_resource PROPERTIES(
+ 'key' = 'value'
+);
+```
+
+You can also put the `hive-site.xml` file in the `conf` directories of FE and BE. This will enable Doris to automatically read information from `hive-site.xml`. The relevant information will be overwritten based on the following rules :
+
+
+* Information in Resource will overwrite that in `hive-site.xml`.
+* Information in `CREATE CATALOG PROPERTIES` will overwrite that in Resource.
+
+### Hive Versions
+
+Doris can access Hive Metastore in all Hive versions. By default, Doris uses the interface compatible with Hive 2.3 to access Hive Metastore. You can specify a certain Hive version when creating Catalogs, for example:
+
+```sql
+CREATE CATALOG hive PROPERTIES (
+ 'type'='hms',
+ 'hive.metastore.uris' = 'thrift://172.21.0.1:7004',
+ 'hive.version' = '1.1.0'
+);
+```
+
+## Column Type Mapping
+
+This is applicable for Hive/Iceberge/Hudi.
+
+| HMS Type | Doris Type | Comment |
+| ------------- | ------------- | ------------------------------------------------- |
+| boolean | boolean | |
+| tinyint | tinyint | |
+| smallint | smallint | |
+| int | int | |
+| bigint | bigint | |
+| date | date | |
+| timestamp | datetime | |
+| float | float | |
+| double | double | |
+| char | char | |
+| varchar | varchar | |
+| decimal | decimal | |
+| `array<type>` | `array<type>` | Support nested array, such as `array<array<int>>` |
+| other | unsupported | |
diff --git a/docs/en/docs/lakehouse/multi-catalog/hudi.md b/docs/en/docs/lakehouse/multi-catalog/hudi.md
index 79e351f994..21f093dcde 100644
--- a/docs/en/docs/lakehouse/multi-catalog/hudi.md
+++ b/docs/en/docs/lakehouse/multi-catalog/hudi.md
@@ -27,4 +27,28 @@ under the License.
# Hudi
-TODO: translate
+## Usage
+
+1. Currently, Doris supports Snapshot Query on Copy-on-Write Hudi tables and Read Optimized Query on Merge-on-Read tables. In the future, it will support Snapshot Query on Merge-on-Read tables and Incremental Query.
+2. Doris only supports Hive Metastore Catalogs currently. The usage is basically the same as that of Hive Catalogs. More types of Catalogs will be supported in future versions.
+
+## Create Catalog
+
+Same as creating Hive Catalogs. A simple example is provided here. See [Hive](./hive) for more information.
+
+```sql
+CREATE CATALOG hudi PROPERTIES (
+ 'type'='hms',
+ 'hive.metastore.uris' = 'thrift://172.21.0.1:7004',
+ 'hadoop.username' = 'hive',
+ 'dfs.nameservices'='your-nameservice',
+ 'dfs.ha.namenodes.your-nameservice'='nn1,nn2',
+ 'dfs.namenode.rpc-address.your-nameservice.nn1'='172.21.0.2:4007',
+ 'dfs.namenode.rpc-address.your-nameservice.nn2'='172.21.0.3:4007',
+ 'dfs.client.failover.proxy.provider.your-nameservice'='org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider'
+);
+```
+
+## Column Type Mapping
+
+Same as that in Hive Catalogs. See the relevant section in [Hive](./hive).
diff --git a/docs/en/docs/lakehouse/multi-catalog/iceberg.md b/docs/en/docs/lakehouse/multi-catalog/iceberg.md
index bff7672543..67ce750066 100644
--- a/docs/en/docs/lakehouse/multi-catalog/iceberg.md
+++ b/docs/en/docs/lakehouse/multi-catalog/iceberg.md
@@ -27,4 +27,51 @@ under the License.
# Iceberg
-TODO: translate
+## Usage
+
+When connecting to Iceberg, Doris:
+
+1. Supports Iceberg V1/V2 table formats;
+2. Supports Position Delete but not Equality Delete for V2 format;
+3. Only supports Hive Metastore Catalogs. The usage is the same as that of Hive Catalogs.
+
+## Create Catalog
+
+Same as creating Hive Catalogs. A simple example is provided here. See [Hive](./hive) for more information.
+
+```sql
+CREATE CATALOG iceberg PROPERTIES (
+ 'type'='hms',
+ 'hive.metastore.uris' = 'thrift://172.21.0.1:7004',
+ 'hadoop.username' = 'hive',
+ 'dfs.nameservices'='your-nameservice',
+ 'dfs.ha.namenodes.your-nameservice'='nn1,nn2',
+ 'dfs.namenode.rpc-address.your-nameservice.nn1'='172.21.0.2:4007',
+ 'dfs.namenode.rpc-address.your-nameservice.nn2'='172.21.0.3:4007',
+ 'dfs.client.failover.proxy.provider.your-nameservice'='org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider'
+);
+```
+
+## Column Type Mapping
+
+Same as that in Hive Catalogs. See the relevant section in [Hive](./hive).
+
+## Time Travel
+
+<version since="dev">
+
+Doris supports reading the specified Snapshot of Iceberg tables.
+
+</version>
+
+Each write operation to an Iceberg table will generate a new Snapshot.
+
+By default, a read request will only read the latest Snapshot.
+
+You can read data of historical table versions using the `FOR TIME AS OF` or `FOR VERSION AS OF` statements based on the Snapshot ID or the timepoint the Snapshot is generated. For example:
+
+`SELECT * FROM iceberg_tbl FOR TIME AS OF "2022-10-07 17:20:37";`
+
+`SELECT * FROM iceberg_tbl FOR VERSION AS OF 868895038966572;`
+
+You can use the [iceberg_meta](https://doris.apache.org/docs/dev/sql-manual/sql-functions/table-functions/iceberg_meta/) table function to view the Snapshot details of the specified table.
diff --git a/docs/en/docs/lakehouse/multi-catalog/multi-catalog.md b/docs/en/docs/lakehouse/multi-catalog/multi-catalog.md
index 5118a62509..61dc900978 100644
--- a/docs/en/docs/lakehouse/multi-catalog/multi-catalog.md
+++ b/docs/en/docs/lakehouse/multi-catalog/multi-catalog.md
@@ -261,7 +261,7 @@ See [Hudi](./hudi)
### Connect to Elasticsearch
-See [Elasticsearch](./elasticsearch)
+See [Elasticsearch](./es)
### Connect to JDBC
diff --git a/docs/zh-CN/docs/lakehouse/multi-catalog/hive.md b/docs/zh-CN/docs/lakehouse/multi-catalog/hive.md
index 50fc541ada..aa9a7bc53d 100644
--- a/docs/zh-CN/docs/lakehouse/multi-catalog/hive.md
+++ b/docs/zh-CN/docs/lakehouse/multi-catalog/hive.md
@@ -28,7 +28,7 @@ under the License.
通过连接 Hive Metastore,或者兼容 Hive Metatore 的元数据服务,Doris 可以自动获取 Hive 的库表信息,并进行数据查询。
-除了 Hive 外,很多其他系统也会使用 Hive Metastore 存储元数据。所以通过 Hive Catalog,我们不仅能方位 Hive,也能访问使用 Hive Metastore 作为元数据存储的系统。如 Iceberg、Hudi 等。
+除了 Hive 外,很多其他系统也会使用 Hive Metastore 存储元数据。所以通过 Hive Catalog,我们不仅能访问 Hive,也能访问使用 Hive Metastore 作为元数据存储的系统。如 Iceberg、Hudi 等。
## 使用限制
@@ -38,7 +38,7 @@ under the License.
4. 支持数据存储在 Juicefs 上的 hive 表,用法如下(需要把juicefs-hadoop-x.x.x.jar放在 fe/lib/ 和 apache_hdfs_broker/lib/ 下)。
## 创建 Catalog
-
+
```sql
CREATE CATALOG hive PROPERTIES (
'type'='hms',
@@ -51,7 +51,7 @@ CREATE CATALOG hive PROPERTIES (
'dfs.client.failover.proxy.provider.your-nameservice'='org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider'
);
```
-
+
除了 `type` 和 `hive.metastore.uris` 两个必须参数外,还可以通过更多参数来传递连接所需要的信息。
如提供 HDFS HA 信息,示例如下:
@@ -68,7 +68,7 @@ CREATE CATALOG hive PROPERTIES (
'dfs.client.failover.proxy.provider.your-nameservice'='org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider'
);
```
-
+
同时提供 HDFS HA 信息和 Kerberos 认证信息,示例如下:
```sql
@@ -87,7 +87,7 @@ CREATE CATALOG hive PROPERTIES (
'yarn.resourcemanager.principal' = 'your-rm-principal/_HOST@YOUR.COM'
);
```
-
+
提供 Hadoop KMS 加密传输信息,示例如下:
```sql
@@ -110,7 +110,7 @@ CREATE CATALOG hive PROPERTIES (
'juicefs.meta' = 'xxx'
);
```
-
+
在 1.2.1 版本之后,我们也可以将这些信息通过创建一个 Resource 统一存储,然后在创建 Catalog 时使用这个 Resource。示例如下:
```sql
@@ -126,12 +126,12 @@ CREATE RESOURCE hms_resource PROPERTIES (
'dfs.client.failover.proxy.provider.your-nameservice'='org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider'
);
-# 2. 创建 Catalog 并使用 Resource,这里的 Key Value 信息回覆盖 Resource 中的信息。
+# 2. 创建 Catalog 并使用 Resource,这里的 Key Value 信息会覆盖 Resource 中的信息。
CREATE CATALOG hive WITH RESOURCE hms_resource PROPERTIES(
'key' = 'value'
);
```
-
+
我们也可以直接将 hive-site.xml 放到 FE 和 BE 的 conf 目录下,系统也会自动读取 hive-site.xml 中的信息。信息覆盖的规则如下:
* Resource 中的信息覆盖 hive-site.xml 中的信息。
diff --git a/docs/zh-CN/docs/lakehouse/multi-catalog/hudi.md b/docs/zh-CN/docs/lakehouse/multi-catalog/hudi.md
index 0f958a813e..5de988a9b1 100644
--- a/docs/zh-CN/docs/lakehouse/multi-catalog/hudi.md
+++ b/docs/zh-CN/docs/lakehouse/multi-catalog/hudi.md
@@ -37,7 +37,7 @@ under the License.
和 Hive Catalog 基本一致,这里仅给出简单示例。其他示例可参阅 [Hive Catalog](./hive)。
```sql
-CREATE CATALOG iceberg PROPERTIES (
+CREATE CATALOG hudi PROPERTIES (
'type'='hms',
'hive.metastore.uris' = 'thrift://172.21.0.1:7004',
'hadoop.username' = 'hive',
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org