You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@inlong.apache.org by GitBox <gi...@apache.org> on 2022/11/10 07:30:07 UTC

[GitHub] [inlong-website] thesumery opened a new pull request, #592: [INLONG-591][Doc] Add document for multiple sink of iceberg

thesumery opened a new pull request, #592:
URL: https://github.com/apache/inlong-website/pull/592

   [INLONG-591][Doc] Add document for multiple sink of iceberg
   
   ### Prepare a Pull Request
   - Add document for multiple sink of iceberg
   - Fixes #591 
   
   ### Motivation
   
   *Add document for multiple sink of iceberg.*
   
   ### Modifications
   
   *Add document for multiple sink of iceberg.*


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@inlong.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [inlong-website] EMsnap commented on a diff in pull request #592: [INLONG-591][Doc] Add document for multiple sink of iceberg

Posted by GitBox <gi...@apache.org>.
EMsnap commented on code in PR #592:
URL: https://github.com/apache/inlong-website/pull/592#discussion_r1023578305


##########
i18n/zh-CN/docusaurus-plugin-content-docs/current/data_node/load_node/iceberg.md:
##########
@@ -148,6 +148,98 @@ TODO
 ### InLong Manager Client 用法
 TODO
 
+## 特征
+### 多表写入
+目前 Iceberg 支持多表同时写入,需要在 FLINK SQL 的建表参数上添加 `'sink.multiple.enable' = 'true'` 并且目标表的schema
+只能定义成 `BINARY` 或者 `STRING` ,以下是一个建表语句举例:
+```
+CREATE TABLE `table_2`(
+    `data` STRING)
+WITH (
+    'connector'='iceberg-inlong',
+    'catalog-name'='hive_prod',
+    'uri'='thrift://localhost:9083',
+    'warehouse'='hdfs://localhost:8020/hive/warehouse',
+    'sink.multiple.enable' = 'true',
+    'sink.multiple.format' = 'canal-json',
+    'sink.multiple.add-column.policy' = 'TRY_IT_BEST',
+    'sink.multiple.database-pattern' = '${database}',
+    'sink.multiple.table-pattern' = 'test_${table}'
+);
+```
+要支持多表写入同时需要设置上游数据的序列化格式(通过选项 'sink.multiple.format'
+来设置, 目前仅支持 [canal-json|debezium-json])。
+
+### 动态表名映射
+Iceberg 在多表写入的时可以自定义映射的数据库名和表名的规则,可以填充占位符然后添加前后缀来修改映射的目标表名称。
+Iceberg Load Node 会解析 `'sink.multiple.database-pattern'` 作为目的端的 数据库名, 解析 `'sink.multiple.table-pattern'`
+作为目的端的表名,占位符是从数据中解析出来的,变量是严格通过 '${VARIABLE_NAME}' 来表示, 变量的取值来自于数据本身, 
+即可以是通过 `'sink.multiple.format'` 指定的某种 Format 的元数据字段, 也可以是数据中的物理字段。
+关于 'topic-parttern' 的例子如下:
+- 'sink.multiple.format' 为 'canal-json':
+
+上游数据为:
+```
+{
+  "data": [
+    {
+      "id": "111",
+      "name": "scooter",
+      "description": "Big 2-wheel scooter",
+      "weight": "5.18"
+    }
+  ],
+  "database": "inventory",
+  "es": 1589373560000,
+  "id": 9,
+  "isDdl": false,
+  "mysqlType": {
+    "id": "INTEGER",
+    "name": "VARCHAR(255)",
+    "description": "VARCHAR(512)",
+    "weight": "FLOAT"
+  },
+  "old": [
+    {
+      "weight": "5.15"
+    }
+  ],
+  "pkNames": [
+    "id"
+  ],
+  "sql": "",
+  "sqlType": {
+    "id": 4,
+    "name": 12,
+    "description": 12,
+    "weight": 7
+  },
+  "table": "products",
+  "ts": 1589373560798,
+  "type": "UPDATE"
+} 
+```
+'topic-pattern' 为 '{database}_${table}', 提取后的 Topic 为 'inventory_products' ('database', 'table' 为元数据字段,
+'id' 为物理字段)
+
+'topic-pattern' 为 '{database}_${table}_${id}', 提取后的 Topic 为 'inventory_products_4' ('database', 'table' 

Review Comment:
   111 -> 4



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@inlong.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [inlong-website] EMsnap merged pull request #592: [INLONG-591][Doc] Add document for multiple sink of iceberg

Posted by GitBox <gi...@apache.org>.
EMsnap merged PR #592:
URL: https://github.com/apache/inlong-website/pull/592


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@inlong.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [inlong-website] gong commented on a diff in pull request #592: [INLONG-591][Doc] Add document for multiple sink of iceberg

Posted by GitBox <gi...@apache.org>.
gong commented on code in PR #592:
URL: https://github.com/apache/inlong-website/pull/592#discussion_r1018748669


##########
docs/data_node/load_node/iceberg.md:
##########
@@ -147,6 +147,103 @@ TODO
 ### Usage for InLong Manager Client
 TODO
 
+## Feature
+### Multiple table sink
+Currently Iceberg support multiple table sinking, it require FLINK SQL create table parameters add  
+`'sink.multiple.enable' = 'true'` and target table schema can only be defined as `BINARY` or `STRING`

Review Comment:
   `BINARY` should change to `BYTES`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@inlong.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [inlong-website] EMsnap commented on a diff in pull request #592: [INLONG-591][Doc] Add document for multiple sink of iceberg

Posted by GitBox <gi...@apache.org>.
EMsnap commented on code in PR #592:
URL: https://github.com/apache/inlong-website/pull/592#discussion_r1018738270


##########
i18n/zh-CN/docusaurus-plugin-content-docs/current/data_node/load_node/iceberg.md:
##########
@@ -148,6 +148,98 @@ TODO
 ### InLong Manager Client 用法
 TODO
 
+## 特征
+### 多表写入
+目前 Iceberg 支持多表同时写入,需要在 FLINK SQL 的建表参数上添加 `'sink.multiple.enable' = 'true'` 并且目标表的schema
+只能定义成 `BINARY` 或者 `STRING` ,以下是一个建表语句举例:
+```
+CREATE TABLE `table_2`(
+    `data` STRING)
+WITH (
+    'connector'='iceberg-inlong',
+    'catalog-name'='hive_prod',
+    'uri'='thrift://localhost:9083',
+    'warehouse'='hdfs://localhost:8020/hive/warehouse',
+    'sink.multiple.enable' = 'true',
+    'sink.multiple.format' = 'canal-json',
+    'sink.multiple.add-column.policy' = 'TRY_IT_BEST',
+    'sink.multiple.database-pattern' = '${database}',
+    'sink.multiple.table-pattern' = 'test_${table}'
+);
+```
+要支持多表写入同时需要设置上游数据的序列化格式(通过选项 'sink.multiple.format'
+来设置, 目前仅支持 [canal-json|debezium-json])。
+
+### 动态表名映射
+Iceberg 在多表写入的时可以自定义映射的数据库名和表名的规则,可以填充占位符然后添加前后缀来修改映射的目标表名称。
+Iceberg Load Node 会解析 `'sink.multiple.database-pattern'` 作为目的端的 数据库名, 解析 `'sink.multiple.table-pattern'`
+作为目的端的表名,占位符是从数据中解析出来的,变量是严格通过 '${VARIABLE_NAME}' 来表示, 变量的取值来自于数据本身, 
+即可以是通过 `'sink.multiple.format'` 指定的某种 Format 的元数据字段, 也可以是数据中的物理字段。
+关于 'topic-parttern' 的例子如下:
+- 'sink.multiple.format' 为 'canal-json':
+
+上游数据为:
+```
+{
+  "data": [
+    {
+      "id": "111",
+      "name": "scooter",
+      "description": "Big 2-wheel scooter",
+      "weight": "5.18"
+    }
+  ],
+  "database": "inventory",
+  "es": 1589373560000,
+  "id": 9,
+  "isDdl": false,
+  "mysqlType": {
+    "id": "INTEGER",
+    "name": "VARCHAR(255)",
+    "description": "VARCHAR(512)",
+    "weight": "FLOAT"
+  },
+  "old": [
+    {
+      "weight": "5.15"
+    }
+  ],
+  "pkNames": [
+    "id"
+  ],
+  "sql": "",
+  "sqlType": {
+    "id": 4,
+    "name": 12,
+    "description": 12,
+    "weight": 7
+  },
+  "table": "products",
+  "ts": 1589373560798,
+  "type": "UPDATE"
+} 
+```
+'topic-pattern' 为 '{database}_${table}', 提取后的 Topic 为 'inventory_products' ('database', 'table' 为元数据字段,
+'id' 为物理字段)
+
+'topic-pattern' 为 '{database}_${table}_${id}', 提取后的 Topic 为 'inventory_products_4' ('database', 'table' 

Review Comment:
   111 ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@inlong.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [inlong-website] gong commented on a diff in pull request #592: [INLONG-591][Doc] Add document for multiple sink of iceberg

Posted by GitBox <gi...@apache.org>.
gong commented on code in PR #592:
URL: https://github.com/apache/inlong-website/pull/592#discussion_r1018751143


##########
i18n/zh-CN/docusaurus-plugin-content-docs/current/data_node/load_node/iceberg.md:
##########
@@ -148,6 +148,98 @@ TODO
 ### InLong Manager Client 用法
 TODO
 
+## 特征
+### 多表写入
+目前 Iceberg 支持多表同时写入,需要在 FLINK SQL 的建表参数上添加 `'sink.multiple.enable' = 'true'` 并且目标表的schema
+只能定义成 `BINARY` 或者 `STRING` ,以下是一个建表语句举例:
+```

Review Comment:
   `BINARY` should be `BYTES`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@inlong.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org