You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@inlong.apache.org by zi...@apache.org on 2022/11/08 06:31:57 UTC

[inlong-website] branch master updated: [INLONG-578][Doc] Add doc for MySQL connector for filtering and allmigrate (#582)

This is an automated email from the ASF dual-hosted git repository.

zirui pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/inlong-website.git


The following commit(s) were added to refs/heads/master by this push:
     new 621a384521 [INLONG-578][Doc] Add doc for MySQL connector for filtering and allmigrate (#582)
621a384521 is described below

commit 621a3845212bfa2ffaef0d9943c989783924f288
Author: Schnapps <zp...@connect.ust.hk>
AuthorDate: Tue Nov 8 14:31:52 2022 +0800

    [INLONG-578][Doc] Add doc for MySQL connector for filtering and allmigrate (#582)
---
 docs/data_node/extract_node/kafka.md               |  2 +-
 docs/data_node/extract_node/mysql-cdc.md           | 91 ++++++++++++++++------
 .../current/data_node/extract_node/mysql-cdc.md    | 60 ++++++++++++--
 3 files changed, 124 insertions(+), 29 deletions(-)

diff --git a/docs/data_node/extract_node/kafka.md b/docs/data_node/extract_node/kafka.md
index 1a7a23b8ab..5c88e49649 100644
--- a/docs/data_node/extract_node/kafka.md
+++ b/docs/data_node/extract_node/kafka.md
@@ -15,7 +15,7 @@ upsert fashion. The `upsert-kafka` connector produces a `changelog stream`, wher
 
 | Extract Node                | Kafka version |                                                                                                                                                                                                                                                                                                                                                                                           
 |-----------------------------|---------------|
-| [Kafka](./kafka.md)         | 0.10+         |  
+| [Kafka](./kafka.md)         | 0.10+         |
 
 ## Dependencies  
 
diff --git a/docs/data_node/extract_node/mysql-cdc.md b/docs/data_node/extract_node/mysql-cdc.md
index d5694b9341..f1f42e9ceb 100644
--- a/docs/data_node/extract_node/mysql-cdc.md
+++ b/docs/data_node/extract_node/mysql-cdc.md
@@ -293,7 +293,18 @@ TODO: It will be supported in the future.
           <td>optional</td>
           <td style={{wordWrap: 'break-word'}}>false</td>
           <td>Boolean</td>
-          <td>Whether it is a whole library migration, Whether it is a whole database migration scenario, if true, it compresses physical fields and other meta fields supported by MySQL Extract Node into a special meta field `data` in canal-json format.</td>
+          <td>Whether it is a full database migration scenario, if it is 'true', MySQL Extract Node will compress the physical fields and other meta fields of the table into 'json'.
+              The special 'data' meta field of the format, currently supports two data formats, if you need data in 'canal json' format,
+              then use the 'data_canal' metadata field, or use the 'data_debezium' metadata field if data in 'debezium json' format is required.</td>
+    </tr>
+    <tr>
+          <td>row-kinds-filtered</td>
+          <td>optional</td>
+          <td style={{wordWrap: 'break-word'}}>false</td>
+          <td>Boolean</td>
+          <td>The specific operation type that needs to be retained, where +U corresponds to the data before the update, -U corresponds to the updated data, and +I corresponds to the data before the update.
+              Inserted data (the existing data is the data of the insert type), -D represents the deleted data, if you want to keep multiple operation types, use & connection.
+              For example +I&-D, the connector will only output the inserted and deleted data, and the updated data will not be output. </td>
     </tr>
     <tr>
       <td>debezium.*</td>
@@ -349,9 +360,14 @@ The following format metadata can be exposed as read-only (VIRTUAL) columns in a
       <td>Type of database operation, such as INSERT/DELETE, etc.</td>
     </tr>
     <tr>
-      <td>meta.data</td>
-      <td>STRING</td>
-      <td>Data of the row that format by `canal-json` only exists when the option `migrate-all` is 'true'.</td>
+      <td>meta.data_canal</td>
+      <td>STRING/BYTES</td>
+      <td>Data for rows in `canal-json` format only exists when the `migrate-all` option is 'true'.</td>
+    </tr>
+    <tr>
+      <td>meta.data_debezium</td>
+      <td>STRING/BYTES</td>
+      <td>Data for `debezium-json` formatted lines only exists if the `migrate-all` option is 'true'.</td>
     </tr>
     <tr>
       <td>meta.is_ddl</td>
@@ -394,30 +410,31 @@ The following format metadata can be exposed as read-only (VIRTUAL) columns in a
 The extended CREATE TABLE example demonstrates the syntax for exposing these metadata fields:
 ```sql
 CREATE TABLE `mysql_extract_node` (
-      `id` INT,
-      `name` STRING,
-      `database_name` string METADATA FROM 'meta.database_name',
-      `table_name`    string METADATA FROM 'meta.table_name',
-      `op_ts`         timestamp(3) METADATA FROM 'meta.op_ts',
-      `op_type` string METADATA FROM 'meta.op_type',
-      `batch_id` bigint METADATA FROM 'meta.batch_id',
-      `is_ddl` boolean METADATA FROM 'meta.is_ddl',
-      `update_before` ARRAY<MAP<STRING, STRING>> METADATA FROM 'meta.update_before',
-      `mysql_type` MAP<STRING, STRING> METADATA FROM 'meta.mysql_type',
-      `pk_names` ARRAY<STRING> METADATA FROM 'meta.pk_names',
-      `data` STRING METADATA FROM 'meta.data',
-      `sql_type` MAP<STRING, INT> METADATA FROM 'meta.sql_type',
-      `ingestion_ts` TIMESTAMP(3) METADATA FROM 'meta.ts',
-      PRIMARY KEY (`id`) NOT ENFORCED 
+     `id` INT,
+     `name` STRING,
+     `database_name` string METADATA FROM 'meta.database_name',
+     `table_name`    string METADATA FROM 'meta.table_name',
+     `op_ts`         timestamp(3) METADATA FROM 'meta.op_ts',
+     `op_type` string METADATA FROM 'meta.op_type',
+     `batch_id` bigint METADATA FROM 'meta.batch_id',
+     `is_ddl` boolean METADATA FROM 'meta.is_ddl',
+     `update_before` ARRAY<MAP<STRING, STRING>> METADATA FROM 'meta.update_before',
+     `mysql_type` MAP<STRING, STRING> METADATA FROM 'meta.mysql_type',
+     `pk_names` ARRAY<STRING> METADATA FROM 'meta.pk_names',
+     `data` STRING METADATA FROM 'meta.data_canal',
+     `sql_type` MAP<STRING, INT> METADATA FROM 'meta.sql_type',
+     `ingestion_ts` TIMESTAMP(3) METADATA FROM 'meta.ts',
+     PRIMARY KEY (`id`) NOT ENFORCED
 ) WITH (
-      'connector' = 'mysql-cdc-inlong', 
+      'connector' = 'mysql-cdc-inlong',
       'hostname' = 'YourHostname',
       'migrate-all' = 'true',
-      'port' = '3306',                
+      'port' = '3306',
       'username' = 'YourUsername',
       'password' = 'YourPassword',
       'database-name' = 'YourDatabase',
-      'table-name' = 'YourTable' 
+      'table-name' = 'YourTable',
+      'row-kinds-filtered' = '+I'
       );
 ```
 
@@ -615,3 +632,33 @@ CREATE TABLE `mysql_extract_node` (
 </table>
 </div>
 
+## Features
+
+### Multi-database multi-table synchronization
+
+Mysql Extract node supports whole database and multi-table synchronization. After this function is enabled, the Mysql Extract node will compress the physical fields of the table into a special meta field 'data_canal' in the 'canal-json' format, and can also be configured as a metadata field 'data_debezium' in the 'debezium-json' format.
+
+Configuration parameters:
+
+| Parameter | Required | Default Value | Data Type | Description |
+|---------------| ---| ---| ---|--------------------- ----------------------------------------|
+| migrate-all |optional| false|String| Enable the entire database migration mode, all physical fields are obtained through the data_canal field |
+| table-name |optional| false|String| The regular expression of the table to be read, use "\." to separate between database and table, and use "," to separate multiple regular expressions |
+| database-name |optional| false|String| The expression of the library to be read, multiple regular expressions are separated by "," |
+
+The CREATE TABLE example demonstrates the function syntax:
+
+```sql
+CREATE TABLE `table_1`(
+`data` STRING METADATA FROM 'meta.data_canal' VIRTUAL)
+WITH (
+'inlong.metric.labels' = 'groupId=1&streamId=1&nodeId=1',
+'migrate-all' = 'true',
+'connector' = 'mysql-cdc-inlong',
+'hostname' = 'localhost',
+'database-name' = 'test,test01',
+'username' = 'root',
+'password' = 'inlong',
+'table-name' = 'test01\.a{2}[0-9]$, test\.[\s\S]*'
+)
+````
diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/data_node/extract_node/mysql-cdc.md b/i18n/zh-CN/docusaurus-plugin-content-docs/current/data_node/extract_node/mysql-cdc.md
index 46b6e03b7a..40932bf45f 100644
--- a/i18n/zh-CN/docusaurus-plugin-content-docs/current/data_node/extract_node/mysql-cdc.md
+++ b/i18n/zh-CN/docusaurus-plugin-content-docs/current/data_node/extract_node/mysql-cdc.md
@@ -289,7 +289,18 @@ TODO: 将在未来支持此功能。
           <td>optional</td>
           <td style={{wordWrap: 'break-word'}}>false</td>
           <td>Boolean</td>
-          <td>是否是全库迁移场景,如果为 'true',MySQL Extract Node 则将表的物理字段和其他元字段压缩成 'canal-json' 格式的特殊元字段 'data'。</td>
+          <td>是否是全库迁移场景,如果为 'true',MySQL Extract Node 则将表的物理字段和其他元字段压缩成 'json' 
+              格式的特殊 'data' 元字段, 目前支持两种 data 格式, 如果需要 'canal json' 格式的数据,
+              则使用 'data_canal' 元数据字段,如果需要使用 'debezium json' 格式的数据则使用 'data_debezium' 元数据字段。</td>
+    </tr>
+    <tr>
+          <td>row-kinds-filtered</td>
+          <td>optional</td>
+          <td style={{wordWrap: 'break-word'}}>false</td>
+          <td>Boolean</td>
+          <td>需要保留的特定的操作类型,其中 +U 对应更新前的数据,-U 对应更新后的数据,+I 对应
+              插入的数据(存量数据为插入类型的数据),-D 代表删除的数据, 如需保留多个操作类型则使用 & 连接。
+              举例 +I&-D,connector 只会输出插入以及删除的数据,更新的数据则不会输出。</td>
     </tr>
     <tr>
       <td>debezium.*</td>
@@ -345,10 +356,15 @@ TODO: 将在未来支持此功能。
       <td>数据库操作的类型,如 INSERT/DELETE 等。</td>
     </tr>
     <tr>
-      <td>meta.data</td>
-      <td>STRING</td>
+      <td>meta.data_canal</td>
+      <td>STRING/BYTES</td>
       <td>`canal-json` 格式化的行的数据只有在 `migrate-all` 选项为 'true' 时才存在。</td>
     </tr>
+    <tr>
+      <td>meta.data_debezium</td>
+      <td>STRING/BYTES</td>
+      <td>`debezium-json` 格式化的行的数据只有在 `migrate-all` 选项为 'true' 时才存在。</td>
+    </tr>
     <tr>
       <td>meta.is_ddl</td>
       <td>BOOLEAN</td>
@@ -402,7 +418,7 @@ CREATE TABLE `mysql_extract_node` (
       `update_before` ARRAY<MAP<STRING, STRING>> METADATA FROM 'meta.update_before',
       `mysql_type` MAP<STRING, STRING> METADATA FROM 'meta.mysql_type',
       `pk_names` ARRAY<STRING> METADATA FROM 'meta.pk_names',
-      `data` STRING METADATA FROM 'meta.data',
+      `data` STRING METADATA FROM 'meta.data_canal',
       `sql_type` MAP<STRING, INT> METADATA FROM 'meta.sql_type',
       `ingestion_ts` TIMESTAMP(3) METADATA FROM 'meta.ts',
       PRIMARY KEY (`id`) NOT ENFORCED 
@@ -414,8 +430,9 @@ CREATE TABLE `mysql_extract_node` (
       'username' = 'YourUsername',
       'password' = 'YourPassword',
       'database-name' = 'YourDatabase',
-      'table-name' = 'YourTable' 
-      );
+      'table-name' = 'YourTable',
+      'row-kinds-filtered' = '+I'
+ );
 ```
 
 ## 数据类型映射
@@ -612,3 +629,34 @@ CREATE TABLE `mysql_extract_node` (
 </table>
 </div>
 
+
+## 特性
+
+### 多库多表同步
+
+Mysql Extract 节点支持整库、多表同步。开启该功能后,Mysql Extract 节点会将表的物理字段压缩成 'canal-json' 格式的特殊元字段 'data_canal',也可配置为 'debezium-json' 格式的元数据字段 'data_debezium'。
+
+配置参数:
+
+| 参数            | 是否必须 | 默认值 | 数据类型 | 描述                                                          |
+|---------------| ---| ---| ---|-------------------------------------------------------------|
+| migrate-all   |optional| false|String| 开启整库迁移模式,所有的物理字段通过 data_canal 字段获取                          | 
+| table-name    |optional| false|String| 需要读取的表的正则表达式,database 和 table 之间使用 "\." 分隔,多个正则表达式使用 "," 分隔 | 
+| database-name |optional| false|String| 需要读取的库的表达式,多个正则表达式使用 "," 分隔                                        | 
+
+CREATE TABLE 示例演示该功能语法:
+
+```sql
+CREATE TABLE `table_1`(
+`data` STRING METADATA FROM 'meta.data_canal' VIRTUAL)
+WITH (
+'inlong.metric.labels' = 'groupId=1&streamId=1&nodeId=1',
+'migrate-all' = 'true',
+'connector' = 'mysql-cdc-inlong',
+'hostname' = 'localhost',
+'database-name' = 'test,test01',
+'username' = 'root',
+'password' = 'inlong',
+'table-name' = 'test01\.a{2}[0-9]$, test\.[\s\S]*'
+)
+```
\ No newline at end of file