You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@doris.apache.org by mo...@apache.org on 2021/11/16 04:12:53 UTC

[incubator-doris] branch master updated: [Blog] Example of binlog load usage (#7080)

This is an automated email from the ASF dual-hosted git repository.

morningman pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-doris.git


The following commit(s) were added to refs/heads/master by this push:
     new 5710cf8  [Blog] Example of binlog load usage (#7080)
5710cf8 is described below

commit 5710cf8febece396e1e4bff19ca3687e349b7b08
Author: jiafeng.zhang <zh...@gmail.com>
AuthorDate: Tue Nov 16 12:12:44 2021 +0800

    [Blog] Example of binlog load usage (#7080)
    
    Example of binlog load usage
---
 .../images/binlog/image-20211110145044815.png      | Bin 0 -> 146900 bytes
 .../images/binlog/image-20211110160106602.png      | Bin 0 -> 52259 bytes
 .../images/binlog/image-20211110160331479.png      | Bin 0 -> 67624 bytes
 .../images/binlog/image-20211110160710709.png      | Bin 0 -> 56885 bytes
 docs/en/article/articles/doris-binlog-load.md      | 418 ++++++++++++++++++++
 docs/zh-CN/article/articles/doris-binlog-load.md   | 421 +++++++++++++++++++++
 6 files changed, 839 insertions(+)

diff --git a/docs/.vuepress/public/images/binlog/image-20211110145044815.png b/docs/.vuepress/public/images/binlog/image-20211110145044815.png
new file mode 100644
index 0000000..c701de1
Binary files /dev/null and b/docs/.vuepress/public/images/binlog/image-20211110145044815.png differ
diff --git a/docs/.vuepress/public/images/binlog/image-20211110160106602.png b/docs/.vuepress/public/images/binlog/image-20211110160106602.png
new file mode 100644
index 0000000..0a3b9a7
Binary files /dev/null and b/docs/.vuepress/public/images/binlog/image-20211110160106602.png differ
diff --git a/docs/.vuepress/public/images/binlog/image-20211110160331479.png b/docs/.vuepress/public/images/binlog/image-20211110160331479.png
new file mode 100644
index 0000000..6adbe59
Binary files /dev/null and b/docs/.vuepress/public/images/binlog/image-20211110160331479.png differ
diff --git a/docs/.vuepress/public/images/binlog/image-20211110160710709.png b/docs/.vuepress/public/images/binlog/image-20211110160710709.png
new file mode 100644
index 0000000..c17c211
Binary files /dev/null and b/docs/.vuepress/public/images/binlog/image-20211110160710709.png differ
diff --git a/docs/en/article/articles/doris-binlog-load.md b/docs/en/article/articles/doris-binlog-load.md
new file mode 100644
index 0000000..6511551
--- /dev/null
+++ b/docs/en/article/articles/doris-binlog-load.md
@@ -0,0 +1,418 @@
+---
+{
+    "title": "How to use Apache Doris Binlog Load and examples",
+    "description": "Binlog Load provides a CDC (Change Data Capture) function that enables Doris to incrementally synchronize the user's data update operation in the Mysql database, making it more convenient for users to complete the import of Mysql data.",
+    "date": "2021-11-10",
+    "metaTitle": "How to use Apache Doris Binlog Load and examples",
+    "language": "en",
+    "isArticle": true,
+    "author": "张家锋",
+    "layout": "Article",
+    "sidebar": false
+}
+---
+<!-- 
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+Binlog Load provides a CDC (Change Data Capture) function that enables Doris to incrementally synchronize the user's data update operation in the Mysql database, making it more convenient for users to complete the import of Mysql data.
+
+>Note:
+>
+>This function needs to be used in 0.15 and later versions
+
+## 1. Install and configure Mysql
+
+1. Install Mysql
+
+   Quickly use Docker to install and configure Mysql, refer to the following link for details
+
+   https://segmentfault.com/a/1190000021523570
+
+   If it is installed on a physical machine, please refer to the following connection:
+
+   [在 CentOS 7 中安装 MySQL 8 的教程详解](https://cloud.tencent.com/developer/article/1721575)
+
+2. enable Mysql binlog
+
+   Enter the Docker container or modify the `/etc/my.cnf` file on the physical machine, and add the following content under [mysqld],
+
+   ```
+   log_bin=mysql_bin
+   binlog-format=Row
+   server-id=1
+   ```
+
+   Then restart Mysql
+
+   ```
+   systemctl restart mysqld
+   ```
+
+3. Create Mysql table
+
+   ```sql
+   create database demo;
+   
+    CREATE TABLE `test_cdc` (
+     `id` int NOT NULL AUTO_INCREMENT,
+     `sex` TINYINT(1) DEFAULT NULL,
+     `name` varchar(20) DEFAULT NULL,
+     `address` varchar(255) DEFAULT NULL,
+     PRIMARY KEY (`id`)
+    ) ENGINE=InnoDB
+   ```
+
+
+## 2. Install and configure Canal
+
+Download canal-1.1.5: https://github.com/alibaba/canal/releases/download/canal-1.1.5/canal.deployer-1.1.5.tar.gz
+
+1. Unzip Canal to the specified directory:
+
+   ```shell
+   tar zxvf canal.deployer-1.1.5.tar.gz -C ./canal
+   ```
+
+2. Create a new directory in the conf folder and rename it as the root directory of the instance. You can name the directory for easy identification.
+
+   For example, my name here is consistent with the name of my database library: demo 
+
+   ```shell
+   vi conf/demo/instance.properties
+   ```
+
+   Given below is a sample configuration of mine:
+
+   For the parameter description, please refer to Canal official documentation:[QuickStart ](https://github.com/alibaba/canal/wiki/QuickStart)
+
+   ```ini
+   #################################################
+   ## mysql serverId , v1.0.26+ will autoGen
+   canal.instance.mysql.slaveId=12115
+   
+   # enable gtid use true/false
+   canal.instance.gtidon=false
+   
+   # position info
+   canal.instance.master.address=10.220.146.11:3306
+   canal.instance.master.journal.name=
+   canal.instance.master.position=
+   canal.instance.master.timestamp=
+   canal.instance.master.gtid=
+   
+   # rds oss binlog
+   canal.instance.rds.accesskey=
+   canal.instance.rds.secretkey=
+   canal.instance.rds.instanceId=
+   
+   # table meta tsdb info
+   canal.instance.tsdb.enable=true
+   #canal.instance.tsdb.url=jdbc:mysql://127.0.0.1:3306/canal_tsdb
+   #canal.instance.tsdb.dbUsername=canal
+   #canal.instance.tsdb.dbPassword=canal
+   
+   #canal.instance.standby.address =
+   #canal.instance.standby.journal.name =
+   #canal.instance.standby.position =
+   #canal.instance.standby.timestamp =
+   #canal.instance.standby.gtid=
+   
+   # username/password
+   canal.instance.dbUsername=zhangfeng
+   canal.instance.dbPassword=zhangfeng800729)(*Q
+   canal.instance.connectionCharset = UTF-8
+   # enable druid Decrypt database password
+   canal.instance.enableDruid=false
+   #canal.instance.pwdPublicKey=MFwwDQYJKoZIhvcNAQEBBQADSwAwSAJBALK4BUxdDltRRE5/zXpVEVPUgunvscYFtEip3pmLlhrWpacX7y7GCMo2/JM6LeHmiiNdH1FWgGCpUfircSwlWKUCAwEAAQ==
+   
+   # table regex
+   canal.instance.filter.regex=demo\\..*
+   # table black regex
+   canal.instance.filter.black.regex=
+   # table field filter(format: schema1.tableName1:field1/field2,schema2.tableName2:field1/field2)
+   #canal.instance.filter.field=test1.t_product:id/subject/keywords,test2.t_company:id/name/contact/ch
+   # table field black filter(format: schema1.tableName1:field1/field2,schema2.tableName2:field1/field2)
+   #canal.instance.filter.black.field=test1.t_product:subject/product_image,test2.t_company:id/name/contact/ch
+   
+   # mq config
+   #canal.mq.topic=
+   # dynamic topic route by schema or table regex
+   #canal.mq.dynamicTopic=mytest1.user,mytest2\\..*,.*\\..*
+   #canal.mq.partition=0
+   # hash partition config
+   #canal.mq.partitionsNum=3
+   #canal.mq.partitionHash=test.table:id^name,.*\\..*
+   #################################################
+   ```
+
+   
+
+3. Start Canal
+
+   ```shell
+   sh bin/startup.sh
+   ```
+
+> Note: canal instance user/passwd
+>
+>Version 1.1.5, add these two configurations in canal.properties
+>
+>canal.user = canal
+>canal.passwd = E3619321C1A937C46A0D8BD1DAC39F93B27D4458
+>
+>The default password is canal/canal, and the password value of canal.passwd can be obtained by selecting password("xxx")
+
+4. Verify whether the startup is successful
+
+   ```shell
+   tail -200f logs/demo/demo.log
+   ```
+
+   ![image-20211110145044815](/images/binlog/image-20211110145044815.png)
+
+## 3.Start synchronizing data
+
+### 3.1 Create Doris target table
+
+The user needs to first create the target table corresponding to the Mysql side on the Doris side
+
+Binlog Load can only support unique target tables, and the Batch Delete function of the target table must be activated.
+
+For the method of enabling Batch Delete, please refer to the batch delete function in `help alter table`.
+
+```sql
+CREATE TABLE `doris_mysql_binlog_demo` (
+  `id` int NOT NULL,
+  `sex` TINYINT(1),
+  `name` varchar(20),
+  `address` varchar(255) 
+) ENGINE=OLAP
+UNIQUE KEY(`id`,sex)
+COMMENT "OLAP"
+DISTRIBUTED BY HASH(`sex`) BUCKETS 1
+PROPERTIES (
+"replication_allocation" = "tag.location.default: 3",
+"in_memory" = "false",
+"storage_format" = "V2"
+);
+
+-- enable batch delete
+ALTER TABLE test_2.doris_mysql_binlog_demo ENABLE FEATURE "BATCH_DELETE";
+```
+
+
+
+### 3.1 Create sync job
+
+#### 3.1.1 Create Sync Job syntax description
+
+The data synchronization (Sync Job) function supports the user to submit a resident data synchronization job. By reading the Binlog log from the specified remote address, incrementally synchronize the user's CDC (Change Data Capture) of the data update operation in the Mysql database Function.
+
+Currently, the data synchronization job only supports Canal docking. The parsed Binlog data is obtained from Canal Server and imported into Doris.
+
+Users can check the status of data synchronization job through `SHOW SYNC JOB`.
+
+grammar:
+
+```	
+CREATE SYNC [db.]job_name
+ (
+ 	channel_desc, 
+ 	channel_desc
+ 	...
+ )
+binlog_desc
+```
+
+1. `job_name`
+
+	The name of the synchronization job is the unique identifier of the job in the current database. Only one job with the same `job_name` can be running.
+   
+2. `channel_desc`
+
+	The data channel under the job is used to describe the mapping relationship between the mysql source table and the doris target table.
+	
+	grammar:  		
+	
+	```
+	FROM mysql_db.src_tbl INTO des_tbl
+	[partitions]
+	[columns_mapping]
+	```
+	
+	1. `mysql_db.src_tbl`
+
+        Specify the database and source table on the mysql side.
+		
+	2. `des_tbl`
+
+        Specify the target table on the doris side. Only unique tables are supported, and the batch delete function of the table needs to be enabled (for the opening method, please refer to the "batch delete function" of help alter table).
+	
+	3. `partitions`
+
+        Specify which partitions of the destination table to import. If not specified, it will be automatically imported into the corresponding partition.
+		
+        Example:
+	
+        ```
+        PARTITION(p1, p2, p3)
+        ```
+		
+	4. `column_mapping`
+
+        Specify the mapping relationship between the columns of the mysql source table and the doris target table. If not specified, FE will default the source table and target table columns in a one-to-one correspondence.
+		
+        The form of col_name = expr is not supported to represent columns.
+		
+        Example:
+		
+        ```
+        Suppose the target table is listed as (k1, k2, v1),
+	     
+        Change the order of columns k1 and k2
+        COLUMNS(k2, k1, v1)
+	     
+        Ignore the fourth column of the source data
+        COLUMNS(k2, k1, v1, dummy_column)
+        ```
+	
+3. `binlog_desc`
+
+    Used to describe the remote data source, currently only supports one type of canal.
+
+    grammar:
+
+    ```
+    FROM BINLOG
+    (
+        "key1" = "value1", 
+        "key2" = "value2"
+    )
+    ```
+
+    1. The attribute corresponding to the Canal data source, prefixed with `canal.`
+
+      1. canal.server.ip: the address of the canal server
+      2. canal.server.port: canal server port
+      3. canal.destination: the identity of the instance
+      4. canal.batchSize: the maximum value of the batch size obtained, the default is 8192
+      5. canal.username: instance username
+      6. canal.password: instance password
+      7. canal.debug: Optional, when set to true, the batch and detailed information of each row of data will be printed out
+          Examples:
+
+1. Simply create a data synchronization job named `job1` for `test_tbl` of `test_db`, connect to the local Canal server, and correspond to the Mysql source table `mysql_db1.tbl1`.
+
+		CREATE SYNC `test_db`.`job1`
+		(
+			FROM `mysql_db1`.`tbl1` INTO `test_tbl `
+		)
+		FROM BINLOG 
+		(
+			"type" = "canal",
+			"canal.server.ip" = "127.0.0.1",
+			"canal.server.port" = "11111",
+			"canal.destination" = "example",
+			"canal.username" = "",
+			"canal.password" = ""
+		);
+	
+2. Create a data synchronization job named `job1` for multiple tables of `test_db`, corresponding to multiple Mysql source tables one by one, and explicitly specify the column mapping.
+
+		CREATE SYNC `test_db`.`job1` 
+		(
+			FROM `mysql_db`.`t1` INTO `test1` COLUMNS(k1, k2, v1) PARTITIONS (p1, p2),
+			FROM `mysql_db`.`t2` INTO `test2` COLUMNS(k3, k4, v2) PARTITION p1
+		) 
+		FROM BINLOG 
+		(
+			"type" = "canal", 
+			"canal.server.ip" = "xx.xxx.xxx.xx", 
+			"canal.server.port" = "12111", 
+			"canal.destination" = "example",  
+			"canal.username" = "username", 
+			"canal.password" = "password"
+		);
+
+#### 3.1.2 Start to synchronize data in mysql table to Doris
+
+>Notice:
+>
+>Before creating a synchronization task, first configure enable_create_sync_job=true in fe.conf, this is false by default and not enabled, otherwise the synchronization task cannot be created
+
+```
+CREATE SYNC test_2.doris_mysql_binlog_demo_job 
+(
+	FROM demo.test_cdc INTO doris_mysql_binlog_demo
+) 
+FROM BINLOG 
+(
+	"type" = "canal", 
+	"canal.server.ip" = "10.220.146.10", 
+	"canal.server.port" = "11111", 
+	"canal.destination" = "demo",  
+	"canal.username" = "canal", 
+	"canal.password" = "canal"
+);
+```
+
+#### 3.1.3 View synchronization tasks
+
+```sql
+SHOW SYNC JOB from test_2;
+```
+
+![image-20211110160106602](/images/binlog/image-20211110160106602.png)
+
+#### 3.1.4 View the data in the table
+
+```sql
+select * from doris_mysql_binlog_demo;
+```
+
+![image-20211110160331479](/images/binlog/image-20211110160331479.png)
+
+#### 3.1.5 Delete Data
+
+We delete the data in the Mysql data table, and then look at the changes in the Doris table
+
+```
+delete from test_cdc where id in (12,13)
+```
+
+We are looking at the Doris table, the two data with id 12 and 13 have been deleted
+
+![image-20211110160710709](/images/binlog/image-20211110160710709.png)
+
+#### 3.1.6 Multi-table synchronization
+
+Multi-table synchronization only needs to be written like the following
+
+
+
+```sql
+CREATE SYNC test_2.doris_mysql_binlog_demo_job 
+(
+    FROM demo.test_cdc INTO doris_mysql_binlog_demo,
+    FROM demo.test_cdc_1 INTO doris_mysql_binlog_demo,
+    FROM demo.test_cdc_2 INTO doris_mysql_binlog_demo,
+    FROM demo.test_cdc_3 INTO doris_mysql_binlog_demo
+) 
+```
+
diff --git a/docs/zh-CN/article/articles/doris-binlog-load.md b/docs/zh-CN/article/articles/doris-binlog-load.md
new file mode 100644
index 0000000..796d80d
--- /dev/null
+++ b/docs/zh-CN/article/articles/doris-binlog-load.md
@@ -0,0 +1,421 @@
+---
+{
+    "title": "Apache Doris Binlog Load使用方法及示例",
+    "description": "Binlog Load提供了一种使Doris增量同步用户在Mysql数据库的对数据更新操作的CDC(Change Data Capture)功能,使用户更方便的完成Mysql数据的导入",
+    "date": "2021-11-10",
+    "metaTitle": "Apache Doris Binlog Load使用方法及示例",
+    "language": "zh-CN",
+    "isArticle": true,
+    "author": "张家锋",
+    "layout": "Article",
+    "sidebar": false
+}
+---
+<!-- 
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+Binlog Load提供了一种使Doris增量同步用户在Mysql数据库的对数据更新操作的CDC(Change Data Capture)功能,使用户更方便的完成Mysql数据的导入
+
+>注意:
+>
+>该功能需要在0.15及以后的版本里使用
+
+## 1. 安装配置 Mysql 
+
+1. 安装Mysql
+
+   快速使用Docker安装配置Mysql,具体参照下面的连接
+
+   https://segmentfault.com/a/1190000021523570
+
+   如果是在物理机上安装可以参考下面的连接:
+
+   [在 CentOS 7 中安装 MySQL 8 的教程详解](https://cloud.tencent.com/developer/article/1721575)
+
+2. 开启Mysql binlog
+
+   进入 Docker 容器或者物理机上修改/etc/my.cnf 文件,在 [mysqld] 下面添加以下内容,
+
+   ```
+   log_bin=mysql_bin
+   binlog-format=Row
+   server-id=1
+   ```
+
+   然后重启Mysql
+
+   ```
+   systemctl restart mysqld
+   ```
+
+3. 创建 Mysql 表
+
+   ```sql
+   create database demo;
+   
+    CREATE TABLE `test_cdc` (
+     `id` int NOT NULL AUTO_INCREMENT,
+     `sex` TINYINT(1) DEFAULT NULL,
+     `name` varchar(20) DEFAULT NULL,
+     `address` varchar(255) DEFAULT NULL,
+     PRIMARY KEY (`id`)
+    ) ENGINE=InnoDB
+   ```
+
+
+## 2. 安装配置Canal
+
+下载canal-1.1.5: https://github.com/alibaba/canal/releases/download/canal-1.1.5/canal.deployer-1.1.5.tar.gz
+
+1. 解压Canal到指定目录:
+
+   ```shell
+   tar zxvf canal.deployer-1.1.5.tar.gz -C ./canal
+   ```
+
+2. 在conf文件夹下新建目录并重命名,作为instance的根目录,目录名你可以自己命名便于识别即可
+
+   例如我这里的命名是和我的数据库库名一致:demo
+
+   ```shell
+   vi conf/demo/instance.properties
+   ```
+
+   下面给出的是一个我的示例配置:
+
+   这里面的参数说明请参考Canal官方文档:[QuickStart ](https://github.com/alibaba/canal/wiki/QuickStart)
+
+   ```ini
+   #################################################
+   ## mysql serverId , v1.0.26+ will autoGen
+   canal.instance.mysql.slaveId=12115
+   
+   # enable gtid use true/false
+   canal.instance.gtidon=false
+   
+   # position info
+   canal.instance.master.address=10.220.146.11:3306
+   canal.instance.master.journal.name=
+   canal.instance.master.position=
+   canal.instance.master.timestamp=
+   canal.instance.master.gtid=
+   
+   # rds oss binlog
+   canal.instance.rds.accesskey=
+   canal.instance.rds.secretkey=
+   canal.instance.rds.instanceId=
+   
+   # table meta tsdb info
+   canal.instance.tsdb.enable=true
+   #canal.instance.tsdb.url=jdbc:mysql://127.0.0.1:3306/canal_tsdb
+   #canal.instance.tsdb.dbUsername=canal
+   #canal.instance.tsdb.dbPassword=canal
+   
+   #canal.instance.standby.address =
+   #canal.instance.standby.journal.name =
+   #canal.instance.standby.position =
+   #canal.instance.standby.timestamp =
+   #canal.instance.standby.gtid=
+   
+   # username/password
+   canal.instance.dbUsername=zhangfeng
+   canal.instance.dbPassword=zhangfeng800729)(*Q
+   canal.instance.connectionCharset = UTF-8
+   # enable druid Decrypt database password
+   canal.instance.enableDruid=false
+   #canal.instance.pwdPublicKey=MFwwDQYJKoZIhvcNAQEBBQADSwAwSAJBALK4BUxdDltRRE5/zXpVEVPUgunvscYFtEip3pmLlhrWpacX7y7GCMo2/JM6LeHmiiNdH1FWgGCpUfircSwlWKUCAwEAAQ==
+   
+   # table regex
+   canal.instance.filter.regex=demo\\..*
+   # table black regex
+   canal.instance.filter.black.regex=
+   # table field filter(format: schema1.tableName1:field1/field2,schema2.tableName2:field1/field2)
+   #canal.instance.filter.field=test1.t_product:id/subject/keywords,test2.t_company:id/name/contact/ch
+   # table field black filter(format: schema1.tableName1:field1/field2,schema2.tableName2:field1/field2)
+   #canal.instance.filter.black.field=test1.t_product:subject/product_image,test2.t_company:id/name/contact/ch
+   
+   # mq config
+   #canal.mq.topic=
+   # dynamic topic route by schema or table regex
+   #canal.mq.dynamicTopic=mytest1.user,mytest2\\..*,.*\\..*
+   #canal.mq.partition=0
+   # hash partition config
+   #canal.mq.partitionsNum=3
+   #canal.mq.partitionHash=test.table:id^name,.*\\..*
+   #################################################
+   ```
+
+   
+
+3. 启动Canal
+
+     ```shell
+   sh bin/startup.sh
+   ```
+
+> 注意:canal instance user/passwd
+>
+>1.1.5 版本,在canal.properties里加上这两个配置
+>
+>canal.user = canal
+>canal.passwd = E3619321C1A937C46A0D8BD1DAC39F93B27D4458
+>
+>默认密码为canal/canal,canal.passwd的密码值可以通过select password("xxx") 来获取 
+
+4. 验证是否启动成功
+
+   ```shell
+   tail -200f logs/demo/demo.log
+   ```
+
+   ![image-20211110145044815](/images/binlog/image-20211110145044815.png)
+
+## 3.开始同步数据
+
+### 3.1 创建Doris目标表
+
+用户需要先在Doris端创建好与Mysql端对应的目标表
+
+Binlog Load只能支持Unique类型的目标表,且必须激活目标表的Batch Delete功能。
+
+开启Batch Delete的方法可以参考`help alter table`中的批量删除功能。
+
+```sql
+CREATE TABLE `doris_mysql_binlog_demo` (
+  `id` int NOT NULL,
+  `sex` TINYINT(1),
+  `name` varchar(20),
+  `address` varchar(255) 
+) ENGINE=OLAP
+UNIQUE KEY(`id`,sex)
+COMMENT "OLAP"
+DISTRIBUTED BY HASH(`sex`) BUCKETS 1
+PROPERTIES (
+"replication_allocation" = "tag.location.default: 3",
+"in_memory" = "false",
+"storage_format" = "V2"
+);
+
+-- enable batch delete
+ALTER TABLE test_2.doris_mysql_binlog_demo ENABLE FEATURE "BATCH_DELETE";
+```
+
+
+
+### 3.1 创建同步作业
+
+#### 3.1.1 Create Sync Job 语法说明
+
+Name: 'CREATE SYNC JOB'
+Description:
+
+数据同步(Sync Job)功能,支持用户提交一个常驻的数据同步作业,通过从指定的远端地址读取Binlog日志,增量同步用户在Mysql数据库的对数据更新操作的CDC(Change Data Capture)功能。
+	
+目前数据同步作业只支持对接Canal,从Canal Server上获取解析好的Binlog数据,导入到Doris内。
+	
+用户可通过 `SHOW SYNC JOB` 查看数据同步作业状态。
+	
+语法:
+
+```	
+CREATE SYNC [db.]job_name
+ (
+ 	channel_desc, 
+ 	channel_desc
+ 	...
+ )
+binlog_desc
+```
+
+1. `job_name`
+
+	同步作业名称,是作业在当前数据库内的唯一标识,相同`job_name`的作业只能有一个在运行。
+   
+2. `channel_desc`
+
+	作业下的数据通道,用来描述mysql源表到doris目标表的映射关系。
+	
+	语法:   		
+	
+	```
+	FROM mysql_db.src_tbl INTO des_tbl
+	[partitions]
+	[columns_mapping]
+	```
+	
+	1. `mysql_db.src_tbl`
+
+        指定mysql端的数据库和源表。
+		
+	2. `des_tbl`
+
+        指定doris端的目标表,只支持Unique表,且需开启表的batch delete功能(开启方法请看help alter table的'批量删除功能')。
+	
+	3. `partitions`
+
+        指定导入目的表的哪些 partition 中。如果不指定,则会自动导入到对应的 partition 中。
+		
+        示例:
+	
+        ```
+        PARTITION(p1, p2, p3)
+        ```
+		
+	4. `column_mapping`
+
+        指定mysql源表和doris目标表的列之间的映射关系。如果不指定,FE会默认源表和目标表的列按顺序一一对应。
+		
+        不支持 col_name = expr 的形式表示列。
+		
+        示例:
+		
+        ```
+        假设目标表列为(k1, k2, v1),
+	     
+        改变列k1和k2的顺序
+        COLUMNS(k2, k1, v1)
+	     
+        忽略源数据的第四列
+        COLUMNS(k2, k1, v1, dummy_column)
+        ```
+	
+3. `binlog_desc`
+
+    用来描述远端数据源,目前仅支持canal一种。
+	
+    语法:
+	
+    ```
+    FROM BINLOG
+    (
+        "key1" = "value1", 
+        "key2" = "value2"
+    )
+    ```
+	
+	1. Canal 数据源对应的属性,以`canal.`为前缀
+
+		1. canal.server.ip: canal server的地址
+		2. canal.server.port: canal server的端口
+		3. canal.destination: instance的标识
+		4. canal.batchSize: 获取的batch大小的最大值,默认8192
+		5. canal.username: instance的用户名
+		6. canal.password: instance的密码
+		7. canal.debug: 可选,设置为true时,会将batch和每一行数据的详细信息都打印出来
+Examples:
+
+1. 简单为 `test_db` 的 `test_tbl` 创建一个名为 `job1` 的数据同步作业,连接本地的Canal服务器,对应Mysql源表 `mysql_db1.tbl1`。
+
+		CREATE SYNC `test_db`.`job1`
+		(
+			FROM `mysql_db1`.`tbl1` INTO `test_tbl `
+		)
+		FROM BINLOG 
+		(
+			"type" = "canal",
+			"canal.server.ip" = "127.0.0.1",
+			"canal.server.port" = "11111",
+			"canal.destination" = "example",
+			"canal.username" = "",
+			"canal.password" = ""
+		);
+	
+2. 为 `test_db` 的多张表创建一个名为 `job1` 的数据同步作业,一一对应多张Mysql源表,并显式的指定列映射。
+
+		CREATE SYNC `test_db`.`job1` 
+		(
+			FROM `mysql_db`.`t1` INTO `test1` COLUMNS(k1, k2, v1) PARTITIONS (p1, p2),
+			FROM `mysql_db`.`t2` INTO `test2` COLUMNS(k3, k4, v2) PARTITION p1
+		) 
+		FROM BINLOG 
+		(
+			"type" = "canal", 
+			"canal.server.ip" = "xx.xxx.xxx.xx", 
+			"canal.server.port" = "12111", 
+			"canal.destination" = "example",  
+			"canal.username" = "username", 
+			"canal.password" = "password"
+		);
+
+#### 3.1.2 开始同步mysql表里数据到Doris
+
+>注意:
+>
+>创建同步任务之前,首先要在fe.conf里配置enable_create_sync_job=true,这个默认是false不启用,否则就不能创建同步任务
+
+```
+CREATE SYNC test_2.doris_mysql_binlog_demo_job 
+(
+	FROM demo.test_cdc INTO doris_mysql_binlog_demo
+) 
+FROM BINLOG 
+(
+	"type" = "canal", 
+	"canal.server.ip" = "10.220.146.10", 
+	"canal.server.port" = "11111", 
+	"canal.destination" = "demo",  
+	"canal.username" = "canal", 
+	"canal.password" = "canal"
+);
+```
+
+#### 3.1.3 查看同步任务
+
+```sql
+SHOW SYNC JOB from test_2;
+```
+
+![image-20211110160106602](/images/binlog/image-20211110160106602.png)
+
+#### 3.1.4 查看表里的数据
+
+```sql
+select * from doris_mysql_binlog_demo;
+```
+
+![image-20211110160331479](/images/binlog/image-20211110160331479.png)
+
+#### 3.1.5 删除数据
+
+我们在Mysql 数据表里删除数据,然后看Doris表里的变化
+
+```
+delete from test_cdc where id in (12,13)
+```
+
+我们在去看Doris表里,id是12,13这两条数据已经被删除
+
+![image-20211110160710709](/images/binlog/image-20211110160710709.png)
+
+#### 3.1.6 多表同步
+
+多表同步只需要像下面这样写法就可以了
+
+
+
+```sql
+CREATE SYNC test_2.doris_mysql_binlog_demo_job 
+(
+    FROM demo.test_cdc INTO doris_mysql_binlog_demo,
+    FROM demo.test_cdc_1 INTO doris_mysql_binlog_demo,
+    FROM demo.test_cdc_2 INTO doris_mysql_binlog_demo,
+    FROM demo.test_cdc_3 INTO doris_mysql_binlog_demo
+) 
+```
+

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org