You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@doris.apache.org by GitBox <gi...@apache.org> on 2021/07/21 08:17:17 UTC

[GitHub] [incubator-doris] xy720 opened a new issue #6287: [Proposal] Support synchronizing MySQL binlog in real time

xy720 opened a new issue #6287:
URL: https://github.com/apache/incubator-doris/issues/6287


   As we know, Binlog is the basic infrastructure in the Mysql Replication Architecture. The synchronization between replcations is carried out by reading and writing the binary log file (binlog) which stored on the mysql master server.
   
   We know that in mysql cluster mode, only one replication is responsible for writing and the other replications are responsible for reading. Therefore, the repliactions' architecture is usually composed of one master (responsible for writing) and one or more slaves (responsible for reading). 
   
   All data changes on the master node will be firstly write into local binlog, then copied to the slave nodes.
   
   1. On the master node, the binlog file are named like mysql-bin.000001, mysql-bin.000002... MySQL will automatically segment binlog logs.
   
   2. On the slave node, the binlog file name and position (offset) will be saved as a file or table to locate the latest consumption location. 
   
   ```
   ---------------------                                ---------------------
   |       Slave       |                                |      Master       |
   |                   |              read              |                   |
   | FileName/Position | <<<--------------------------- |    Binlog Files   |
   ---------------------                                ---------------------
   ```
   
   In order to get binlogs, the slave node sends the “MySQL binlog dump command” to the master node, and the dump thread of the master server will start to pushing binlog to the slave server continuously.
   
   That is to say, we can get binlogs on the master node by forging this dump command. We can use Alibaba's Canal to achieve this goal.
   
   Canal forge the dump protocol to disguise itself as a slave node to get and parse the master server's binlog log. Then it will store the parsed data in a ring queue in memory, waiting for other clients to subscribe and get it.
   
   Therefore, with canal as the intermediary, Fe can get and synchronize the binlog logs on the master node. The blueprint for the first stage is below:
   
   ```
                                                  ------------------------------------------------------
                                                  | ---------- ---------> channel 1 ---------> table1  |
                Binlog                    Get     | | client | ---------> channel 2 ---------> table2  |
   ---------- ---------> ----------- ------------>| ---------- ---------> channel 3 ---------> table3  |
   |  Mysql |            |  canal  |      Ack     |                                                    |
   |        |            |         | <------------|  Doris
   ----------            -----------              ------------------------------------------------------
   ```
   
   The work we need to do will be divided into two stages:
   
   stage 1: 
   1、Support creating consumption job and data channel in Fe to get parsed data in canal.
   2、Support increasingly synchronizing the data changed in MYSQL, and ensure that the data isn't lost and repeateded.
   
   stage 2:
   1、Support synchronizing and executing MySQL DDL statements.
   2、Embedding the canal into Fe, no longer to independently deploy canal server.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] morningman closed issue #6287: [Proposal] Support synchronizing MySQL binlog in real time

Posted by GitBox <gi...@apache.org>.
morningman closed issue #6287:
URL: https://github.com/apache/incubator-doris/issues/6287


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] xy720 edited a comment on issue #6287: [Proposal] Support synchronizing MySQL binlog in real time

Posted by GitBox <gi...@apache.org>.
xy720 edited a comment on issue #6287:
URL: https://github.com/apache/incubator-doris/issues/6287#issuecomment-884136384


   We already finish stage 1 in pr 
   https://github.com/apache/incubator-doris/pull/6289
   
   The proformance we test is below:
   
   ## Environment
   
   |  Cores   | Memory  |
   |  ----  | ----  |
   | 96 | 360G |
   
   ## Test Scenarios
   
   1、Use mysqlslap as tool of MySQL for pressure testing.
   
   ```
   Three source tables with the same structure in MySQL database (t1, t2, t3): 
   (avg row size 35 bytes)
   +-------+---------+------+-----+---------+-------+
   | Field | Type    | Null | Key | Default | Extra |
   +-------+---------+------+-----+---------+-------+
   | a     | int(11) | NO   | PRI | NULL    |       |
   | b     | int(11) | YES  |     | NULL    |       |
   +-------+---------+------+-----+---------+-------+
   ```
   
   ```
   Three target tables of Doris database (test1, test2, test3)
   +-------+------+------+-------+---------+---------+
   | Field | Type | Null | Key   | Default | Extra   |
   +-------+------+------+-------+---------+---------+
   | a     | INT  | No   | true  | NULL    |         |
   | b     | INT  | No   | false | NULL    | REPLACE |
   +-------+------+------+-------+---------+---------+
   ```
   
   ## Test Result
   
   ### scene 1
   
   1. start-up canal
   2. create sync job and start synchronizing
   3. Perform insert operations in three source tables in Mysql
   
   |  stage  |  statement  |  data size  |  cost(s)  |  tps avg  |  speed  |
   |  ----  |  ----  | ---- |  ----  |  ----  |  ----  |
   | mysql insert | 1000w   |  120M  |  74  |  135135  |  1.6M/s |
   | doris synchronization   | 1000w  |  120M  |  81  |  123457  |  1.5M/s  |
   
   ### scene 2
   
   1. start-up canal
   2. Perform insert operations in three source tables in Mysql
   3. create sync job and start synchronizing
   
   |  stage  |  statement  |  data size  |  cost(s)  |  tps avg  |  speed  |
   |  ----  |  ----  | ---- | ----  | ----  |  ----  |
   | mysql insert | 1000w   |  120M  |  76  | 131578  |  1.6M/s |
   | doris synchronization  | 1000w  |  120M  |  72  | 138889  |  1.7M/s  |
   
   
   ### Scene 3
   
   1. create sync job and start synchronizing
   2. Delete data from three source tables
   
   |  stage  |  op  |  row  |  data size  |  cost(s)  |  tps avg  |  speed  |
   |  ----  |  ----  |  ----  |  ----  | ---- | ----  |  ----  |
   | mysql delete | delete   | 10000000   |  120M  |  30  |   33w  |  4M/s |
   | doris synchronization | delete   | 10000000  |  120M  |  35  |  30w  |  3.4M/s  |


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] xy720 edited a comment on issue #6287: [Proposal] Support synchronizing MySQL binlog in real time

Posted by GitBox <gi...@apache.org>.
xy720 edited a comment on issue #6287:
URL: https://github.com/apache/incubator-doris/issues/6287#issuecomment-884136384






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] xy720 edited a comment on issue #6287: [Proposal] Support synchronizing MySQL binlog in real time

Posted by GitBox <gi...@apache.org>.
xy720 edited a comment on issue #6287:
URL: https://github.com/apache/incubator-doris/issues/6287#issuecomment-884136384


   We already finish stage 1 in pr 
   https://github.com/apache/incubator-doris/pull/6289
   
   The proformance we test is below:
   
   ## Environment
   
   |  Cores   | Memory  |
   |  ----  | ----  |
   | 96 | 360G |
   
   ## Test Scenarios
   
   1、Use mysqlslap as tool of MySQL for pressure testing.
   
   ```
   Three source tables with the same structure in MySQL database (t1, t2, t3): 
   (avg row size 35 bytes)
   +-------+---------+------+-----+---------+-------+
   | Field | Type    | Null | Key | Default | Extra |
   +-------+---------+------+-----+---------+-------+
   | a     | int(11) | NO   | PRI | NULL    |       |
   | b     | int(11) | YES  |     | NULL    |       |
   +-------+---------+------+-----+---------+-------+
   ```
   
   ```
   Three target tables of Doris database (test1, test2, test3)
   +-------+------+------+-------+---------+---------+
   | Field | Type | Null | Key   | Default | Extra   |
   +-------+------+------+-------+---------+---------+
   | a     | INT  | No   | true  | NULL    |         |
   | b     | INT  | No   | false | NULL    | REPLACE |
   +-------+------+------+-------+---------+---------+
   ```
   
   ## Test Result
   
   ### scene 1
   
   1. start-up canal
   2. create sync job and start synchronizing
   3. Perform insert operations in three source tables in Mysql
   
   |  stage  |  statement  |  data size  |  cost(s)  |  tps avg  |  speed  |
   |  ----  |  ----  | ---- |  ----  |  ----  |  ----  |
   | mysql insert | 1000w   |  120M  |  74  |  135135  |  1.6M/s |
   | doris synchronization   | 1000w  |  120M  |  81  |  123457  |  1.5M/s  |
   
   ### scene 2
   
   1. start-up canal
   2. Perform insert operations in three source tables in Mysql
   3. create sync job and start synchronizing
   
   |  stage  |  statement  |  data size  |  cost(s)  |  tps avg  |  speed  |
   |  ----  |  ----  | ---- | ----  | ----  |  ----  |
   | mysql insert | 1000w   |  120M  |  76  | 131578  |  1.6M/s |
   | doris synchronization  | 1000w  |  120M  |  72  | 138889  |  1.7M/s  |
   
   
   ### scene 3
   
   1. create sync job and start synchronizing
   2. Delete data from three source tables
   
   |  stage  |  op  |  row  |  data size  |  cost(s)  |  tps avg  |  speed  |
   |  ----  |  ----  |  ----  |  ----  | ---- | ----  |  ----  |
   | mysql delete | delete   | 10000000   |  120M  |  30  |   33w  |  4M/s |
   | doris synchronization | delete   | 10000000  |  120M  |  35  |  30w  |  3.4M/s  |


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] xy720 commented on issue #6287: [Proposal] Support synchronizing MySQL binlog in real time

Posted by GitBox <gi...@apache.org>.
xy720 commented on issue #6287:
URL: https://github.com/apache/incubator-doris/issues/6287#issuecomment-884136384


   We already finish stage 1 in pr 
   https://github.com/apache/incubator-doris/pull/6289
   
   The proformance we test is below:
   
   ## Environment
   
   |  Cores   | Memory  |
   |  ----  | ----  |
   | 96 | 360G |
   
   ## Test Scenarios
   
   1、Use mysqlslap as tool of MySQL for pressure testing.
   
   ```
   Three source tables with the same structure in MySQL database (t1, t2, t3): 
   (avg row size 35 bytes)
   +-------+---------+------+-----+---------+-------+
   | Field | Type    | Null | Key | Default | Extra |
   +-------+---------+------+-----+---------+-------+
   | a     | int(11) | NO   | PRI | NULL    |       |
   | b     | int(11) | YES  |     | NULL    |       |
   +-------+---------+------+-----+---------+-------+
   ```
   
   ```
   Three target tables of Doris database (test1, test2, test3)
   +-------+------+------+-------+---------+---------+
   | Field | Type | Null | Key   | Default | Extra   |
   +-------+------+------+-------+---------+---------+
   | a     | INT  | No   | true  | NULL    |         |
   | b     | INT  | No   | false | NULL    | REPLACE |
   +-------+------+------+-------+---------+---------+
   ```
   
   ## Test Resule
   
   ### scene 1
   
   1. start-up canal
   2. create sync job and start synchronizing
   3. Perform insert operations in three source tables in Mysql
   
   |  stage  |  statement  |  data size  |  cost(s)  |  tps avg  |  speed  |
   |  ----  |  ----  | ---- |  ----  |  ----  |  ----  |
   | mysql insert | 1000w   |  120M  |  74  |  135135  |  1.6M/s |
   | doris synchronization   | 1000w  |  120M  |  81  |  123457  |  1.5M/s  |
   
   #### scene 2
   
   1. start-up canal
   2. Perform insert operations in three source tables in Mysql
   3. create sync job and start synchronizing
   
   |  stage  |  statement  |  data size  |  cost(s)  |  tps avg  |  speed  |
   |  ----  |  ----  | ---- | ----  | ----  |  ----  |
   | mysql insert | 1000w   |  120M  |  76  | 131578  |  0.6M/s |
   | doris synchronization  | 1000w  |  120M  |  72  | 138889  |  1.7M/s  |
   
   
   ### Scene 3
   
   1. create sync job and start synchronizing
   2. Delete data from three source tables
   
   |  stage  |  op  |  row  |  data size  |  cost(s)  |  tps avg  |  speed  |
   |  ----  |  ----  |  ----  |  ----  | ---- | ----  |  ----  |
   | mysql delete | delete   | 10000000   |  120M  |  30  |   33w  |  4M/s |
   | doris synchronization | delete   | 10000000  |  120M  |  35  |  30w  |  3.4M/s  |


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] xy720 edited a comment on issue #6287: [Proposal] Support synchronizing MySQL binlog in real time

Posted by GitBox <gi...@apache.org>.
xy720 edited a comment on issue #6287:
URL: https://github.com/apache/incubator-doris/issues/6287#issuecomment-884136384


   We already finish stage 1 in pr 
   https://github.com/apache/incubator-doris/pull/6289
   
   The proformance we test is below:
   
   ## Environment
   
   |  Cores   | Memory  |
   |  ----  | ----  |
   | 96 | 360G |
   
   ## Test Scenarios
   
   1、Use mysqlslap as tool of MySQL for pressure testing.
   
   ```
   Three source tables with the same structure in MySQL database (t1, t2, t3): 
   (avg row size 35 bytes)
   +-------+---------+------+-----+---------+-------+
   | Field | Type    | Null | Key | Default | Extra |
   +-------+---------+------+-----+---------+-------+
   | a     | int(11) | NO   | PRI | NULL    |       |
   | b     | int(11) | YES  |     | NULL    |       |
   +-------+---------+------+-----+---------+-------+
   ```
   
   ```
   Three target tables of Doris database (test1, test2, test3)
   +-------+------+------+-------+---------+---------+
   | Field | Type | Null | Key   | Default | Extra   |
   +-------+------+------+-------+---------+---------+
   | a     | INT  | No   | true  | NULL    |         |
   | b     | INT  | No   | false | NULL    | REPLACE |
   +-------+------+------+-------+---------+---------+
   ```
   
   ## Test Result
   
   ### scene 1
   
   1. start-up canal
   2. create sync job and start synchronizing
   3. Perform insert operations in three source tables in Mysql
   
   |  stage  |  row  |  data size  |  cost(s)  |  tps avg  |  speed  |
   |  ----  |  ----  | ---- |  ----  |  ----  |  ----  |
   | mysql insert | 1000w   |  120M  |  74  |  135135  |  1.6M/s |
   | doris synchronization   | 1000w  |  120M  |  81  |  123457  |  1.5M/s  |
   
   ### scene 2
   
   1. start-up canal
   2. Perform insert operations in three source tables in Mysql
   3. create sync job and start synchronizing
   
   |  stage  |  row  |  data size  |  cost(s)  |  tps avg  |  speed  |
   |  ----  |  ----  | ---- | ----  | ----  |  ----  |
   | mysql insert | 1000w   |  120M  |  76  | 131578  |  1.6M/s |
   | doris synchronization  | 1000w  |  120M  |  72  | 138889  |  1.7M/s  |
   
   
   ### scene 3
   
   1. create sync job and start synchronizing
   2. Delete data from three source tables
   
   |  stage  |  op  |  row  |  data size  |  cost(s)  |  tps avg  |  speed  |
   |  ----  |  ----  |  ----  |  ----  | ---- | ----  |  ----  |
   | mysql delete | delete   | 10000000   |  120M  |  30  |   33w  |  4M/s |
   | doris synchronization | delete   | 10000000  |  120M  |  35  |  30w  |  3.4M/s  |


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org