You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@shardingsphere.apache.org by GitBox <gi...@apache.org> on 2020/04/08 09:52:03 UTC
[GitHub] [incubator-shardingsphere] KomachiSion opened a new issue #5106: Re-design for Sharding-Scaling Job

KomachiSion opened a new issue #5106: Re-design for Sharding-Scaling Job
URL: https://github.com/apache/incubator-shardingsphere/issues/5106
 
 
   ## Current Job Design
   
   Sharding-Scaling will extract `datasources` and `actualDataNodes` information from ShardingSphere's sharding configuration to generate Sharding-Scaling jobs. Each complete ShardingSphere configuration will generate a complete Sharding-Scaling job.
   
   After checking datasources, Sharding-Scaling job will do the first time job spliting according to the datasource. Each datasource will be one task of Sharding-Scaling job.
   
   In datasource sub-task, it also will be splited to inventory task(or named history task) and incremental task (or named real-time task). What's more, the inventory task will still be split further through the tables, and maybe then splited by primary key.
   
   Job structure like following:
   
   ```
   Sharding-Scaling Job
   	|--- Datasource1 Sub-task
   	|		|--- Inventory Task
   	|		|		|--- TableA Task
   	|		|				|--- TableA Task shard #1
   	|		|				|--- TableA Task shard #2
   	|		|				.....
   	|		|		|--- TableB Task
   	|		|			.....
   	|		|--- Incremental Task
   	|--- Datasource2 Sub-task
   		.....
   ``` 
   
   For each table task without sharding, table task shard and incremental task, there are at lease two thread to execute the task. One is `Reader` which query or subscribe change data and the other `Writer` which write data into targer ShardingSphere.
   
   The design is not friendly for distribution:
   
   - The leaf task may be the second layer(incremental task), third layer(table task without sharding and fourth layer(table task shard).
   - Concurrency or thread pool size is hard to control.
   - Limited thread pool may cause problem [Issue#5092](https://github.com/apache/incubator-shardingsphere/issues/5092)
   
   ## New Design
   
   In new design, the Sharding-Scaling job will not be splited with logic of database structure rather by the logic of distributed jobs.
   
   ```
   Sharding-Scaling Job
   	|--- Inventory Task
   			|--- Inventory Task #1
   			|		|--- Inner Inventory Task For ds1.table1
   			|		|--- Inner Inventory Task For ds1.table2 #1
   			|--- Inventory Task #2
   			|		|--- Inner Inventory Task For ds1.table2 #2
   			|		|--- Inner Inventory Task For ds2.table3
   			......
   	|--- Incremental Task
   			|--- Incremental Task #1 For ds1
   			|--- Incremental Task #2 For ds2
   			......
   ```
   
   In new design, Sharding-Scaling split job to inventory and incremental first, and then split inventory task by `job concurrency` users configured. The leaf task is `Inventory Task #n` and `Incremental Task #n`, which can be equally distributed to different Sharding-Scaling nodes.
   
   The inventory table task and inventory table task shard in old design will be as inner task of `Inventory Task #n`, and exeucte one by one.
   
   For `Reader` and `Writer`, they are similar to old design. But for new design, the thread pool size and concurrency become easier to understand and configure. Users only need to understand `job concurrency` property.
   
   Todo list:
   
   - [ ] Rename `Reader` and `Writer` [Issue#4595](https://github.com/apache/incubator-shardingsphere/issues/4595)
   - [ ] Rename `HistoryTask` and `RealtimeTask`
   - [ ] Refactor `ExecuteEngine` in Sharding-Scaling
   - [ ] Refactor `Inventory Task`
   - [ ] Refactor `TaskController`
   - [ ] Refactor `JobController` 
   
   I think the new design is better, and more suitable for development of Sharding-Scaling.
   
   Any suggestions? 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services