You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@shardingsphere.apache.org by GitBox <gi...@apache.org> on 2020/03/23 04:26:04 UTC

[GitHub] [incubator-shardingsphere] kimmking opened a new issue #4896: [DISCUSS]MetadataCenter Design 5.x

kimmking opened a new issue #4896: [DISCUSS]MetadataCenter Design 5.x
URL: https://github.com/apache/incubator-shardingsphere/issues/4896
 
 
   MetadataCenter Design5.x is a part of Orchestration 5.x Design(#4515).
   
   ## Metadata Center Design
   
   The purpose of this document is to illustrate the metadata center design of the Apache ShardingSphere governance module.
   
   [TOC]
   
   ### 1. Definition
   
   The word 'metadata' in this article is the metadata of the data source used by Sharding-JDBC / Sharding-Proxy. These metadata are the core data objects to ensure the correct operations of each component of ShardingSphere. Currently, they are scattered in each part of the system. They should be reorganized and managed in a unified manner using a new component such like 'metadata center', and coordinate changes when metadatas changes.
   
   
   
   ### 2. Types
   
   The current metadata object models are mainly defined in:
   
   > org.apache.shardingsphere.sql.parser.binder
   >
   > ├─column
   > │      ColumnMetaData.java
   > │      ColumnMetaDataLoader.java
   > ├─index
   > │      IndexMetaData.java
   > │      IndexMetaDataLoader.java
   > ├─schema
   > │      SchemaMetaData.java
   > │      SchemaMetaDataLoader.java
   > └─table
   >      TableMetaData.java
   >      TableMetaDataLoader.java
   >
   > 
   >
   > Hierarchical relationship is `schema > table > column + index`
   >
   > At the same time, the scaling module also has a subset of models and loaders that need to be merged (#4866).
   
   
   
   One issue to further discuss is:
   
   > Question 1: Should sharding rules and so on be put in the metadata center uniformly?
   >
   > It is recommended to only process datasource metadatas right now, and rule datas should still be placed in config center. See if we want to adjust them later.
   
   
   
   ### 3. Current Metadata Loading
   
   The unified entry for loading metadatas in:
   
   > org.apache.shardingsphere.sql.parser.binder.metadata.schema.SchemaMetaDataLoader
   
   ![image-20200319190724018](https://user-images.githubusercontent.com/807508/77133013-76df2780-6a9c-11ea-9a98-54d69bd43599.png)
   
   Metadatas are loaded in three places:
   
   1、Sharding-JDBC
   
   2、Sharding-Proxy
   
   > Bootstrap.startWithRegistryCenter->LogicSchemas.init/initSchemas->LogicSchemas.initSchemas(for)
   >
   > ->LogicSchemaFactory.newInstance->XXSchema->XXSchema.createMetaData/loadSchemaMetaData
   >
   > ->SchemaMetaDataLoader.load(dataSource, maxConnectionsSizePerQuery)
   
   Each XXSchema class implements guava's @ Subscribe's renew method, which can execute the corresponding rules to refresh them(rules, not metadatas) when an event is received. 
   
   ShardingSchema and MasterSlaveSchema can additionally support the disableEvent to disable the data source. (This section have a lot codes those can be optimized.)
   
   sharding table metadatas loading:
   
   > org.apache.shardingsphere.core.metadata.ShardingMetaDataLoader
   
   Load the logic part and the default part separately, and then call SchemaMetaDataLoader to load according to the hierarchy.
   
   3、Sharding-scaling
   
   There are an independent TableMetaDataLoader/ColumnMetaDataLoader here, used to load its own TableMetaData/ColumnMetaData.
   
   
   
   ### 4. Metadata Usage 
   
   **Route Module**
   
     * If it is SelectStatementContext, where condition exists, it will participate in obtaining ShadingConditions
     * If it is DDLStatement, DCLStatement, it will get the table data from it.
   
   **Rewrite Module**
   
   - Determine if the column exists in metadata.
     - EncryptPredicateParameterRewriter
     - EncryptPredicateColumnTokenGenerator
     - EncryptPredicateRightValueTokenGenerator
   
   **Execute Module**
   
     * If it is the following types, it will refresh metaData.
   
    ```java
    if (sqlStatementContext instanceof CreateTableStatementContext) {
               refreshTableMetaData(runtimeContext, ((CreateTableStatementContext) sqlStatementContext).getSqlStatement());
           } else if (sqlStatementContext instanceof AlterTableStatementContext) {
               refreshTableMetaData(runtimeContext, ((AlterTableStatementContext) sqlStatementContext).getSqlStatement());
           } else if (sqlStatementContext instanceof DropTableStatementContext) {
               refreshTableMetaData(runtimeContext, ((DropTableStatementContext) sqlStatementContext).getSqlStatement());
           } else if (sqlStatementContext instanceof CreateIndexStatementContext) {
               refreshTableMetaData(runtimeContext, ((CreateIndexStatementContext) sqlStatementContext).getSqlStatement());
           } else if (sqlStatementContext instanceof DropIndexStatementContext) {
               refreshTableMetaData(runtimeContext, ((DropIndexStatementContext) sqlStatementContext).getSqlStatement());
           }
    ```
   
   **Merge Module**
   
     * Not used, use SQL directly to return MetaData from ResultSet
   
   
   
   ### 5. Metadata Changes
   
   Currently, metadatas are loaded and managed by each started sharding-JDBC or proxy node.
   
   If a DDL is executed through one node, the following refresh method is directly called to refresh the metadata of the current node.
   
   JDBC:
   
   > org.apache.shardingsphere.shardingjdbc.executor.AbstractStatementExecutor.refreshMetaDataIfNeeded
   
   Proxy:
   
   > org.apache.shardingsphere.shardingproxy.backend.schema.impl.ShardingSchema.refreshTableMetaData
   
   A tip: The codes in these two places are also heavily duplicated.
   
   
   
   ### 6. Metadata Center Design
   
   From the above sections, we know that there are some points that need improvement:
   
   Improvement 1: 
   
   If there are multiple nodes started at the same time, a large number of metadatas will be repeatedly loaded from the DB, and maybe lead to a performance issue.
   
   Improvement 2:
   
   Once a node executed a DDL, other nodes don't know that the metadatas had changed, and lead to a consistency issue.
   
   We hope to solve these two problems via metadata center design.
   
   #### 6.1 Definition
   
   The metadata center is a mechanism for unified loading of metadatas, change notifications, and data synchronization through the unified management of all metadatas.
   
   #### 6.2 Feature
   
   It is planned to sort out the existing metadata loading and all usage scenarios, uniformly manage the metadata to be persisted to the CenterRepository since the metadatas are loaded for the first time, and subsequent nodes will start to obtain metadatas from the metadata center (Improvement 1). When a node performs a DDL operation, the node's metadatas is refreshed and synchronized to CenterRepository, and then all other nodes are notified to synchronize new datas from the metadata center (Improvement 2).
   
   #### 6.3 API
   
   On the basis of clearing up, the existing metadata loading logic will be restructured, and some loader code will be migrated to the new metadata center module:
   
   > sharding-orchestration-center-metadata
   
   And the following new API::
   
   1. MetadataCenter
   2. MetadataLoader
   3. MetadataNode
   4. MetadataListener
   5. MetadataChangeEvent
   
   
   
   #### 6.4 Data Structure
   
   The current metadata memory structure is as follows:
   
   ![image-20200319194148562](https://user-images.githubusercontent.com/807508/77133014-78105480-6a9c-11ea-972d-e4dad9f7bf08.png)
   
   
   There are two style available metadata structure configurations.
   
   CenterRepository metadata structure configuration style one:
   
   > ```
   > ├─orchestration-namespace
   > ├─orchestration-name
   > │  ├─metadata
   > │  │  ├─ip1:port/catalog/schema
   > │  │  │  ├─table1
   > │  │  │  │  ├─columns
   > │  │  │  │  └─indexs
   > │  │  │  ├─table2
   > │  │  │  │  ├─columns
   > │  │  │  │  └─indexs
   > │  │  ├─ip2:port/catalog/schema
   > │  │  │  ├─table3
   > │  │  │  │  ├─columns
   > │  │  │  │  └─indexs
   > │  │  │  ├─table4
   > │  │  │  │  ├─columns
   > │  │  │  │  └─indexs
   > ```
   
   CenterRepository metadata structure configuration style two:
   
   > ```
   > ├─orchestration-namespace
   > ├─orchestration-name
   > │  ├─metadata
   > │  │  ├─ip1:port/catalog/schema
   > │  │  │  ├─ [json/yaml text contents]
   > ```
   
   Style one: It is intuitive and fine-grained.
   
   Style two: It is simpler and easier to manage. 
   
   (It is recommended to use style 2.)
   
   
   
   One question:
   
   > Question 2: Will the logic table be displayed here?
   >
   > I think the answer is no.
   >
   > Only the actual real tables metadatas here, logic table is not a part of metadata.
   
   
   
   #### 6.5 Loading
   
   Current loading process:
   
   > Load sharding tables first, then load default tables, and check whether the metadatas of all tables is consistent according to the check.metadata.enable parameter.
   
   New loading process:
   
   > After loading metadata, write to CenterRepository, then trigger global notification.
   
   
   
   > Question 3: Are the metadatas of this step written to CenterRepository after each group or datasource is loaded, or after all metadatas loaded and written them once (related to triggering one or more events).
   >
   > Consider to implement a full synchronizing at first, and then see if it can be persisted and notified in batches, based on loading speed.
   
   #### 6.6 Synchronization
   
   After DDL is executed in one node, the metadatas of this node is refreshed through the Event mechanism(instead of local method calling),  synchronized to CenterRepository, and then a global change notification is triggered to other nodes.
   
   > Question 4: Do we need to hold the loading process of other nodes to prevent concurrency.
   >
   > Do not consider concurrency at first, implement the load and notify function, and finally solve this problem, maybe involving distributed locks.
   
   ![幻灯片4](https://user-images.githubusercontent.com/807508/77132931-2ff13200-6a9c-11ea-82a4-dbb10195b1c8.PNG)
   
   
   #### 6.7 Notification
   
   Through the Event mechanism, other nodes are notified and then update metadatas from CenterRepository.
   
   ![幻灯片5](https://user-images.githubusercontent.com/807508/77132933-31225f00-6a9c-11ea-9e97-12593880da3e.PNG)
   
   ### 7. Task list
   
   - [ ] 7.1 metadata carding
   - [ ] 7.2 metadata refactoring (code abstraction and cleanup)
   - [ ] 7.3 Add center-metadata related project structure
   - [ ] 7.4 Implementing metadata persistence
   - [ ] 7.5 Migrating some loader codes
   - [ ] 7.6 Implementing the event notification mechanism
   - [ ] 7.7 Implementing Global Synchronization
   - [ ] 7.8 Optimize loading and usage
   - [ ] 7.9 Improve Unittests
   - [ ] 7.10 Implementing examples
   - [ ] 7.11 Implementing documents
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-shardingsphere] kimmking edited a comment on issue #4896: [DISCUSS]MetadataCenter Design 5.x

Posted by GitBox <gi...@apache.org>.
kimmking edited a comment on issue #4896: [DISCUSS]MetadataCenter Design 5.x
URL: https://github.com/apache/incubator-shardingsphere/issues/4896#issuecomment-611468787
 
 
   Let's talk about/pick up tasks from https://github.com/apache/incubator-shardingsphere/issues/5128
   @ssxlulu @menghaoranss @yu199195 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-shardingsphere] kimmking commented on issue #4896: [DISCUSS]MetadataCenter Design 5.x

Posted by GitBox <gi...@apache.org>.
kimmking commented on issue #4896: [DISCUSS]MetadataCenter Design 5.x
URL: https://github.com/apache/incubator-shardingsphere/issues/4896#issuecomment-611463629
 
 
   I have done some tasks for Phase1&2 and merged into master now.
   We can moving on to do 3&4.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-shardingsphere] kimmking edited a comment on issue #4896: [DISCUSS]MetadataCenter Design 5.x

Posted by GitBox <gi...@apache.org>.
kimmking edited a comment on issue #4896: [DISCUSS]MetadataCenter Design 5.x
URL: https://github.com/apache/incubator-shardingsphere/issues/4896#issuecomment-611468787
 
 
   Let's talk about/pick up detail tasks from https://github.com/apache/incubator-shardingsphere/issues/5128
   @ssxlulu @menghaoranss @yu199195 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-shardingsphere] kimmking commented on issue #4896: [DISCUSS]MetadataCenter Design 5.x

Posted by GitBox <gi...@apache.org>.
kimmking commented on issue #4896: [DISCUSS]MetadataCenter Design 5.x
URL: https://github.com/apache/incubator-shardingsphere/issues/4896#issuecomment-610920373
 
 
   @ssxlulu @yu199195 @menghaoranss we will move on and each one can take a part of it.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-shardingsphere] kimmking commented on issue #4896: [DISCUSS]MetadataCenter Design 5.x

Posted by GitBox <gi...@apache.org>.
kimmking commented on issue #4896: [DISCUSS]MetadataCenter Design 5.x
URL: https://github.com/apache/incubator-shardingsphere/issues/4896#issuecomment-610920017
 
 
   I will split this issue to 3 phase:
   1.  add project structure and integrate center repository
   2.  add load and refresh methods, global notifications
   3.  revise docs and update unittests/examples

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-shardingsphere] yu199195 commented on issue #4896: [DISCUSS]MetadataCenter Design 5.x

Posted by GitBox <gi...@apache.org>.
yu199195 commented on issue #4896: [DISCUSS]MetadataCenter Design 5.x
URL: https://github.com/apache/incubator-shardingsphere/issues/4896#issuecomment-602486498
 
 
   good issues

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-shardingsphere] kimmking edited a comment on issue #4896: [DISCUSS]MetadataCenter Design 5.x

Posted by GitBox <gi...@apache.org>.
kimmking edited a comment on issue #4896: [DISCUSS]MetadataCenter Design 5.x
URL: https://github.com/apache/incubator-shardingsphere/issues/4896#issuecomment-610920017
 
 
   I will split this issue to 3 phase:
   
   - [x] 1.  add project structure and integrate center repository
   - [x] 2.  add load and recover metadata&global notifications
   - [ ] 3.  add refresh methods via executing ddlstatment
   - [ ] 4. revise docs and update unittests/examples

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-shardingsphere] kimmking commented on issue #4896: [DISCUSS]MetadataCenter Design 5.x

Posted by GitBox <gi...@apache.org>.
kimmking commented on issue #4896: [DISCUSS]MetadataCenter Design 5.x
URL: https://github.com/apache/incubator-shardingsphere/issues/4896#issuecomment-604846886
 
 
   > @kimmking Hi, I am interested in the metadata module, and I have a little experience about the metadata management, such as concurrent ddl, I will be honor to take part in it.
   
   U r welcome.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-shardingsphere] ssxlulu commented on issue #4896: [DISCUSS]MetadataCenter Design 5.x

Posted by GitBox <gi...@apache.org>.
ssxlulu commented on issue #4896: [DISCUSS]MetadataCenter Design 5.x
URL: https://github.com/apache/incubator-shardingsphere/issues/4896#issuecomment-604251629
 
 
   @kimmking Hi, I am interested in the metadata module, and I have a little experience about the metadata management, such as concurrent ddl, I will be honor to take part in it.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-shardingsphere] kimmking commented on issue #4896: [DISCUSS]MetadataCenter Design 5.x

Posted by GitBox <gi...@apache.org>.
kimmking commented on issue #4896: [DISCUSS]MetadataCenter Design 5.x
URL: https://github.com/apache/incubator-shardingsphere/issues/4896#issuecomment-611468787
 
 
   Let's talk about/pick up tasks from https://github.com/apache/incubator-shardingsphere/issues/5128

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-shardingsphere] ssxlulu commented on issue #4896: [DISCUSS]MetadataCenter Design 5.x

Posted by GitBox <gi...@apache.org>.
ssxlulu commented on issue #4896: [DISCUSS]MetadataCenter Design 5.x
URL: https://github.com/apache/incubator-shardingsphere/issues/4896#issuecomment-611271452
 
 
   Please assign me phase 2.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-shardingsphere] menghaoranss commented on issue #4896: [DISCUSS]MetadataCenter Design 5.x

Posted by GitBox <gi...@apache.org>.
menghaoranss commented on issue #4896: [DISCUSS]MetadataCenter Design 5.x
URL: https://github.com/apache/incubator-shardingsphere/issues/4896#issuecomment-611304647
 
 
   I can try phase 1, and maybe each phase needs to be split into sub issues also, we can complete them together.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-shardingsphere] kimmking edited a comment on issue #4896: [DISCUSS]MetadataCenter Design 5.x

Posted by GitBox <gi...@apache.org>.
kimmking edited a comment on issue #4896: [DISCUSS]MetadataCenter Design 5.x
URL: https://github.com/apache/incubator-shardingsphere/issues/4896#issuecomment-610920017
 
 
   I will split this issue to 3 phase:
   
   - [x] 1.  add project structure and integrate center repository
   - [x] 2.  add load and recover metadata&global notifications
   - [x] 3.  add refresh methods via executing ddlstatment
   - [x] 4. revise docs and update unittests/examples

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services