You are viewing a plain text version of this content. The canonical link for it is here.

Posted to notifications@shardingsphere.apache.org by GitBox <gi...@apache.org> on 2021/03/11 10:40:45 UTC

[GitHub] [shardingsphere] Emitswang commented on issue #9628: Sharding-proxy：a slew of 'config-' files cause proxy start takes a long time

Emitswang commented on issue #9628:
URL: https://github.com/apache/shardingsphere/issues/9628#issuecomment-796641932

Hi @tristaZero

I'm glad that my suggestion has been accepted. For the first point, let me briefly explain the background.

Previously, we have implemented the rule of sharding through mybatis interceptor at the business application level. That is to say, the SQL currently applied to the proxy layer is in the form of `select * from sharding_13_25_db.t_sharding_table_1 where uid =13000251`.

Current stage, we want to take over all SQL requests by using sharding-proxy. Then, without changing the business, I need to create all the DB corresponding config files so that sharding-proxy can take over all the SQL requests smoothly.

For example, if I only configure a sharding rule config file corresponding to `schemaName: sharding_db`. Business application send sql request: `select * from sharding_13_25_db.t_sharding_table_1 where uid = 13000251 `, an exception will occur:` error 1049 (42000): unknown database`.

Therefore, we need a large number of configuration files to enable the proxy to normally accept all SQL requests. In this way, we will have 2000 dataSources need to configure, although in fact, we only want to split them into 20 data source instances, so we don't use the sharding function provided by proxy, we just do read-write separation.

You may ask why business applications don't modify the SQL to `select * from sharding_db.t_sharding where uid = 13000251`. This is mainly considered from the cost of business implement, which mainly involves the following changes:

1. At present, there are some SQL scenarios in the business, which can't obtain the sharding rule according to SQL itself. It needs to carry out some association queries, query from the dictionary mapping table, and then rewrite the sharding rule to SQL.
2. The business has table scanning logic traversing by table name, starting from `00_db.t_0` to `99_db.t_9` scan data one by one.

Another risk I am worried about is that 2000 table configurations will produce some large objects. Will this cause runtime exception, such as fullgc frequently or oom, even affect proxy performance.

In theory, parallelization can speed up the `build` process, and I'm preparing to make relevant modifications to verify it. If I can, I'd like to be a contributor to the project. However, as I am a novice in the project, the whole process of submitting PR is not very clear, so I need to learn it first, such as:

1. Is the scope of change to be evaluated by myself or to be implemented after your evaluation and confirmation?
2. If the code I submit does not meet the requirements of the project, will it wait until I finish the modification?
3. Is it necessary to add new test cases or run the original test cases for parallelization?
4. Is there a deadline requirement?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org