You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@shardingsphere.apache.org by GitBox <gi...@apache.org> on 2022/10/10 06:40:40 UTC

[GitHub] [shardingsphere] TeslaCN opened a new issue, #18281: Refactoring backend of ShardingSphere-Proxy

TeslaCN opened a new issue, #18281:
URL: https://github.com/apache/shardingsphere/issues/18281

   ## The current problem in Proxy: the backend exposes multiple entries to the frontend
   
   When ShardingSphere-Proxy MySQL was available for the first time, it differentiated Text Protocol and Binary Protocol in the backend module (directly calling `DatabaseCommunicationEngine`). Text Protocol will process all statements, and Binary Protocol will only process DML. This is also the origin of many problems that are reported later, some issues or PRs:
   - https://github.com/apache/shardingsphere/issues/9611
   - https://github.com/apache/shardingsphere/pull/10299
   - https://github.com/apache/shardingsphere/issues/11090
   - https://github.com/apache/shardingsphere/issues/12989
   - https://github.com/apache/shardingsphere/issues/17173
   - https://github.com/apache/shardingsphere/pull/17904 This change resulted in incorrect transaction status. But the deep-seated reason should be that most people are unfamiliar with the protocol and the problem in Proxy backend code structure, so they can hardly know that this change will be problematic.
   - Temporary fix for https://github.com/apache/shardingsphere/pull/18161 #17904: add handle auto commit to all SQL execution entries.
   
   The above list is just the tip of the iceberg. There are many related issues that are not enumerated, and some issues that have no issues, but those who are familiar with the code know that there are problems:
   - MySQL Proxy uses server-side PreparedStatement to execute DDL, which also leads to metadata inconsistency, etc.
   - Proxy uses the connection pool to directly execute `SET` statements, which may cause the connection to be polluted. The Proxy side does not handle the `SET` and other parameter type statements well, some statements will be intercepted, and some statements will be transparently transmitted to a certain storage node, resulting in the pollution of a certain database connection in the connection pool (related issues: Proxy stuck when stress testing: https://github.com/apache/shardingsphere/pull/13682).
   - Taking set encoding as an example, some processing has been done on multiple entries of the Proxy protocol.
   
   Temporary solution in the current code: Choose to use TextProtocolBackendHandler or DatabaseCommunicateEngine by judging the type of SQLStatement. Judging the back-end entry at the front end can temporarily solve the problem, but the code will be ugly and difficult to understand.
   
   In addition, the current Proxy backend implementation is overfitting for MySQL. MySQL distinguishes between Text and Binary protocols, but PostgreSQL does not have this distinction. In fact, the reason is not that other database protocols are not like the MySQL protocol, but that the back-end should provide what the front-end needs through an unified API. As for the specific protocol encoding and decoding, it should be handled by the front-end.
   
   ## Refactoring ideas
   ### gist
   
   - The current `TextProtocolBackendHandlerFactory` carries too much logic and is not the only entry for backend. Its logic needs to be adjusted to the main process of Proxy.
   - `DatabaseCommunicateEngine` no longer distinguishes whether it is Binary or not. Whether `JDBCDriverType` uses Statement or PreparedStatement, JDBC can shield the difference. Data encoding as Text or Binary should be handled by the front-end protocol.
   - The `DatabaseBackendHandler` before refactoring has the following implementations:
     - BroadcastDatabaseBackendHandler iterates over all logical Databases and executes statements (SET-like statements).
     - `UnicastDatabaseBackendHandler` executes the statement in the current logical Database. If no logical Database is specified (MySQL has no use), it will randomly select a data source for execution.
     - `SchemaAssignedDatabaseBackendHandler` almost proxies `DatabaseBackendHandler.`
   `DatabaseCommunicationEngine` can directly replace org.apache.shardingsphere.proxy.backend.text.data.impl.SchemaAssignedDatabaseBackendHandler.
   - Refactored `TextProtocolBackendHandler` to `ProxyBackendHandler`, MySQL's `COM_STMT_EXECUTE` and PostgreSQL/openGauss Portal no longer use `DatabaseCommunicateEngine` directly, and use ProxyBackendHandler as the backend entry uniformly.
   - ProxyBackendHandler separates creation and execution logic and can be created once and executed repeatedly.
   - The separate `PreparedStatementRegistry` for MySQL and PostgreSQL is used as part of the ConnectionSession, and the Registry no longer exists alone. Extract and abstract the key methods of PreparedStatement as interfaces, and add the following states to PreparedStatement to improve PreparedStatement performance:
     - `QueryHeader:` unchanged for the same PreparedStatement;
     - `SQLStatementContext:` Only the parameter part needs to be changed. There is already a method to do that.
   MySQL PreparedStatement needs to maintain a part of the parameter state to support `COM_STMT_SEND_LONG_DATA` (Blob related) and `COM_STMT_FETCH` (cursor-like) in the future;
   PostgreSQL's PreparedStatement has no parameter state, and the specific state is maintained inside the Portal.
   - Detail parameter optimization:
     - MySQL JDBC Driver sets `prepStmtCacheSize` to `200000`, MySQL `max_prepared_stmt_count` is `16382` by default, and the maximum value is `1048576`. `200000` is not a suitable value for most scenarios, and will cause the cache map of each Connection to occupy about 5 MB of memory. Consider modifying it to `8192`.
       - <https://dev.mysql.com/doc/refman/5.7/en/server-system-variables.html#sysvar_max_prepared_stmt_count>
   - Use `CompletableFuture` instead of Vert.x Future to decouple asynchronous logic from Vert.x.
   
   ![refactor_proxy_backend_en drawio](https://user-images.githubusercontent.com/20503072/173790535-4c3d3773-3bc3-4756-b576-e198b7bba5dc.png)
   
   ### Resource releasing
   The original resource releasing logic:
   - Not in a transaction, the database connection is returned to pool after each request;
   - In a transaction, the database connection is not returned to pool, but all used `Statement`s and `ResultSet`s are closed (except for the resources occupied by PostgreSQL Portal suspend, MySQL's PreparedStatement life cycle cannot span commands). All resources are not released until the end of the transaction.
   
   How to control resources after refactored:
   `ProxyBackendHandler` needs to have the following conditions:
   - `execute` Repeatable execution.
   - `releaseUnusedResources`: (internally determine whether it needs to be released) method to conditionally releasing Statement and ResultSet.
   
   The following cursors refer to MySQL COM_STMT_FETCH or PostgreSQL Portal
   `BackendConnection` no longer marks `DatabaseCommunicationEngine` usage status, but instead marks usage status of JDBC Connection.
   - Not in a transaction, no cursor:
   `releaseUnusedResource` can release all resources. All JDBC Connections can be released after the command is executed.
   - Not in transaction, with cursor:
   (This does not exist in PostgreSQL, just consider MySQL) During cursor execution, mark the corresponding JDBC connection as using, and only release the unused JDBC connection after the command ends. `releaseUnusedResources` does not release the resources needed by the cursor.
   - In a transaction, no cursor:
   (According to the original logic) After a single command is executed, all JDBC Connections used in the transaction are not released, and are released after the transaction ends. `Statements` and `ResultSets` that are no longer used can be released.
   - In a transaction, there is a cursor:
   (According to the original logic) After a single command is executed, all JDBC Connections used in the transaction are not released, and are released after the transaction ends. The PostgreSQL Portal life cycle is the end of the transaction, regardless of the cursor after the end of the transaction. For MySQL `COM_STMT_FETCH`, since Proxy does not support it originally, it can be considered to support cursors that do not cross transactions after reconstruction.
   - When the client connection is disconnected:
   Release all resources.
   
   ### Handling auto commit
   After refactoring, consider using `DatabaseBackendHandler` as an abstract class to handle auto commit in the execute logic. At this time, the `DatabaseBackendHandler` already knows SQL and `SQLStatement,` so it can process auto commit conditionally based on this information.
   For other implementations of `ProxyBackendHandler` such as `DistSQLHandler` or `TransactionBackendHandler` or `DatabaseAdminExecutor,` there is no need to deal with transaction related logic.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@shardingsphere.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [shardingsphere] github-actions[bot] commented on issue #18281: Refactoring backend of ShardingSphere-Proxy

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on issue #18281:
URL: https://github.com/apache/shardingsphere/issues/18281#issuecomment-1272349634

   Hello , this issue has not received a reply for several days.
   This issue is supposed to be closed.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@shardingsphere.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [shardingsphere] github-actions[bot] closed issue #18281: Refactoring backend of ShardingSphere-Proxy

Posted by GitBox <gi...@apache.org>.
github-actions[bot] closed issue #18281: Refactoring backend of ShardingSphere-Proxy
URL: https://github.com/apache/shardingsphere/issues/18281


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@shardingsphere.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [shardingsphere] TeslaCN commented on issue #18281: Refactoring backend of ShardingSphere-Proxy

Posted by GitBox <gi...@apache.org>.
TeslaCN commented on issue #18281:
URL: https://github.com/apache/shardingsphere/issues/18281#issuecomment-1298089546

   The main parts of this issue have been finished. Since we have planned to split Vert.x from master branch. The rest of this issue could be planned after Vert.x removed. 
   https://lists.apache.org/thread/0vd7h44bjjszc5fs2hpftktt4oh4hhw5


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@shardingsphere.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [shardingsphere] TeslaCN commented on issue #18281: Refactoring backend of ShardingSphere-Proxy

Posted by GitBox <gi...@apache.org>.
TeslaCN commented on issue #18281:
URL: https://github.com/apache/shardingsphere/issues/18281#issuecomment-1221699001

   We have solved the urgent problems caused by the defective design. The rest of this issue will be finished in later release.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@shardingsphere.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [shardingsphere] TeslaCN commented on issue #18281: Refactoring backend of ShardingSphere-Proxy

Posted by GitBox <gi...@apache.org>.
TeslaCN commented on issue #18281:
URL: https://github.com/apache/shardingsphere/issues/18281#issuecomment-1157265233

   We may move the exception handling in #18359 into backend after refactored.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@shardingsphere.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [shardingsphere] TeslaCN closed issue #18281: Refactoring backend of ShardingSphere-Proxy

Posted by GitBox <gi...@apache.org>.
TeslaCN closed issue #18281: Refactoring backend of ShardingSphere-Proxy
URL: https://github.com/apache/shardingsphere/issues/18281


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@shardingsphere.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org