You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@shardingsphere.apache.org by "sandynz (via GitHub)" <gi...@apache.org> on 2023/02/22 06:57:04 UTC
[GitHub] [shardingsphere] sandynz commented on a diff in pull request #24293: Improve table records count calculation in pipeline job

sandynz commented on code in PR #24293:
URL: https://github.com/apache/shardingsphere/pull/24293#discussion_r1113898689


##########
kernel/data-pipeline/api/src/main/java/org/apache/shardingsphere/data/pipeline/spi/sqlbuilder/PipelineSQLBuilder.java:
##########
@@ -135,6 +135,16 @@ default Optional<String> buildCreateSchemaSQL(String schemaName) {
     // TODO keep it for now, it might be used later
     String buildCountSQL(String schemaName, String tableName);
     
+    /**
+     * Build Estimate count SQL.
+     *
+     * @param databaseName database name
+     * @param schemaName schema name
+     * @param tableName table name
+     * @return estimate count sql
+     */
+    String buildEstimateCountSQL(String databaseName, String schemaName, String tableName);

Review Comment:
   `buildEstimateCountSQL` could be `buildEstimatedCountSQL`



##########
kernel/data-pipeline/core/src/main/java/org/apache/shardingsphere/data/pipeline/core/prepare/InventoryTaskSplitter.java:
##########
@@ -171,12 +171,20 @@ private long getTableRecordsCount(final InventoryIncrementalJobItemContext jobIt
         PipelineJobConfiguration jobConfig = jobItemContext.getJobConfig();
         String schemaName = dumperConfig.getSchemaName(new LogicTableName(dumperConfig.getLogicTableName()));
         String actualTableName = dumperConfig.getActualTableName();
-        // TODO with a large amount of data, count the full table will have performance problem
-        String sql = PipelineTypedSPILoader.getDatabaseTypedService(PipelineSQLBuilder.class, jobConfig.getSourceDatabaseType()).buildCountSQL(schemaName, actualTableName);
+        PipelineSQLBuilder pipelineSQLBuilder = PipelineTypedSPILoader.getDatabaseTypedService(PipelineSQLBuilder.class, jobConfig.getSourceDatabaseType());
+        String sql = pipelineSQLBuilder.buildEstimateCountSQL(dumperConfig.getDataSourceName(), schemaName, actualTableName);
         try (
                 Connection connection = dataSource.getConnection();
                 PreparedStatement preparedStatement = connection.prepareStatement(sql)) {
+            long estimateCount;

Review Comment:
   `estimateCount` could be `result`



##########
kernel/data-pipeline/dialect/postgresql/src/main/java/org/apache/shardingsphere/data/pipeline/postgresql/sqlbuilder/PostgreSQLPipelineSQLBuilder.java:
##########
@@ -85,6 +85,12 @@ private String buildConflictSQL(final DataRecord dataRecord) {
         return result.toString();
     }
     
+    @Override
+    public String buildEstimateCountSQL(final String databaseName, final String schemaName, final String tableName) {
+        // TODO Support estimate count later.
+        return buildCountSQL(schemaName, tableName);
+    }

Review Comment:
   It's not supported, so it should return `Optional<String>`



##########
kernel/data-pipeline/core/src/main/java/org/apache/shardingsphere/data/pipeline/core/prepare/InventoryTaskSplitter.java:
##########
@@ -171,12 +171,20 @@ private long getTableRecordsCount(final InventoryIncrementalJobItemContext jobIt
         PipelineJobConfiguration jobConfig = jobItemContext.getJobConfig();
         String schemaName = dumperConfig.getSchemaName(new LogicTableName(dumperConfig.getLogicTableName()));
         String actualTableName = dumperConfig.getActualTableName();
-        // TODO with a large amount of data, count the full table will have performance problem
-        String sql = PipelineTypedSPILoader.getDatabaseTypedService(PipelineSQLBuilder.class, jobConfig.getSourceDatabaseType()).buildCountSQL(schemaName, actualTableName);
+        PipelineSQLBuilder pipelineSQLBuilder = PipelineTypedSPILoader.getDatabaseTypedService(PipelineSQLBuilder.class, jobConfig.getSourceDatabaseType());
+        String sql = pipelineSQLBuilder.buildEstimateCountSQL(dumperConfig.getDataSourceName(), schemaName, actualTableName);
         try (
                 Connection connection = dataSource.getConnection();
                 PreparedStatement preparedStatement = connection.prepareStatement(sql)) {
+            long estimateCount;
             try (ResultSet resultSet = preparedStatement.executeQuery()) {
+                resultSet.next();
+                estimateCount = resultSet.getLong(1);
+            }
+            if (estimateCount > 0) {
+                return estimateCount;
+            }
+            try (ResultSet resultSet = connection.createStatement().executeQuery(pipelineSQLBuilder.buildCountSQL(schemaName, actualTableName))) {

Review Comment:
   1, `buildCountSQL` might cost too much time, so add some log for it, includes cost time.
   
   2, connection will return to pool but not closed, so will `connection.createStatement()` be closed automatically?
   



##########
kernel/data-pipeline/dialect/mysql/src/main/java/org/apache/shardingsphere/data/pipeline/mysql/sqlbuilder/MySQLPipelineSQLBuilder.java:
##########
@@ -89,6 +89,11 @@ public Optional<String> buildCRC32SQL(final String schemaName, final String tabl
         return Optional.of(String.format("SELECT BIT_XOR(CAST(CRC32(%s) AS UNSIGNED)) AS checksum, COUNT(1) AS cnt FROM %s", quote(column), quote(tableName)));
     }
     
+    @Override
+    public String buildEstimateCountSQL(final String databaseName, final String schemaName, final String tableName) {
+        return String.format("SELECT TABLE_ROWS from INFORMATION_SCHEMA.TABLES WHERE TABLE_SCHEMA = '%s' AND TABLE_NAME = '%s'", databaseName, getQualifiedTableName(schemaName, tableName));

Review Comment:
   `from` could be `FROM`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@shardingsphere.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org