You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@shardingsphere.apache.org by "sandynz (via GitHub)" <gi...@apache.org> on 2023/02/22 09:09:56 UTC
[GitHub] [shardingsphere] sandynz commented on a diff in pull request #24293: Improve table records count calculation in pipeline job for MySQL

sandynz commented on code in PR #24293:
URL: https://github.com/apache/shardingsphere/pull/24293#discussion_r1114028641


##########
kernel/data-pipeline/core/src/main/java/org/apache/shardingsphere/data/pipeline/core/prepare/InventoryTaskSplitter.java:
##########
@@ -171,21 +172,48 @@ private long getTableRecordsCount(final InventoryIncrementalJobItemContext jobIt
         PipelineJobConfiguration jobConfig = jobItemContext.getJobConfig();
         String schemaName = dumperConfig.getSchemaName(new LogicTableName(dumperConfig.getLogicTableName()));
         String actualTableName = dumperConfig.getActualTableName();
-        // TODO with a large amount of data, count the full table will have performance problem
-        String sql = PipelineTypedSPILoader.getDatabaseTypedService(PipelineSQLBuilder.class, jobConfig.getSourceDatabaseType()).buildCountSQL(schemaName, actualTableName);
+        PipelineSQLBuilder pipelineSQLBuilder = PipelineTypedSPILoader.getDatabaseTypedService(PipelineSQLBuilder.class, jobConfig.getSourceDatabaseType());
+        Optional<String> estimatedCountSQL = pipelineSQLBuilder.buildEstimatedCountSQL(schemaName, actualTableName);

Review Comment:
   `estimatedCountSQL` could be `sql`



##########
kernel/data-pipeline/core/src/main/java/org/apache/shardingsphere/data/pipeline/core/prepare/InventoryTaskSplitter.java:
##########
@@ -171,21 +172,48 @@ private long getTableRecordsCount(final InventoryIncrementalJobItemContext jobIt
         PipelineJobConfiguration jobConfig = jobItemContext.getJobConfig();
         String schemaName = dumperConfig.getSchemaName(new LogicTableName(dumperConfig.getLogicTableName()));
         String actualTableName = dumperConfig.getActualTableName();
-        // TODO with a large amount of data, count the full table will have performance problem
-        String sql = PipelineTypedSPILoader.getDatabaseTypedService(PipelineSQLBuilder.class, jobConfig.getSourceDatabaseType()).buildCountSQL(schemaName, actualTableName);
+        PipelineSQLBuilder pipelineSQLBuilder = PipelineTypedSPILoader.getDatabaseTypedService(PipelineSQLBuilder.class, jobConfig.getSourceDatabaseType());
+        Optional<String> estimatedCountSQL = pipelineSQLBuilder.buildEstimatedCountSQL(schemaName, actualTableName);
+        try {
+            if (estimatedCountSQL.isPresent()) {
+                long estimatedCount = getEstimatedCountSQLResult(dataSource, estimatedCountSQL.get());
+                return estimatedCount > 0 ? estimatedCount : getCountSQLResult(dataSource, pipelineSQLBuilder.buildCountSQL(schemaName, actualTableName));
+            }
+            return getCountSQLResult(dataSource, pipelineSQLBuilder.buildCountSQL(schemaName, actualTableName));
+        } catch (final SQLException ex) {
+            String uniqueKey = dumperConfig.hasUniqueKey() ? dumperConfig.getUniqueKeyColumns().get(0).getName() : "";
+            throw new SplitPipelineJobByUniqueKeyException(dumperConfig.getActualTableName(), uniqueKey, ex);
+        }
+    }
+    
+    // TODO maybe need refactor after PostgreSQL support estimated count.
+    private long getEstimatedCountSQLResult(final DataSource dataSource, final String estimatedCountSQL) throws SQLException {

Review Comment:
   `getEstimatedCountSQLResult` could be `getEstimatedCount`.
   
   And also `getCountSQLResult` could be `getCount`.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@shardingsphere.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org