You are viewing a plain text version of this content. The canonical link for it is here.

Posted to notifications@shardingsphere.apache.org by "azexcy (via GitHub)" <gi...@apache.org> on 2023/02/22 06:09:01 UTC

[GitHub] [shardingsphere] azexcy opened a new pull request, #24293: Improve table records count calculation in pipeline job

azexcy opened a new pull request, #24293:
URL: https://github.com/apache/shardingsphere/pull/24293

   Fixes #24291.
   
   Changes proposed in this pull request:
     - Improve table records count calculation in pipeline job
   
   ---
   
   Before committing this PR, I'm sure that I have checked the following options:
   - [ ] My code follows the [code of conduct](https://shardingsphere.apache.org/community/en/involved/conduct/code/) of this project.
   - [ ] I have self-reviewed the commit code.
   - [ ] I have (or in comment I request) added corresponding labels for the pull request.
   - [ ] I have passed maven check locally : `./mvnw clean install -B -T1C -Dmaven.javadoc.skip -Dmaven.jacoco.skip -e`.
   - [ ] I have made corresponding changes to the documentation.
   - [ ] I have added corresponding unit tests for my changes.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@shardingsphere.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [shardingsphere] sandynz merged pull request #24293: Improve table records count calculation in pipeline job for MySQL

Posted by "sandynz (via GitHub)" <gi...@apache.org>.

sandynz merged PR #24293:
URL: https://github.com/apache/shardingsphere/pull/24293


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@shardingsphere.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [shardingsphere] azexcy commented on a diff in pull request #24293: Improve table records count calculation in pipeline job for MySQL

Posted by "azexcy (via GitHub)" <gi...@apache.org>.

azexcy commented on code in PR #24293:
URL: https://github.com/apache/shardingsphere/pull/24293#discussion_r1113910440


##########
kernel/data-pipeline/dialect/postgresql/src/main/java/org/apache/shardingsphere/data/pipeline/postgresql/sqlbuilder/PostgreSQLPipelineSQLBuilder.java:
##########
@@ -85,6 +85,12 @@ private String buildConflictSQL(final DataRecord dataRecord) {
         return result.toString();
     }
     
+    @Override
+    public String buildEstimateCountSQL(final String databaseName, final String schemaName, final String tableName) {
+        // TODO Support estimate count later.
+        return buildCountSQL(schemaName, tableName);
+    }

Review Comment:
   Improved



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@shardingsphere.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [shardingsphere] codecov-commenter commented on pull request #24293: Improve table records count calculation in pipeline job for MySQL

Posted by "codecov-commenter (via GitHub)" <gi...@apache.org>.

codecov-commenter commented on PR #24293:
URL: https://github.com/apache/shardingsphere/pull/24293#issuecomment-1439567562

   # [Codecov](https://codecov.io/gh/apache/shardingsphere/pull/24293?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#24293](https://codecov.io/gh/apache/shardingsphere/pull/24293?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (78cc612) into [master](https://codecov.io/gh/apache/shardingsphere/commit/c42e1a33f7d7718ade1aaa6391ab6ff3b94146c0?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (c42e1a3) will **decrease** coverage by `0.02%`.
   > The diff coverage is `8.00%`.
   
   > :exclamation: Current head 78cc612 differs from pull request most recent head e149781. Consider uploading reports for the commit e149781 to get more accurate results
   
   ```diff
   @@             Coverage Diff              @@
   ##             master   #24293      +/-   ##
   ============================================
   - Coverage     49.97%   49.95%   -0.02%     
     Complexity     1573     1573              
   ============================================
     Files          3253     3253              
     Lines         53440    53454      +14     
     Branches       9854     9856       +2     
   ============================================
   - Hits          26706    26705       -1     
   - Misses        24365    24380      +15     
     Partials       2369     2369              
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/shardingsphere/pull/24293?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [...ta/pipeline/spi/sqlbuilder/PipelineSQLBuilder.java](https://codecov.io/gh/apache/shardingsphere/pull/24293?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-a2VybmVsL2RhdGEtcGlwZWxpbmUvYXBpL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9zaGFyZGluZ3NwaGVyZS9kYXRhL3BpcGVsaW5lL3NwaS9zcWxidWlsZGVyL1BpcGVsaW5lU1FMQnVpbGRlci5qYXZh) | `0.00% <ø> (ø)` | |
   | [...a/pipeline/core/prepare/InventoryTaskSplitter.java](https://codecov.io/gh/apache/shardingsphere/pull/24293?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-a2VybmVsL2RhdGEtcGlwZWxpbmUvY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvc2hhcmRpbmdzcGhlcmUvZGF0YS9waXBlbGluZS9jb3JlL3ByZXBhcmUvSW52ZW50b3J5VGFza1NwbGl0dGVyLmphdmE=) | `0.00% <0.00%> (ø)` | |
   | [...ngauss/sqlbuilder/OpenGaussPipelineSQLBuilder.java](https://codecov.io/gh/apache/shardingsphere/pull/24293?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-a2VybmVsL2RhdGEtcGlwZWxpbmUvZGlhbGVjdC9vcGVuZ2F1c3Mvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL3NoYXJkaW5nc3BoZXJlL2RhdGEvcGlwZWxpbmUvb3BlbmdhdXNzL3NxbGJ1aWxkZXIvT3BlbkdhdXNzUGlwZWxpbmVTUUxCdWlsZGVyLmphdmE=) | `88.88% <0.00%> (-5.23%)` | :arrow_down: |
   | [...resql/sqlbuilder/PostgreSQLPipelineSQLBuilder.java](https://codecov.io/gh/apache/shardingsphere/pull/24293?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-a2VybmVsL2RhdGEtcGlwZWxpbmUvZGlhbGVjdC9wb3N0Z3Jlc3FsL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9zaGFyZGluZ3NwaGVyZS9kYXRhL3BpcGVsaW5lL3Bvc3RncmVzcWwvc3FsYnVpbGRlci9Qb3N0Z3JlU1FMUGlwZWxpbmVTUUxCdWlsZGVyLmphdmE=) | `95.45% <0.00%> (-4.55%)` | :arrow_down: |
   | [...e/scenario/migration/api/impl/MigrationJobAPI.java](https://codecov.io/gh/apache/shardingsphere/pull/24293?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-a2VybmVsL2RhdGEtcGlwZWxpbmUvc2NlbmFyaW8vbWlncmF0aW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9zaGFyZGluZ3NwaGVyZS9kYXRhL3BpcGVsaW5lL3NjZW5hcmlvL21pZ3JhdGlvbi9hcGkvaW1wbC9NaWdyYXRpb25Kb2JBUEkuamF2YQ==) | `0.00% <0.00%> (ø)` | |
   | [...line/mysql/sqlbuilder/MySQLPipelineSQLBuilder.java](https://codecov.io/gh/apache/shardingsphere/pull/24293?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-a2VybmVsL2RhdGEtcGlwZWxpbmUvZGlhbGVjdC9teXNxbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvc2hhcmRpbmdzcGhlcmUvZGF0YS9waXBlbGluZS9teXNxbC9zcWxidWlsZGVyL015U1FMUGlwZWxpbmVTUUxCdWlsZGVyLmphdmE=) | `90.00% <100.00%> (+1.11%)` | :arrow_up: |
   | [...handler/distsql/ral/hint/enums/HintSourceType.java](https://codecov.io/gh/apache/shardingsphere/pull/24293?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cHJveHkvYmFja2VuZC9jb3JlL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9zaGFyZGluZ3NwaGVyZS9wcm94eS9iYWNrZW5kL2hhbmRsZXIvZGlzdHNxbC9yYWwvaGludC9lbnVtcy9IaW50U291cmNlVHlwZS5qYXZh) | `0.00% <0.00%> (-42.86%)` | :arrow_down: |
   
   :mega: We’re building smart automated test selection to slash your CI/CD build times. [Learn more](https://about.codecov.io/iterative-testing/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@shardingsphere.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [shardingsphere] sandynz commented on a diff in pull request #24293: Improve table records count calculation in pipeline job

Posted by "sandynz (via GitHub)" <gi...@apache.org>.

sandynz commented on code in PR #24293:
URL: https://github.com/apache/shardingsphere/pull/24293#discussion_r1113898689


##########
kernel/data-pipeline/api/src/main/java/org/apache/shardingsphere/data/pipeline/spi/sqlbuilder/PipelineSQLBuilder.java:
##########
@@ -135,6 +135,16 @@ default Optional<String> buildCreateSchemaSQL(String schemaName) {
     // TODO keep it for now, it might be used later
     String buildCountSQL(String schemaName, String tableName);
     
+    /**
+     * Build Estimate count SQL.
+     *
+     * @param databaseName database name
+     * @param schemaName schema name
+     * @param tableName table name
+     * @return estimate count sql
+     */
+    String buildEstimateCountSQL(String databaseName, String schemaName, String tableName);

Review Comment:
   `buildEstimateCountSQL` could be `buildEstimatedCountSQL`



##########
kernel/data-pipeline/core/src/main/java/org/apache/shardingsphere/data/pipeline/core/prepare/InventoryTaskSplitter.java:
##########
@@ -171,12 +171,20 @@ private long getTableRecordsCount(final InventoryIncrementalJobItemContext jobIt
         PipelineJobConfiguration jobConfig = jobItemContext.getJobConfig();
         String schemaName = dumperConfig.getSchemaName(new LogicTableName(dumperConfig.getLogicTableName()));
         String actualTableName = dumperConfig.getActualTableName();
-        // TODO with a large amount of data, count the full table will have performance problem
-        String sql = PipelineTypedSPILoader.getDatabaseTypedService(PipelineSQLBuilder.class, jobConfig.getSourceDatabaseType()).buildCountSQL(schemaName, actualTableName);
+        PipelineSQLBuilder pipelineSQLBuilder = PipelineTypedSPILoader.getDatabaseTypedService(PipelineSQLBuilder.class, jobConfig.getSourceDatabaseType());
+        String sql = pipelineSQLBuilder.buildEstimateCountSQL(dumperConfig.getDataSourceName(), schemaName, actualTableName);
         try (
                 Connection connection = dataSource.getConnection();
                 PreparedStatement preparedStatement = connection.prepareStatement(sql)) {
+            long estimateCount;

Review Comment:
   `estimateCount` could be `result`



##########
kernel/data-pipeline/dialect/postgresql/src/main/java/org/apache/shardingsphere/data/pipeline/postgresql/sqlbuilder/PostgreSQLPipelineSQLBuilder.java:
##########
@@ -85,6 +85,12 @@ private String buildConflictSQL(final DataRecord dataRecord) {
         return result.toString();
     }
     
+    @Override
+    public String buildEstimateCountSQL(final String databaseName, final String schemaName, final String tableName) {
+        // TODO Support estimate count later.
+        return buildCountSQL(schemaName, tableName);
+    }

Review Comment:
   It's not supported, so it should return `Optional<String>`



##########
kernel/data-pipeline/core/src/main/java/org/apache/shardingsphere/data/pipeline/core/prepare/InventoryTaskSplitter.java:
##########
@@ -171,12 +171,20 @@ private long getTableRecordsCount(final InventoryIncrementalJobItemContext jobIt
         PipelineJobConfiguration jobConfig = jobItemContext.getJobConfig();
         String schemaName = dumperConfig.getSchemaName(new LogicTableName(dumperConfig.getLogicTableName()));
         String actualTableName = dumperConfig.getActualTableName();
-        // TODO with a large amount of data, count the full table will have performance problem
-        String sql = PipelineTypedSPILoader.getDatabaseTypedService(PipelineSQLBuilder.class, jobConfig.getSourceDatabaseType()).buildCountSQL(schemaName, actualTableName);
+        PipelineSQLBuilder pipelineSQLBuilder = PipelineTypedSPILoader.getDatabaseTypedService(PipelineSQLBuilder.class, jobConfig.getSourceDatabaseType());
+        String sql = pipelineSQLBuilder.buildEstimateCountSQL(dumperConfig.getDataSourceName(), schemaName, actualTableName);
         try (
                 Connection connection = dataSource.getConnection();
                 PreparedStatement preparedStatement = connection.prepareStatement(sql)) {
+            long estimateCount;
             try (ResultSet resultSet = preparedStatement.executeQuery()) {
+                resultSet.next();
+                estimateCount = resultSet.getLong(1);
+            }
+            if (estimateCount > 0) {
+                return estimateCount;
+            }
+            try (ResultSet resultSet = connection.createStatement().executeQuery(pipelineSQLBuilder.buildCountSQL(schemaName, actualTableName))) {

Review Comment:
   1, `buildCountSQL` might cost too much time, so add some log for it, includes cost time.
   
   2, connection will return to pool but not closed, so will `connection.createStatement()` be closed automatically?
   



##########
kernel/data-pipeline/dialect/mysql/src/main/java/org/apache/shardingsphere/data/pipeline/mysql/sqlbuilder/MySQLPipelineSQLBuilder.java:
##########
@@ -89,6 +89,11 @@ public Optional<String> buildCRC32SQL(final String schemaName, final String tabl
         return Optional.of(String.format("SELECT BIT_XOR(CAST(CRC32(%s) AS UNSIGNED)) AS checksum, COUNT(1) AS cnt FROM %s", quote(column), quote(tableName)));
     }
     
+    @Override
+    public String buildEstimateCountSQL(final String databaseName, final String schemaName, final String tableName) {
+        return String.format("SELECT TABLE_ROWS from INFORMATION_SCHEMA.TABLES WHERE TABLE_SCHEMA = '%s' AND TABLE_NAME = '%s'", databaseName, getQualifiedTableName(schemaName, tableName));

Review Comment:
   `from` could be `FROM`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@shardingsphere.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [shardingsphere] sandynz commented on a diff in pull request #24293: Improve table records count calculation in pipeline job for MySQL

Posted by "sandynz (via GitHub)" <gi...@apache.org>.

sandynz commented on code in PR #24293:
URL: https://github.com/apache/shardingsphere/pull/24293#discussion_r1114028641


##########
kernel/data-pipeline/core/src/main/java/org/apache/shardingsphere/data/pipeline/core/prepare/InventoryTaskSplitter.java:
##########
@@ -171,21 +172,48 @@ private long getTableRecordsCount(final InventoryIncrementalJobItemContext jobIt
         PipelineJobConfiguration jobConfig = jobItemContext.getJobConfig();
         String schemaName = dumperConfig.getSchemaName(new LogicTableName(dumperConfig.getLogicTableName()));
         String actualTableName = dumperConfig.getActualTableName();
-        // TODO with a large amount of data, count the full table will have performance problem
-        String sql = PipelineTypedSPILoader.getDatabaseTypedService(PipelineSQLBuilder.class, jobConfig.getSourceDatabaseType()).buildCountSQL(schemaName, actualTableName);
+        PipelineSQLBuilder pipelineSQLBuilder = PipelineTypedSPILoader.getDatabaseTypedService(PipelineSQLBuilder.class, jobConfig.getSourceDatabaseType());
+        Optional<String> estimatedCountSQL = pipelineSQLBuilder.buildEstimatedCountSQL(schemaName, actualTableName);

Review Comment:
   `estimatedCountSQL` could be `sql`



##########
kernel/data-pipeline/core/src/main/java/org/apache/shardingsphere/data/pipeline/core/prepare/InventoryTaskSplitter.java:
##########
@@ -171,21 +172,48 @@ private long getTableRecordsCount(final InventoryIncrementalJobItemContext jobIt
         PipelineJobConfiguration jobConfig = jobItemContext.getJobConfig();
         String schemaName = dumperConfig.getSchemaName(new LogicTableName(dumperConfig.getLogicTableName()));
         String actualTableName = dumperConfig.getActualTableName();
-        // TODO with a large amount of data, count the full table will have performance problem
-        String sql = PipelineTypedSPILoader.getDatabaseTypedService(PipelineSQLBuilder.class, jobConfig.getSourceDatabaseType()).buildCountSQL(schemaName, actualTableName);
+        PipelineSQLBuilder pipelineSQLBuilder = PipelineTypedSPILoader.getDatabaseTypedService(PipelineSQLBuilder.class, jobConfig.getSourceDatabaseType());
+        Optional<String> estimatedCountSQL = pipelineSQLBuilder.buildEstimatedCountSQL(schemaName, actualTableName);
+        try {
+            if (estimatedCountSQL.isPresent()) {
+                long estimatedCount = getEstimatedCountSQLResult(dataSource, estimatedCountSQL.get());
+                return estimatedCount > 0 ? estimatedCount : getCountSQLResult(dataSource, pipelineSQLBuilder.buildCountSQL(schemaName, actualTableName));
+            }
+            return getCountSQLResult(dataSource, pipelineSQLBuilder.buildCountSQL(schemaName, actualTableName));
+        } catch (final SQLException ex) {
+            String uniqueKey = dumperConfig.hasUniqueKey() ? dumperConfig.getUniqueKeyColumns().get(0).getName() : "";
+            throw new SplitPipelineJobByUniqueKeyException(dumperConfig.getActualTableName(), uniqueKey, ex);
+        }
+    }
+    
+    // TODO maybe need refactor after PostgreSQL support estimated count.
+    private long getEstimatedCountSQLResult(final DataSource dataSource, final String estimatedCountSQL) throws SQLException {

Review Comment:
   `getEstimatedCountSQLResult` could be `getEstimatedCount`.
   
   And also `getCountSQLResult` could be `getCount`.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@shardingsphere.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org