You are viewing a plain text version of this content. The canonical link for it is here.
Posted to gitbox@hive.apache.org by "wecharyu (via GitHub)" <gi...@apache.org> on 2023/04/14 18:04:14 UTC

[GitHub] [hive] wecharyu opened a new pull request, #4238: HIVE-27266: Retrieve only partNames if not need drop data in HMSHandler.dropPartitionsAndGetLocations

wecharyu opened a new pull request, #4238:
URL: https://github.com/apache/hive/pull/4238

   ### What changes were proposed in this pull request?
   A small improvement of `HMSHandler.dropPartitionsAndGetLocations` , retrieve only partNames rather than partName and location pairs if we do not need check location.
   
   
   ### Why are the changes needed?
   Performance improvement, especially when the table partition number is large.
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   
   ### How was this patch tested?
   1. pass all existing test
   2. add a new benchmark test **dropTableMetadataWithPartitions**
   
   - Before this patch
   ```bash
   Operation                      Mean     Med      Min      Max      Err%
   dropTableMetaOnlyWithPartitions.10 23.70    21.87    19.36    31.73    14.48
   dropTableMetaOnlyWithPartitions.100 54.42    54.15    45.92    76.68    8.891
   dropTableMetaOnlyWithPartitions.1000 462.5    456.1    321.0    654.3    15.96
   ```
   - After this patch
   ```bash
   Operation                      Mean     Med      Min      Max      Err%
   dropTableMetaOnlyWithPartitions.10 21.49    21.24    19.30    27.90    6.661
   dropTableMetaOnlyWithPartitions.100 51.51    48.30    44.86    85.23    16.91
   dropTableMetaOnlyWithPartitions.1000 415.4    407.2    308.8    595.2    14.28
   ```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] deniskuzZ merged pull request #4238: HIVE-27266: Retrieve only partNames if not need drop data in HMSHandler.dropPartitionsAndGetLocations

Posted by "deniskuzZ (via GitHub)" <gi...@apache.org>.
deniskuzZ merged PR #4238:
URL: https://github.com/apache/hive/pull/4238


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] wecharyu commented on a diff in pull request #4238: HIVE-27266: Retrieve only partNames if not need drop data in HMSHandler.dropPartitionsAndGetLocations

Posted by "wecharyu (via GitHub)" <gi...@apache.org>.
wecharyu commented on code in PR #4238:
URL: https://github.com/apache/hive/pull/4238#discussion_r1167357919


##########
standalone-metastore/metastore-tools/metastore-benchmarks/src/main/java/org/apache/hadoop/hive/metastore/tools/HMSBenchmarks.java:
##########
@@ -123,6 +123,31 @@ static DescriptiveStatistics benchmarkDeleteWithPartitions(@NotNull MicroBenchma
         null);
   }
 
+  static DescriptiveStatistics benchmarkDeleteMetaOnlyWithPartitions(@NotNull MicroBenchmark bench,
+                                                                        @NotNull BenchData data,
+                                                                        int howMany,
+                                                                        int nparams) {
+    final HMSClient client = data.getClient();
+    String dbName = data.dbName;
+    String tableName = data.tableName;
+
+    // Create many parameters
+    Map<String, String> parameters = new HashMap<>(nparams);
+    for (int i = 0; i < nparams; i++) {
+      parameters.put(PARAM_KEY + i, PARAM_VALUE + i);
+    }
+
+    return bench.measure(
+            () -> throwingSupplierWrapper(() -> {
+              BenchmarkUtils.createPartitionedTable(client, dbName, tableName);
+              addManyPartitions(client, dbName, tableName, parameters,
+                      Collections.singletonList("d"), howMany);

Review Comment:
   It's a prefix of the partition value, in this benchmark the created partitions in mysql looks as follows:
   ```sql
   mysql> select * from PARTITIONS limit 3;
   +---------+-------------+------------------+-----------+--------+--------+----------+
   | PART_ID | CREATE_TIME | LAST_ACCESS_TIME | PART_NAME | SD_ID  | TBL_ID | WRITE_ID |
   +---------+-------------+------------------+-----------+--------+--------+----------+
   |  301296 |  1681524241 |                0 | date=d0   | 302595 |   1299 |        0 |
   |  301297 |  1681524241 |                0 | date=d1   | 302596 |   1299 |        0 |
   |  301298 |  1681524241 |                0 | date=d2   | 302597 |   1299 |        0 |
   +---------+-------------+------------------+-----------+--------+--------+----------+
   3 rows in set (0.00 sec)
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] sonarcloud[bot] commented on pull request #4238: HIVE-27266: Retrieve only partNames if not need drop data in HMSHandler.dropPartitionsAndGetLocations

Posted by "sonarcloud[bot] (via GitHub)" <gi...@apache.org>.
sonarcloud[bot] commented on PR #4238:
URL: https://github.com/apache/hive/pull/4238#issuecomment-1509242730

   Kudos, SonarCloud Quality Gate passed!&nbsp; &nbsp; [![Quality Gate passed](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/QualityGateBadge/passed-16px.png 'Quality Gate passed')](https://sonarcloud.io/dashboard?id=apache_hive&pullRequest=4238)
   
   [![Bug](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/bug-16px.png 'Bug')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4238&resolved=false&types=BUG) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4238&resolved=false&types=BUG) [0 Bugs](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4238&resolved=false&types=BUG)  
   [![Vulnerability](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/vulnerability-16px.png 'Vulnerability')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4238&resolved=false&types=VULNERABILITY) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4238&resolved=false&types=VULNERABILITY) [0 Vulnerabilities](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4238&resolved=false&types=VULNERABILITY)  
   [![Security Hotspot](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/security_hotspot-16px.png 'Security Hotspot')](https://sonarcloud.io/project/security_hotspots?id=apache_hive&pullRequest=4238&resolved=false&types=SECURITY_HOTSPOT) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/security_hotspots?id=apache_hive&pullRequest=4238&resolved=false&types=SECURITY_HOTSPOT) [0 Security Hotspots](https://sonarcloud.io/project/security_hotspots?id=apache_hive&pullRequest=4238&resolved=false&types=SECURITY_HOTSPOT)  
   [![Code Smell](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/code_smell-16px.png 'Code Smell')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4238&resolved=false&types=CODE_SMELL) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4238&resolved=false&types=CODE_SMELL) [0 Code Smells](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4238&resolved=false&types=CODE_SMELL)
   
   [![No Coverage information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/CoverageChart/NoCoverageInfo-16px.png 'No Coverage information')](https://sonarcloud.io/component_measures?id=apache_hive&pullRequest=4238&metric=coverage&view=list) No Coverage information  
   [![No Duplication information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/Duplications/NoDuplicationInfo-16px.png 'No Duplication information')](https://sonarcloud.io/component_measures?id=apache_hive&pullRequest=4238&metric=duplicated_lines_density&view=list) No Duplication information
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] TuroczyX commented on a diff in pull request #4238: HIVE-27266: Retrieve only partNames if not need drop data in HMSHandler.dropPartitionsAndGetLocations

Posted by "TuroczyX (via GitHub)" <gi...@apache.org>.
TuroczyX commented on code in PR #4238:
URL: https://github.com/apache/hive/pull/4238#discussion_r1167197968


##########
standalone-metastore/metastore-tools/metastore-benchmarks/src/main/java/org/apache/hadoop/hive/metastore/tools/HMSBenchmarks.java:
##########
@@ -123,6 +123,31 @@ static DescriptiveStatistics benchmarkDeleteWithPartitions(@NotNull MicroBenchma
         null);
   }
 
+  static DescriptiveStatistics benchmarkDeleteMetaOnlyWithPartitions(@NotNull MicroBenchmark bench,
+                                                                        @NotNull BenchData data,
+                                                                        int howMany,
+                                                                        int nparams) {
+    final HMSClient client = data.getClient();
+    String dbName = data.dbName;
+    String tableName = data.tableName;
+
+    // Create many parameters
+    Map<String, String> parameters = new HashMap<>(nparams);
+    for (int i = 0; i < nparams; i++) {
+      parameters.put(PARAM_KEY + i, PARAM_VALUE + i);
+    }
+
+    return bench.measure(
+            () -> throwingSupplierWrapper(() -> {
+              BenchmarkUtils.createPartitionedTable(client, dbName, tableName);
+              addManyPartitions(client, dbName, tableName, parameters,
+                      Collections.singletonList("d"), howMany);

Review Comment:
   Sorry for asking dumb question. What this "d" means in the Collection.singleonList?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] wecharyu commented on pull request #4238: HIVE-27266: Retrieve only partNames if not need drop data in HMSHandler.dropPartitionsAndGetLocations

Posted by "wecharyu (via GitHub)" <gi...@apache.org>.
wecharyu commented on PR #4238:
URL: https://github.com/apache/hive/pull/4238#issuecomment-1517786250

   @deniskuzZ @pvary @veghlaci05 : Could you also help review this PR?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org