You are viewing a plain text version of this content. The canonical link for it is here.
Posted to gitbox@hive.apache.org by "wecharyu (via GitHub)" <gi...@apache.org> on 2023/04/14 18:04:14 UTC
[GitHub] [hive] wecharyu opened a new pull request, #4238: HIVE-27266: Retrieve only partNames if not need drop data in HMSHandler.dropPartitionsAndGetLocations
wecharyu opened a new pull request, #4238:
URL: https://github.com/apache/hive/pull/4238
### What changes were proposed in this pull request?
A small improvement of `HMSHandler.dropPartitionsAndGetLocations` , retrieve only partNames rather than partName and location pairs if we do not need check location.
### Why are the changes needed?
Performance improvement, especially when the table partition number is large.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
1. pass all existing test
2. add a new benchmark test **dropTableMetadataWithPartitions**
- Before this patch
```bash
Operation Mean Med Min Max Err%
dropTableMetaOnlyWithPartitions.10 23.70 21.87 19.36 31.73 14.48
dropTableMetaOnlyWithPartitions.100 54.42 54.15 45.92 76.68 8.891
dropTableMetaOnlyWithPartitions.1000 462.5 456.1 321.0 654.3 15.96
```
- After this patch
```bash
Operation Mean Med Min Max Err%
dropTableMetaOnlyWithPartitions.10 21.49 21.24 19.30 27.90 6.661
dropTableMetaOnlyWithPartitions.100 51.51 48.30 44.86 85.23 16.91
dropTableMetaOnlyWithPartitions.1000 415.4 407.2 308.8 595.2 14.28
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org
[GitHub] [hive] deniskuzZ merged pull request #4238: HIVE-27266: Retrieve only partNames if not need drop data in HMSHandler.dropPartitionsAndGetLocations
Posted by "deniskuzZ (via GitHub)" <gi...@apache.org>.
deniskuzZ merged PR #4238:
URL: https://github.com/apache/hive/pull/4238
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org
[GitHub] [hive] wecharyu commented on a diff in pull request #4238: HIVE-27266: Retrieve only partNames if not need drop data in HMSHandler.dropPartitionsAndGetLocations
Posted by "wecharyu (via GitHub)" <gi...@apache.org>.
wecharyu commented on code in PR #4238:
URL: https://github.com/apache/hive/pull/4238#discussion_r1167357919
##########
standalone-metastore/metastore-tools/metastore-benchmarks/src/main/java/org/apache/hadoop/hive/metastore/tools/HMSBenchmarks.java:
##########
@@ -123,6 +123,31 @@ static DescriptiveStatistics benchmarkDeleteWithPartitions(@NotNull MicroBenchma
null);
}
+ static DescriptiveStatistics benchmarkDeleteMetaOnlyWithPartitions(@NotNull MicroBenchmark bench,
+ @NotNull BenchData data,
+ int howMany,
+ int nparams) {
+ final HMSClient client = data.getClient();
+ String dbName = data.dbName;
+ String tableName = data.tableName;
+
+ // Create many parameters
+ Map<String, String> parameters = new HashMap<>(nparams);
+ for (int i = 0; i < nparams; i++) {
+ parameters.put(PARAM_KEY + i, PARAM_VALUE + i);
+ }
+
+ return bench.measure(
+ () -> throwingSupplierWrapper(() -> {
+ BenchmarkUtils.createPartitionedTable(client, dbName, tableName);
+ addManyPartitions(client, dbName, tableName, parameters,
+ Collections.singletonList("d"), howMany);
Review Comment:
It's a prefix of the partition value, in this benchmark the created partitions in mysql looks as follows:
```sql
mysql> select * from PARTITIONS limit 3;
+---------+-------------+------------------+-----------+--------+--------+----------+
| PART_ID | CREATE_TIME | LAST_ACCESS_TIME | PART_NAME | SD_ID | TBL_ID | WRITE_ID |
+---------+-------------+------------------+-----------+--------+--------+----------+
| 301296 | 1681524241 | 0 | date=d0 | 302595 | 1299 | 0 |
| 301297 | 1681524241 | 0 | date=d1 | 302596 | 1299 | 0 |
| 301298 | 1681524241 | 0 | date=d2 | 302597 | 1299 | 0 |
+---------+-------------+------------------+-----------+--------+--------+----------+
3 rows in set (0.00 sec)
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org
[GitHub] [hive] sonarcloud[bot] commented on pull request #4238: HIVE-27266: Retrieve only partNames if not need drop data in HMSHandler.dropPartitionsAndGetLocations
Posted by "sonarcloud[bot] (via GitHub)" <gi...@apache.org>.
sonarcloud[bot] commented on PR #4238:
URL: https://github.com/apache/hive/pull/4238#issuecomment-1509242730
Kudos, SonarCloud Quality Gate passed! [![Quality Gate passed](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/QualityGateBadge/passed-16px.png 'Quality Gate passed')](https://sonarcloud.io/dashboard?id=apache_hive&pullRequest=4238)
[![Bug](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/bug-16px.png 'Bug')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4238&resolved=false&types=BUG) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4238&resolved=false&types=BUG) [0 Bugs](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4238&resolved=false&types=BUG)
[![Vulnerability](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/vulnerability-16px.png 'Vulnerability')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4238&resolved=false&types=VULNERABILITY) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4238&resolved=false&types=VULNERABILITY) [0 Vulnerabilities](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4238&resolved=false&types=VULNERABILITY)
[![Security Hotspot](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/security_hotspot-16px.png 'Security Hotspot')](https://sonarcloud.io/project/security_hotspots?id=apache_hive&pullRequest=4238&resolved=false&types=SECURITY_HOTSPOT) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/security_hotspots?id=apache_hive&pullRequest=4238&resolved=false&types=SECURITY_HOTSPOT) [0 Security Hotspots](https://sonarcloud.io/project/security_hotspots?id=apache_hive&pullRequest=4238&resolved=false&types=SECURITY_HOTSPOT)
[![Code Smell](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/code_smell-16px.png 'Code Smell')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4238&resolved=false&types=CODE_SMELL) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4238&resolved=false&types=CODE_SMELL) [0 Code Smells](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4238&resolved=false&types=CODE_SMELL)
[![No Coverage information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/CoverageChart/NoCoverageInfo-16px.png 'No Coverage information')](https://sonarcloud.io/component_measures?id=apache_hive&pullRequest=4238&metric=coverage&view=list) No Coverage information
[![No Duplication information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/Duplications/NoDuplicationInfo-16px.png 'No Duplication information')](https://sonarcloud.io/component_measures?id=apache_hive&pullRequest=4238&metric=duplicated_lines_density&view=list) No Duplication information
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org
[GitHub] [hive] TuroczyX commented on a diff in pull request #4238: HIVE-27266: Retrieve only partNames if not need drop data in HMSHandler.dropPartitionsAndGetLocations
Posted by "TuroczyX (via GitHub)" <gi...@apache.org>.
TuroczyX commented on code in PR #4238:
URL: https://github.com/apache/hive/pull/4238#discussion_r1167197968
##########
standalone-metastore/metastore-tools/metastore-benchmarks/src/main/java/org/apache/hadoop/hive/metastore/tools/HMSBenchmarks.java:
##########
@@ -123,6 +123,31 @@ static DescriptiveStatistics benchmarkDeleteWithPartitions(@NotNull MicroBenchma
null);
}
+ static DescriptiveStatistics benchmarkDeleteMetaOnlyWithPartitions(@NotNull MicroBenchmark bench,
+ @NotNull BenchData data,
+ int howMany,
+ int nparams) {
+ final HMSClient client = data.getClient();
+ String dbName = data.dbName;
+ String tableName = data.tableName;
+
+ // Create many parameters
+ Map<String, String> parameters = new HashMap<>(nparams);
+ for (int i = 0; i < nparams; i++) {
+ parameters.put(PARAM_KEY + i, PARAM_VALUE + i);
+ }
+
+ return bench.measure(
+ () -> throwingSupplierWrapper(() -> {
+ BenchmarkUtils.createPartitionedTable(client, dbName, tableName);
+ addManyPartitions(client, dbName, tableName, parameters,
+ Collections.singletonList("d"), howMany);
Review Comment:
Sorry for asking dumb question. What this "d" means in the Collection.singleonList?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org
[GitHub] [hive] wecharyu commented on pull request #4238: HIVE-27266: Retrieve only partNames if not need drop data in HMSHandler.dropPartitionsAndGetLocations
Posted by "wecharyu (via GitHub)" <gi...@apache.org>.
wecharyu commented on PR #4238:
URL: https://github.com/apache/hive/pull/4238#issuecomment-1517786250
@deniskuzZ @pvary @veghlaci05 : Could you also help review this PR?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org