You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by "swaminathanmanish (via GitHub)" <gi...@apache.org> on 2023/11/03 16:18:40 UTC
[PR] Adding ability to fetch metadata for specific list of segments [pinot]
swaminathanmanish opened a new pull request, #11949:
URL: https://github.com/apache/pinot/pull/11949
**Problem**: Getting metadata for all segments leads to high heap utilization and even controller OOMs (especially when we have large number of segments and small controller heap). Typically we dont need metadata for all segments at once and this can be done in batches.
Adding an API to fetch metadata for a list of segments. Also unified all APIs to use the same internal function. Did not modify existing APIs for backwards compatiblity.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
Re: [PR] Adding ability to fetch metadata for specific list of segments [pinot]
Posted by "codecov-commenter (via GitHub)" <gi...@apache.org>.
codecov-commenter commented on PR #11949:
URL: https://github.com/apache/pinot/pull/11949#issuecomment-1792844057
## [Codecov](https://app.codecov.io/gh/apache/pinot/pull/11949?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) Report
> Merging [#11949](https://app.codecov.io/gh/apache/pinot/pull/11949?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) (03c2001) into [master](https://app.codecov.io/gh/apache/pinot/commit/53e80331a4e8cb1593e01b8133a6d90aa39cef22?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) (53e8033) will **decrease** coverage by `0.04%`.
> Report is 3 commits behind head on master.
> The diff coverage is `0.00%`.
```diff
@@ Coverage Diff @@
## master #11949 +/- ##
============================================
- Coverage 61.30% 61.26% -0.04%
+ Complexity 1140 1139 -1
============================================
Files 2378 2378
Lines 128865 128887 +22
Branches 19927 19929 +2
============================================
- Hits 78998 78960 -38
- Misses 44147 44198 +51
- Partials 5720 5729 +9
```
| [Flag](https://app.codecov.io/gh/apache/pinot/pull/11949/flags?src=pr&el=flags&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | Coverage Δ | |
|---|---|---|
| [custom-integration1](https://app.codecov.io/gh/apache/pinot/pull/11949/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `<0.01% <0.00%> (-0.01%)` | :arrow_down: |
| [integration](https://app.codecov.io/gh/apache/pinot/pull/11949/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `<0.01% <0.00%> (-0.01%)` | :arrow_down: |
| [integration1](https://app.codecov.io/gh/apache/pinot/pull/11949/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `<0.01% <0.00%> (-0.01%)` | :arrow_down: |
| [integration2](https://app.codecov.io/gh/apache/pinot/pull/11949/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `0.00% <0.00%> (?)` | |
| [java-11](https://app.codecov.io/gh/apache/pinot/pull/11949/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `<0.01% <0.00%> (-0.01%)` | :arrow_down: |
| [java-21](https://app.codecov.io/gh/apache/pinot/pull/11949/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `61.26% <0.00%> (-0.04%)` | :arrow_down: |
| [skip-bytebuffers-false](https://app.codecov.io/gh/apache/pinot/pull/11949/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `34.62% <0.00%> (-26.66%)` | :arrow_down: |
| [skip-bytebuffers-true](https://app.codecov.io/gh/apache/pinot/pull/11949/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `27.64% <0.00%> (-33.63%)` | :arrow_down: |
| [temurin](https://app.codecov.io/gh/apache/pinot/pull/11949/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `61.26% <0.00%> (-0.04%)` | :arrow_down: |
| [unittests](https://app.codecov.io/gh/apache/pinot/pull/11949/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `61.25% <0.00%> (-0.04%)` | :arrow_down: |
| [unittests1](https://app.codecov.io/gh/apache/pinot/pull/11949/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `46.41% <ø> (-0.05%)` | :arrow_down: |
| [unittests2](https://app.codecov.io/gh/apache/pinot/pull/11949/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `27.64% <0.00%> (-0.01%)` | :arrow_down: |
Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#carryforward-flags-in-the-pull-request-comment) to find out more.
| [Files](https://app.codecov.io/gh/apache/pinot/pull/11949?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | Coverage Δ | |
|---|---|---|
| [...che/pinot/controller/util/TableMetadataReader.java](https://app.codecov.io/gh/apache/pinot/pull/11949?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#diff-cGlub3QtY29udHJvbGxlci9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29udHJvbGxlci91dGlsL1RhYmxlTWV0YWRhdGFSZWFkZXIuamF2YQ==) | `0.00% <0.00%> (ø)` | |
... and [25 files with indirect coverage changes](https://app.codecov.io/gh/apache/pinot/pull/11949/indirect-changes?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache)
:mega: Codecov offers a browser extension for seamless coverage viewing on GitHub. Try it in [Chrome](https://chrome.google.com/webstore/detail/codecov/gedikamndpbemklijjkncpnolildpbgo) or [Firefox](https://addons.mozilla.org/en-US/firefox/addon/codecov/) today!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
Re: [PR] Adding ability to fetch metadata for specific list of segments [pinot]
Posted by "snleee (via GitHub)" <gi...@apache.org>.
snleee commented on code in PR #11949:
URL: https://github.com/apache/pinot/pull/11949#discussion_r1381949812
##########
pinot-controller/src/main/java/org/apache/pinot/controller/util/TableMetadataReader.java:
##########
@@ -55,58 +53,58 @@ public TableMetadataReader(Executor executor, HttpClientConnectionManager connec
}
/**
- * This method retrieves the full segment metadata for a given table.
- * Currently supports only OFFLINE tables.
- * @return a map of segmentName to its metadata
+ * This api takes in list of segments for which we need the metadata.
*/
- public JsonNode getSegmentsMetadata(String tableNameWithType, List<String> columns, int timeoutMs)
+ public JsonNode getSegmentsMetadata(String tableNameWithType, List<String> columns, Set<String> segmentsToInclude,
+ int timeoutMs)
throws InvalidConfigException, IOException {
- final Map<String, List<String>> serverToSegments =
+ return getSegmentsMetadataInternal(tableNameWithType, columns, segmentsToInclude, timeoutMs);
+ }
+
+ private JsonNode getSegmentsMetadataInternal(String tableNameWithType, List<String> columns,
+ Set<String> segmentsToInclude, int timeoutMs)
+ throws InvalidConfigException, IOException {
+ final Map<String, List<String>> serverToSegmentsMap =
_pinotHelixResourceManager.getServerToSegmentsMap(tableNameWithType);
BiMap<String, String> endpoints =
- _pinotHelixResourceManager.getDataInstanceAdminEndpoints(serverToSegments.keySet());
+ _pinotHelixResourceManager.getDataInstanceAdminEndpoints(serverToSegmentsMap.keySet());
ServerSegmentMetadataReader serverSegmentMetadataReader =
new ServerSegmentMetadataReader(_executor, _connectionManager);
- List<String> segmentsMetadata = serverSegmentMetadataReader
- .getSegmentMetadataFromServer(tableNameWithType, serverToSegments, endpoints, columns, timeoutMs);
+ // Filter segments that we need
+ for (Map.Entry<String, List<String>> serverToSegment : serverToSegmentsMap.entrySet()) {
+ List<String> segments = serverToSegment.getValue();
+ if (segmentsToInclude != null && !segmentsToInclude.isEmpty()) {
+ segments.retainAll(segmentsToInclude);
+ }
+ }
+ List<String> segmentsMetadata = serverSegmentMetadataReader
+ .getSegmentMetadataFromServer(tableNameWithType, serverToSegmentsMap, endpoints, columns, timeoutMs);
Map<String, JsonNode> response = new HashMap<>();
for (String segmentMetadata : segmentsMetadata) {
JsonNode responseJson = JsonUtils.stringToJsonNode(segmentMetadata);
response.put(responseJson.get("segmentName").asText(), responseJson);
}
return JsonUtils.objectToJsonNode(response);
}
+ /**
+ * This method retrieves the full segment metadata for a given table.
+ * Currently supports only OFFLINE tables.
+ * @return a map of segmentName to its metadata
+ */
+ public JsonNode getSegmentsMetadata(String tableNameWithType, List<String> columns, int timeoutMs)
+ throws InvalidConfigException, IOException {
+ return getSegmentsMetadataInternal(tableNameWithType, columns, null, timeoutMs);
Review Comment:
(nit) please apply the formatting
##########
pinot-controller/src/main/java/org/apache/pinot/controller/util/TableMetadataReader.java:
##########
@@ -55,58 +53,58 @@ public TableMetadataReader(Executor executor, HttpClientConnectionManager connec
}
/**
- * This method retrieves the full segment metadata for a given table.
- * Currently supports only OFFLINE tables.
- * @return a map of segmentName to its metadata
+ * This api takes in list of segments for which we need the metadata.
*/
- public JsonNode getSegmentsMetadata(String tableNameWithType, List<String> columns, int timeoutMs)
+ public JsonNode getSegmentsMetadata(String tableNameWithType, List<String> columns, Set<String> segmentsToInclude,
+ int timeoutMs)
throws InvalidConfigException, IOException {
- final Map<String, List<String>> serverToSegments =
+ return getSegmentsMetadataInternal(tableNameWithType, columns, segmentsToInclude, timeoutMs);
+ }
+
+ private JsonNode getSegmentsMetadataInternal(String tableNameWithType, List<String> columns,
+ Set<String> segmentsToInclude, int timeoutMs)
+ throws InvalidConfigException, IOException {
+ final Map<String, List<String>> serverToSegmentsMap =
_pinotHelixResourceManager.getServerToSegmentsMap(tableNameWithType);
BiMap<String, String> endpoints =
- _pinotHelixResourceManager.getDataInstanceAdminEndpoints(serverToSegments.keySet());
+ _pinotHelixResourceManager.getDataInstanceAdminEndpoints(serverToSegmentsMap.keySet());
ServerSegmentMetadataReader serverSegmentMetadataReader =
new ServerSegmentMetadataReader(_executor, _connectionManager);
- List<String> segmentsMetadata = serverSegmentMetadataReader
- .getSegmentMetadataFromServer(tableNameWithType, serverToSegments, endpoints, columns, timeoutMs);
+ // Filter segments that we need
+ for (Map.Entry<String, List<String>> serverToSegment : serverToSegmentsMap.entrySet()) {
+ List<String> segments = serverToSegment.getValue();
+ if (segmentsToInclude != null && !segmentsToInclude.isEmpty()) {
+ segments.retainAll(segmentsToInclude);
+ }
+ }
+ List<String> segmentsMetadata = serverSegmentMetadataReader
+ .getSegmentMetadataFromServer(tableNameWithType, serverToSegmentsMap, endpoints, columns, timeoutMs);
Map<String, JsonNode> response = new HashMap<>();
for (String segmentMetadata : segmentsMetadata) {
JsonNode responseJson = JsonUtils.stringToJsonNode(segmentMetadata);
response.put(responseJson.get("segmentName").asText(), responseJson);
}
return JsonUtils.objectToJsonNode(response);
}
+ /**
Review Comment:
(nit) add line
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
Re: [PR] Adding ability to fetch metadata for specific list of segments [pinot]
Posted by "swaminathanmanish (via GitHub)" <gi...@apache.org>.
swaminathanmanish commented on PR #11949:
URL: https://github.com/apache/pinot/pull/11949#issuecomment-1792739182
cc @snleee - Please review
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
Re: [PR] Adding ability to fetch metadata for specific list of segments [pinot]
Posted by "snleee (via GitHub)" <gi...@apache.org>.
snleee merged PR #11949:
URL: https://github.com/apache/pinot/pull/11949
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
Re: [PR] Adding ability to fetch metadata for specific list of segments [pinot]
Posted by "Jackie-Jiang (via GitHub)" <gi...@apache.org>.
Jackie-Jiang commented on code in PR #11949:
URL: https://github.com/apache/pinot/pull/11949#discussion_r1391889826
##########
pinot-controller/src/main/java/org/apache/pinot/controller/util/TableMetadataReader.java:
##########
@@ -55,22 +55,34 @@ public TableMetadataReader(Executor executor, HttpClientConnectionManager connec
}
/**
- * This method retrieves the full segment metadata for a given table.
- * Currently supports only OFFLINE tables.
- * @return a map of segmentName to its metadata
+ * This api takes in list of segments for which we need the metadata.
*/
- public JsonNode getSegmentsMetadata(String tableNameWithType, List<String> columns, int timeoutMs)
+ public JsonNode getSegmentsMetadata(String tableNameWithType, List<String> columns, Set<String> segmentsToInclude,
Review Comment:
Add `@Nullable` to `segmentsToInclude`
##########
pinot-controller/src/main/java/org/apache/pinot/controller/util/TableMetadataReader.java:
##########
@@ -55,22 +55,34 @@ public TableMetadataReader(Executor executor, HttpClientConnectionManager connec
}
/**
- * This method retrieves the full segment metadata for a given table.
- * Currently supports only OFFLINE tables.
- * @return a map of segmentName to its metadata
+ * This api takes in list of segments for which we need the metadata.
*/
- public JsonNode getSegmentsMetadata(String tableNameWithType, List<String> columns, int timeoutMs)
+ public JsonNode getSegmentsMetadata(String tableNameWithType, List<String> columns, Set<String> segmentsToInclude,
+ int timeoutMs)
throws InvalidConfigException, IOException {
- final Map<String, List<String>> serverToSegments =
+ return getSegmentsMetadataInternal(tableNameWithType, columns, segmentsToInclude, timeoutMs);
+ }
+
+ private JsonNode getSegmentsMetadataInternal(String tableNameWithType, List<String> columns,
+ Set<String> segmentsToInclude, int timeoutMs)
Review Comment:
Add `@Nullable` to `segmentsToInclude`
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org