You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by "swaminathanmanish (via GitHub)" <gi...@apache.org> on 2023/11/03 16:18:40 UTC

[PR] Adding ability to fetch metadata for specific list of segments [pinot]

swaminathanmanish opened a new pull request, #11949:
URL: https://github.com/apache/pinot/pull/11949

   **Problem**: Getting metadata for all segments leads to high heap utilization and even controller OOMs (especially when we have large number of segments and small controller heap). Typically we dont need metadata for all segments at once and this can be done in batches. 
   Adding an API to fetch metadata for a list of segments. Also unified all APIs to use the same internal function. Did not modify existing APIs for backwards compatiblity. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [PR] Adding ability to fetch metadata for specific list of segments [pinot]

Posted by "codecov-commenter (via GitHub)" <gi...@apache.org>.
codecov-commenter commented on PR #11949:
URL: https://github.com/apache/pinot/pull/11949#issuecomment-1792844057

   ## [Codecov](https://app.codecov.io/gh/apache/pinot/pull/11949?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) Report
   > Merging [#11949](https://app.codecov.io/gh/apache/pinot/pull/11949?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) (03c2001) into [master](https://app.codecov.io/gh/apache/pinot/commit/53e80331a4e8cb1593e01b8133a6d90aa39cef22?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) (53e8033) will **decrease** coverage by `0.04%`.
   > Report is 3 commits behind head on master.
   > The diff coverage is `0.00%`.
   
   ```diff
   @@             Coverage Diff              @@
   ##             master   #11949      +/-   ##
   ============================================
   - Coverage     61.30%   61.26%   -0.04%     
   + Complexity     1140     1139       -1     
   ============================================
     Files          2378     2378              
     Lines        128865   128887      +22     
     Branches      19927    19929       +2     
   ============================================
   - Hits          78998    78960      -38     
   - Misses        44147    44198      +51     
   - Partials       5720     5729       +9     
   ```
   
   | [Flag](https://app.codecov.io/gh/apache/pinot/pull/11949/flags?src=pr&el=flags&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | Coverage Δ | |
   |---|---|---|
   | [custom-integration1](https://app.codecov.io/gh/apache/pinot/pull/11949/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `<0.01% <0.00%> (-0.01%)` | :arrow_down: |
   | [integration](https://app.codecov.io/gh/apache/pinot/pull/11949/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `<0.01% <0.00%> (-0.01%)` | :arrow_down: |
   | [integration1](https://app.codecov.io/gh/apache/pinot/pull/11949/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `<0.01% <0.00%> (-0.01%)` | :arrow_down: |
   | [integration2](https://app.codecov.io/gh/apache/pinot/pull/11949/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `0.00% <0.00%> (?)` | |
   | [java-11](https://app.codecov.io/gh/apache/pinot/pull/11949/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `<0.01% <0.00%> (-0.01%)` | :arrow_down: |
   | [java-21](https://app.codecov.io/gh/apache/pinot/pull/11949/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `61.26% <0.00%> (-0.04%)` | :arrow_down: |
   | [skip-bytebuffers-false](https://app.codecov.io/gh/apache/pinot/pull/11949/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `34.62% <0.00%> (-26.66%)` | :arrow_down: |
   | [skip-bytebuffers-true](https://app.codecov.io/gh/apache/pinot/pull/11949/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `27.64% <0.00%> (-33.63%)` | :arrow_down: |
   | [temurin](https://app.codecov.io/gh/apache/pinot/pull/11949/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `61.26% <0.00%> (-0.04%)` | :arrow_down: |
   | [unittests](https://app.codecov.io/gh/apache/pinot/pull/11949/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `61.25% <0.00%> (-0.04%)` | :arrow_down: |
   | [unittests1](https://app.codecov.io/gh/apache/pinot/pull/11949/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `46.41% <ø> (-0.05%)` | :arrow_down: |
   | [unittests2](https://app.codecov.io/gh/apache/pinot/pull/11949/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `27.64% <0.00%> (-0.01%)` | :arrow_down: |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Files](https://app.codecov.io/gh/apache/pinot/pull/11949?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | Coverage Δ | |
   |---|---|---|
   | [...che/pinot/controller/util/TableMetadataReader.java](https://app.codecov.io/gh/apache/pinot/pull/11949?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#diff-cGlub3QtY29udHJvbGxlci9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29udHJvbGxlci91dGlsL1RhYmxlTWV0YWRhdGFSZWFkZXIuamF2YQ==) | `0.00% <0.00%> (ø)` | |
   
   ... and [25 files with indirect coverage changes](https://app.codecov.io/gh/apache/pinot/pull/11949/indirect-changes?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache)
   
   :mega: Codecov offers a browser extension for seamless coverage viewing on GitHub. Try it in [Chrome](https://chrome.google.com/webstore/detail/codecov/gedikamndpbemklijjkncpnolildpbgo) or [Firefox](https://addons.mozilla.org/en-US/firefox/addon/codecov/) today!
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [PR] Adding ability to fetch metadata for specific list of segments [pinot]

Posted by "snleee (via GitHub)" <gi...@apache.org>.
snleee commented on code in PR #11949:
URL: https://github.com/apache/pinot/pull/11949#discussion_r1381949812


##########
pinot-controller/src/main/java/org/apache/pinot/controller/util/TableMetadataReader.java:
##########
@@ -55,58 +53,58 @@ public TableMetadataReader(Executor executor, HttpClientConnectionManager connec
   }
 
   /**
-   * This method retrieves the full segment metadata for a given table.
-   * Currently supports only OFFLINE tables.
-   * @return a map of segmentName to its metadata
+   * This api takes in list of segments for which we need the metadata.
    */
-  public JsonNode getSegmentsMetadata(String tableNameWithType, List<String> columns, int timeoutMs)
+  public JsonNode getSegmentsMetadata(String tableNameWithType, List<String> columns, Set<String> segmentsToInclude,
+      int timeoutMs)
       throws InvalidConfigException, IOException {
-    final Map<String, List<String>> serverToSegments =
+    return getSegmentsMetadataInternal(tableNameWithType, columns, segmentsToInclude, timeoutMs);
+  }
+
+  private JsonNode getSegmentsMetadataInternal(String tableNameWithType, List<String> columns,
+      Set<String> segmentsToInclude, int timeoutMs)
+      throws InvalidConfigException, IOException {
+    final Map<String, List<String>> serverToSegmentsMap =
         _pinotHelixResourceManager.getServerToSegmentsMap(tableNameWithType);
     BiMap<String, String> endpoints =
-        _pinotHelixResourceManager.getDataInstanceAdminEndpoints(serverToSegments.keySet());
+        _pinotHelixResourceManager.getDataInstanceAdminEndpoints(serverToSegmentsMap.keySet());
     ServerSegmentMetadataReader serverSegmentMetadataReader =
         new ServerSegmentMetadataReader(_executor, _connectionManager);
 
-    List<String> segmentsMetadata = serverSegmentMetadataReader
-        .getSegmentMetadataFromServer(tableNameWithType, serverToSegments, endpoints, columns, timeoutMs);
+    // Filter segments that we need
+    for (Map.Entry<String, List<String>> serverToSegment : serverToSegmentsMap.entrySet()) {
+      List<String> segments = serverToSegment.getValue();
+      if (segmentsToInclude != null && !segmentsToInclude.isEmpty()) {
+        segments.retainAll(segmentsToInclude);
+      }
+    }
 
+    List<String> segmentsMetadata = serverSegmentMetadataReader
+        .getSegmentMetadataFromServer(tableNameWithType, serverToSegmentsMap, endpoints, columns, timeoutMs);
     Map<String, JsonNode> response = new HashMap<>();
     for (String segmentMetadata : segmentsMetadata) {
       JsonNode responseJson = JsonUtils.stringToJsonNode(segmentMetadata);
       response.put(responseJson.get("segmentName").asText(), responseJson);
     }
     return JsonUtils.objectToJsonNode(response);
   }
+  /**
+   * This method retrieves the full segment metadata for a given table.
+   * Currently supports only OFFLINE tables.
+   * @return a map of segmentName to its metadata
+   */
+  public JsonNode getSegmentsMetadata(String tableNameWithType, List<String> columns, int timeoutMs)
+      throws InvalidConfigException, IOException {
+   return getSegmentsMetadataInternal(tableNameWithType, columns, null, timeoutMs);

Review Comment:
   (nit) please apply the formatting



##########
pinot-controller/src/main/java/org/apache/pinot/controller/util/TableMetadataReader.java:
##########
@@ -55,58 +53,58 @@ public TableMetadataReader(Executor executor, HttpClientConnectionManager connec
   }
 
   /**
-   * This method retrieves the full segment metadata for a given table.
-   * Currently supports only OFFLINE tables.
-   * @return a map of segmentName to its metadata
+   * This api takes in list of segments for which we need the metadata.
    */
-  public JsonNode getSegmentsMetadata(String tableNameWithType, List<String> columns, int timeoutMs)
+  public JsonNode getSegmentsMetadata(String tableNameWithType, List<String> columns, Set<String> segmentsToInclude,
+      int timeoutMs)
       throws InvalidConfigException, IOException {
-    final Map<String, List<String>> serverToSegments =
+    return getSegmentsMetadataInternal(tableNameWithType, columns, segmentsToInclude, timeoutMs);
+  }
+
+  private JsonNode getSegmentsMetadataInternal(String tableNameWithType, List<String> columns,
+      Set<String> segmentsToInclude, int timeoutMs)
+      throws InvalidConfigException, IOException {
+    final Map<String, List<String>> serverToSegmentsMap =
         _pinotHelixResourceManager.getServerToSegmentsMap(tableNameWithType);
     BiMap<String, String> endpoints =
-        _pinotHelixResourceManager.getDataInstanceAdminEndpoints(serverToSegments.keySet());
+        _pinotHelixResourceManager.getDataInstanceAdminEndpoints(serverToSegmentsMap.keySet());
     ServerSegmentMetadataReader serverSegmentMetadataReader =
         new ServerSegmentMetadataReader(_executor, _connectionManager);
 
-    List<String> segmentsMetadata = serverSegmentMetadataReader
-        .getSegmentMetadataFromServer(tableNameWithType, serverToSegments, endpoints, columns, timeoutMs);
+    // Filter segments that we need
+    for (Map.Entry<String, List<String>> serverToSegment : serverToSegmentsMap.entrySet()) {
+      List<String> segments = serverToSegment.getValue();
+      if (segmentsToInclude != null && !segmentsToInclude.isEmpty()) {
+        segments.retainAll(segmentsToInclude);
+      }
+    }
 
+    List<String> segmentsMetadata = serverSegmentMetadataReader
+        .getSegmentMetadataFromServer(tableNameWithType, serverToSegmentsMap, endpoints, columns, timeoutMs);
     Map<String, JsonNode> response = new HashMap<>();
     for (String segmentMetadata : segmentsMetadata) {
       JsonNode responseJson = JsonUtils.stringToJsonNode(segmentMetadata);
       response.put(responseJson.get("segmentName").asText(), responseJson);
     }
     return JsonUtils.objectToJsonNode(response);
   }
+  /**

Review Comment:
   (nit) add line



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [PR] Adding ability to fetch metadata for specific list of segments [pinot]

Posted by "swaminathanmanish (via GitHub)" <gi...@apache.org>.
swaminathanmanish commented on PR #11949:
URL: https://github.com/apache/pinot/pull/11949#issuecomment-1792739182

   cc @snleee - Please review


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [PR] Adding ability to fetch metadata for specific list of segments [pinot]

Posted by "snleee (via GitHub)" <gi...@apache.org>.
snleee merged PR #11949:
URL: https://github.com/apache/pinot/pull/11949


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [PR] Adding ability to fetch metadata for specific list of segments [pinot]

Posted by "Jackie-Jiang (via GitHub)" <gi...@apache.org>.
Jackie-Jiang commented on code in PR #11949:
URL: https://github.com/apache/pinot/pull/11949#discussion_r1391889826


##########
pinot-controller/src/main/java/org/apache/pinot/controller/util/TableMetadataReader.java:
##########
@@ -55,22 +55,34 @@ public TableMetadataReader(Executor executor, HttpClientConnectionManager connec
   }
 
   /**
-   * This method retrieves the full segment metadata for a given table.
-   * Currently supports only OFFLINE tables.
-   * @return a map of segmentName to its metadata
+   * This api takes in list of segments for which we need the metadata.
    */
-  public JsonNode getSegmentsMetadata(String tableNameWithType, List<String> columns, int timeoutMs)
+  public JsonNode getSegmentsMetadata(String tableNameWithType, List<String> columns, Set<String> segmentsToInclude,

Review Comment:
   Add `@Nullable` to `segmentsToInclude`



##########
pinot-controller/src/main/java/org/apache/pinot/controller/util/TableMetadataReader.java:
##########
@@ -55,22 +55,34 @@ public TableMetadataReader(Executor executor, HttpClientConnectionManager connec
   }
 
   /**
-   * This method retrieves the full segment metadata for a given table.
-   * Currently supports only OFFLINE tables.
-   * @return a map of segmentName to its metadata
+   * This api takes in list of segments for which we need the metadata.
    */
-  public JsonNode getSegmentsMetadata(String tableNameWithType, List<String> columns, int timeoutMs)
+  public JsonNode getSegmentsMetadata(String tableNameWithType, List<String> columns, Set<String> segmentsToInclude,
+      int timeoutMs)
       throws InvalidConfigException, IOException {
-    final Map<String, List<String>> serverToSegments =
+    return getSegmentsMetadataInternal(tableNameWithType, columns, segmentsToInclude, timeoutMs);
+  }
+
+  private JsonNode getSegmentsMetadataInternal(String tableNameWithType, List<String> columns,
+      Set<String> segmentsToInclude, int timeoutMs)

Review Comment:
   Add `@Nullable` to `segmentsToInclude`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org