You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by "klsince (via GitHub)" <gi...@apache.org> on 2024/02/20 23:09:29 UTC

[PR] Enter segment preloading at partition level [pinot]

klsince opened a new pull request, #12451:
URL: https://github.com/apache/pinot/pull/12451

   Currently, we trigger the segment preloading only when creating the tableDataMgr on a server, which is less flexible than triggering at the partition level. For example, when moving a partition to a server already hosting some other partitions, the current triggering logic at table level won't preload segments for the new partition. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [PR] Enter segment preloading at partition level [pinot]

Posted by "Jackie-Jiang (via GitHub)" <gi...@apache.org>.
Jackie-Jiang merged PR #12451:
URL: https://github.com/apache/pinot/pull/12451


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [PR] Enter segment preloading at partition level [pinot]

Posted by "gortiz (via GitHub)" <gi...@apache.org>.
gortiz commented on code in PR #12451:
URL: https://github.com/apache/pinot/pull/12451#discussion_r1514496150


##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/upsert/BaseTableUpsertMetadataManager.java:
##########
@@ -55,20 +39,11 @@ public abstract class BaseTableUpsertMetadataManager implements TableUpsertMetad
   private static final Logger LOGGER = LoggerFactory.getLogger(BaseTableUpsertMetadataManager.class);
 
   protected String _tableNameWithType;
-  protected TableDataManager _tableDataManager;
-  protected HelixManager _helixManager;
-  protected ExecutorService _segmentPreloadExecutor;

Review Comment:
   these are all breaking changes. Third party code may be extending this class and using these attributes



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [PR] Enter segment preloading at partition level [pinot]

Posted by "Jackie-Jiang (via GitHub)" <gi...@apache.org>.
Jackie-Jiang commented on code in PR #12451:
URL: https://github.com/apache/pinot/pull/12451#discussion_r1498371551


##########
pinot-common/src/main/java/org/apache/pinot/common/utils/SegmentUtils.java:
##########
@@ -49,6 +49,24 @@ public static Integer getRealtimeSegmentPartitionId(String segmentName, String r
         ZKMetadataProvider.getSegmentZKMetadata(helixManager.getHelixPropertyStore(), realtimeTableName, segmentName);
     Preconditions.checkState(segmentZKMetadata != null,
         "Failed to find segment ZK metadata for segment: %s of table: %s", segmentName, realtimeTableName);
+    return getRealtimeSegmentPartitionId(segmentZKMetadata, partitionColumn);
+  }
+
+  @Nullable
+  public static Integer getRealtimeSegmentPartitionId(String segmentName, SegmentZKMetadata segmentZKMetadata,
+      @Nullable String partitionColumn) {
+    // A fast path if the segmentName is an LLC segment name: get the partition id from the name directly
+    LLCSegmentName llcSegmentName = LLCSegmentName.of(segmentName);
+    if (llcSegmentName != null) {
+      return llcSegmentName.getPartitionGroupId();
+    }
+    // Otherwise, retrieve the partition id from the segment zk metadata.
+    return getRealtimeSegmentPartitionId(segmentZKMetadata, partitionColumn);
+  }
+
+  @Nullable
+  public static Integer getRealtimeSegmentPartitionId(SegmentZKMetadata segmentZKMetadata,

Review Comment:
   (minor) Make this private? We probably don't want user to directly use this method?



##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/upsert/BasePartitionUpsertMetadataManager.java:
##########
@@ -102,6 +117,12 @@ public abstract class BasePartitionUpsertMetadataManager implements PartitionUps
   // Initialize with 1 pending operation to indicate the metadata manager can take more operations
   private int _numPendingOperations = 1;
   private boolean _closed;
+  // The lock and boolean flag below ensure only one thread can start preloading and preloading happens only once.
+  private final Lock _preloadLock = new ReentrantLock();
+  private volatile boolean _isPreloaded = false;

Review Comment:
   Why do we need 2 flags? Can we initialize `_isPreloading` based on whether preload is enabled?



##########
pinot-core/src/main/java/org/apache/pinot/core/data/manager/realtime/RealtimeTableDataManager.java:
##########
@@ -452,6 +460,12 @@ public void addSegment(String segmentName, IndexLoadingConfig indexLoadingConfig
     _serverMetrics.addValueToTableGauge(_tableNameWithType, ServerGauge.SEGMENT_COUNT, 1L);
   }
 
+  private boolean isReadyToPreload() {

Review Comment:
   Rename this to `isUpsertPreloadEnabled()`. We don't need to check both upsert and this, checking only this should be enough.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [PR] Enter segment preloading at partition level [pinot]

Posted by "klsince (via GitHub)" <gi...@apache.org>.
klsince commented on code in PR #12451:
URL: https://github.com/apache/pinot/pull/12451#discussion_r1509628827


##########
pinot-core/src/main/java/org/apache/pinot/core/data/manager/realtime/RealtimeTableDataManager.java:
##########
@@ -359,6 +357,12 @@ public boolean isPartialUpsertEnabled() {
         && _tableUpsertMetadataManager.getUpsertMode() == UpsertConfig.Mode.PARTIAL;
   }
 
+  private boolean isUpsertPreloadEnabled() {
+    UpsertConfig upsertConfig = _tableConfig.getUpsertConfig();
+    return _tableUpsertMetadataManager != null && _segmentPreloadExecutor != null && upsertConfig != null

Review Comment:
   Good question. I followed the method call below used to create partition mgr, as that's the entry point for preloading. And this is only called in `addSegment()` or `handleUpsert()` methods. So afaik, we should be good to check this at the beginning of the addSegment() method.
   ```
   ... _tableUpsertMetadataManager.getOrCreatePartitionManager(partitionId);
   ```
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [PR] Enter segment preloading at partition level [pinot]

Posted by "Jackie-Jiang (via GitHub)" <gi...@apache.org>.
Jackie-Jiang commented on code in PR #12451:
URL: https://github.com/apache/pinot/pull/12451#discussion_r1505162788


##########
pinot-core/src/main/java/org/apache/pinot/core/data/manager/realtime/RealtimeTableDataManager.java:
##########
@@ -359,6 +357,12 @@ public boolean isPartialUpsertEnabled() {
         && _tableUpsertMetadataManager.getUpsertMode() == UpsertConfig.Mode.PARTIAL;
   }
 
+  private boolean isUpsertPreloadEnabled() {
+    UpsertConfig upsertConfig = _tableConfig.getUpsertConfig();
+    return _tableUpsertMetadataManager != null && _segmentPreloadExecutor != null && upsertConfig != null

Review Comment:
   Not introduced in this PR, but do we have checks in other places? Basically we don't want it to skip preloading silently



##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/upsert/PartitionUpsertMetadataManager.java:
##########
@@ -64,6 +68,14 @@ public interface PartitionUpsertMetadataManager extends Closeable {
    */
   void addSegment(ImmutableSegment segment);
 
+  /**
+   * Preload segments for the table partition. Segments can be added differently during preloading.
+   */
+  void preloadSegments(TableDataManager tableDataManager, IndexLoadingConfig indexLoadingConfig,

Review Comment:
   Suggest only passing in `indexLoadingConfig`. Other properties can be set during initialization. We can consider only setting `tableDataManager` during initialization, and add getters in `TableDataManager` to get `helixManager` and `segmentPreloadExecutor`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [PR] Enter segment preloading at partition level [pinot]

Posted by "codecov-commenter (via GitHub)" <gi...@apache.org>.
codecov-commenter commented on PR #12451:
URL: https://github.com/apache/pinot/pull/12451#issuecomment-1955449569

   ## [Codecov](https://app.codecov.io/gh/apache/pinot/pull/12451?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) Report
   Attention: `50 lines` in your changes are missing coverage. Please review.
   > Comparison is base [(`8c395ff`)](https://app.codecov.io/gh/apache/pinot/commit/8c395ff0788099cdb2e36b768e0694853262b942?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) 61.73% compared to head [(`cb76026`)](https://app.codecov.io/gh/apache/pinot/pull/12451?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) 61.56%.
   
   | [Files](https://app.codecov.io/gh/apache/pinot/pull/12451?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | Patch % | Lines |
   |---|---|---|
   | [...cal/upsert/BasePartitionUpsertMetadataManager.java](https://app.codecov.io/gh/apache/pinot/pull/12451?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC91cHNlcnQvQmFzZVBhcnRpdGlvblVwc2VydE1ldGFkYXRhTWFuYWdlci5qYXZh) | 63.75% | [24 Missing and 5 partials :warning: ](https://app.codecov.io/gh/apache/pinot/pull/12451?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) |
   | [...ata/manager/realtime/RealtimeTableDataManager.java](https://app.codecov.io/gh/apache/pinot/pull/12451?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9kYXRhL21hbmFnZXIvcmVhbHRpbWUvUmVhbHRpbWVUYWJsZURhdGFNYW5hZ2VyLmphdmE=) | 0.00% | [15 Missing and 1 partial :warning: ](https://app.codecov.io/gh/apache/pinot/pull/12451?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) |
   | [...va/org/apache/pinot/common/utils/SegmentUtils.java](https://app.codecov.io/gh/apache/pinot/pull/12451?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#diff-cGlub3QtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9jb21tb24vdXRpbHMvU2VnbWVudFV0aWxzLmphdmE=) | 0.00% | [5 Missing :warning: ](https://app.codecov.io/gh/apache/pinot/pull/12451?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) |
   
   <details><summary>Additional details and impacted files</summary>
   
   
   ```diff
   @@             Coverage Diff              @@
   ##             master   #12451      +/-   ##
   ============================================
   - Coverage     61.73%   61.56%   -0.18%     
     Complexity      207      207              
   ============================================
     Files          2436     2436              
     Lines        133173   133196      +23     
     Branches      20631    20635       +4     
   ============================================
   - Hits          82220    81996     -224     
   - Misses        44907    45146     +239     
   - Partials       6046     6054       +8     
   ```
   
   | [Flag](https://app.codecov.io/gh/apache/pinot/pull/12451/flags?src=pr&el=flags&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | Coverage Δ | |
   |---|---|---|
   | [custom-integration1](https://app.codecov.io/gh/apache/pinot/pull/12451/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `<0.01% <0.00%> (ø)` | |
   | [integration](https://app.codecov.io/gh/apache/pinot/pull/12451/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `<0.01% <0.00%> (-0.01%)` | :arrow_down: |
   | [integration1](https://app.codecov.io/gh/apache/pinot/pull/12451/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `?` | |
   | [integration2](https://app.codecov.io/gh/apache/pinot/pull/12451/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `0.00% <0.00%> (ø)` | |
   | [java-11](https://app.codecov.io/gh/apache/pinot/pull/12451/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `0.00% <0.00%> (-61.69%)` | :arrow_down: |
   | [java-21](https://app.codecov.io/gh/apache/pinot/pull/12451/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `61.56% <50.49%> (-0.06%)` | :arrow_down: |
   | [skip-bytebuffers-false](https://app.codecov.io/gh/apache/pinot/pull/12451/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `34.82% <0.00%> (-26.88%)` | :arrow_down: |
   | [skip-bytebuffers-true](https://app.codecov.io/gh/apache/pinot/pull/12451/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `27.70% <50.49%> (-33.89%)` | :arrow_down: |
   | [temurin](https://app.codecov.io/gh/apache/pinot/pull/12451/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `61.56% <50.49%> (-0.18%)` | :arrow_down: |
   | [unittests](https://app.codecov.io/gh/apache/pinot/pull/12451/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `61.55% <50.49%> (-0.18%)` | :arrow_down: |
   | [unittests1](https://app.codecov.io/gh/apache/pinot/pull/12451/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `46.65% <0.00%> (-0.20%)` | :arrow_down: |
   | [unittests2](https://app.codecov.io/gh/apache/pinot/pull/12451/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `27.70% <50.49%> (-0.04%)` | :arrow_down: |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   
   </details>
   
   [:umbrella: View full report in Codecov by Sentry](https://app.codecov.io/gh/apache/pinot/pull/12451?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache).   
   :loudspeaker: Have feedback on the report? [Share it here](https://about.codecov.io/codecov-pr-comment-feedback/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [PR] Enter segment preloading at partition level [pinot]

Posted by "klsince (via GitHub)" <gi...@apache.org>.
klsince commented on code in PR #12451:
URL: https://github.com/apache/pinot/pull/12451#discussion_r1498426332


##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/upsert/BasePartitionUpsertMetadataManager.java:
##########
@@ -102,6 +117,12 @@ public abstract class BasePartitionUpsertMetadataManager implements PartitionUps
   // Initialize with 1 pending operation to indicate the metadata manager can take more operations
   private int _numPendingOperations = 1;
   private boolean _closed;
+  // The lock and boolean flag below ensure only one thread can start preloading and preloading happens only once.
+  private final Lock _preloadLock = new ReentrantLock();
+  private volatile boolean _isPreloaded = false;

Review Comment:
   simplified, by setting _isPreloading to true if preloading is enabled, otherwise false



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [PR] Enter segment preloading at partition level [pinot]

Posted by "klsince (via GitHub)" <gi...@apache.org>.
klsince commented on code in PR #12451:
URL: https://github.com/apache/pinot/pull/12451#discussion_r1514938270


##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/upsert/BaseTableUpsertMetadataManager.java:
##########
@@ -55,20 +39,11 @@ public abstract class BaseTableUpsertMetadataManager implements TableUpsertMetad
   private static final Logger LOGGER = LoggerFactory.getLogger(BaseTableUpsertMetadataManager.class);
 
   protected String _tableNameWithType;
-  protected TableDataManager _tableDataManager;
-  protected HelixManager _helixManager;
-  protected ExecutorService _segmentPreloadExecutor;

Review Comment:
   those were added when adding the preloading logic but was made at table level. Those got reverted in this PR to move the preloading logic to table partition level for more flexibility and more explicit/clear locking logic.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org