You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by GitBox <gi...@apache.org> on 2021/12/30 07:10:42 UTC

[GitHub] [pinot] klsince opened a new pull request #7969: extend SegmentDirectoryLoader interface and refactor BaseTableDataManager

klsince opened a new pull request #7969:
URL: https://github.com/apache/pinot/pull/7969


   This PR extends SegmentDirectoryLoader interface and uses it to refactor BaseTableDataManager, so that the segment loading logic assumes SegmentDirectory interface instead of a concrete local segment folder, to be more extensible.
   
   ## Description
   <!-- Add a description of your PR here.
   A good description should include pointers to an issue or design document, etc.
   -->
   ## Upgrade Notes
   Does this PR prevent a zero down-time upgrade? (Assume upgrade order: Controller, Broker, Server, Minion)
   * [ ] Yes (Please label as **<code>backward-incompat</code>**, and complete the section below on Release Notes)
   
   Does this PR fix a zero-downtime upgrade introduced earlier?
   * [ ] Yes (Please label this as **<code>backward-incompat</code>**, and complete the section below on Release Notes)
   
   Does this PR otherwise need attention when creating release notes? Things to consider:
   - New configuration options
   - Deprecation of configurations
   - Signature changes to public methods/interfaces
   - New plugins added or old plugins removed
   * [ ] Yes (Please label this PR as **<code>release-notes</code>** and complete the section on Release Notes)
   ## Release Notes
   <!-- If you have tagged this as either backward-incompat or release-notes,
   you MUST add text here that you would like to see appear in release notes of the
   next release. -->
   
   <!-- If you have a series of commits adding or enabling a feature, then
   add this section only in final commit that marks the feature completed.
   Refer to earlier release notes to see examples of text.
   -->
   ## Documentation
   <!-- If you have introduced a new feature or configuration, please add it to the documentation as well.
   See https://docs.pinot.apache.org/developers/developers-and-contributors/update-document
   -->
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] codecov-commenter edited a comment on pull request #7969: extend SegmentDirectoryLoader interface and refactor BaseTableDataManager

Posted by GitBox <gi...@apache.org>.
codecov-commenter edited a comment on pull request #7969:
URL: https://github.com/apache/pinot/pull/7969#issuecomment-1002909897


   # [Codecov](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#7969](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (5ae0dff) into [master](https://codecov.io/gh/apache/pinot/commit/d25a48879ea8b7f7b7595249d412705b670204e5?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (d25a488) will **increase** coverage by `0.00%`.
   > The diff coverage is `83.43%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/pinot/pull/7969/graphs/tree.svg?width=650&height=150&src=pr&token=4ibza2ugkz&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   ```diff
   @@            Coverage Diff            @@
   ##             master    #7969   +/-   ##
   =========================================
     Coverage     71.37%   71.38%           
   - Complexity     4194     4201    +7     
   =========================================
     Files          1594     1595    +1     
     Lines         82562    82647   +85     
     Branches      12321    12325    +4     
   =========================================
   + Hits          58927    58995   +68     
   - Misses        19652    19663   +11     
   - Partials       3983     3989    +6     
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | integration1 | `29.06% <39.05%> (+0.02%)` | :arrow_up: |
   | integration2 | `27.57% <36.09%> (-0.16%)` | :arrow_down: |
   | unittests1 | `68.27% <83.97%> (+0.02%)` | :arrow_up: |
   | unittests2 | `14.28% <0.00%> (+<0.01%)` | :arrow_up: |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [...ata/manager/realtime/RealtimeTableDataManager.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9kYXRhL21hbmFnZXIvcmVhbHRpbWUvUmVhbHRpbWVUYWJsZURhdGFNYW5hZ2VyLmphdmE=) | `67.88% <ø> (ø)` | |
   | [...ent/index/column/PhysicalColumnIndexContainer.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2luZGV4L2NvbHVtbi9QaHlzaWNhbENvbHVtbkluZGV4Q29udGFpbmVyLmphdmE=) | `90.83% <0.00%> (-0.84%)` | :arrow_down: |
   | [...egment/spi/index/metadata/SegmentMetadataImpl.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1zcGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL3Bpbm90L3NlZ21lbnQvc3BpL2luZGV4L21ldGFkYXRhL1NlZ21lbnRNZXRhZGF0YUltcGwuamF2YQ==) | `74.28% <28.57%> (-0.29%)` | :arrow_down: |
   | [...server/starter/helix/HelixInstanceDataManager.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VydmVyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9zZXJ2ZXIvc3RhcnRlci9oZWxpeC9IZWxpeEluc3RhbmNlRGF0YU1hbmFnZXIuamF2YQ==) | `79.44% <69.23%> (-1.03%)` | :arrow_down: |
   | [...indexsegment/immutable/ImmutableSegmentLoader.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9pbmRleHNlZ21lbnQvaW1tdXRhYmxlL0ltbXV0YWJsZVNlZ21lbnRMb2FkZXIuamF2YQ==) | `87.50% <82.75%> (-4.17%)` | :arrow_down: |
   | [...ocal/segment/index/loader/SegmentPreProcessor.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2luZGV4L2xvYWRlci9TZWdtZW50UHJlUHJvY2Vzc29yLmphdmE=) | `80.00% <83.33%> (+0.21%)` | :arrow_up: |
   | [.../pinot/core/data/manager/BaseTableDataManager.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9kYXRhL21hbmFnZXIvQmFzZVRhYmxlRGF0YU1hbmFnZXIuamF2YQ==) | `88.23% <87.50%> (+2.29%)` | :arrow_up: |
   | [...nt/local/loader/DefaultSegmentDirectoryLoader.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9sb2FkZXIvRGVmYXVsdFNlZ21lbnREaXJlY3RvcnlMb2FkZXIuamF2YQ==) | `100.00% <100.00%> (ø)` | |
   | [...egment/local/segment/index/loader/LoaderUtils.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2luZGV4L2xvYWRlci9Mb2FkZXJVdGlscy5qYXZh) | `95.91% <100.00%> (+1.80%)` | :arrow_up: |
   | [.../columnminmaxvalue/ColumnMinMaxValueGenerator.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2luZGV4L2xvYWRlci9jb2x1bW5taW5tYXh2YWx1ZS9Db2x1bW5NaW5NYXhWYWx1ZUdlbmVyYXRvci5qYXZh) | `73.68% <100.00%> (ø)` | |
   | ... and [28 more](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [d25a488...5ae0dff](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] npawar commented on a change in pull request #7969: extend SegmentDirectoryLoader interface and refactor BaseTableDataManager

Posted by GitBox <gi...@apache.org>.
npawar commented on a change in pull request #7969:
URL: https://github.com/apache/pinot/pull/7969#discussion_r779813509



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/data/manager/BaseTableDataManager.java
##########
@@ -333,94 +306,162 @@ public void reloadSegment(String segmentName, IndexLoadingConfig indexLoadingCon
 
   @Override
   public void addOrReplaceSegment(String segmentName, IndexLoadingConfig indexLoadingConfig,
-      SegmentZKMetadata zkMetadata, @Nullable SegmentMetadata localMetadata)
+      SegmentZKMetadata zkMetadata, @Nullable SegmentMetadata segmentMetadata)
       throws Exception {
-    if (!isNewSegment(zkMetadata, localMetadata)) {
-      LOGGER.info("Segment: {} of table: {} has crc: {} same as before, already loaded, do nothing", segmentName,
-          _tableNameWithType, localMetadata.getCrc());
+    // Non-null segment metadata means the segment has already been loaded.
+    if (segmentMetadata != null) {
+      if (hasSameCRC(zkMetadata, segmentMetadata)) {
+        // Simply returns if the CRC hasn't changed. The table config may have changed
+        // since segment is loaded, but that is handled by reloadSegment() method.
+        LOGGER.info("Segment: {} of table: {} has crc: {} same as before, already loaded, do nothing", segmentName,
+            _tableNameWithType, segmentMetadata.getCrc());
+      } else {
+        // Download the raw segment, reprocess and load it if the CRC has changed.
+        LOGGER.info("Segment: {} of table: {} already loaded but its crc: {} differs from new crc: {}", segmentName,
+            _tableNameWithType, segmentMetadata.getCrc(), zkMetadata.getCrc());
+        downloadRawSegmentAndProcess(segmentName, indexLoadingConfig, zkMetadata,
+            ZKMetadataProvider.getTableSchema(_propertyStore, _tableNameWithType));
+      }
       return;
     }
 
-    // Try to recover if no local metadata is provided.
-    if (localMetadata == null) {
-      LOGGER.info("Segment: {} of table: {} is not loaded, checking disk", segmentName, _tableNameWithType);
-      localMetadata = recoverSegmentQuietly(segmentName);
-      if (!isNewSegment(zkMetadata, localMetadata)) {
-        LOGGER.info("Segment: {} of table {} has crc: {} same as before, loading", segmentName, _tableNameWithType,
-            localMetadata.getCrc());
-        if (loadSegmentQuietly(segmentName, indexLoadingConfig)) {
-          return;
-        }
-        // Set local metadata to null to indicate that the local segment fails to load,
-        // although it exists and has same crc with the remote one.
-        localMetadata = null;
-      }
-    }
+    // For local tier backend, try to recover the segment from potential
+    // reload failure. Continue upon any failure.
+    File indexDir = getSegmentDataDir(segmentName);
+    recoverReloadFailureQuietly(_tableNameWithType, segmentName, indexDir);
 
-    Preconditions.checkState(allowDownload(segmentName, zkMetadata), "Segment: %s of table: %s does not allow download",
-        segmentName, _tableNameWithType);
+    // Creates the SegmentDirectory object to access the segment metadata that
+    // may be from local tier backend or remote tier backend.
+    SegmentDirectory segmentDirectory = null;
+    try {
+      SegmentDirectoryLoaderContext loaderContext =
+          new SegmentDirectoryLoaderContext(indexLoadingConfig.getTableConfig(), indexLoadingConfig.getInstanceId(),
+              segmentName, indexLoadingConfig.getSegmentDirectoryConfigs());
+      SegmentDirectoryLoader segmentDirectoryLoader =
+          SegmentDirectoryLoaderRegistry.getSegmentDirectoryLoader(indexLoadingConfig.getSegmentDirectoryLoader());
+      segmentDirectory = segmentDirectoryLoader.load(indexDir.toURI(), loaderContext);
+    } catch (Exception e) {
+      LOGGER.warn("Failed to create SegmentDirectory for segment: {} of table: {} due to error: {}", segmentName,
+          _tableNameWithType, e.getMessage());
+    }
 
-    // Download segment and replace the local one, either due to failure to recover local segment,
-    // or the segment data is updated and has new CRC now.
-    if (localMetadata == null) {
-      LOGGER.info("Download segment: {} of table: {} as no good one exists locally", segmentName, _tableNameWithType);
-    } else {
-      LOGGER.info("Download segment: {} of table: {} as local crc: {} mismatches remote crc: {}.", segmentName,
-          _tableNameWithType, localMetadata.getCrc(), zkMetadata.getCrc());
+    // Download the raw segment, reprocess and load it if it is never loaded or its CRC has changed.
+    if (segmentDirectory == null) {
+      LOGGER.info("Segment: {} of table: {} does not exist", segmentName, _tableNameWithType);
+      downloadRawSegmentAndProcess(segmentName, indexLoadingConfig, zkMetadata,
+          ZKMetadataProvider.getTableSchema(_propertyStore, _tableNameWithType));
+      return;
+    }
+    if (!hasSameCRC(zkMetadata, segmentDirectory.getSegmentMetadata())) {
+      LOGGER.info("Segment: {} of table: {} has crc: {} different from new crc: {}", segmentName,
+          _tableNameWithType, segmentDirectory.getSegmentMetadata().getCrc(), zkMetadata.getCrc());
+      closeSegmentDirectoryQuietly(segmentDirectory);
+      downloadRawSegmentAndProcess(segmentName, indexLoadingConfig, zkMetadata,
+          ZKMetadataProvider.getTableSchema(_propertyStore, _tableNameWithType));
+      return;
     }
-    File indexDir = downloadSegment(segmentName, zkMetadata);
-    addSegment(indexDir, indexLoadingConfig);
-    LOGGER.info("Downloaded and loaded segment: {} of table: {} with crc: {}", segmentName, _tableNameWithType,
-        zkMetadata.getCrc());
-  }
 
-  protected boolean allowDownload(String segmentName, SegmentZKMetadata zkMetadata) {
-    return true;
+    try {
+      Schema schema = ZKMetadataProvider.getTableSchema(_propertyStore, _tableNameWithType);
+      if (!ImmutableSegmentLoader.needPreprocess(segmentDirectory, indexLoadingConfig, schema)) {
+        // The loaded segment is still consistent with current table config or schema.
+        LOGGER.info("Segment: {} of table: {} is consistent with table config and schema", segmentName,
+            _tableNameWithType);
+        loadSegment(segmentDirectory, indexLoadingConfig, schema);
+        return;
+      }
+      // If any discrepancy is found, get the segment from tier backend, reprocess and load it.
+      LOGGER.info("Segment: {} of table: {} needs reprocess to reflect latest table config and schema", segmentName,
+          _tableNameWithType);
+      // Close the stale SegmentDirectory object before loading the newly processed segment.
+      closeSegmentDirectoryQuietly(segmentDirectory);
+      downloadTierSegmentAndProcess(segmentName, indexLoadingConfig, schema);
+    } catch (Exception e) {
+      closeSegmentDirectoryQuietly(segmentDirectory);
+      LOGGER.error("Failed to reprocess and load segment: {} of table: {}", segmentName, _tableNameWithType, e);
+      downloadRawSegmentAndProcess(segmentName, indexLoadingConfig, zkMetadata,
+          ZKMetadataProvider.getTableSchema(_propertyStore, _tableNameWithType));
+    }
   }
 
-  protected File downloadSegment(String segmentName, SegmentZKMetadata zkMetadata)
+  /**
+   * Get the segment from the configured tier backend, reprocess it with the latest table
+   * config and schema and then load it. Please note that the segment is from tier backend,
+   * not deep store, for incremental processing.
+   */
+  private void downloadTierSegmentAndProcess(String segmentName, IndexLoadingConfig indexLoadingConfig,
+      @Nullable Schema schema)
       throws Exception {
-    // TODO: may support download from peer servers for RealTime table.
-    return downloadSegmentFromDeepStore(segmentName, zkMetadata);
+    SegmentDirectoryLoaderContext loaderContext =
+        new SegmentDirectoryLoaderContext(indexLoadingConfig.getTableConfig(), indexLoadingConfig.getInstanceId(),
+            segmentName, indexLoadingConfig.getSegmentDirectoryConfigs());
+    SegmentDirectoryLoader segmentDirectoryLoader =
+        SegmentDirectoryLoaderRegistry.getSegmentDirectoryLoader(indexLoadingConfig.getSegmentDirectoryLoader());
+    File indexDir = getSegmentDataDir(segmentName);
+    segmentDirectoryLoader.download(indexDir, loaderContext);

Review comment:
       This is to make Pinot more extensible and future proofing for supporting cases like lazy loading and tiering. Improvement of readability is being done simply to make the review go easier, since that part of code was a lil complex to read to begin with..




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] klsince commented on a change in pull request #7969: refactor segment loading logic in BaseTableDataManager to decouple it with local segment directory

Posted by GitBox <gi...@apache.org>.
klsince commented on a change in pull request #7969:
URL: https://github.com/apache/pinot/pull/7969#discussion_r780029648



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/data/manager/BaseTableDataManager.java
##########
@@ -333,94 +306,156 @@ public void reloadSegment(String segmentName, IndexLoadingConfig indexLoadingCon
 
   @Override
   public void addOrReplaceSegment(String segmentName, IndexLoadingConfig indexLoadingConfig,
-      SegmentZKMetadata zkMetadata, @Nullable SegmentMetadata localMetadata)
+      SegmentZKMetadata zkMetadata, @Nullable SegmentMetadata segmentMetadata)
       throws Exception {
-    if (!isNewSegment(zkMetadata, localMetadata)) {
-      LOGGER.info("Segment: {} of table: {} has crc: {} same as before, already loaded, do nothing", segmentName,
-          _tableNameWithType, localMetadata.getCrc());
+    // Non-null segment metadata means the segment has already been loaded.
+    if (segmentMetadata != null) {
+      if (hasSameCRC(zkMetadata, segmentMetadata)) {
+        // Simply returns if the CRC hasn't changed. The table config may have changed
+        // since segment is loaded, but that is handled by reloadSegment() method.
+        LOGGER.info("Segment: {} of table: {} has crc: {} same as before, already loaded, do nothing", segmentName,
+            _tableNameWithType, segmentMetadata.getCrc());
+      } else {
+        // Download the raw segment, reprocess and load it if the CRC has changed.
+        LOGGER.info("Segment: {} of table: {} already loaded but its crc: {} differs from new crc: {}", segmentName,
+            _tableNameWithType, segmentMetadata.getCrc(), zkMetadata.getCrc());
+        downloadRawSegmentAndProcess(segmentName, indexLoadingConfig, zkMetadata,

Review comment:
       no more raw segment or tier segment. please review the newly updated code.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] klsince commented on a change in pull request #7969: refactor segment loading logic in BaseTableDataManager to decouple it with local segment directory

Posted by GitBox <gi...@apache.org>.
klsince commented on a change in pull request #7969:
URL: https://github.com/apache/pinot/pull/7969#discussion_r780526797



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/data/manager/BaseTableDataManager.java
##########
@@ -480,12 +426,138 @@ File getSegmentDataDir(String segmentName) {
     return new File(_indexDir, segmentName);
   }
 
-  @VisibleForTesting
-  static boolean isNewSegment(SegmentZKMetadata zkMetadata, @Nullable SegmentMetadata localMetadata) {
-    return localMetadata == null || !hasSameCRC(zkMetadata, localMetadata);
+  /**
+   * Create a backup directory to handle failure of segment reloading.
+   * First rename index directory to segment backup directory so that original segment have all file
+   * descriptors point to the segment backup directory to ensure original segment serves queries properly.
+   * The original index directory is restored lazily, as depending on the conditions,
+   * it may be restored from the backup directory or segment downloaded from deep store.
+   */
+  private void createBackup(File indexDir) {
+    if (!indexDir.exists()) {
+      return;
+    }
+    File parentDir = indexDir.getParentFile();
+    File segmentBackupDir = new File(parentDir, indexDir.getName() + CommonConstants.Segment.SEGMENT_BACKUP_DIR_SUFFIX);
+    // Rename index directory to segment backup directory (atomic).
+    Preconditions.checkState(indexDir.renameTo(segmentBackupDir),
+        "Failed to rename index directory: %s to segment backup directory: %s", indexDir, segmentBackupDir);
+  }
+
+  /**
+   * Remove the backup directory to mark the completion of segment reloading.
+   * First rename then delete is as renaming is an atomic operation, but deleting is not.
+   * When we rename the segment backup directory to segment temporary directory, we know the reload
+   * already succeeded, so that we can safely delete the segment temporary directory.
+   */
+  private void removeBackup(File indexDir)
+      throws IOException {
+    File parentDir = indexDir.getParentFile();
+    File segmentBackupDir = new File(parentDir, indexDir.getName() + CommonConstants.Segment.SEGMENT_BACKUP_DIR_SUFFIX);
+    if (!segmentBackupDir.exists()) {
+      return;
+    }
+    File segmentTempDir = new File(parentDir, indexDir.getName() + CommonConstants.Segment.SEGMENT_TEMP_DIR_SUFFIX);
+    // Rename segment backup directory to segment temporary directory (atomic).
+    Preconditions.checkState(segmentBackupDir.renameTo(segmentTempDir),
+        "Failed to rename segment backup directory: %s to segment temporary directory: %s", segmentBackupDir,
+        segmentTempDir);
+    FileUtils.deleteDirectory(segmentTempDir);
+  }
+
+  private boolean tryLoadExistingSegment(String segmentName, IndexLoadingConfig indexLoadingConfig,
+      SegmentZKMetadata zkMetadata) {
+    // Try to recover the segment from potential segment reloading failure.
+    File indexDir = getSegmentDataDir(segmentName);
+    recoverReloadFailureQuietly(_tableNameWithType, segmentName, indexDir);
+
+    // Creates the SegmentDirectory object to access the segment metadata.
+    // The metadata is null if the segment doesn't exist yet.
+    SegmentDirectory segmentDirectory = tryGetSegmentDirectory(segmentName, indexLoadingConfig);
+    SegmentMetadataImpl segmentMetadata = (segmentDirectory == null) ? null : segmentDirectory.getSegmentMetadata();
+
+    // If the segment doesn't exist on server or its CRC has changed, then we
+    // need to fall back to download the segment from deep store to load it.
+    if (segmentMetadata == null) {
+      LOGGER.info("Segment: {} of table: {} does not exist", segmentName, _tableNameWithType);
+      return false;
+    }
+    if (!hasSameCRC(zkMetadata, segmentMetadata)) {
+      LOGGER.info("Segment: {} of table: {} has crc change from: {} to: {}", segmentName, _tableNameWithType,
+          segmentMetadata.getCrc(), zkMetadata.getCrc());
+      return false;
+    }
+
+    try {
+      // If the segment is still kept by the server, then we can
+      // either load it directly if it's still consistent with latest table config and schema;
+      // or reprocess it to reflect latest table config and schema before loading.
+      Schema schema = ZKMetadataProvider.getTableSchema(_propertyStore, _tableNameWithType);
+      if (!ImmutableSegmentLoader.needPreprocess(segmentDirectory, indexLoadingConfig, schema)) {
+        LOGGER.info("Segment: {} of table: {} is consistent with latest table config and schema", segmentName,
+            _tableNameWithType);
+      } else {
+        LOGGER.info("Segment: {} of table: {} needs reprocess to reflect latest table config and schema", segmentName,
+            _tableNameWithType);
+        segmentDirectory.copyTo(indexDir);
+        // Close the stale SegmentDirectory object and recreate it with reprocessed segment.
+        closeSegmentDirectoryQuietly(segmentDirectory);
+        ImmutableSegmentLoader.preprocess(indexDir, indexLoadingConfig, schema);
+        segmentDirectory = getSegmentDirectory(segmentName, indexLoadingConfig);
+      }
+      ImmutableSegment segment = ImmutableSegmentLoader.load(segmentDirectory, indexLoadingConfig, schema);
+      addSegment(segment);
+      LOGGER.info("Loaded existing segment: {} of table: {} with crc: {}", segmentName, _tableNameWithType,
+          zkMetadata.getCrc());
+      return true;
+    } catch (Exception e) {
+      LOGGER.error("Failed to load existing segment: {} of table: {} with crc: {}", segmentName, _tableNameWithType, e);
+      closeSegmentDirectoryQuietly(segmentDirectory);
+      return false;
+    }
+  }
+
+  private SegmentDirectory tryGetSegmentDirectory(String segmentName, IndexLoadingConfig indexLoadingConfig) {
+    try {
+      return getSegmentDirectory(segmentName, indexLoadingConfig);
+    } catch (Exception e) {
+      LOGGER.warn("Attempt to get SegmentDirectory for segment: {} of table: {} failed with error: {}", segmentName,
+          _tableNameWithType, e.getMessage());
+      return null;
+    }
+  }
+
+  private SegmentDirectory getSegmentDirectory(String segmentName, IndexLoadingConfig indexLoadingConfig)

Review comment:
       I'll rethink the method names: this one gets SegmentDirectory object; the getSegmentDataDir composes seg dir path.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] jackjlli commented on a change in pull request #7969: extend SegmentDirectoryLoader interface and refactor BaseTableDataManager

Posted by GitBox <gi...@apache.org>.
jackjlli commented on a change in pull request #7969:
URL: https://github.com/apache/pinot/pull/7969#discussion_r777714621



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/data/manager/BaseTableDataManager.java
##########
@@ -333,92 +315,131 @@ public void reloadSegment(String segmentName, IndexLoadingConfig indexLoadingCon
 
   @Override
   public void addOrReplaceSegment(String segmentName, IndexLoadingConfig indexLoadingConfig,
-      SegmentZKMetadata zkMetadata, @Nullable SegmentMetadata localMetadata)
+      SegmentZKMetadata zkMetadata, @Nullable SegmentMetadata segmentMetadata)
       throws Exception {
-    if (!isNewSegment(zkMetadata, localMetadata)) {
-      LOGGER.info("Segment: {} of table: {} has crc: {} same as before, already loaded, do nothing", segmentName,
-          _tableNameWithType, localMetadata.getCrc());
+    // Non-null segment metadata means the segment has already been loaded.
+    if (segmentMetadata != null) {
+      if (hasSameCRC(zkMetadata, segmentMetadata)) {
+        // Simply returns if the CRC hasn't changed. The table config may have changed
+        // since segment is loaded, but that is handled by reloadSegment() method.
+        LOGGER.info("Segment: {} of table: {} has crc: {} same as before, already loaded, do nothing", segmentName,
+            _tableNameWithType, segmentMetadata.getCrc());
+      } else {
+        // Download the raw segment, reprocess and load it if the CRC has changed.
+        LOGGER.info("Segment: {} of table: {} already loaded but its crc: {} differs from new crc: {}", segmentName,
+            _tableNameWithType, segmentMetadata.getCrc(), zkMetadata.getCrc());
+        downloadRawSegmentAndProcess(segmentName, indexLoadingConfig, zkMetadata,
+            ZKMetadataProvider.getTableSchema(_propertyStore, _tableNameWithType));
+      }
       return;
     }
 
-    // Try to recover if no local metadata is provided.
-    if (localMetadata == null) {
-      LOGGER.info("Segment: {} of table: {} is not loaded, checking disk", segmentName, _tableNameWithType);
-      localMetadata = recoverSegmentQuietly(segmentName);
-      if (!isNewSegment(zkMetadata, localMetadata)) {
-        LOGGER.info("Segment: {} of table {} has crc: {} same as before, loading", segmentName, _tableNameWithType,
-            localMetadata.getCrc());
-        if (loadSegmentQuietly(segmentName, indexLoadingConfig)) {
-          return;
-        }
-        // Set local metadata to null to indicate that the local segment fails to load,
-        // although it exists and has same crc with the remote one.
-        localMetadata = null;
-      }
+    // For local tier backend, try to recover the segment from potential
+    // reload failure. Continue upon any failure.
+    File indexDir = getSegmentDataDir(segmentName);
+    recoverReloadFailureQuietly(_tableNameWithType, segmentName, indexDir);
+
+    // Creates the SegmentDirectory object to access the segment metadata that
+    // may be from local tier backend or remote tier backend.
+    SegmentDirectoryLoaderContext segmentLoaderContext =
+        new SegmentDirectoryLoaderContext(indexLoadingConfig.getTableConfig(), indexLoadingConfig.getInstanceId(),
+            segmentName, indexLoadingConfig.getSegmentDirectoryConfigs());
+    SegmentDirectoryLoader segmentLoader =
+        SegmentDirectoryLoaderRegistry.getSegmentDirectoryLoader(indexLoadingConfig.getSegmentDirectoryLoader());
+    SegmentDirectory segmentDirectory = null;
+    try {
+      segmentDirectory = segmentLoader.load(indexDir.toURI(), segmentLoaderContext);
+    } catch (Exception e) {
+      LOGGER.warn("Failed to create SegmentDirectory for segment: {} of table: {} due to error: {}", segmentName,
+          _tableNameWithType, e.getMessage());
     }
 
-    Preconditions.checkState(allowDownload(segmentName, zkMetadata), "Segment: %s of table: %s does not allow download",
-        segmentName, _tableNameWithType);
-
-    // Download segment and replace the local one, either due to failure to recover local segment,
-    // or the segment data is updated and has new CRC now.
-    if (localMetadata == null) {
-      LOGGER.info("Download segment: {} of table: {} as no good one exists locally", segmentName, _tableNameWithType);
-    } else {
-      LOGGER.info("Download segment: {} of table: {} as local crc: {} mismatches remote crc: {}.", segmentName,
-          _tableNameWithType, localMetadata.getCrc(), zkMetadata.getCrc());
+    // Download the raw segment, reprocess and load it if it is never loaded or its CRC has changed.
+    if (segmentDirectory == null || !hasSameCRC(zkMetadata, segmentDirectory.getSegmentMetadata())) {
+      LOGGER.info("Segment: {} of table: {} not exist or its crc: {} differs from new crc: {}", segmentName,
+          _tableNameWithType, (segmentDirectory == null) ? "none" : segmentDirectory.getSegmentMetadata().getCrc(),
+          zkMetadata.getCrc());
+      closeSegmentDirectoryQuietly(segmentDirectory);
+      downloadRawSegmentAndProcess(segmentName, indexLoadingConfig, zkMetadata,
+          ZKMetadataProvider.getTableSchema(_propertyStore, _tableNameWithType));
+      return;
     }
-    File indexDir = downloadSegment(segmentName, zkMetadata);
-    addSegment(indexDir, indexLoadingConfig);
-    LOGGER.info("Downloaded and loaded segment: {} of table: {} with crc: {}", segmentName, _tableNameWithType,
-        zkMetadata.getCrc());
-  }
 
-  protected boolean allowDownload(String segmentName, SegmentZKMetadata zkMetadata) {
-    return true;
+    try {
+      Schema schema = ZKMetadataProvider.getTableSchema(_propertyStore, _tableNameWithType);
+      if (!ImmutableSegmentLoader.needPreprocess(segmentDirectory, indexLoadingConfig, schema)) {
+        // The loaded segment is still consistent with current table config or schema.
+        LOGGER.info("Segment: {} of table: {} is consistent with table config and schema", segmentName,
+            _tableNameWithType);
+        addSegment(ImmutableSegmentLoader.load(segmentDirectory, indexLoadingConfig, schema));
+        return;
+      }
+      // If any discrepancy is found, get the segment from tier backend, reprocess and load it.
+      // Please note that the segment is from tier backed, not deep store, for incremental processing.

Review comment:
       `s/backed/backend/`

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/data/manager/BaseTableDataManager.java
##########
@@ -275,53 +278,32 @@ public void addSegmentError(String segmentName, SegmentErrorInfo segmentErrorInf
 
   @Override
   public void reloadSegment(String segmentName, IndexLoadingConfig indexLoadingConfig, SegmentZKMetadata zkMetadata,
-      SegmentMetadata localMetadata, @Nullable Schema schema, boolean forceDownload)
+      SegmentMetadata segmentMetadata, @Nullable Schema schema, boolean forceDownload)
       throws Exception {
-    File indexDir = localMetadata.getIndexDir();
-    Preconditions.checkState(indexDir.isDirectory(), "Index directory: %s is not a directory", indexDir);
-
-    File parentFile = indexDir.getParentFile();
-    File segmentBackupDir =
-        new File(parentFile, indexDir.getName() + CommonConstants.Segment.SEGMENT_BACKUP_DIR_SUFFIX);
-
+    File indexDir = getSegmentDataDir(segmentName);
+    // Create backup dir to make segment reloading atomic for local tier backend.
+    LoaderUtils.createBackup(indexDir);
     try {
-      // First rename index directory to segment backup directory so that original segment have all file descriptors
-      // point to the segment backup directory to ensure original segment serves queries properly
-
-      // Rename index directory to segment backup directory (atomic)
-      Preconditions.checkState(indexDir.renameTo(segmentBackupDir),
-          "Failed to rename index directory: %s to segment backup directory: %s", indexDir, segmentBackupDir);
-
-      // Download from remote or copy from local backup directory into index directory,
-      // and then continue to load the segment from index directory.
-      boolean shouldDownload = forceDownload || !hasSameCRC(zkMetadata, localMetadata);
-      if (shouldDownload && allowDownload(segmentName, zkMetadata)) {
-        if (forceDownload) {
-          LOGGER.info("Segment: {} of table: {} is forced to download", segmentName, _tableNameWithType);
-        } else {
-          LOGGER.info("Download segment:{} of table: {} as local crc: {} mismatches remote crc: {}", segmentName,
-              _tableNameWithType, localMetadata.getCrc(), zkMetadata.getCrc());
-        }
-        indexDir = downloadSegment(segmentName, zkMetadata);
+      boolean shouldDownloadRawSegment = forceDownload || !hasSameCRC(zkMetadata, segmentMetadata);
+      if (shouldDownloadRawSegment && allowDownloadRawSegment(segmentName, zkMetadata)) {
+        downloadRawSegmentAndProcess(segmentName, indexLoadingConfig, zkMetadata, schema);
       } else {
-        LOGGER.info("Reload the local copy of segment: {} of table: {}", segmentName, _tableNameWithType);
-        FileUtils.copyDirectory(segmentBackupDir, indexDir);
+        SegmentDirectoryLoaderContext segmentLoaderContext =
+            new SegmentDirectoryLoaderContext(indexLoadingConfig.getTableConfig(), indexLoadingConfig.getInstanceId(),
+                segmentName, indexLoadingConfig.getSegmentDirectoryConfigs());
+        SegmentDirectoryLoader segmentLoader =
+            SegmentDirectoryLoaderRegistry.getSegmentDirectoryLoader(indexLoadingConfig.getSegmentDirectoryLoader());
+        // Get the segment from tier backend, reprocess and load it with current table config and schema.
+        // Please note that the segment is from tier backed, not deep store, for incremental processing.

Review comment:
       `s/backed/backend/`

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/data/manager/BaseTableDataManager.java
##########
@@ -333,92 +315,131 @@ public void reloadSegment(String segmentName, IndexLoadingConfig indexLoadingCon
 
   @Override
   public void addOrReplaceSegment(String segmentName, IndexLoadingConfig indexLoadingConfig,
-      SegmentZKMetadata zkMetadata, @Nullable SegmentMetadata localMetadata)
+      SegmentZKMetadata zkMetadata, @Nullable SegmentMetadata segmentMetadata)
       throws Exception {
-    if (!isNewSegment(zkMetadata, localMetadata)) {
-      LOGGER.info("Segment: {} of table: {} has crc: {} same as before, already loaded, do nothing", segmentName,
-          _tableNameWithType, localMetadata.getCrc());
+    // Non-null segment metadata means the segment has already been loaded.
+    if (segmentMetadata != null) {
+      if (hasSameCRC(zkMetadata, segmentMetadata)) {
+        // Simply returns if the CRC hasn't changed. The table config may have changed
+        // since segment is loaded, but that is handled by reloadSegment() method.
+        LOGGER.info("Segment: {} of table: {} has crc: {} same as before, already loaded, do nothing", segmentName,
+            _tableNameWithType, segmentMetadata.getCrc());
+      } else {
+        // Download the raw segment, reprocess and load it if the CRC has changed.
+        LOGGER.info("Segment: {} of table: {} already loaded but its crc: {} differs from new crc: {}", segmentName,
+            _tableNameWithType, segmentMetadata.getCrc(), zkMetadata.getCrc());
+        downloadRawSegmentAndProcess(segmentName, indexLoadingConfig, zkMetadata,
+            ZKMetadataProvider.getTableSchema(_propertyStore, _tableNameWithType));
+      }
       return;
     }
 
-    // Try to recover if no local metadata is provided.
-    if (localMetadata == null) {
-      LOGGER.info("Segment: {} of table: {} is not loaded, checking disk", segmentName, _tableNameWithType);
-      localMetadata = recoverSegmentQuietly(segmentName);
-      if (!isNewSegment(zkMetadata, localMetadata)) {
-        LOGGER.info("Segment: {} of table {} has crc: {} same as before, loading", segmentName, _tableNameWithType,
-            localMetadata.getCrc());
-        if (loadSegmentQuietly(segmentName, indexLoadingConfig)) {
-          return;
-        }
-        // Set local metadata to null to indicate that the local segment fails to load,
-        // although it exists and has same crc with the remote one.
-        localMetadata = null;
-      }
+    // For local tier backend, try to recover the segment from potential
+    // reload failure. Continue upon any failure.
+    File indexDir = getSegmentDataDir(segmentName);
+    recoverReloadFailureQuietly(_tableNameWithType, segmentName, indexDir);
+
+    // Creates the SegmentDirectory object to access the segment metadata that
+    // may be from local tier backend or remote tier backend.
+    SegmentDirectoryLoaderContext segmentLoaderContext =
+        new SegmentDirectoryLoaderContext(indexLoadingConfig.getTableConfig(), indexLoadingConfig.getInstanceId(),
+            segmentName, indexLoadingConfig.getSegmentDirectoryConfigs());
+    SegmentDirectoryLoader segmentLoader =
+        SegmentDirectoryLoaderRegistry.getSegmentDirectoryLoader(indexLoadingConfig.getSegmentDirectoryLoader());
+    SegmentDirectory segmentDirectory = null;
+    try {
+      segmentDirectory = segmentLoader.load(indexDir.toURI(), segmentLoaderContext);
+    } catch (Exception e) {
+      LOGGER.warn("Failed to create SegmentDirectory for segment: {} of table: {} due to error: {}", segmentName,
+          _tableNameWithType, e.getMessage());
     }
 
-    Preconditions.checkState(allowDownload(segmentName, zkMetadata), "Segment: %s of table: %s does not allow download",
-        segmentName, _tableNameWithType);
-
-    // Download segment and replace the local one, either due to failure to recover local segment,
-    // or the segment data is updated and has new CRC now.
-    if (localMetadata == null) {
-      LOGGER.info("Download segment: {} of table: {} as no good one exists locally", segmentName, _tableNameWithType);
-    } else {
-      LOGGER.info("Download segment: {} of table: {} as local crc: {} mismatches remote crc: {}.", segmentName,
-          _tableNameWithType, localMetadata.getCrc(), zkMetadata.getCrc());
+    // Download the raw segment, reprocess and load it if it is never loaded or its CRC has changed.
+    if (segmentDirectory == null || !hasSameCRC(zkMetadata, segmentDirectory.getSegmentMetadata())) {
+      LOGGER.info("Segment: {} of table: {} not exist or its crc: {} differs from new crc: {}", segmentName,
+          _tableNameWithType, (segmentDirectory == null) ? "none" : segmentDirectory.getSegmentMetadata().getCrc(),
+          zkMetadata.getCrc());
+      closeSegmentDirectoryQuietly(segmentDirectory);
+      downloadRawSegmentAndProcess(segmentName, indexLoadingConfig, zkMetadata,
+          ZKMetadataProvider.getTableSchema(_propertyStore, _tableNameWithType));
+      return;
     }
-    File indexDir = downloadSegment(segmentName, zkMetadata);
-    addSegment(indexDir, indexLoadingConfig);
-    LOGGER.info("Downloaded and loaded segment: {} of table: {} with crc: {}", segmentName, _tableNameWithType,
-        zkMetadata.getCrc());
-  }
 
-  protected boolean allowDownload(String segmentName, SegmentZKMetadata zkMetadata) {
-    return true;
+    try {
+      Schema schema = ZKMetadataProvider.getTableSchema(_propertyStore, _tableNameWithType);
+      if (!ImmutableSegmentLoader.needPreprocess(segmentDirectory, indexLoadingConfig, schema)) {
+        // The loaded segment is still consistent with current table config or schema.
+        LOGGER.info("Segment: {} of table: {} is consistent with table config and schema", segmentName,
+            _tableNameWithType);
+        addSegment(ImmutableSegmentLoader.load(segmentDirectory, indexLoadingConfig, schema));
+        return;
+      }
+      // If any discrepancy is found, get the segment from tier backend, reprocess and load it.
+      // Please note that the segment is from tier backed, not deep store, for incremental processing.
+      LOGGER.info("Segment: {} of table: {} needs reprocess with table config and schema", segmentName,
+          _tableNameWithType);
+      // Close the stale SegmentDirectory object before loading the newly processed segment.
+      closeSegmentDirectoryQuietly(segmentDirectory);
+      segmentLoader.download(indexDir, segmentLoaderContext);
+      processAndLoadSegment(segmentName, indexDir, indexLoadingConfig, schema);
+      LOGGER.info("Segment: {} of table: {} is reprocessed and loaded", segmentName, _tableNameWithType);
+    } catch (Exception e) {
+      closeSegmentDirectoryQuietly(segmentDirectory);
+      LOGGER.error("Failed to reprocess and load segment: {} of table: {}", segmentName, _tableNameWithType, e);
+      downloadRawSegmentAndProcess(segmentName, indexLoadingConfig, zkMetadata,
+          ZKMetadataProvider.getTableSchema(_propertyStore, _tableNameWithType));
+    }
   }
 
-  protected File downloadSegment(String segmentName, SegmentZKMetadata zkMetadata)
+  /**
+   * This method downloads the raw segment from the deep store and process it, mainly
+   * for cases where segment CRC has changed or the existing segment fails to load.
+   */
+  private void downloadRawSegmentAndProcess(String segmentName, IndexLoadingConfig indexLoadingConfig,
+      SegmentZKMetadata zkMetadata, @Nullable Schema schema)
       throws Exception {
-    // TODO: may support download from peer servers for RealTime table.
-    return downloadSegmentFromDeepStore(segmentName, zkMetadata);
+    Preconditions.checkState(allowDownloadRawSegment(segmentName, zkMetadata),
+        "Segment: %s of table: %s does not allow download raw segment", segmentName, _tableNameWithType);
+
+    File indexDir = downloadRawSegment(segmentName, zkMetadata);
+    processAndLoadSegment(segmentName, indexDir, indexLoadingConfig, schema);
+    LOGGER.info("Downloaded raw segment: {} of table: {} with crc: {} and loaded it", segmentName,
+        _tableNameWithType, zkMetadata.getCrc());
   }
 
-  /**
-   * Server restart during segment reload might leave segment directory in inconsistent state, like the index
-   * directory might not exist but segment backup directory existed. This method tries to recover from reload
-   * failure before checking the existence of the index directory and loading segment metadata from it.
-   */
-  private SegmentMetadata recoverSegmentQuietly(String segmentName) {
-    File indexDir = getSegmentDataDir(segmentName);
+  private void processAndLoadSegment(String segmentName, File indexDir, IndexLoadingConfig indexLoadingConfig,
+      @Nullable Schema schema)
+      throws Exception {
+    // Preprocess the segment locally with current table config and schema.
+    ImmutableSegmentLoader.preprocess(indexDir, indexLoadingConfig, schema);
+
+    SegmentDirectoryLoaderContext segmentLoaderContext =
+        new SegmentDirectoryLoaderContext(indexLoadingConfig.getTableConfig(), indexLoadingConfig.getInstanceId(),
+            segmentName, indexLoadingConfig.getSegmentDirectoryConfigs());
+    SegmentDirectoryLoader segmentLoader =
+        SegmentDirectoryLoaderRegistry.getSegmentDirectoryLoader(indexLoadingConfig.getSegmentDirectoryLoader());
+    SegmentDirectory segmentDirectory = null;
     try {
-      LoaderUtils.reloadFailureRecovery(indexDir);
-      if (!indexDir.exists()) {
-        LOGGER.info("Segment: {} of table: {} is not found on disk", segmentName, _tableNameWithType);
-        return null;
-      }
-      SegmentMetadataImpl localMetadata = new SegmentMetadataImpl(indexDir);
-      LOGGER.info("Segment: {} of table: {} with crc: {} from disk is ready for loading", segmentName,
-          _tableNameWithType, localMetadata.getCrc());
-      return localMetadata;
+      // Upload the processed segment to server tier backend, which can be local or remote.
+      segmentLoader.upload(indexDir, segmentLoaderContext);
+      // Create the SegmentDirectory object with the newly processed segment from tier backend.
+      segmentDirectory = segmentLoader.load(indexDir.toURI(), segmentLoaderContext);
+      addSegment(ImmutableSegmentLoader.load(segmentDirectory, indexLoadingConfig, schema));
     } catch (Exception e) {
-      LOGGER.error("Failed to recover segment: {} of table: {} from disk", segmentName, _tableNameWithType, e);
-      FileUtils.deleteQuietly(indexDir);
-      return null;
+      closeSegmentDirectoryQuietly(segmentDirectory);
+      LOGGER.warn("Failed to load newly processed segment: {} of table: {}", segmentName, _tableNameWithType, e);

Review comment:
       Why not use `error` level here since an exception gets thrown?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] snleee edited a comment on pull request #7969: refactor segment loading logic in BaseTableDataManager to decouple it with local segment directory

Posted by GitBox <gi...@apache.org>.
snleee edited a comment on pull request #7969:
URL: https://github.com/apache/pinot/pull/7969#issuecomment-1010323545


   @klsince I see. That may be a transient issue. I re-triggered the tests https://github.com/apache/pinot/actions/runs/1681129412


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] codecov-commenter edited a comment on pull request #7969: refactor segment loading logic in BaseTableDataManager to decouple it with local segment directory

Posted by GitBox <gi...@apache.org>.
codecov-commenter edited a comment on pull request #7969:
URL: https://github.com/apache/pinot/pull/7969#issuecomment-1002909897


   # [Codecov](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#7969](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (ac41900) into [master](https://codecov.io/gh/apache/pinot/commit/823aa07d7fc61543ed3d91992749d28aecd192ee?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (823aa07) will **decrease** coverage by `0.08%`.
   > The diff coverage is `75.00%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/pinot/pull/7969/graphs/tree.svg?width=650&height=150&src=pr&token=4ibza2ugkz&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   ```diff
   @@             Coverage Diff              @@
   ##             master    #7969      +/-   ##
   ============================================
   - Coverage     71.35%   71.27%   -0.09%     
   - Complexity     4218     4221       +3     
   ============================================
     Files          1595     1596       +1     
     Lines         82748    82834      +86     
     Branches      12348    12359      +11     
   ============================================
   - Hits          59046    59039       -7     
   - Misses        19715    19802      +87     
   - Partials       3987     3993       +6     
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | integration1 | `28.90% <27.90%> (-0.17%)` | :arrow_down: |
   | integration2 | `27.63% <27.32%> (+0.07%)` | :arrow_up: |
   | unittests1 | `68.16% <78.06%> (-0.03%)` | :arrow_down: |
   | unittests2 | `14.27% <0.00%> (+0.01%)` | :arrow_up: |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [...ent/index/column/PhysicalColumnIndexContainer.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2luZGV4L2NvbHVtbi9QaHlzaWNhbENvbHVtbkluZGV4Q29udGFpbmVyLmphdmE=) | `90.83% <0.00%> (-0.84%)` | :arrow_down: |
   | [...ache/pinot/segment/spi/store/SegmentDirectory.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1zcGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL3Bpbm90L3NlZ21lbnQvc3BpL3N0b3JlL1NlZ21lbnREaXJlY3RvcnkuamF2YQ==) | `55.55% <0.00%> (-6.95%)` | :arrow_down: |
   | [...egment/spi/index/metadata/SegmentMetadataImpl.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1zcGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL3Bpbm90L3NlZ21lbnQvc3BpL2luZGV4L21ldGFkYXRhL1NlZ21lbnRNZXRhZGF0YUltcGwuamF2YQ==) | `74.28% <28.57%> (-0.29%)` | :arrow_down: |
   | [...server/starter/helix/HelixInstanceDataManager.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VydmVyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9zZXJ2ZXIvc3RhcnRlci9oZWxpeC9IZWxpeEluc3RhbmNlRGF0YU1hbmFnZXIuamF2YQ==) | `79.44% <47.05%> (-2.64%)` | :arrow_down: |
   | [...indexsegment/immutable/ImmutableSegmentLoader.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9pbmRleHNlZ21lbnQvaW1tdXRhYmxlL0ltbXV0YWJsZVNlZ21lbnRMb2FkZXIuamF2YQ==) | `84.15% <73.52%> (-7.51%)` | :arrow_down: |
   | [.../pinot/core/data/manager/BaseTableDataManager.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9kYXRhL21hbmFnZXIvQmFzZVRhYmxlRGF0YU1hbmFnZXIuamF2YQ==) | `86.58% <80.24%> (+1.80%)` | :arrow_up: |
   | [...ocal/segment/index/loader/SegmentPreProcessor.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2luZGV4L2xvYWRlci9TZWdtZW50UHJlUHJvY2Vzc29yLmphdmE=) | `80.00% <83.33%> (+0.21%)` | :arrow_up: |
   | [...nt/local/loader/DefaultSegmentDirectoryLoader.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9sb2FkZXIvRGVmYXVsdFNlZ21lbnREaXJlY3RvcnlMb2FkZXIuamF2YQ==) | `100.00% <100.00%> (ø)` | |
   | [.../columnminmaxvalue/ColumnMinMaxValueGenerator.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2luZGV4L2xvYWRlci9jb2x1bW5taW5tYXh2YWx1ZS9Db2x1bW5NaW5NYXhWYWx1ZUdlbmVyYXRvci5qYXZh) | `73.68% <100.00%> (ø)` | |
   | [...loader/defaultcolumn/BaseDefaultColumnHandler.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2luZGV4L2xvYWRlci9kZWZhdWx0Y29sdW1uL0Jhc2VEZWZhdWx0Q29sdW1uSGFuZGxlci5qYXZh) | `63.26% <100.00%> (ø)` | |
   | ... and [34 more](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [823aa07...ac41900](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] mcvsubbu commented on pull request #7969: refactor segment loading logic in BaseTableDataManager to decouple it with local segment directory

Posted by GitBox <gi...@apache.org>.
mcvsubbu commented on pull request #7969:
URL: https://github.com/apache/pinot/pull/7969#issuecomment-1011287110


   Overall, I am a -1 on this PR. It has taken what was a bit complex but at least intuitive to read code to a completely confusing piece of code.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] klsince commented on pull request #7969: refactor segment loading logic in BaseTableDataManager to decouple it with local segment directory

Posted by GitBox <gi...@apache.org>.
klsince commented on pull request #7969:
URL: https://github.com/apache/pinot/pull/7969#issuecomment-1012564460


   > I think a factory for creating the appropriate segmentDirectory may help. And/or , an interface method at SegmentDirectory level of something like: `boolean isLocal()`
   
   Yes, currently, there is a factory method: SegmentDirectoryLoaderRegistry.getSegmentDirectoryLoader, which returns certain type of SegmentDirectoryLoader to initialize specific SegmentDirectory objects.
   
   I'm not so sure about SegmentDirectory.isLocal(). This PR didn't require it, as the segment location is made transparent behind SegmentDirectory interface. It may enable other future extensions and would love to revisit it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] codecov-commenter edited a comment on pull request #7969: extend SegmentDirectoryLoader interface and refactor BaseTableDataManager

Posted by GitBox <gi...@apache.org>.
codecov-commenter edited a comment on pull request #7969:
URL: https://github.com/apache/pinot/pull/7969#issuecomment-1002909897


   # [Codecov](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#7969](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (f76a05c) into [master](https://codecov.io/gh/apache/pinot/commit/a1f3c30e2ac74dd0acb67087a86d8f03b416f5b2?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (a1f3c30) will **decrease** coverage by `6.47%`.
   > The diff coverage is `82.71%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/pinot/pull/7969/graphs/tree.svg?width=650&height=150&src=pr&token=4ibza2ugkz&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   ```diff
   @@             Coverage Diff              @@
   ##             master    #7969      +/-   ##
   ============================================
   - Coverage     71.38%   64.91%   -6.48%     
   - Complexity     4200     4208       +8     
   ============================================
     Files          1595     1550      -45     
     Lines         82595    80803    -1792     
     Branches      12326    12128     -198     
   ============================================
   - Hits          58964    52453    -6511     
   - Misses        19645    24615    +4970     
   + Partials       3986     3735     -251     
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | integration1 | `?` | |
   | integration2 | `?` | |
   | unittests1 | `68.28% <82.71%> (+0.04%)` | :arrow_up: |
   | unittests2 | `14.29% <0.00%> (-0.02%)` | :arrow_down: |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [...ata/manager/realtime/RealtimeTableDataManager.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9kYXRhL21hbmFnZXIvcmVhbHRpbWUvUmVhbHRpbWVUYWJsZURhdGFNYW5hZ2VyLmphdmE=) | `11.78% <ø> (-56.10%)` | :arrow_down: |
   | [...ent/index/column/PhysicalColumnIndexContainer.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2luZGV4L2NvbHVtbi9QaHlzaWNhbENvbHVtbkluZGV4Q29udGFpbmVyLmphdmE=) | `90.83% <0.00%> (-0.84%)` | :arrow_down: |
   | [...egment/spi/index/metadata/SegmentMetadataImpl.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1zcGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL3Bpbm90L3NlZ21lbnQvc3BpL2luZGV4L21ldGFkYXRhL1NlZ21lbnRNZXRhZGF0YUltcGwuamF2YQ==) | `74.28% <28.57%> (-0.29%)` | :arrow_down: |
   | [...indexsegment/immutable/ImmutableSegmentLoader.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9pbmRleHNlZ21lbnQvaW1tdXRhYmxlL0ltbXV0YWJsZVNlZ21lbnRMb2FkZXIuamF2YQ==) | `87.50% <82.75%> (-4.17%)` | :arrow_down: |
   | [...ocal/segment/index/loader/SegmentPreProcessor.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2luZGV4L2xvYWRlci9TZWdtZW50UHJlUHJvY2Vzc29yLmphdmE=) | `80.00% <83.33%> (+0.21%)` | :arrow_up: |
   | [.../pinot/core/data/manager/BaseTableDataManager.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9kYXRhL21hbmFnZXIvQmFzZVRhYmxlRGF0YU1hbmFnZXIuamF2YQ==) | `86.78% <84.04%> (+1.88%)` | :arrow_up: |
   | [...nt/local/loader/DefaultSegmentDirectoryLoader.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9sb2FkZXIvRGVmYXVsdFNlZ21lbnREaXJlY3RvcnlMb2FkZXIuamF2YQ==) | `100.00% <100.00%> (ø)` | |
   | [...egment/local/segment/index/loader/LoaderUtils.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2luZGV4L2xvYWRlci9Mb2FkZXJVdGlscy5qYXZh) | `95.91% <100.00%> (+1.80%)` | :arrow_up: |
   | [.../columnminmaxvalue/ColumnMinMaxValueGenerator.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2luZGV4L2xvYWRlci9jb2x1bW5taW5tYXh2YWx1ZS9Db2x1bW5NaW5NYXhWYWx1ZUdlbmVyYXRvci5qYXZh) | `73.68% <100.00%> (ø)` | |
   | [...loader/defaultcolumn/BaseDefaultColumnHandler.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2luZGV4L2xvYWRlci9kZWZhdWx0Y29sdW1uL0Jhc2VEZWZhdWx0Q29sdW1uSGFuZGxlci5qYXZh) | `63.26% <100.00%> (ø)` | |
   | ... and [377 more](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [a1f3c30...f76a05c](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] klsince commented on a change in pull request #7969: extend SegmentDirectoryLoader interface and refactor BaseTableDataManager

Posted by GitBox <gi...@apache.org>.
klsince commented on a change in pull request #7969:
URL: https://github.com/apache/pinot/pull/7969#discussion_r777735083



##########
File path: pinot-server/src/main/java/org/apache/pinot/server/starter/helix/HelixInstanceDataManager.java
##########
@@ -293,6 +292,26 @@ private void reloadSegment(String tableNameWithType, SegmentMetadata segmentMeta
     }
   }
 
+  @VisibleForTesting
+  boolean reloadMutableSegment(String tableNameWithType, String segmentName,
+      SegmentDataManager segmentDataManager, @Nullable Schema schema) {
+    IndexSegment segment = segmentDataManager.getSegment();
+    if (segment instanceof ImmutableSegment) {
+      LOGGER.info("Reloading OFFLINE segment: {} in table: {} not using local tier backend", segmentName,

Review comment:
       think I used 'OFFLINE' due to seeing 'REALTIME' in next log msg. But the next log msg actually specified `consuming` for clarity. I'll fix this and refactor per your suggestion. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] klsince commented on a change in pull request #7969: extend SegmentDirectoryLoader interface and refactor BaseTableDataManager

Posted by GitBox <gi...@apache.org>.
klsince commented on a change in pull request #7969:
URL: https://github.com/apache/pinot/pull/7969#discussion_r778324865



##########
File path: pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/index/loader/LoaderUtils.java
##########
@@ -161,4 +160,36 @@ public static void reloadFailureRecovery(File indexDir)
       FileUtils.forceDelete(segmentTempDir);
     }
   }
+
+  public static void createBackup(File indexDir)
+      throws IOException {
+    if (!indexDir.exists()) {
+      return;
+    }
+    File parentDir = indexDir.getParentFile();
+    File segmentBackupDir = new File(parentDir, indexDir.getName() + CommonConstants.Segment.SEGMENT_BACKUP_DIR_SUFFIX);
+    // Rename index directory to segment backup directory (atomic)
+    Preconditions.checkState(indexDir.renameTo(segmentBackupDir),
+        "Failed to rename index directory: %s to segment backup directory: %s", indexDir, segmentBackupDir);
+    // Copy the backup dir back to proceed.
+    FileUtils.copyDirectory(segmentBackupDir, indexDir);

Review comment:
       `renameTo` happens at L172. The logic to create backup dir is not changed but moved to this util method.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] codecov-commenter edited a comment on pull request #7969: extend SegmentDirectoryLoader interface and refactor BaseTableDataManager

Posted by GitBox <gi...@apache.org>.
codecov-commenter edited a comment on pull request #7969:
URL: https://github.com/apache/pinot/pull/7969#issuecomment-1002909897


   # [Codecov](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#7969](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (03c234b) into [master](https://codecov.io/gh/apache/pinot/commit/a1f3c30e2ac74dd0acb67087a86d8f03b416f5b2?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (a1f3c30) will **decrease** coverage by `6.49%`.
   > The diff coverage is `82.92%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/pinot/pull/7969/graphs/tree.svg?width=650&height=150&src=pr&token=4ibza2ugkz&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   ```diff
   @@             Coverage Diff              @@
   ##             master    #7969      +/-   ##
   ============================================
   - Coverage     71.38%   64.89%   -6.50%     
   - Complexity     4200     4209       +9     
   ============================================
     Files          1595     1550      -45     
     Lines         82595    80824    -1771     
     Branches      12326    12132     -194     
   ============================================
   - Hits          58964    52453    -6511     
   - Misses        19645    24638    +4993     
   + Partials       3986     3733     -253     
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | integration1 | `?` | |
   | integration2 | `?` | |
   | unittests1 | `68.25% <82.92%> (+0.02%)` | :arrow_up: |
   | unittests2 | `14.28% <0.00%> (-0.02%)` | :arrow_down: |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [...ata/manager/realtime/RealtimeTableDataManager.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9kYXRhL21hbmFnZXIvcmVhbHRpbWUvUmVhbHRpbWVUYWJsZURhdGFNYW5hZ2VyLmphdmE=) | `11.78% <ø> (-56.10%)` | :arrow_down: |
   | [...ent/index/column/PhysicalColumnIndexContainer.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2luZGV4L2NvbHVtbi9QaHlzaWNhbENvbHVtbkluZGV4Q29udGFpbmVyLmphdmE=) | `90.83% <0.00%> (-0.84%)` | :arrow_down: |
   | [...egment/spi/index/metadata/SegmentMetadataImpl.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1zcGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL3Bpbm90L3NlZ21lbnQvc3BpL2luZGV4L21ldGFkYXRhL1NlZ21lbnRNZXRhZGF0YUltcGwuamF2YQ==) | `74.28% <28.57%> (-0.29%)` | :arrow_down: |
   | [...indexsegment/immutable/ImmutableSegmentLoader.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9pbmRleHNlZ21lbnQvaW1tdXRhYmxlL0ltbXV0YWJsZVNlZ21lbnRMb2FkZXIuamF2YQ==) | `87.50% <82.75%> (-4.17%)` | :arrow_down: |
   | [...ocal/segment/index/loader/SegmentPreProcessor.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2luZGV4L2xvYWRlci9TZWdtZW50UHJlUHJvY2Vzc29yLmphdmE=) | `80.00% <83.33%> (+0.21%)` | :arrow_up: |
   | [.../pinot/core/data/manager/BaseTableDataManager.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9kYXRhL21hbmFnZXIvQmFzZVRhYmxlRGF0YU1hbmFnZXIuamF2YQ==) | `86.89% <84.37%> (+2.00%)` | :arrow_up: |
   | [...nt/local/loader/DefaultSegmentDirectoryLoader.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9sb2FkZXIvRGVmYXVsdFNlZ21lbnREaXJlY3RvcnlMb2FkZXIuamF2YQ==) | `100.00% <100.00%> (ø)` | |
   | [...egment/local/segment/index/loader/LoaderUtils.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2luZGV4L2xvYWRlci9Mb2FkZXJVdGlscy5qYXZh) | `95.91% <100.00%> (+1.80%)` | :arrow_up: |
   | [.../columnminmaxvalue/ColumnMinMaxValueGenerator.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2luZGV4L2xvYWRlci9jb2x1bW5taW5tYXh2YWx1ZS9Db2x1bW5NaW5NYXhWYWx1ZUdlbmVyYXRvci5qYXZh) | `73.68% <100.00%> (ø)` | |
   | [...loader/defaultcolumn/BaseDefaultColumnHandler.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2luZGV4L2xvYWRlci9kZWZhdWx0Y29sdW1uL0Jhc2VEZWZhdWx0Q29sdW1uSGFuZGxlci5qYXZh) | `63.26% <100.00%> (ø)` | |
   | ... and [376 more](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [a1f3c30...03c234b](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] klsince commented on a change in pull request #7969: extend SegmentDirectoryLoader interface and refactor BaseTableDataManager

Posted by GitBox <gi...@apache.org>.
klsince commented on a change in pull request #7969:
URL: https://github.com/apache/pinot/pull/7969#discussion_r778455280



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/data/manager/BaseTableDataManager.java
##########
@@ -275,53 +278,23 @@ public void addSegmentError(String segmentName, SegmentErrorInfo segmentErrorInf
 
   @Override
   public void reloadSegment(String segmentName, IndexLoadingConfig indexLoadingConfig, SegmentZKMetadata zkMetadata,
-      SegmentMetadata localMetadata, @Nullable Schema schema, boolean forceDownload)
+      SegmentMetadata segmentMetadata, @Nullable Schema schema, boolean forceDownload)
       throws Exception {
-    File indexDir = localMetadata.getIndexDir();
-    Preconditions.checkState(indexDir.isDirectory(), "Index directory: %s is not a directory", indexDir);
-
-    File parentFile = indexDir.getParentFile();
-    File segmentBackupDir =
-        new File(parentFile, indexDir.getName() + CommonConstants.Segment.SEGMENT_BACKUP_DIR_SUFFIX);
-
+    File indexDir = getSegmentDataDir(segmentName);
+    // Create backup dir to make segment reloading atomic for local tier backend.
+    LoaderUtils.createBackup(indexDir);
     try {
-      // First rename index directory to segment backup directory so that original segment have all file descriptors
-      // point to the segment backup directory to ensure original segment serves queries properly
-
-      // Rename index directory to segment backup directory (atomic)
-      Preconditions.checkState(indexDir.renameTo(segmentBackupDir),
-          "Failed to rename index directory: %s to segment backup directory: %s", indexDir, segmentBackupDir);
-
-      // Download from remote or copy from local backup directory into index directory,
-      // and then continue to load the segment from index directory.
-      boolean shouldDownload = forceDownload || !hasSameCRC(zkMetadata, localMetadata);
-      if (shouldDownload && allowDownload(segmentName, zkMetadata)) {
-        if (forceDownload) {
-          LOGGER.info("Segment: {} of table: {} is forced to download", segmentName, _tableNameWithType);
-        } else {
-          LOGGER.info("Download segment:{} of table: {} as local crc: {} mismatches remote crc: {}", segmentName,
-              _tableNameWithType, localMetadata.getCrc(), zkMetadata.getCrc());
-        }
-        indexDir = downloadSegment(segmentName, zkMetadata);
+      boolean shouldDownloadRawSegment = forceDownload || !hasSameCRC(zkMetadata, segmentMetadata);
+      if (shouldDownloadRawSegment && allowDownloadRawSegment(segmentName, zkMetadata)) {
+        downloadRawSegmentAndProcess(segmentName, indexLoadingConfig, zkMetadata, schema);

Review comment:
       thanks for the pointers. think I should make it clear that it means: a tier of servers with local or remote backend




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] klsince commented on a change in pull request #7969: extend SegmentDirectoryLoader interface and refactor BaseTableDataManager

Posted by GitBox <gi...@apache.org>.
klsince commented on a change in pull request #7969:
URL: https://github.com/apache/pinot/pull/7969#discussion_r778478593



##########
File path: pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/index/loader/LoaderUtils.java
##########
@@ -161,4 +160,36 @@ public static void reloadFailureRecovery(File indexDir)
       FileUtils.forceDelete(segmentTempDir);
     }
   }
+
+  public static void createBackup(File indexDir)
+      throws IOException {
+    if (!indexDir.exists()) {
+      return;
+    }
+    File parentDir = indexDir.getParentFile();
+    File segmentBackupDir = new File(parentDir, indexDir.getName() + CommonConstants.Segment.SEGMENT_BACKUP_DIR_SUFFIX);
+    // Rename index directory to segment backup directory (atomic)
+    Preconditions.checkState(indexDir.renameTo(segmentBackupDir),
+        "Failed to rename index directory: %s to segment backup directory: %s", indexDir, segmentBackupDir);
+    // Copy the backup dir back to proceed.
+    FileUtils.copyDirectory(segmentBackupDir, indexDir);

Review comment:
       It's on purpose to rename and then copy back, so that the subsequent segment preprocessing happens on the copy dir w/o affecting the ongoing queries, which if any are accessing the renamed dir which is kept intact even if segment reloading fails on the copy dir. As context, renameTo is an atomic operation.
   
   > prior to this PR, we just downloaded the segment 
   
   Previously, downloading doesn't always happens. It happens if segment CRC changes or forceDownload option is set to true, when one requests to reload segment (so downloading raw segment is relatively rare). In this PR, I made copy always happens, and if downloading happens, it overwrites the copy dir. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] klsince commented on a change in pull request #7969: refactor segment loading logic in BaseTableDataManager to decouple it with local segment directory

Posted by GitBox <gi...@apache.org>.
klsince commented on a change in pull request #7969:
URL: https://github.com/apache/pinot/pull/7969#discussion_r780030087



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/data/manager/BaseTableDataManager.java
##########
@@ -333,94 +306,162 @@ public void reloadSegment(String segmentName, IndexLoadingConfig indexLoadingCon
 
   @Override
   public void addOrReplaceSegment(String segmentName, IndexLoadingConfig indexLoadingConfig,
-      SegmentZKMetadata zkMetadata, @Nullable SegmentMetadata localMetadata)
+      SegmentZKMetadata zkMetadata, @Nullable SegmentMetadata segmentMetadata)
       throws Exception {
-    if (!isNewSegment(zkMetadata, localMetadata)) {
-      LOGGER.info("Segment: {} of table: {} has crc: {} same as before, already loaded, do nothing", segmentName,
-          _tableNameWithType, localMetadata.getCrc());
+    // Non-null segment metadata means the segment has already been loaded.
+    if (segmentMetadata != null) {
+      if (hasSameCRC(zkMetadata, segmentMetadata)) {
+        // Simply returns if the CRC hasn't changed. The table config may have changed
+        // since segment is loaded, but that is handled by reloadSegment() method.
+        LOGGER.info("Segment: {} of table: {} has crc: {} same as before, already loaded, do nothing", segmentName,
+            _tableNameWithType, segmentMetadata.getCrc());
+      } else {
+        // Download the raw segment, reprocess and load it if the CRC has changed.
+        LOGGER.info("Segment: {} of table: {} already loaded but its crc: {} differs from new crc: {}", segmentName,
+            _tableNameWithType, segmentMetadata.getCrc(), zkMetadata.getCrc());
+        downloadRawSegmentAndProcess(segmentName, indexLoadingConfig, zkMetadata,
+            ZKMetadataProvider.getTableSchema(_propertyStore, _tableNameWithType));
+      }
       return;
     }
 
-    // Try to recover if no local metadata is provided.
-    if (localMetadata == null) {
-      LOGGER.info("Segment: {} of table: {} is not loaded, checking disk", segmentName, _tableNameWithType);
-      localMetadata = recoverSegmentQuietly(segmentName);
-      if (!isNewSegment(zkMetadata, localMetadata)) {
-        LOGGER.info("Segment: {} of table {} has crc: {} same as before, loading", segmentName, _tableNameWithType,
-            localMetadata.getCrc());
-        if (loadSegmentQuietly(segmentName, indexLoadingConfig)) {
-          return;
-        }
-        // Set local metadata to null to indicate that the local segment fails to load,
-        // although it exists and has same crc with the remote one.
-        localMetadata = null;
-      }
-    }
+    // For local tier backend, try to recover the segment from potential
+    // reload failure. Continue upon any failure.
+    File indexDir = getSegmentDataDir(segmentName);
+    recoverReloadFailureQuietly(_tableNameWithType, segmentName, indexDir);
 
-    Preconditions.checkState(allowDownload(segmentName, zkMetadata), "Segment: %s of table: %s does not allow download",
-        segmentName, _tableNameWithType);
+    // Creates the SegmentDirectory object to access the segment metadata that
+    // may be from local tier backend or remote tier backend.
+    SegmentDirectory segmentDirectory = null;
+    try {
+      SegmentDirectoryLoaderContext loaderContext =
+          new SegmentDirectoryLoaderContext(indexLoadingConfig.getTableConfig(), indexLoadingConfig.getInstanceId(),
+              segmentName, indexLoadingConfig.getSegmentDirectoryConfigs());
+      SegmentDirectoryLoader segmentDirectoryLoader =
+          SegmentDirectoryLoaderRegistry.getSegmentDirectoryLoader(indexLoadingConfig.getSegmentDirectoryLoader());
+      segmentDirectory = segmentDirectoryLoader.load(indexDir.toURI(), loaderContext);
+    } catch (Exception e) {
+      LOGGER.warn("Failed to create SegmentDirectory for segment: {} of table: {} due to error: {}", segmentName,
+          _tableNameWithType, e.getMessage());
+    }
 
-    // Download segment and replace the local one, either due to failure to recover local segment,
-    // or the segment data is updated and has new CRC now.
-    if (localMetadata == null) {
-      LOGGER.info("Download segment: {} of table: {} as no good one exists locally", segmentName, _tableNameWithType);
-    } else {
-      LOGGER.info("Download segment: {} of table: {} as local crc: {} mismatches remote crc: {}.", segmentName,
-          _tableNameWithType, localMetadata.getCrc(), zkMetadata.getCrc());
+    // Download the raw segment, reprocess and load it if it is never loaded or its CRC has changed.
+    if (segmentDirectory == null) {
+      LOGGER.info("Segment: {} of table: {} does not exist", segmentName, _tableNameWithType);
+      downloadRawSegmentAndProcess(segmentName, indexLoadingConfig, zkMetadata,
+          ZKMetadataProvider.getTableSchema(_propertyStore, _tableNameWithType));
+      return;
+    }
+    if (!hasSameCRC(zkMetadata, segmentDirectory.getSegmentMetadata())) {
+      LOGGER.info("Segment: {} of table: {} has crc: {} different from new crc: {}", segmentName,
+          _tableNameWithType, segmentDirectory.getSegmentMetadata().getCrc(), zkMetadata.getCrc());
+      closeSegmentDirectoryQuietly(segmentDirectory);
+      downloadRawSegmentAndProcess(segmentName, indexLoadingConfig, zkMetadata,
+          ZKMetadataProvider.getTableSchema(_propertyStore, _tableNameWithType));
+      return;
     }
-    File indexDir = downloadSegment(segmentName, zkMetadata);
-    addSegment(indexDir, indexLoadingConfig);
-    LOGGER.info("Downloaded and loaded segment: {} of table: {} with crc: {}", segmentName, _tableNameWithType,
-        zkMetadata.getCrc());
-  }
 
-  protected boolean allowDownload(String segmentName, SegmentZKMetadata zkMetadata) {
-    return true;
+    try {
+      Schema schema = ZKMetadataProvider.getTableSchema(_propertyStore, _tableNameWithType);
+      if (!ImmutableSegmentLoader.needPreprocess(segmentDirectory, indexLoadingConfig, schema)) {
+        // The loaded segment is still consistent with current table config or schema.
+        LOGGER.info("Segment: {} of table: {} is consistent with table config and schema", segmentName,
+            _tableNameWithType);
+        loadSegment(segmentDirectory, indexLoadingConfig, schema);
+        return;
+      }
+      // If any discrepancy is found, get the segment from tier backend, reprocess and load it.
+      LOGGER.info("Segment: {} of table: {} needs reprocess to reflect latest table config and schema", segmentName,
+          _tableNameWithType);
+      // Close the stale SegmentDirectory object before loading the newly processed segment.
+      closeSegmentDirectoryQuietly(segmentDirectory);
+      downloadTierSegmentAndProcess(segmentName, indexLoadingConfig, schema);
+    } catch (Exception e) {
+      closeSegmentDirectoryQuietly(segmentDirectory);
+      LOGGER.error("Failed to reprocess and load segment: {} of table: {}", segmentName, _tableNameWithType, e);
+      downloadRawSegmentAndProcess(segmentName, indexLoadingConfig, zkMetadata,
+          ZKMetadataProvider.getTableSchema(_propertyStore, _tableNameWithType));
+    }
   }
 
-  protected File downloadSegment(String segmentName, SegmentZKMetadata zkMetadata)
+  /**
+   * Get the segment from the configured tier backend, reprocess it with the latest table
+   * config and schema and then load it. Please note that the segment is from tier backend,
+   * not deep store, for incremental processing.
+   */
+  private void downloadTierSegmentAndProcess(String segmentName, IndexLoadingConfig indexLoadingConfig,
+      @Nullable Schema schema)
       throws Exception {
-    // TODO: may support download from peer servers for RealTime table.
-    return downloadSegmentFromDeepStore(segmentName, zkMetadata);
+    SegmentDirectoryLoaderContext loaderContext =
+        new SegmentDirectoryLoaderContext(indexLoadingConfig.getTableConfig(), indexLoadingConfig.getInstanceId(),
+            segmentName, indexLoadingConfig.getSegmentDirectoryConfigs());
+    SegmentDirectoryLoader segmentDirectoryLoader =
+        SegmentDirectoryLoaderRegistry.getSegmentDirectoryLoader(indexLoadingConfig.getSegmentDirectoryLoader());
+    File indexDir = getSegmentDataDir(segmentName);
+    segmentDirectoryLoader.download(indexDir, loaderContext);

Review comment:
       Refactored with new way proposed above. Please review the updated code. Thanks!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] codecov-commenter edited a comment on pull request #7969: refactor segment loading logic in BaseTableDataManager to decouple it with local segment directory

Posted by GitBox <gi...@apache.org>.
codecov-commenter edited a comment on pull request #7969:
URL: https://github.com/apache/pinot/pull/7969#issuecomment-1002909897


   # [Codecov](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#7969](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (a65dbaf) into [master](https://codecov.io/gh/apache/pinot/commit/a1f3c30e2ac74dd0acb67087a86d8f03b416f5b2?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (a1f3c30) will **decrease** coverage by `0.06%`.
   > The diff coverage is `75.00%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/pinot/pull/7969/graphs/tree.svg?width=650&height=150&src=pr&token=4ibza2ugkz&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   ```diff
   @@             Coverage Diff              @@
   ##             master    #7969      +/-   ##
   ============================================
   - Coverage     71.38%   71.32%   -0.07%     
   - Complexity     4200     4221      +21     
   ============================================
     Files          1595     1595              
     Lines         82595    82816     +221     
     Branches      12326    12356      +30     
   ============================================
   + Hits          58964    59067     +103     
   - Misses        19645    19748     +103     
   - Partials       3986     4001      +15     
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | integration1 | `28.99% <27.90%> (-0.09%)` | :arrow_down: |
   | integration2 | `27.69% <27.32%> (-0.07%)` | :arrow_down: |
   | unittests1 | `68.15% <78.06%> (-0.08%)` | :arrow_down: |
   | unittests2 | `14.26% <0.00%> (-0.05%)` | :arrow_down: |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [...ent/index/column/PhysicalColumnIndexContainer.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2luZGV4L2NvbHVtbi9QaHlzaWNhbENvbHVtbkluZGV4Q29udGFpbmVyLmphdmE=) | `90.83% <0.00%> (-0.84%)` | :arrow_down: |
   | [...ache/pinot/segment/spi/store/SegmentDirectory.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1zcGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL3Bpbm90L3NlZ21lbnQvc3BpL3N0b3JlL1NlZ21lbnREaXJlY3RvcnkuamF2YQ==) | `55.55% <0.00%> (-6.95%)` | :arrow_down: |
   | [...egment/spi/index/metadata/SegmentMetadataImpl.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1zcGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL3Bpbm90L3NlZ21lbnQvc3BpL2luZGV4L21ldGFkYXRhL1NlZ21lbnRNZXRhZGF0YUltcGwuamF2YQ==) | `74.28% <28.57%> (-0.29%)` | :arrow_down: |
   | [...server/starter/helix/HelixInstanceDataManager.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VydmVyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9zZXJ2ZXIvc3RhcnRlci9oZWxpeC9IZWxpeEluc3RhbmNlRGF0YU1hbmFnZXIuamF2YQ==) | `79.44% <47.05%> (-2.06%)` | :arrow_down: |
   | [...indexsegment/immutable/ImmutableSegmentLoader.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9pbmRleHNlZ21lbnQvaW1tdXRhYmxlL0ltbXV0YWJsZVNlZ21lbnRMb2FkZXIuamF2YQ==) | `84.15% <73.52%> (-7.51%)` | :arrow_down: |
   | [.../pinot/core/data/manager/BaseTableDataManager.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9kYXRhL21hbmFnZXIvQmFzZVRhYmxlRGF0YU1hbmFnZXIuamF2YQ==) | `86.72% <80.24%> (+1.82%)` | :arrow_up: |
   | [...ocal/segment/index/loader/SegmentPreProcessor.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2luZGV4L2xvYWRlci9TZWdtZW50UHJlUHJvY2Vzc29yLmphdmE=) | `80.00% <83.33%> (+0.21%)` | :arrow_up: |
   | [...nt/local/loader/DefaultSegmentDirectoryLoader.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9sb2FkZXIvRGVmYXVsdFNlZ21lbnREaXJlY3RvcnlMb2FkZXIuamF2YQ==) | `100.00% <100.00%> (ø)` | |
   | [.../columnminmaxvalue/ColumnMinMaxValueGenerator.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2luZGV4L2xvYWRlci9jb2x1bW5taW5tYXh2YWx1ZS9Db2x1bW5NaW5NYXhWYWx1ZUdlbmVyYXRvci5qYXZh) | `73.68% <100.00%> (ø)` | |
   | [...loader/defaultcolumn/BaseDefaultColumnHandler.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2luZGV4L2xvYWRlci9kZWZhdWx0Y29sdW1uL0Jhc2VEZWZhdWx0Q29sdW1uSGFuZGxlci5qYXZh) | `63.26% <100.00%> (ø)` | |
   | ... and [32 more](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [a1f3c30...a65dbaf](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] klsince commented on a change in pull request #7969: extend SegmentDirectoryLoader interface and refactor BaseTableDataManager

Posted by GitBox <gi...@apache.org>.
klsince commented on a change in pull request #7969:
URL: https://github.com/apache/pinot/pull/7969#discussion_r778346750



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/data/manager/BaseTableDataManager.java
##########
@@ -275,53 +278,23 @@ public void addSegmentError(String segmentName, SegmentErrorInfo segmentErrorInf
 
   @Override
   public void reloadSegment(String segmentName, IndexLoadingConfig indexLoadingConfig, SegmentZKMetadata zkMetadata,
-      SegmentMetadata localMetadata, @Nullable Schema schema, boolean forceDownload)
+      SegmentMetadata segmentMetadata, @Nullable Schema schema, boolean forceDownload)
       throws Exception {
-    File indexDir = localMetadata.getIndexDir();
-    Preconditions.checkState(indexDir.isDirectory(), "Index directory: %s is not a directory", indexDir);
-
-    File parentFile = indexDir.getParentFile();
-    File segmentBackupDir =
-        new File(parentFile, indexDir.getName() + CommonConstants.Segment.SEGMENT_BACKUP_DIR_SUFFIX);
-
+    File indexDir = getSegmentDataDir(segmentName);
+    // Create backup dir to make segment reloading atomic for local tier backend.
+    LoaderUtils.createBackup(indexDir);
     try {
-      // First rename index directory to segment backup directory so that original segment have all file descriptors
-      // point to the segment backup directory to ensure original segment serves queries properly
-
-      // Rename index directory to segment backup directory (atomic)
-      Preconditions.checkState(indexDir.renameTo(segmentBackupDir),
-          "Failed to rename index directory: %s to segment backup directory: %s", indexDir, segmentBackupDir);
-
-      // Download from remote or copy from local backup directory into index directory,
-      // and then continue to load the segment from index directory.
-      boolean shouldDownload = forceDownload || !hasSameCRC(zkMetadata, localMetadata);
-      if (shouldDownload && allowDownload(segmentName, zkMetadata)) {
-        if (forceDownload) {
-          LOGGER.info("Segment: {} of table: {} is forced to download", segmentName, _tableNameWithType);
-        } else {
-          LOGGER.info("Download segment:{} of table: {} as local crc: {} mismatches remote crc: {}", segmentName,
-              _tableNameWithType, localMetadata.getCrc(), zkMetadata.getCrc());
-        }
-        indexDir = downloadSegment(segmentName, zkMetadata);
+      boolean shouldDownloadRawSegment = forceDownload || !hasSameCRC(zkMetadata, segmentMetadata);
+      if (shouldDownloadRawSegment && allowDownloadRawSegment(segmentName, zkMetadata)) {
+        downloadRawSegmentAndProcess(segmentName, indexLoadingConfig, zkMetadata, schema);
       } else {
-        LOGGER.info("Reload the local copy of segment: {} of table: {}", segmentName, _tableNameWithType);
-        FileUtils.copyDirectory(segmentBackupDir, indexDir);
+        downloadTierSegmentAndProcess(segmentName, indexLoadingConfig, schema);
       }
-
-      // Load from index directory and replace the old segment in memory.
-      addSegment(ImmutableSegmentLoader.load(indexDir, indexLoadingConfig, schema));
-
-      // Rename segment backup directory to segment temporary directory (atomic)
-      // The reason to first rename then delete is that, renaming is an atomic operation, but deleting is not. When we
-      // rename the segment backup directory to segment temporary directory, we know the reload already succeeded, so
-      // that we can safely delete the segment temporary directory
-      File segmentTempDir = new File(parentFile, indexDir.getName() + CommonConstants.Segment.SEGMENT_TEMP_DIR_SUFFIX);
-      Preconditions.checkState(segmentBackupDir.renameTo(segmentTempDir),
-          "Failed to rename segment backup directory: %s to segment temporary directory: %s", segmentBackupDir,
-          segmentTempDir);
-      FileUtils.deleteDirectory(segmentTempDir);
+      // Remove backup dir to mark the completion of reloading for local tier backend.
+      LoaderUtils.removeBackup(indexDir);

Review comment:
       hmm.. do you mean to do renameTo outside the createBackup() and removeBackup() util methods?
   
   I think `rename, then copy or delete` are pretty tightly coupled to finish one operation atomically, and I feel separating them would increase complexity for future maintenance. 
   
   btw, I extracted the two new util methods createBackup() and removeBackup() into LoaderUtils after seeing `reloadFailureRecovery` method already there. I kinda prefer to having those methods in one place, as they are all for the failure handling.
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] klsince edited a comment on pull request #7969: refactor segment loading logic in BaseTableDataManager to decouple it with local segment directory

Posted by GitBox <gi...@apache.org>.
klsince edited a comment on pull request #7969:
URL: https://github.com/apache/pinot/pull/7969#issuecomment-1010314758


   > Please fix the integration test. Thanks!
   
   Yes, I'm investigating the failure after rebasing the latest master. It's running ok on my local. So I'm using a [debug PR](https://github.com/apache/pinot/pull/7991) to dig on this. That debug PR is made off the latest master and with a few prints. But as you can see the failure has moved to another integ test case there. That made me wonder if the Github CI env got some issue. Not sure if you've got permission to help take a quick look at the CI env. Thank you @snleee.  


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] mcvsubbu commented on a change in pull request #7969: refactor segment loading logic in BaseTableDataManager to decouple it with local segment directory

Posted by GitBox <gi...@apache.org>.
mcvsubbu commented on a change in pull request #7969:
URL: https://github.com/apache/pinot/pull/7969#discussion_r784280438



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/data/manager/BaseTableDataManager.java
##########
@@ -371,9 +360,9 @@ public void addOrReplaceSegment(String segmentName, IndexLoadingConfig indexLoad
     // Download segment and replace the local one, either due to failure to recover local segment,
     // or the segment data is updated and has new CRC now.
     if (localMetadata == null) {
-      LOGGER.info("Download segment: {} of table: {} as no good one exists locally", segmentName, _tableNameWithType);
+      LOGGER.info("Download segment: {} of table: {} as it doesn't exist", segmentName, _tableNameWithType);
     } else {
-      LOGGER.info("Download segment: {} of table: {} as local crc: {} mismatches remote crc: {}.", segmentName,
+      LOGGER.info("Download segment: {} of table: {} as crc changes from: {} to: {}", segmentName,

Review comment:
       will this log message happen twice when crc changes? is there a way to avoid that? It is already logged in tryLoadExisting() ... method

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/data/manager/BaseTableDataManager.java
##########
@@ -493,12 +445,139 @@ protected File getTmpSegmentDataDir(String segmentName) {
     return new File(_resourceTmpDir, segmentName);
   }
 
-  @VisibleForTesting
-  static boolean isNewSegment(SegmentZKMetadata zkMetadata, @Nullable SegmentMetadata localMetadata) {
-    return localMetadata == null || !hasSameCRC(zkMetadata, localMetadata);
+  /**
+   * Create a backup directory to handle failure of segment reloading.
+   * First rename index directory to segment backup directory so that original segment have all file
+   * descriptors point to the segment backup directory to ensure original segment serves queries properly.
+   * The original index directory is restored lazily, as depending on the conditions,
+   * it may be restored from the backup directory or segment downloaded from deep store.
+   */
+  private void createBackup(File indexDir) {
+    if (!indexDir.exists()) {
+      return;
+    }
+    File parentDir = indexDir.getParentFile();
+    File segmentBackupDir = new File(parentDir, indexDir.getName() + CommonConstants.Segment.SEGMENT_BACKUP_DIR_SUFFIX);
+    // Rename index directory to segment backup directory (atomic).
+    Preconditions.checkState(indexDir.renameTo(segmentBackupDir),
+        "Failed to rename index directory: %s to segment backup directory: %s", indexDir, segmentBackupDir);
+  }
+
+  /**
+   * Remove the backup directory to mark the completion of segment reloading.
+   * First rename then delete is as renaming is an atomic operation, but deleting is not.
+   * When we rename the segment backup directory to segment temporary directory, we know the reload
+   * already succeeded, so that we can safely delete the segment temporary directory.
+   */
+  private void removeBackup(File indexDir)
+      throws IOException {
+    File parentDir = indexDir.getParentFile();
+    File segmentBackupDir = new File(parentDir, indexDir.getName() + CommonConstants.Segment.SEGMENT_BACKUP_DIR_SUFFIX);
+    if (!segmentBackupDir.exists()) {
+      return;
+    }
+    // Rename segment backup directory to segment temporary directory (atomic).
+    File segmentTempDir = new File(parentDir, indexDir.getName() + CommonConstants.Segment.SEGMENT_TEMP_DIR_SUFFIX);
+    Preconditions.checkState(segmentBackupDir.renameTo(segmentTempDir),
+        "Failed to rename segment backup directory: %s to segment temporary directory: %s", segmentBackupDir,
+        segmentTempDir);
+    FileUtils.deleteDirectory(segmentTempDir);
+  }
+
+  private boolean tryLoadExistingSegment(String segmentName, IndexLoadingConfig indexLoadingConfig,

Review comment:
       Can u add a comment here stating the conditions when it returns true or false. As I understand, true means it existed locally and has been loaded into memory. false means it may exist locally with a different CRC or it may not exist at all. Is the SegmentDirectory always closed upon return?

##########
File path: pinot-server/src/main/java/org/apache/pinot/server/starter/helix/HelixInstanceDataManager.java
##########
@@ -307,6 +306,30 @@ private void reloadSegment(String tableNameWithType, SegmentMetadata segmentMeta
     }
   }
 
+  /**
+   * Try to reload a mutable segment.
+   * @return true if the segment is mutable; false if the segment is immutable.

Review comment:
       ```suggestion
      * @return true if the segment is mutable and loaded; false if the segment is immutable.
   ```

##########
File path: pinot-server/src/main/java/org/apache/pinot/server/starter/helix/HelixInstanceDataManager.java
##########
@@ -307,6 +306,30 @@ private void reloadSegment(String tableNameWithType, SegmentMetadata segmentMeta
     }
   }
 
+  /**
+   * Try to reload a mutable segment.
+   * @return true if the segment is mutable; false if the segment is immutable.
+   */
+  @VisibleForTesting
+  boolean reloadMutableSegment(String tableNameWithType, String segmentName,

Review comment:
       Maybe rename to reloadIfMutabelSegment() ? 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] codecov-commenter edited a comment on pull request #7969: refactor segment loading logic in BaseTableDataManager to decouple it with local segment directory

Posted by GitBox <gi...@apache.org>.
codecov-commenter edited a comment on pull request #7969:
URL: https://github.com/apache/pinot/pull/7969#issuecomment-1002909897


   # [Codecov](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#7969](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (ad6d167) into [master](https://codecov.io/gh/apache/pinot/commit/823aa07d7fc61543ed3d91992749d28aecd192ee?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (823aa07) will **decrease** coverage by `1.23%`.
   > The diff coverage is `75.00%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/pinot/pull/7969/graphs/tree.svg?width=650&height=150&src=pr&token=4ibza2ugkz&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   ```diff
   @@             Coverage Diff              @@
   ##             master    #7969      +/-   ##
   ============================================
   - Coverage     71.35%   70.11%   -1.24%     
   - Complexity     4218     4221       +3     
   ============================================
     Files          1595     1596       +1     
     Lines         82748    82874     +126     
     Branches      12348    12365      +17     
   ============================================
   - Hits          59046    58110     -936     
   - Misses        19715    20778    +1063     
   + Partials       3987     3986       -1     
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | integration1 | `?` | |
   | integration2 | `27.63% <27.32%> (+0.07%)` | :arrow_up: |
   | unittests1 | `68.13% <78.06%> (-0.05%)` | :arrow_down: |
   | unittests2 | `14.33% <0.00%> (+0.06%)` | :arrow_up: |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [...ent/index/column/PhysicalColumnIndexContainer.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2luZGV4L2NvbHVtbi9QaHlzaWNhbENvbHVtbkluZGV4Q29udGFpbmVyLmphdmE=) | `90.83% <0.00%> (-0.84%)` | :arrow_down: |
   | [...ache/pinot/segment/spi/store/SegmentDirectory.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1zcGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL3Bpbm90L3NlZ21lbnQvc3BpL3N0b3JlL1NlZ21lbnREaXJlY3RvcnkuamF2YQ==) | `55.55% <0.00%> (-6.95%)` | :arrow_down: |
   | [...egment/spi/index/metadata/SegmentMetadataImpl.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1zcGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL3Bpbm90L3NlZ21lbnQvc3BpL2luZGV4L21ldGFkYXRhL1NlZ21lbnRNZXRhZGF0YUltcGwuamF2YQ==) | `74.28% <28.57%> (-0.29%)` | :arrow_down: |
   | [...server/starter/helix/HelixInstanceDataManager.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VydmVyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9zZXJ2ZXIvc3RhcnRlci9oZWxpeC9IZWxpeEluc3RhbmNlRGF0YU1hbmFnZXIuamF2YQ==) | `79.44% <47.05%> (-2.64%)` | :arrow_down: |
   | [...indexsegment/immutable/ImmutableSegmentLoader.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9pbmRleHNlZ21lbnQvaW1tdXRhYmxlL0ltbXV0YWJsZVNlZ21lbnRMb2FkZXIuamF2YQ==) | `84.15% <73.52%> (-7.51%)` | :arrow_down: |
   | [.../pinot/core/data/manager/BaseTableDataManager.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9kYXRhL21hbmFnZXIvQmFzZVRhYmxlRGF0YU1hbmFnZXIuamF2YQ==) | `86.58% <80.24%> (+1.80%)` | :arrow_up: |
   | [...ocal/segment/index/loader/SegmentPreProcessor.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2luZGV4L2xvYWRlci9TZWdtZW50UHJlUHJvY2Vzc29yLmphdmE=) | `80.00% <83.33%> (+0.21%)` | :arrow_up: |
   | [...nt/local/loader/DefaultSegmentDirectoryLoader.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9sb2FkZXIvRGVmYXVsdFNlZ21lbnREaXJlY3RvcnlMb2FkZXIuamF2YQ==) | `100.00% <100.00%> (ø)` | |
   | [.../columnminmaxvalue/ColumnMinMaxValueGenerator.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2luZGV4L2xvYWRlci9jb2x1bW5taW5tYXh2YWx1ZS9Db2x1bW5NaW5NYXhWYWx1ZUdlbmVyYXRvci5qYXZh) | `73.68% <100.00%> (ø)` | |
   | [...loader/defaultcolumn/BaseDefaultColumnHandler.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2luZGV4L2xvYWRlci9kZWZhdWx0Y29sdW1uL0Jhc2VEZWZhdWx0Q29sdW1uSGFuZGxlci5qYXZh) | `63.26% <100.00%> (ø)` | |
   | ... and [125 more](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [823aa07...ad6d167](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] codecov-commenter edited a comment on pull request #7969: refactor segment loading logic in BaseTableDataManager to decouple it with local segment directory

Posted by GitBox <gi...@apache.org>.
codecov-commenter edited a comment on pull request #7969:
URL: https://github.com/apache/pinot/pull/7969#issuecomment-1002909897


   # [Codecov](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#7969](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (ac41900) into [master](https://codecov.io/gh/apache/pinot/commit/823aa07d7fc61543ed3d91992749d28aecd192ee?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (823aa07) will **decrease** coverage by `6.52%`.
   > The diff coverage is `78.06%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/pinot/pull/7969/graphs/tree.svg?width=650&height=150&src=pr&token=4ibza2ugkz&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   ```diff
   @@             Coverage Diff              @@
   ##             master    #7969      +/-   ##
   ============================================
   - Coverage     71.35%   64.83%   -6.53%     
   - Complexity     4218     4221       +3     
   ============================================
     Files          1595     1551      -44     
     Lines         82748    80960    -1788     
     Branches      12348    12155     -193     
   ============================================
   - Hits          59046    52491    -6555     
   - Misses        19715    24731    +5016     
   + Partials       3987     3738     -249     
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | integration1 | `?` | |
   | integration2 | `?` | |
   | unittests1 | `68.16% <78.06%> (-0.03%)` | :arrow_down: |
   | unittests2 | `14.27% <0.00%> (+0.01%)` | :arrow_up: |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [...ent/index/column/PhysicalColumnIndexContainer.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2luZGV4L2NvbHVtbi9QaHlzaWNhbENvbHVtbkluZGV4Q29udGFpbmVyLmphdmE=) | `90.83% <0.00%> (-0.84%)` | :arrow_down: |
   | [...ache/pinot/segment/spi/store/SegmentDirectory.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1zcGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL3Bpbm90L3NlZ21lbnQvc3BpL3N0b3JlL1NlZ21lbnREaXJlY3RvcnkuamF2YQ==) | `55.55% <0.00%> (-6.95%)` | :arrow_down: |
   | [...egment/spi/index/metadata/SegmentMetadataImpl.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1zcGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL3Bpbm90L3NlZ21lbnQvc3BpL2luZGV4L21ldGFkYXRhL1NlZ21lbnRNZXRhZGF0YUltcGwuamF2YQ==) | `74.28% <28.57%> (-0.29%)` | :arrow_down: |
   | [...indexsegment/immutable/ImmutableSegmentLoader.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9pbmRleHNlZ21lbnQvaW1tdXRhYmxlL0ltbXV0YWJsZVNlZ21lbnRMb2FkZXIuamF2YQ==) | `84.15% <73.52%> (-7.51%)` | :arrow_down: |
   | [.../pinot/core/data/manager/BaseTableDataManager.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9kYXRhL21hbmFnZXIvQmFzZVRhYmxlRGF0YU1hbmFnZXIuamF2YQ==) | `86.14% <80.24%> (+1.37%)` | :arrow_up: |
   | [...ocal/segment/index/loader/SegmentPreProcessor.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2luZGV4L2xvYWRlci9TZWdtZW50UHJlUHJvY2Vzc29yLmphdmE=) | `80.00% <83.33%> (+0.21%)` | :arrow_up: |
   | [...nt/local/loader/DefaultSegmentDirectoryLoader.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9sb2FkZXIvRGVmYXVsdFNlZ21lbnREaXJlY3RvcnlMb2FkZXIuamF2YQ==) | `100.00% <100.00%> (ø)` | |
   | [.../columnminmaxvalue/ColumnMinMaxValueGenerator.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2luZGV4L2xvYWRlci9jb2x1bW5taW5tYXh2YWx1ZS9Db2x1bW5NaW5NYXhWYWx1ZUdlbmVyYXRvci5qYXZh) | `73.68% <100.00%> (ø)` | |
   | [...loader/defaultcolumn/BaseDefaultColumnHandler.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2luZGV4L2xvYWRlci9kZWZhdWx0Y29sdW1uL0Jhc2VEZWZhdWx0Q29sdW1uSGFuZGxlci5qYXZh) | `63.26% <100.00%> (ø)` | |
   | [...t/local/segment/store/SegmentLocalFSDirectory.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L3N0b3JlL1NlZ21lbnRMb2NhbEZTRGlyZWN0b3J5LmphdmE=) | `71.89% <100.00%> (+2.60%)` | :arrow_up: |
   | ... and [381 more](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [823aa07...ac41900](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] codecov-commenter edited a comment on pull request #7969: refactor segment loading logic in BaseTableDataManager to decouple it with local segment directory

Posted by GitBox <gi...@apache.org>.
codecov-commenter edited a comment on pull request #7969:
URL: https://github.com/apache/pinot/pull/7969#issuecomment-1002909897


   # [Codecov](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#7969](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (e801de0) into [master](https://codecov.io/gh/apache/pinot/commit/823aa07d7fc61543ed3d91992749d28aecd192ee?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (823aa07) will **decrease** coverage by `6.53%`.
   > The diff coverage is `78.06%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/pinot/pull/7969/graphs/tree.svg?width=650&height=150&src=pr&token=4ibza2ugkz&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   ```diff
   @@             Coverage Diff              @@
   ##             master    #7969      +/-   ##
   ============================================
   - Coverage     71.35%   64.82%   -6.54%     
   - Complexity     4218     4223       +5     
   ============================================
     Files          1595     1550      -45     
     Lines         82748    80952    -1796     
     Branches      12348    12155     -193     
   ============================================
   - Hits          59046    52475    -6571     
   - Misses        19715    24734    +5019     
   + Partials       3987     3743     -244     
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | integration1 | `?` | |
   | integration2 | `?` | |
   | unittests1 | `68.17% <78.06%> (-0.01%)` | :arrow_down: |
   | unittests2 | `14.25% <0.00%> (-0.02%)` | :arrow_down: |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [...ent/index/column/PhysicalColumnIndexContainer.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2luZGV4L2NvbHVtbi9QaHlzaWNhbENvbHVtbkluZGV4Q29udGFpbmVyLmphdmE=) | `90.83% <0.00%> (-0.84%)` | :arrow_down: |
   | [...ache/pinot/segment/spi/store/SegmentDirectory.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1zcGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL3Bpbm90L3NlZ21lbnQvc3BpL3N0b3JlL1NlZ21lbnREaXJlY3RvcnkuamF2YQ==) | `55.55% <0.00%> (-6.95%)` | :arrow_down: |
   | [...egment/spi/index/metadata/SegmentMetadataImpl.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1zcGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL3Bpbm90L3NlZ21lbnQvc3BpL2luZGV4L21ldGFkYXRhL1NlZ21lbnRNZXRhZGF0YUltcGwuamF2YQ==) | `74.28% <28.57%> (-0.29%)` | :arrow_down: |
   | [...indexsegment/immutable/ImmutableSegmentLoader.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9pbmRleHNlZ21lbnQvaW1tdXRhYmxlL0ltbXV0YWJsZVNlZ21lbnRMb2FkZXIuamF2YQ==) | `84.15% <73.52%> (-7.51%)` | :arrow_down: |
   | [.../pinot/core/data/manager/BaseTableDataManager.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9kYXRhL21hbmFnZXIvQmFzZVRhYmxlRGF0YU1hbmFnZXIuamF2YQ==) | `86.14% <80.24%> (+1.37%)` | :arrow_up: |
   | [...ocal/segment/index/loader/SegmentPreProcessor.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2luZGV4L2xvYWRlci9TZWdtZW50UHJlUHJvY2Vzc29yLmphdmE=) | `80.00% <83.33%> (+0.21%)` | :arrow_up: |
   | [...nt/local/loader/DefaultSegmentDirectoryLoader.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9sb2FkZXIvRGVmYXVsdFNlZ21lbnREaXJlY3RvcnlMb2FkZXIuamF2YQ==) | `100.00% <100.00%> (ø)` | |
   | [.../columnminmaxvalue/ColumnMinMaxValueGenerator.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2luZGV4L2xvYWRlci9jb2x1bW5taW5tYXh2YWx1ZS9Db2x1bW5NaW5NYXhWYWx1ZUdlbmVyYXRvci5qYXZh) | `73.68% <100.00%> (ø)` | |
   | [...loader/defaultcolumn/BaseDefaultColumnHandler.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2luZGV4L2xvYWRlci9kZWZhdWx0Y29sdW1uL0Jhc2VEZWZhdWx0Q29sdW1uSGFuZGxlci5qYXZh) | `63.26% <100.00%> (ø)` | |
   | [...t/local/segment/store/SegmentLocalFSDirectory.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L3N0b3JlL1NlZ21lbnRMb2NhbEZTRGlyZWN0b3J5LmphdmE=) | `71.89% <100.00%> (+2.60%)` | :arrow_up: |
   | ... and [377 more](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [823aa07...e801de0](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] codecov-commenter edited a comment on pull request #7969: refactor segment loading logic in BaseTableDataManager to decouple it with local segment directory

Posted by GitBox <gi...@apache.org>.
codecov-commenter edited a comment on pull request #7969:
URL: https://github.com/apache/pinot/pull/7969#issuecomment-1002909897


   # [Codecov](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#7969](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (e801de0) into [master](https://codecov.io/gh/apache/pinot/commit/823aa07d7fc61543ed3d91992749d28aecd192ee?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (823aa07) will **decrease** coverage by `57.10%`.
   > The diff coverage is `0.00%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/pinot/pull/7969/graphs/tree.svg?width=650&height=150&src=pr&token=4ibza2ugkz&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   ```diff
   @@              Coverage Diff              @@
   ##             master    #7969       +/-   ##
   =============================================
   - Coverage     71.35%   14.25%   -57.11%     
   + Complexity     4218       81     -4137     
   =============================================
     Files          1595     1550       -45     
     Lines         82748    80952     -1796     
     Branches      12348    12155      -193     
   =============================================
   - Hits          59046    11537    -47509     
   - Misses        19715    68553    +48838     
   + Partials       3987      862     -3125     
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | integration1 | `?` | |
   | integration2 | `?` | |
   | unittests1 | `?` | |
   | unittests2 | `14.25% <0.00%> (-0.02%)` | :arrow_down: |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [.../pinot/core/data/manager/BaseTableDataManager.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9kYXRhL21hbmFnZXIvQmFzZVRhYmxlRGF0YU1hbmFnZXIuamF2YQ==) | `0.00% <0.00%> (-84.78%)` | :arrow_down: |
   | [...indexsegment/immutable/ImmutableSegmentLoader.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9pbmRleHNlZ21lbnQvaW1tdXRhYmxlL0ltbXV0YWJsZVNlZ21lbnRMb2FkZXIuamF2YQ==) | `0.00% <0.00%> (-91.67%)` | :arrow_down: |
   | [...nt/local/loader/DefaultSegmentDirectoryLoader.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9sb2FkZXIvRGVmYXVsdFNlZ21lbnREaXJlY3RvcnlMb2FkZXIuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [...ent/index/column/PhysicalColumnIndexContainer.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2luZGV4L2NvbHVtbi9QaHlzaWNhbENvbHVtbkluZGV4Q29udGFpbmVyLmphdmE=) | `0.00% <0.00%> (-91.67%)` | :arrow_down: |
   | [...ocal/segment/index/loader/SegmentPreProcessor.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2luZGV4L2xvYWRlci9TZWdtZW50UHJlUHJvY2Vzc29yLmphdmE=) | `0.00% <0.00%> (-79.79%)` | :arrow_down: |
   | [.../columnminmaxvalue/ColumnMinMaxValueGenerator.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2luZGV4L2xvYWRlci9jb2x1bW5taW5tYXh2YWx1ZS9Db2x1bW5NaW5NYXhWYWx1ZUdlbmVyYXRvci5qYXZh) | `0.00% <0.00%> (-73.69%)` | :arrow_down: |
   | [...loader/defaultcolumn/BaseDefaultColumnHandler.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2luZGV4L2xvYWRlci9kZWZhdWx0Y29sdW1uL0Jhc2VEZWZhdWx0Q29sdW1uSGFuZGxlci5qYXZh) | `0.00% <0.00%> (-63.27%)` | :arrow_down: |
   | [...t/local/segment/store/SegmentLocalFSDirectory.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L3N0b3JlL1NlZ21lbnRMb2NhbEZTRGlyZWN0b3J5LmphdmE=) | `0.00% <0.00%> (-69.29%)` | :arrow_down: |
   | [...egment/spi/index/metadata/SegmentMetadataImpl.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1zcGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL3Bpbm90L3NlZ21lbnQvc3BpL2luZGV4L21ldGFkYXRhL1NlZ21lbnRNZXRhZGF0YUltcGwuamF2YQ==) | `0.00% <0.00%> (-74.57%)` | :arrow_down: |
   | [...ache/pinot/segment/spi/store/SegmentDirectory.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1zcGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL3Bpbm90L3NlZ21lbnQvc3BpL3N0b3JlL1NlZ21lbnREaXJlY3RvcnkuamF2YQ==) | `0.00% <0.00%> (-62.50%)` | :arrow_down: |
   | ... and [1277 more](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [823aa07...e801de0](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] klsince commented on a change in pull request #7969: refactor segment loading logic in BaseTableDataManager to decouple it with local segment directory

Posted by GitBox <gi...@apache.org>.
klsince commented on a change in pull request #7969:
URL: https://github.com/apache/pinot/pull/7969#discussion_r781622620



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/data/manager/BaseTableDataManager.java
##########
@@ -277,49 +282,36 @@ public void addSegmentError(String segmentName, SegmentErrorInfo segmentErrorInf
   public void reloadSegment(String segmentName, IndexLoadingConfig indexLoadingConfig, SegmentZKMetadata zkMetadata,
       SegmentMetadata localMetadata, @Nullable Schema schema, boolean forceDownload)
       throws Exception {
-    File indexDir = localMetadata.getIndexDir();
-    Preconditions.checkState(indexDir.isDirectory(), "Index directory: %s is not a directory", indexDir);
-
-    File parentFile = indexDir.getParentFile();
-    File segmentBackupDir =
-        new File(parentFile, indexDir.getName() + CommonConstants.Segment.SEGMENT_BACKUP_DIR_SUFFIX);
-
+    File indexDir = getSegmentDataDir(segmentName);
     try {
-      // First rename index directory to segment backup directory so that original segment have all file descriptors
-      // point to the segment backup directory to ensure original segment serves queries properly
+      // Create backup directory to handle failure of segment reloading.
+      createBackup(indexDir);
 
-      // Rename index directory to segment backup directory (atomic)
-      Preconditions.checkState(indexDir.renameTo(segmentBackupDir),
-          "Failed to rename index directory: %s to segment backup directory: %s", indexDir, segmentBackupDir);
-
-      // Download from remote or copy from local backup directory into index directory,
-      // and then continue to load the segment from index directory.
+      // Download segment from deep store if CRC changes or forced to download;
+      // otherwise, copy backup directory back to the original index directory.
+      // And then continue to load the segment from the index directory.
       boolean shouldDownload = forceDownload || !hasSameCRC(zkMetadata, localMetadata);
       if (shouldDownload && allowDownload(segmentName, zkMetadata)) {
         if (forceDownload) {
           LOGGER.info("Segment: {} of table: {} is forced to download", segmentName, _tableNameWithType);
         } else {
-          LOGGER.info("Download segment:{} of table: {} as local crc: {} mismatches remote crc: {}", segmentName,
+          LOGGER.info("Download segment:{} of table: {} as crc changes from: {} to: {}", segmentName,
               _tableNameWithType, localMetadata.getCrc(), zkMetadata.getCrc());
         }
         indexDir = downloadSegment(segmentName, zkMetadata);
       } else {
-        LOGGER.info("Reload the local copy of segment: {} of table: {}", segmentName, _tableNameWithType);
-        FileUtils.copyDirectory(segmentBackupDir, indexDir);
+        LOGGER.info("Reload existing segment: {} of table: {}", segmentName, _tableNameWithType);
+        try (SegmentDirectory segmentDirectory = initSegmentDirectory(segmentName, indexLoadingConfig)) {

Review comment:
       Thanks for the feedback! 
   
   A key motivation to use `SegmentDirectory.copyTo(indexDir)` instead of `FileUtils.copyDirectory(segmentBackupDir, indexDir)` is to abstract where the source directory is. So, I'd suggest to read current logic as 
   ```
   1. make a copy of SegmentDirectory at indexDir; (SegmentDirectory knows source dir is kept)
   2. load index dir
   ```
   
   This allows some implementation of SegmentDirectory to copy from places other than the local backup dir.
   
   Following this abstraction, SegmentLocalFSDirectory.copyTo() doesn't assume what the `destination file` is, but it knows where the source is, either the index dir if it is present or the backup dir.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] mcvsubbu commented on a change in pull request #7969: refactor segment loading logic in BaseTableDataManager to decouple it with local segment directory

Posted by GitBox <gi...@apache.org>.
mcvsubbu commented on a change in pull request #7969:
URL: https://github.com/apache/pinot/pull/7969#discussion_r783294197



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/data/manager/BaseTableDataManager.java
##########
@@ -493,12 +445,139 @@ protected File getTmpSegmentDataDir(String segmentName) {
     return new File(_resourceTmpDir, segmentName);
   }
 
-  @VisibleForTesting
-  static boolean isNewSegment(SegmentZKMetadata zkMetadata, @Nullable SegmentMetadata localMetadata) {
-    return localMetadata == null || !hasSameCRC(zkMetadata, localMetadata);
+  /**
+   * Create a backup directory to handle failure of segment reloading.
+   * First rename index directory to segment backup directory so that original segment have all file
+   * descriptors point to the segment backup directory to ensure original segment serves queries properly.
+   * The original index directory is restored lazily, as depending on the conditions,
+   * it may be restored from the backup directory or segment downloaded from deep store.
+   */
+  private void createBackup(File indexDir) {
+    if (!indexDir.exists()) {
+      return;
+    }
+    File parentDir = indexDir.getParentFile();
+    File segmentBackupDir = new File(parentDir, indexDir.getName() + CommonConstants.Segment.SEGMENT_BACKUP_DIR_SUFFIX);
+    // Rename index directory to segment backup directory (atomic).
+    Preconditions.checkState(indexDir.renameTo(segmentBackupDir),
+        "Failed to rename index directory: %s to segment backup directory: %s", indexDir, segmentBackupDir);
+  }
+
+  /**
+   * Remove the backup directory to mark the completion of segment reloading.
+   * First rename then delete is as renaming is an atomic operation, but deleting is not.
+   * When we rename the segment backup directory to segment temporary directory, we know the reload
+   * already succeeded, so that we can safely delete the segment temporary directory.
+   */
+  private void removeBackup(File indexDir)
+      throws IOException {
+    File parentDir = indexDir.getParentFile();
+    File segmentBackupDir = new File(parentDir, indexDir.getName() + CommonConstants.Segment.SEGMENT_BACKUP_DIR_SUFFIX);
+    if (!segmentBackupDir.exists()) {
+      return;
+    }
+    // Rename segment backup directory to segment temporary directory (atomic).
+    File segmentTempDir = new File(parentDir, indexDir.getName() + CommonConstants.Segment.SEGMENT_TEMP_DIR_SUFFIX);
+    Preconditions.checkState(segmentBackupDir.renameTo(segmentTempDir),
+        "Failed to rename segment backup directory: %s to segment temporary directory: %s", segmentBackupDir,
+        segmentTempDir);
+    FileUtils.deleteDirectory(segmentTempDir);
+  }
+
+  private boolean tryLoadExistingSegment(String segmentName, IndexLoadingConfig indexLoadingConfig,
+      SegmentZKMetadata zkMetadata) {
+    // Try to recover the segment from potential segment reloading failure.
+    File indexDir = getSegmentDataDir(segmentName);
+    recoverReloadFailureQuietly(_tableNameWithType, segmentName, indexDir);
+
+    // Creates the SegmentDirectory object to access the segment metadata.
+    // The metadata is null if the segment doesn't exist yet.
+    SegmentDirectory segmentDirectory = tryInitSegmentDirectory(segmentName, indexLoadingConfig);

Review comment:
       Can you verify that we don't see a warning in the normal case of loading a new segment? It seems to me that the following steps will happen.
   
   addOrReplace() -> tryLoadExistingSegment() -> tryInitSegmentDirectory() -> initSegmentDirectory() 
   
   The last method catches and exception and logs a warning.
   
   There should not be any warnings in the normal case.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] codecov-commenter edited a comment on pull request #7969: refactor segment loading logic in BaseTableDataManager to decouple it with local segment directory

Posted by GitBox <gi...@apache.org>.
codecov-commenter edited a comment on pull request #7969:
URL: https://github.com/apache/pinot/pull/7969#issuecomment-1002909897


   # [Codecov](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#7969](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (ad6d167) into [master](https://codecov.io/gh/apache/pinot/commit/823aa07d7fc61543ed3d91992749d28aecd192ee?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (823aa07) will **decrease** coverage by `0.04%`.
   > The diff coverage is `75.00%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/pinot/pull/7969/graphs/tree.svg?width=650&height=150&src=pr&token=4ibza2ugkz&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   ```diff
   @@             Coverage Diff              @@
   ##             master    #7969      +/-   ##
   ============================================
   - Coverage     71.35%   71.31%   -0.05%     
   - Complexity     4218     4221       +3     
   ============================================
     Files          1595     1596       +1     
     Lines         82748    82874     +126     
     Branches      12348    12365      +17     
   ============================================
   + Hits          59046    59102      +56     
   - Misses        19715    19766      +51     
   - Partials       3987     4006      +19     
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | integration1 | `28.97% <27.90%> (-0.10%)` | :arrow_down: |
   | integration2 | `27.63% <27.32%> (+0.07%)` | :arrow_up: |
   | unittests1 | `68.13% <78.06%> (-0.05%)` | :arrow_down: |
   | unittests2 | `14.33% <0.00%> (+0.06%)` | :arrow_up: |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [...ent/index/column/PhysicalColumnIndexContainer.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2luZGV4L2NvbHVtbi9QaHlzaWNhbENvbHVtbkluZGV4Q29udGFpbmVyLmphdmE=) | `90.83% <0.00%> (-0.84%)` | :arrow_down: |
   | [...ache/pinot/segment/spi/store/SegmentDirectory.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1zcGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL3Bpbm90L3NlZ21lbnQvc3BpL3N0b3JlL1NlZ21lbnREaXJlY3RvcnkuamF2YQ==) | `55.55% <0.00%> (-6.95%)` | :arrow_down: |
   | [...egment/spi/index/metadata/SegmentMetadataImpl.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1zcGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL3Bpbm90L3NlZ21lbnQvc3BpL2luZGV4L21ldGFkYXRhL1NlZ21lbnRNZXRhZGF0YUltcGwuamF2YQ==) | `74.28% <28.57%> (-0.29%)` | :arrow_down: |
   | [...server/starter/helix/HelixInstanceDataManager.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VydmVyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9zZXJ2ZXIvc3RhcnRlci9oZWxpeC9IZWxpeEluc3RhbmNlRGF0YU1hbmFnZXIuamF2YQ==) | `79.44% <47.05%> (-2.64%)` | :arrow_down: |
   | [...indexsegment/immutable/ImmutableSegmentLoader.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9pbmRleHNlZ21lbnQvaW1tdXRhYmxlL0ltbXV0YWJsZVNlZ21lbnRMb2FkZXIuamF2YQ==) | `84.15% <73.52%> (-7.51%)` | :arrow_down: |
   | [.../pinot/core/data/manager/BaseTableDataManager.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9kYXRhL21hbmFnZXIvQmFzZVRhYmxlRGF0YU1hbmFnZXIuamF2YQ==) | `87.44% <80.24%> (+2.67%)` | :arrow_up: |
   | [...ocal/segment/index/loader/SegmentPreProcessor.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2luZGV4L2xvYWRlci9TZWdtZW50UHJlUHJvY2Vzc29yLmphdmE=) | `80.00% <83.33%> (+0.21%)` | :arrow_up: |
   | [...nt/local/loader/DefaultSegmentDirectoryLoader.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9sb2FkZXIvRGVmYXVsdFNlZ21lbnREaXJlY3RvcnlMb2FkZXIuamF2YQ==) | `100.00% <100.00%> (ø)` | |
   | [.../columnminmaxvalue/ColumnMinMaxValueGenerator.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2luZGV4L2xvYWRlci9jb2x1bW5taW5tYXh2YWx1ZS9Db2x1bW5NaW5NYXhWYWx1ZUdlbmVyYXRvci5qYXZh) | `73.68% <100.00%> (ø)` | |
   | [...loader/defaultcolumn/BaseDefaultColumnHandler.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2luZGV4L2xvYWRlci9kZWZhdWx0Y29sdW1uL0Jhc2VEZWZhdWx0Q29sdW1uSGFuZGxlci5qYXZh) | `63.26% <100.00%> (ø)` | |
   | ... and [42 more](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [823aa07...ad6d167](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] npawar commented on a change in pull request #7969: extend SegmentDirectoryLoader interface and refactor BaseTableDataManager

Posted by GitBox <gi...@apache.org>.
npawar commented on a change in pull request #7969:
URL: https://github.com/apache/pinot/pull/7969#discussion_r777649669



##########
File path: pinot-server/src/main/java/org/apache/pinot/server/starter/helix/HelixInstanceDataManager.java
##########
@@ -293,6 +292,26 @@ private void reloadSegment(String tableNameWithType, SegmentMetadata segmentMeta
     }
   }
 
+  @VisibleForTesting
+  boolean reloadMutableSegment(String tableNameWithType, String segmentName,
+      SegmentDataManager segmentDataManager, @Nullable Schema schema) {
+    IndexSegment segment = segmentDataManager.getSegment();
+    if (segment instanceof ImmutableSegment) {
+      LOGGER.info("Reloading OFFLINE segment: {} in table: {} not using local tier backend", segmentName,

Review comment:
       this comment is confusing. The segment is not in OFFLINE state right? this is simply a completed segment of a table that could be either realtime or offline, and is on tier backend?
   How about something like "Reloading ImmutableSegment: {} of table: {} on remote tier" ?

##########
File path: pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/loader/SegmentDirectoryLoader.java
##########
@@ -18,19 +18,48 @@
  */
 package org.apache.pinot.segment.spi.loader;
 
+import java.io.File;
 import java.net.URI;
 import org.apache.pinot.segment.spi.store.SegmentDirectory;
 
 
 /**
- * Interface for creating and loading the {@link SegmentDirectory} instance using provided config
+ * Interface for creating and loading the {@link SegmentDirectory} instance using provided config.
+ *
+ * The segment may be kept in local or remote tier backend. When the segment needs reprocessing,
+ * like to add or remote indices, the SegmentDirectoryLoader downloads the segment from tier backend

Review comment:
       typo s/remote/remove

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/data/manager/BaseTableDataManager.java
##########
@@ -333,92 +315,131 @@ public void reloadSegment(String segmentName, IndexLoadingConfig indexLoadingCon
 
   @Override
   public void addOrReplaceSegment(String segmentName, IndexLoadingConfig indexLoadingConfig,
-      SegmentZKMetadata zkMetadata, @Nullable SegmentMetadata localMetadata)
+      SegmentZKMetadata zkMetadata, @Nullable SegmentMetadata segmentMetadata)
       throws Exception {
-    if (!isNewSegment(zkMetadata, localMetadata)) {
-      LOGGER.info("Segment: {} of table: {} has crc: {} same as before, already loaded, do nothing", segmentName,
-          _tableNameWithType, localMetadata.getCrc());
+    // Non-null segment metadata means the segment has already been loaded.
+    if (segmentMetadata != null) {
+      if (hasSameCRC(zkMetadata, segmentMetadata)) {
+        // Simply returns if the CRC hasn't changed. The table config may have changed
+        // since segment is loaded, but that is handled by reloadSegment() method.
+        LOGGER.info("Segment: {} of table: {} has crc: {} same as before, already loaded, do nothing", segmentName,
+            _tableNameWithType, segmentMetadata.getCrc());
+      } else {
+        // Download the raw segment, reprocess and load it if the CRC has changed.
+        LOGGER.info("Segment: {} of table: {} already loaded but its crc: {} differs from new crc: {}", segmentName,
+            _tableNameWithType, segmentMetadata.getCrc(), zkMetadata.getCrc());
+        downloadRawSegmentAndProcess(segmentName, indexLoadingConfig, zkMetadata,
+            ZKMetadataProvider.getTableSchema(_propertyStore, _tableNameWithType));
+      }
       return;
     }
 
-    // Try to recover if no local metadata is provided.
-    if (localMetadata == null) {
-      LOGGER.info("Segment: {} of table: {} is not loaded, checking disk", segmentName, _tableNameWithType);
-      localMetadata = recoverSegmentQuietly(segmentName);
-      if (!isNewSegment(zkMetadata, localMetadata)) {
-        LOGGER.info("Segment: {} of table {} has crc: {} same as before, loading", segmentName, _tableNameWithType,
-            localMetadata.getCrc());
-        if (loadSegmentQuietly(segmentName, indexLoadingConfig)) {
-          return;
-        }
-        // Set local metadata to null to indicate that the local segment fails to load,
-        // although it exists and has same crc with the remote one.
-        localMetadata = null;
-      }
+    // For local tier backend, try to recover the segment from potential
+    // reload failure. Continue upon any failure.
+    File indexDir = getSegmentDataDir(segmentName);
+    recoverReloadFailureQuietly(_tableNameWithType, segmentName, indexDir);
+
+    // Creates the SegmentDirectory object to access the segment metadata that
+    // may be from local tier backend or remote tier backend.
+    SegmentDirectoryLoaderContext segmentLoaderContext =
+        new SegmentDirectoryLoaderContext(indexLoadingConfig.getTableConfig(), indexLoadingConfig.getInstanceId(),
+            segmentName, indexLoadingConfig.getSegmentDirectoryConfigs());
+    SegmentDirectoryLoader segmentLoader =
+        SegmentDirectoryLoaderRegistry.getSegmentDirectoryLoader(indexLoadingConfig.getSegmentDirectoryLoader());
+    SegmentDirectory segmentDirectory = null;
+    try {
+      segmentDirectory = segmentLoader.load(indexDir.toURI(), segmentLoaderContext);
+    } catch (Exception e) {
+      LOGGER.warn("Failed to create SegmentDirectory for segment: {} of table: {} due to error: {}", segmentName,
+          _tableNameWithType, e.getMessage());
     }
 
-    Preconditions.checkState(allowDownload(segmentName, zkMetadata), "Segment: %s of table: %s does not allow download",
-        segmentName, _tableNameWithType);
-
-    // Download segment and replace the local one, either due to failure to recover local segment,
-    // or the segment data is updated and has new CRC now.
-    if (localMetadata == null) {
-      LOGGER.info("Download segment: {} of table: {} as no good one exists locally", segmentName, _tableNameWithType);
-    } else {
-      LOGGER.info("Download segment: {} of table: {} as local crc: {} mismatches remote crc: {}.", segmentName,
-          _tableNameWithType, localMetadata.getCrc(), zkMetadata.getCrc());
+    // Download the raw segment, reprocess and load it if it is never loaded or its CRC has changed.
+    if (segmentDirectory == null || !hasSameCRC(zkMetadata, segmentDirectory.getSegmentMetadata())) {
+      LOGGER.info("Segment: {} of table: {} not exist or its crc: {} differs from new crc: {}", segmentName,
+          _tableNameWithType, (segmentDirectory == null) ? "none" : segmentDirectory.getSegmentMetadata().getCrc(),
+          zkMetadata.getCrc());
+      closeSegmentDirectoryQuietly(segmentDirectory);
+      downloadRawSegmentAndProcess(segmentName, indexLoadingConfig, zkMetadata,
+          ZKMetadataProvider.getTableSchema(_propertyStore, _tableNameWithType));
+      return;
     }
-    File indexDir = downloadSegment(segmentName, zkMetadata);
-    addSegment(indexDir, indexLoadingConfig);
-    LOGGER.info("Downloaded and loaded segment: {} of table: {} with crc: {}", segmentName, _tableNameWithType,
-        zkMetadata.getCrc());
-  }
 
-  protected boolean allowDownload(String segmentName, SegmentZKMetadata zkMetadata) {
-    return true;
+    try {
+      Schema schema = ZKMetadataProvider.getTableSchema(_propertyStore, _tableNameWithType);
+      if (!ImmutableSegmentLoader.needPreprocess(segmentDirectory, indexLoadingConfig, schema)) {
+        // The loaded segment is still consistent with current table config or schema.
+        LOGGER.info("Segment: {} of table: {} is consistent with table config and schema", segmentName,
+            _tableNameWithType);
+        addSegment(ImmutableSegmentLoader.load(segmentDirectory, indexLoadingConfig, schema));
+        return;
+      }
+      // If any discrepancy is found, get the segment from tier backend, reprocess and load it.
+      // Please note that the segment is from tier backed, not deep store, for incremental processing.
+      LOGGER.info("Segment: {} of table: {} needs reprocess with table config and schema", segmentName,

Review comment:
       nit: "Segment {} of table {} needs reprocess to reflect latest table config and schema" ?

##########
File path: pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/index/metadata/SegmentMetadataImpl.java
##########
@@ -80,6 +80,9 @@
 
   private SegmentVersion _segmentVersion;
   private List<StarTreeV2Metadata> _starTreeV2MetadataList;
+  // Caching properties around can be costly when the number of segments is high according to the

Review comment:
       good catch!

##########
File path: pinot-server/src/main/java/org/apache/pinot/server/starter/helix/HelixInstanceDataManager.java
##########
@@ -293,6 +292,26 @@ private void reloadSegment(String tableNameWithType, SegmentMetadata segmentMeta
     }
   }
 
+  @VisibleForTesting
+  boolean reloadMutableSegment(String tableNameWithType, String segmentName,
+      SegmentDataManager segmentDataManager, @Nullable Schema schema) {
+    IndexSegment segment = segmentDataManager.getSegment();
+    if (segment instanceof ImmutableSegment) {
+      LOGGER.info("Reloading OFFLINE segment: {} in table: {} not using local tier backend", segmentName,

Review comment:
       Can we pull out the check for ImmutableSegment out of this method? Maybe another helper method 
   ```
   boolean reloadSegmentWithNullIndexDir(params..) {
   IndexSegment segment = segmentDataManager.getSegment();
       if (segment instanceof ImmutableSegment) {
         LOGGER.info("Reloading OFFLINE segment: {} in table: {} not using local tier backend", segmentName,
   tableNameWithType);
         return false;
       }
   return reloadMutableSegment(args..);
   ```
   

##########
File path: pinot-segment-local/src/main/java/org/apache/pinot/segment/local/indexsegment/immutable/ImmutableSegmentLoader.java
##########
@@ -95,35 +96,52 @@ public static ImmutableSegment load(File indexDir, IndexLoadingConfig indexLoadi
    * modify the segment like to convert segment format, add or remove indices.
    */
   public static ImmutableSegment load(File indexDir, IndexLoadingConfig indexLoadingConfig, @Nullable Schema schema,

Review comment:
       where is this load method being used now? All calls I saw in BaseDataTableManager were for the new load method

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/data/manager/BaseTableDataManager.java
##########
@@ -275,53 +278,32 @@ public void addSegmentError(String segmentName, SegmentErrorInfo segmentErrorInf
 
   @Override
   public void reloadSegment(String segmentName, IndexLoadingConfig indexLoadingConfig, SegmentZKMetadata zkMetadata,
-      SegmentMetadata localMetadata, @Nullable Schema schema, boolean forceDownload)
+      SegmentMetadata segmentMetadata, @Nullable Schema schema, boolean forceDownload)
       throws Exception {
-    File indexDir = localMetadata.getIndexDir();
-    Preconditions.checkState(indexDir.isDirectory(), "Index directory: %s is not a directory", indexDir);
-
-    File parentFile = indexDir.getParentFile();
-    File segmentBackupDir =
-        new File(parentFile, indexDir.getName() + CommonConstants.Segment.SEGMENT_BACKUP_DIR_SUFFIX);
-
+    File indexDir = getSegmentDataDir(segmentName);
+    // Create backup dir to make segment reloading atomic for local tier backend.
+    LoaderUtils.createBackup(indexDir);
     try {
-      // First rename index directory to segment backup directory so that original segment have all file descriptors
-      // point to the segment backup directory to ensure original segment serves queries properly
-
-      // Rename index directory to segment backup directory (atomic)
-      Preconditions.checkState(indexDir.renameTo(segmentBackupDir),
-          "Failed to rename index directory: %s to segment backup directory: %s", indexDir, segmentBackupDir);
-
-      // Download from remote or copy from local backup directory into index directory,
-      // and then continue to load the segment from index directory.
-      boolean shouldDownload = forceDownload || !hasSameCRC(zkMetadata, localMetadata);
-      if (shouldDownload && allowDownload(segmentName, zkMetadata)) {
-        if (forceDownload) {
-          LOGGER.info("Segment: {} of table: {} is forced to download", segmentName, _tableNameWithType);
-        } else {
-          LOGGER.info("Download segment:{} of table: {} as local crc: {} mismatches remote crc: {}", segmentName,
-              _tableNameWithType, localMetadata.getCrc(), zkMetadata.getCrc());
-        }
-        indexDir = downloadSegment(segmentName, zkMetadata);
+      boolean shouldDownloadRawSegment = forceDownload || !hasSameCRC(zkMetadata, segmentMetadata);
+      if (shouldDownloadRawSegment && allowDownloadRawSegment(segmentName, zkMetadata)) {
+        downloadRawSegmentAndProcess(segmentName, indexLoadingConfig, zkMetadata, schema);
       } else {
-        LOGGER.info("Reload the local copy of segment: {} of table: {}", segmentName, _tableNameWithType);
-        FileUtils.copyDirectory(segmentBackupDir, indexDir);
+        SegmentDirectoryLoaderContext segmentLoaderContext =
+            new SegmentDirectoryLoaderContext(indexLoadingConfig.getTableConfig(), indexLoadingConfig.getInstanceId(),

Review comment:
       may i suggest adding one more small method, similar to `downloadRawSegmentAndProcess`, as `downloadTierSegmentAndProcess` to fold lines in this else block. It can also be used i the "if not consistent" block of addOrReplace.

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/data/manager/BaseTableDataManager.java
##########
@@ -333,92 +315,131 @@ public void reloadSegment(String segmentName, IndexLoadingConfig indexLoadingCon
 
   @Override
   public void addOrReplaceSegment(String segmentName, IndexLoadingConfig indexLoadingConfig,
-      SegmentZKMetadata zkMetadata, @Nullable SegmentMetadata localMetadata)
+      SegmentZKMetadata zkMetadata, @Nullable SegmentMetadata segmentMetadata)
       throws Exception {
-    if (!isNewSegment(zkMetadata, localMetadata)) {
-      LOGGER.info("Segment: {} of table: {} has crc: {} same as before, already loaded, do nothing", segmentName,
-          _tableNameWithType, localMetadata.getCrc());
+    // Non-null segment metadata means the segment has already been loaded.
+    if (segmentMetadata != null) {
+      if (hasSameCRC(zkMetadata, segmentMetadata)) {
+        // Simply returns if the CRC hasn't changed. The table config may have changed
+        // since segment is loaded, but that is handled by reloadSegment() method.
+        LOGGER.info("Segment: {} of table: {} has crc: {} same as before, already loaded, do nothing", segmentName,
+            _tableNameWithType, segmentMetadata.getCrc());
+      } else {
+        // Download the raw segment, reprocess and load it if the CRC has changed.
+        LOGGER.info("Segment: {} of table: {} already loaded but its crc: {} differs from new crc: {}", segmentName,
+            _tableNameWithType, segmentMetadata.getCrc(), zkMetadata.getCrc());
+        downloadRawSegmentAndProcess(segmentName, indexLoadingConfig, zkMetadata,
+            ZKMetadataProvider.getTableSchema(_propertyStore, _tableNameWithType));
+      }
       return;
     }
 
-    // Try to recover if no local metadata is provided.
-    if (localMetadata == null) {
-      LOGGER.info("Segment: {} of table: {} is not loaded, checking disk", segmentName, _tableNameWithType);
-      localMetadata = recoverSegmentQuietly(segmentName);
-      if (!isNewSegment(zkMetadata, localMetadata)) {
-        LOGGER.info("Segment: {} of table {} has crc: {} same as before, loading", segmentName, _tableNameWithType,
-            localMetadata.getCrc());
-        if (loadSegmentQuietly(segmentName, indexLoadingConfig)) {
-          return;
-        }
-        // Set local metadata to null to indicate that the local segment fails to load,
-        // although it exists and has same crc with the remote one.
-        localMetadata = null;
-      }
+    // For local tier backend, try to recover the segment from potential
+    // reload failure. Continue upon any failure.
+    File indexDir = getSegmentDataDir(segmentName);
+    recoverReloadFailureQuietly(_tableNameWithType, segmentName, indexDir);
+
+    // Creates the SegmentDirectory object to access the segment metadata that
+    // may be from local tier backend or remote tier backend.
+    SegmentDirectoryLoaderContext segmentLoaderContext =
+        new SegmentDirectoryLoaderContext(indexLoadingConfig.getTableConfig(), indexLoadingConfig.getInstanceId(),
+            segmentName, indexLoadingConfig.getSegmentDirectoryConfigs());
+    SegmentDirectoryLoader segmentLoader =
+        SegmentDirectoryLoaderRegistry.getSegmentDirectoryLoader(indexLoadingConfig.getSegmentDirectoryLoader());
+    SegmentDirectory segmentDirectory = null;
+    try {
+      segmentDirectory = segmentLoader.load(indexDir.toURI(), segmentLoaderContext);
+    } catch (Exception e) {
+      LOGGER.warn("Failed to create SegmentDirectory for segment: {} of table: {} due to error: {}", segmentName,
+          _tableNameWithType, e.getMessage());
     }
 
-    Preconditions.checkState(allowDownload(segmentName, zkMetadata), "Segment: %s of table: %s does not allow download",
-        segmentName, _tableNameWithType);
-
-    // Download segment and replace the local one, either due to failure to recover local segment,
-    // or the segment data is updated and has new CRC now.
-    if (localMetadata == null) {
-      LOGGER.info("Download segment: {} of table: {} as no good one exists locally", segmentName, _tableNameWithType);
-    } else {
-      LOGGER.info("Download segment: {} of table: {} as local crc: {} mismatches remote crc: {}.", segmentName,
-          _tableNameWithType, localMetadata.getCrc(), zkMetadata.getCrc());
+    // Download the raw segment, reprocess and load it if it is never loaded or its CRC has changed.
+    if (segmentDirectory == null || !hasSameCRC(zkMetadata, segmentDirectory.getSegmentMetadata())) {
+      LOGGER.info("Segment: {} of table: {} not exist or its crc: {} differs from new crc: {}", segmentName,
+          _tableNameWithType, (segmentDirectory == null) ? "none" : segmentDirectory.getSegmentMetadata().getCrc(),
+          zkMetadata.getCrc());
+      closeSegmentDirectoryQuietly(segmentDirectory);
+      downloadRawSegmentAndProcess(segmentName, indexLoadingConfig, zkMetadata,
+          ZKMetadataProvider.getTableSchema(_propertyStore, _tableNameWithType));
+      return;
     }
-    File indexDir = downloadSegment(segmentName, zkMetadata);
-    addSegment(indexDir, indexLoadingConfig);
-    LOGGER.info("Downloaded and loaded segment: {} of table: {} with crc: {}", segmentName, _tableNameWithType,
-        zkMetadata.getCrc());
-  }
 
-  protected boolean allowDownload(String segmentName, SegmentZKMetadata zkMetadata) {
-    return true;
+    try {
+      Schema schema = ZKMetadataProvider.getTableSchema(_propertyStore, _tableNameWithType);
+      if (!ImmutableSegmentLoader.needPreprocess(segmentDirectory, indexLoadingConfig, schema)) {
+        // The loaded segment is still consistent with current table config or schema.
+        LOGGER.info("Segment: {} of table: {} is consistent with table config and schema", segmentName,
+            _tableNameWithType);
+        addSegment(ImmutableSegmentLoader.load(segmentDirectory, indexLoadingConfig, schema));

Review comment:
       (optional) and also another sub method, `loadSegment` to fold in this line. This way this addOrReplace driver method will be easier to read

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/data/manager/BaseTableDataManager.java
##########
@@ -333,92 +315,131 @@ public void reloadSegment(String segmentName, IndexLoadingConfig indexLoadingCon
 
   @Override
   public void addOrReplaceSegment(String segmentName, IndexLoadingConfig indexLoadingConfig,
-      SegmentZKMetadata zkMetadata, @Nullable SegmentMetadata localMetadata)
+      SegmentZKMetadata zkMetadata, @Nullable SegmentMetadata segmentMetadata)
       throws Exception {
-    if (!isNewSegment(zkMetadata, localMetadata)) {
-      LOGGER.info("Segment: {} of table: {} has crc: {} same as before, already loaded, do nothing", segmentName,
-          _tableNameWithType, localMetadata.getCrc());
+    // Non-null segment metadata means the segment has already been loaded.
+    if (segmentMetadata != null) {
+      if (hasSameCRC(zkMetadata, segmentMetadata)) {
+        // Simply returns if the CRC hasn't changed. The table config may have changed
+        // since segment is loaded, but that is handled by reloadSegment() method.
+        LOGGER.info("Segment: {} of table: {} has crc: {} same as before, already loaded, do nothing", segmentName,
+            _tableNameWithType, segmentMetadata.getCrc());
+      } else {
+        // Download the raw segment, reprocess and load it if the CRC has changed.
+        LOGGER.info("Segment: {} of table: {} already loaded but its crc: {} differs from new crc: {}", segmentName,
+            _tableNameWithType, segmentMetadata.getCrc(), zkMetadata.getCrc());
+        downloadRawSegmentAndProcess(segmentName, indexLoadingConfig, zkMetadata,
+            ZKMetadataProvider.getTableSchema(_propertyStore, _tableNameWithType));
+      }
       return;
     }
 
-    // Try to recover if no local metadata is provided.
-    if (localMetadata == null) {
-      LOGGER.info("Segment: {} of table: {} is not loaded, checking disk", segmentName, _tableNameWithType);
-      localMetadata = recoverSegmentQuietly(segmentName);
-      if (!isNewSegment(zkMetadata, localMetadata)) {
-        LOGGER.info("Segment: {} of table {} has crc: {} same as before, loading", segmentName, _tableNameWithType,
-            localMetadata.getCrc());
-        if (loadSegmentQuietly(segmentName, indexLoadingConfig)) {
-          return;
-        }
-        // Set local metadata to null to indicate that the local segment fails to load,
-        // although it exists and has same crc with the remote one.
-        localMetadata = null;
-      }
+    // For local tier backend, try to recover the segment from potential
+    // reload failure. Continue upon any failure.
+    File indexDir = getSegmentDataDir(segmentName);
+    recoverReloadFailureQuietly(_tableNameWithType, segmentName, indexDir);
+
+    // Creates the SegmentDirectory object to access the segment metadata that
+    // may be from local tier backend or remote tier backend.
+    SegmentDirectoryLoaderContext segmentLoaderContext =
+        new SegmentDirectoryLoaderContext(indexLoadingConfig.getTableConfig(), indexLoadingConfig.getInstanceId(),
+            segmentName, indexLoadingConfig.getSegmentDirectoryConfigs());
+    SegmentDirectoryLoader segmentLoader =
+        SegmentDirectoryLoaderRegistry.getSegmentDirectoryLoader(indexLoadingConfig.getSegmentDirectoryLoader());
+    SegmentDirectory segmentDirectory = null;
+    try {
+      segmentDirectory = segmentLoader.load(indexDir.toURI(), segmentLoaderContext);
+    } catch (Exception e) {
+      LOGGER.warn("Failed to create SegmentDirectory for segment: {} of table: {} due to error: {}", segmentName,
+          _tableNameWithType, e.getMessage());
     }
 
-    Preconditions.checkState(allowDownload(segmentName, zkMetadata), "Segment: %s of table: %s does not allow download",
-        segmentName, _tableNameWithType);
-
-    // Download segment and replace the local one, either due to failure to recover local segment,
-    // or the segment data is updated and has new CRC now.
-    if (localMetadata == null) {
-      LOGGER.info("Download segment: {} of table: {} as no good one exists locally", segmentName, _tableNameWithType);
-    } else {
-      LOGGER.info("Download segment: {} of table: {} as local crc: {} mismatches remote crc: {}.", segmentName,
-          _tableNameWithType, localMetadata.getCrc(), zkMetadata.getCrc());
+    // Download the raw segment, reprocess and load it if it is never loaded or its CRC has changed.
+    if (segmentDirectory == null || !hasSameCRC(zkMetadata, segmentDirectory.getSegmentMetadata())) {
+      LOGGER.info("Segment: {} of table: {} not exist or its crc: {} differs from new crc: {}", segmentName,

Review comment:
       can we split these conditions and their respective log statements so it's easier to read and follow in the logs?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] codecov-commenter edited a comment on pull request #7969: extend SegmentDirectoryLoader interface and refactor BaseTableDataManager

Posted by GitBox <gi...@apache.org>.
codecov-commenter edited a comment on pull request #7969:
URL: https://github.com/apache/pinot/pull/7969#issuecomment-1002909897


   # [Codecov](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#7969](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (8bc9da6) into [master](https://codecov.io/gh/apache/pinot/commit/a1f3c30e2ac74dd0acb67087a86d8f03b416f5b2?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (a1f3c30) will **decrease** coverage by `1.01%`.
   > The diff coverage is `80.76%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/pinot/pull/7969/graphs/tree.svg?width=650&height=150&src=pr&token=4ibza2ugkz&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   ```diff
   @@             Coverage Diff              @@
   ##             master    #7969      +/-   ##
   ============================================
   - Coverage     71.38%   70.37%   -1.02%     
   - Complexity     4200     4208       +8     
   ============================================
     Files          1595     1595              
     Lines         82595    82690      +95     
     Branches      12326    12334       +8     
   ============================================
   - Hits          58964    58196     -768     
   - Misses        19645    20514     +869     
   + Partials       3986     3980       -6     
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | integration1 | `29.06% <36.81%> (-0.02%)` | :arrow_down: |
   | integration2 | `?` | |
   | unittests1 | `68.25% <82.92%> (+0.02%)` | :arrow_up: |
   | unittests2 | `14.26% <0.00%> (-0.05%)` | :arrow_down: |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [...ata/manager/realtime/RealtimeTableDataManager.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9kYXRhL21hbmFnZXIvcmVhbHRpbWUvUmVhbHRpbWVUYWJsZURhdGFNYW5hZ2VyLmphdmE=) | `43.49% <ø> (-24.40%)` | :arrow_down: |
   | [...ent/index/column/PhysicalColumnIndexContainer.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2luZGV4L2NvbHVtbi9QaHlzaWNhbENvbHVtbkluZGV4Q29udGFpbmVyLmphdmE=) | `90.83% <0.00%> (-0.84%)` | :arrow_down: |
   | [...egment/spi/index/metadata/SegmentMetadataImpl.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1zcGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL3Bpbm90L3NlZ21lbnQvc3BpL2luZGV4L21ldGFkYXRhL1NlZ21lbnRNZXRhZGF0YUltcGwuamF2YQ==) | `74.28% <28.57%> (-0.29%)` | :arrow_down: |
   | [...server/starter/helix/HelixInstanceDataManager.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VydmVyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9zZXJ2ZXIvc3RhcnRlci9oZWxpeC9IZWxpeEluc3RhbmNlRGF0YU1hbmFnZXIuamF2YQ==) | `79.44% <50.00%> (-2.06%)` | :arrow_down: |
   | [...indexsegment/immutable/ImmutableSegmentLoader.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9pbmRleHNlZ21lbnQvaW1tdXRhYmxlL0ltbXV0YWJsZVNlZ21lbnRMb2FkZXIuamF2YQ==) | `87.50% <82.75%> (-4.17%)` | :arrow_down: |
   | [...ocal/segment/index/loader/SegmentPreProcessor.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2luZGV4L2xvYWRlci9TZWdtZW50UHJlUHJvY2Vzc29yLmphdmE=) | `80.00% <83.33%> (+0.21%)` | :arrow_up: |
   | [.../pinot/core/data/manager/BaseTableDataManager.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9kYXRhL21hbmFnZXIvQmFzZVRhYmxlRGF0YU1hbmFnZXIuamF2YQ==) | `87.77% <86.45%> (+2.87%)` | :arrow_up: |
   | [...nt/local/loader/DefaultSegmentDirectoryLoader.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9sb2FkZXIvRGVmYXVsdFNlZ21lbnREaXJlY3RvcnlMb2FkZXIuamF2YQ==) | `100.00% <100.00%> (ø)` | |
   | [...egment/local/segment/index/loader/LoaderUtils.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2luZGV4L2xvYWRlci9Mb2FkZXJVdGlscy5qYXZh) | `95.91% <100.00%> (+1.80%)` | :arrow_up: |
   | [.../columnminmaxvalue/ColumnMinMaxValueGenerator.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2luZGV4L2xvYWRlci9jb2x1bW5taW5tYXh2YWx1ZS9Db2x1bW5NaW5NYXhWYWx1ZUdlbmVyYXRvci5qYXZh) | `73.68% <100.00%> (ø)` | |
   | ... and [102 more](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [a1f3c30...8bc9da6](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] klsince commented on a change in pull request #7969: extend SegmentDirectoryLoader interface and refactor BaseTableDataManager

Posted by GitBox <gi...@apache.org>.
klsince commented on a change in pull request #7969:
URL: https://github.com/apache/pinot/pull/7969#discussion_r779041798



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/data/manager/BaseTableDataManager.java
##########
@@ -333,94 +306,162 @@ public void reloadSegment(String segmentName, IndexLoadingConfig indexLoadingCon
 
   @Override
   public void addOrReplaceSegment(String segmentName, IndexLoadingConfig indexLoadingConfig,
-      SegmentZKMetadata zkMetadata, @Nullable SegmentMetadata localMetadata)
+      SegmentZKMetadata zkMetadata, @Nullable SegmentMetadata segmentMetadata)
       throws Exception {
-    if (!isNewSegment(zkMetadata, localMetadata)) {
-      LOGGER.info("Segment: {} of table: {} has crc: {} same as before, already loaded, do nothing", segmentName,
-          _tableNameWithType, localMetadata.getCrc());
+    // Non-null segment metadata means the segment has already been loaded.
+    if (segmentMetadata != null) {
+      if (hasSameCRC(zkMetadata, segmentMetadata)) {
+        // Simply returns if the CRC hasn't changed. The table config may have changed
+        // since segment is loaded, but that is handled by reloadSegment() method.
+        LOGGER.info("Segment: {} of table: {} has crc: {} same as before, already loaded, do nothing", segmentName,
+            _tableNameWithType, segmentMetadata.getCrc());
+      } else {
+        // Download the raw segment, reprocess and load it if the CRC has changed.
+        LOGGER.info("Segment: {} of table: {} already loaded but its crc: {} differs from new crc: {}", segmentName,
+            _tableNameWithType, segmentMetadata.getCrc(), zkMetadata.getCrc());
+        downloadRawSegmentAndProcess(segmentName, indexLoadingConfig, zkMetadata,
+            ZKMetadataProvider.getTableSchema(_propertyStore, _tableNameWithType));
+      }
       return;
     }
 
-    // Try to recover if no local metadata is provided.
-    if (localMetadata == null) {
-      LOGGER.info("Segment: {} of table: {} is not loaded, checking disk", segmentName, _tableNameWithType);
-      localMetadata = recoverSegmentQuietly(segmentName);
-      if (!isNewSegment(zkMetadata, localMetadata)) {
-        LOGGER.info("Segment: {} of table {} has crc: {} same as before, loading", segmentName, _tableNameWithType,
-            localMetadata.getCrc());
-        if (loadSegmentQuietly(segmentName, indexLoadingConfig)) {
-          return;
-        }
-        // Set local metadata to null to indicate that the local segment fails to load,
-        // although it exists and has same crc with the remote one.
-        localMetadata = null;
-      }
-    }
+    // For local tier backend, try to recover the segment from potential
+    // reload failure. Continue upon any failure.
+    File indexDir = getSegmentDataDir(segmentName);
+    recoverReloadFailureQuietly(_tableNameWithType, segmentName, indexDir);
 
-    Preconditions.checkState(allowDownload(segmentName, zkMetadata), "Segment: %s of table: %s does not allow download",
-        segmentName, _tableNameWithType);
+    // Creates the SegmentDirectory object to access the segment metadata that
+    // may be from local tier backend or remote tier backend.
+    SegmentDirectory segmentDirectory = null;
+    try {
+      SegmentDirectoryLoaderContext loaderContext =
+          new SegmentDirectoryLoaderContext(indexLoadingConfig.getTableConfig(), indexLoadingConfig.getInstanceId(),
+              segmentName, indexLoadingConfig.getSegmentDirectoryConfigs());
+      SegmentDirectoryLoader segmentDirectoryLoader =
+          SegmentDirectoryLoaderRegistry.getSegmentDirectoryLoader(indexLoadingConfig.getSegmentDirectoryLoader());
+      segmentDirectory = segmentDirectoryLoader.load(indexDir.toURI(), loaderContext);
+    } catch (Exception e) {
+      LOGGER.warn("Failed to create SegmentDirectory for segment: {} of table: {} due to error: {}", segmentName,
+          _tableNameWithType, e.getMessage());
+    }
 
-    // Download segment and replace the local one, either due to failure to recover local segment,
-    // or the segment data is updated and has new CRC now.
-    if (localMetadata == null) {
-      LOGGER.info("Download segment: {} of table: {} as no good one exists locally", segmentName, _tableNameWithType);
-    } else {
-      LOGGER.info("Download segment: {} of table: {} as local crc: {} mismatches remote crc: {}.", segmentName,
-          _tableNameWithType, localMetadata.getCrc(), zkMetadata.getCrc());
+    // Download the raw segment, reprocess and load it if it is never loaded or its CRC has changed.
+    if (segmentDirectory == null) {
+      LOGGER.info("Segment: {} of table: {} does not exist", segmentName, _tableNameWithType);
+      downloadRawSegmentAndProcess(segmentName, indexLoadingConfig, zkMetadata,
+          ZKMetadataProvider.getTableSchema(_propertyStore, _tableNameWithType));
+      return;
+    }
+    if (!hasSameCRC(zkMetadata, segmentDirectory.getSegmentMetadata())) {
+      LOGGER.info("Segment: {} of table: {} has crc: {} different from new crc: {}", segmentName,
+          _tableNameWithType, segmentDirectory.getSegmentMetadata().getCrc(), zkMetadata.getCrc());
+      closeSegmentDirectoryQuietly(segmentDirectory);
+      downloadRawSegmentAndProcess(segmentName, indexLoadingConfig, zkMetadata,
+          ZKMetadataProvider.getTableSchema(_propertyStore, _tableNameWithType));
+      return;
     }
-    File indexDir = downloadSegment(segmentName, zkMetadata);
-    addSegment(indexDir, indexLoadingConfig);
-    LOGGER.info("Downloaded and loaded segment: {} of table: {} with crc: {}", segmentName, _tableNameWithType,
-        zkMetadata.getCrc());
-  }
 
-  protected boolean allowDownload(String segmentName, SegmentZKMetadata zkMetadata) {
-    return true;
+    try {
+      Schema schema = ZKMetadataProvider.getTableSchema(_propertyStore, _tableNameWithType);
+      if (!ImmutableSegmentLoader.needPreprocess(segmentDirectory, indexLoadingConfig, schema)) {
+        // The loaded segment is still consistent with current table config or schema.
+        LOGGER.info("Segment: {} of table: {} is consistent with table config and schema", segmentName,
+            _tableNameWithType);
+        loadSegment(segmentDirectory, indexLoadingConfig, schema);
+        return;
+      }
+      // If any discrepancy is found, get the segment from tier backend, reprocess and load it.
+      LOGGER.info("Segment: {} of table: {} needs reprocess to reflect latest table config and schema", segmentName,
+          _tableNameWithType);
+      // Close the stale SegmentDirectory object before loading the newly processed segment.
+      closeSegmentDirectoryQuietly(segmentDirectory);
+      downloadTierSegmentAndProcess(segmentName, indexLoadingConfig, schema);
+    } catch (Exception e) {
+      closeSegmentDirectoryQuietly(segmentDirectory);
+      LOGGER.error("Failed to reprocess and load segment: {} of table: {}", segmentName, _tableNameWithType, e);
+      downloadRawSegmentAndProcess(segmentName, indexLoadingConfig, zkMetadata,
+          ZKMetadataProvider.getTableSchema(_propertyStore, _tableNameWithType));
+    }
   }
 
-  protected File downloadSegment(String segmentName, SegmentZKMetadata zkMetadata)
+  /**
+   * Get the segment from the configured tier backend, reprocess it with the latest table
+   * config and schema and then load it. Please note that the segment is from tier backend,
+   * not deep store, for incremental processing.
+   */
+  private void downloadTierSegmentAndProcess(String segmentName, IndexLoadingConfig indexLoadingConfig,
+      @Nullable Schema schema)
       throws Exception {
-    // TODO: may support download from peer servers for RealTime table.
-    return downloadSegmentFromDeepStore(segmentName, zkMetadata);
+    SegmentDirectoryLoaderContext loaderContext =
+        new SegmentDirectoryLoaderContext(indexLoadingConfig.getTableConfig(), indexLoadingConfig.getInstanceId(),
+            segmentName, indexLoadingConfig.getSegmentDirectoryConfigs());
+    SegmentDirectoryLoader segmentDirectoryLoader =
+        SegmentDirectoryLoaderRegistry.getSegmentDirectoryLoader(indexLoadingConfig.getSegmentDirectoryLoader());
+    File indexDir = getSegmentDataDir(segmentName);
+    segmentDirectoryLoader.download(indexDir, loaderContext);

Review comment:
       The download and upload methods are introduced to SegmentDirectoryLoader interface is because today the SegmentPreprocessor and pretty much all kinds of IndexCreators used by it requires the `File indexDir` as an input param. If they were able to work with SegmentDirectory interface instead of `File indexDir`, the download/upload methods wouldn't be needed in SegmentDirectoryLoader interface. (This is briefly explained in the comments of SegmentDirectoryLoader interface)
   
   Hope this adds more clarity on why downloading raw segment from deep store is not part of SegmentDirectoryLoader interface. As there can be a path forward to remove download/upload methods from SegmentDirectoryLoader, if segment preprocessing logic could work with SegmentDirectory interface instead of requiring File index explicitly. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] mcvsubbu commented on a change in pull request #7969: extend SegmentDirectoryLoader interface and refactor BaseTableDataManager

Posted by GitBox <gi...@apache.org>.
mcvsubbu commented on a change in pull request #7969:
URL: https://github.com/apache/pinot/pull/7969#discussion_r779784146



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/data/manager/BaseTableDataManager.java
##########
@@ -333,94 +306,162 @@ public void reloadSegment(String segmentName, IndexLoadingConfig indexLoadingCon
 
   @Override
   public void addOrReplaceSegment(String segmentName, IndexLoadingConfig indexLoadingConfig,
-      SegmentZKMetadata zkMetadata, @Nullable SegmentMetadata localMetadata)
+      SegmentZKMetadata zkMetadata, @Nullable SegmentMetadata segmentMetadata)
       throws Exception {
-    if (!isNewSegment(zkMetadata, localMetadata)) {
-      LOGGER.info("Segment: {} of table: {} has crc: {} same as before, already loaded, do nothing", segmentName,
-          _tableNameWithType, localMetadata.getCrc());
+    // Non-null segment metadata means the segment has already been loaded.
+    if (segmentMetadata != null) {
+      if (hasSameCRC(zkMetadata, segmentMetadata)) {
+        // Simply returns if the CRC hasn't changed. The table config may have changed
+        // since segment is loaded, but that is handled by reloadSegment() method.
+        LOGGER.info("Segment: {} of table: {} has crc: {} same as before, already loaded, do nothing", segmentName,
+            _tableNameWithType, segmentMetadata.getCrc());
+      } else {
+        // Download the raw segment, reprocess and load it if the CRC has changed.
+        LOGGER.info("Segment: {} of table: {} already loaded but its crc: {} differs from new crc: {}", segmentName,
+            _tableNameWithType, segmentMetadata.getCrc(), zkMetadata.getCrc());
+        downloadRawSegmentAndProcess(segmentName, indexLoadingConfig, zkMetadata,
+            ZKMetadataProvider.getTableSchema(_propertyStore, _tableNameWithType));
+      }
       return;
     }
 
-    // Try to recover if no local metadata is provided.
-    if (localMetadata == null) {
-      LOGGER.info("Segment: {} of table: {} is not loaded, checking disk", segmentName, _tableNameWithType);
-      localMetadata = recoverSegmentQuietly(segmentName);
-      if (!isNewSegment(zkMetadata, localMetadata)) {
-        LOGGER.info("Segment: {} of table {} has crc: {} same as before, loading", segmentName, _tableNameWithType,
-            localMetadata.getCrc());
-        if (loadSegmentQuietly(segmentName, indexLoadingConfig)) {
-          return;
-        }
-        // Set local metadata to null to indicate that the local segment fails to load,
-        // although it exists and has same crc with the remote one.
-        localMetadata = null;
-      }
-    }
+    // For local tier backend, try to recover the segment from potential
+    // reload failure. Continue upon any failure.
+    File indexDir = getSegmentDataDir(segmentName);
+    recoverReloadFailureQuietly(_tableNameWithType, segmentName, indexDir);
 
-    Preconditions.checkState(allowDownload(segmentName, zkMetadata), "Segment: %s of table: %s does not allow download",
-        segmentName, _tableNameWithType);
+    // Creates the SegmentDirectory object to access the segment metadata that
+    // may be from local tier backend or remote tier backend.
+    SegmentDirectory segmentDirectory = null;
+    try {
+      SegmentDirectoryLoaderContext loaderContext =
+          new SegmentDirectoryLoaderContext(indexLoadingConfig.getTableConfig(), indexLoadingConfig.getInstanceId(),
+              segmentName, indexLoadingConfig.getSegmentDirectoryConfigs());
+      SegmentDirectoryLoader segmentDirectoryLoader =
+          SegmentDirectoryLoaderRegistry.getSegmentDirectoryLoader(indexLoadingConfig.getSegmentDirectoryLoader());
+      segmentDirectory = segmentDirectoryLoader.load(indexDir.toURI(), loaderContext);
+    } catch (Exception e) {
+      LOGGER.warn("Failed to create SegmentDirectory for segment: {} of table: {} due to error: {}", segmentName,
+          _tableNameWithType, e.getMessage());
+    }
 
-    // Download segment and replace the local one, either due to failure to recover local segment,
-    // or the segment data is updated and has new CRC now.
-    if (localMetadata == null) {
-      LOGGER.info("Download segment: {} of table: {} as no good one exists locally", segmentName, _tableNameWithType);
-    } else {
-      LOGGER.info("Download segment: {} of table: {} as local crc: {} mismatches remote crc: {}.", segmentName,
-          _tableNameWithType, localMetadata.getCrc(), zkMetadata.getCrc());
+    // Download the raw segment, reprocess and load it if it is never loaded or its CRC has changed.
+    if (segmentDirectory == null) {
+      LOGGER.info("Segment: {} of table: {} does not exist", segmentName, _tableNameWithType);
+      downloadRawSegmentAndProcess(segmentName, indexLoadingConfig, zkMetadata,
+          ZKMetadataProvider.getTableSchema(_propertyStore, _tableNameWithType));
+      return;
+    }
+    if (!hasSameCRC(zkMetadata, segmentDirectory.getSegmentMetadata())) {
+      LOGGER.info("Segment: {} of table: {} has crc: {} different from new crc: {}", segmentName,
+          _tableNameWithType, segmentDirectory.getSegmentMetadata().getCrc(), zkMetadata.getCrc());
+      closeSegmentDirectoryQuietly(segmentDirectory);
+      downloadRawSegmentAndProcess(segmentName, indexLoadingConfig, zkMetadata,
+          ZKMetadataProvider.getTableSchema(_propertyStore, _tableNameWithType));
+      return;
     }
-    File indexDir = downloadSegment(segmentName, zkMetadata);
-    addSegment(indexDir, indexLoadingConfig);
-    LOGGER.info("Downloaded and loaded segment: {} of table: {} with crc: {}", segmentName, _tableNameWithType,
-        zkMetadata.getCrc());
-  }
 
-  protected boolean allowDownload(String segmentName, SegmentZKMetadata zkMetadata) {
-    return true;
+    try {
+      Schema schema = ZKMetadataProvider.getTableSchema(_propertyStore, _tableNameWithType);
+      if (!ImmutableSegmentLoader.needPreprocess(segmentDirectory, indexLoadingConfig, schema)) {
+        // The loaded segment is still consistent with current table config or schema.
+        LOGGER.info("Segment: {} of table: {} is consistent with table config and schema", segmentName,
+            _tableNameWithType);
+        loadSegment(segmentDirectory, indexLoadingConfig, schema);
+        return;
+      }
+      // If any discrepancy is found, get the segment from tier backend, reprocess and load it.
+      LOGGER.info("Segment: {} of table: {} needs reprocess to reflect latest table config and schema", segmentName,
+          _tableNameWithType);
+      // Close the stale SegmentDirectory object before loading the newly processed segment.
+      closeSegmentDirectoryQuietly(segmentDirectory);
+      downloadTierSegmentAndProcess(segmentName, indexLoadingConfig, schema);
+    } catch (Exception e) {
+      closeSegmentDirectoryQuietly(segmentDirectory);
+      LOGGER.error("Failed to reprocess and load segment: {} of table: {}", segmentName, _tableNameWithType, e);
+      downloadRawSegmentAndProcess(segmentName, indexLoadingConfig, zkMetadata,
+          ZKMetadataProvider.getTableSchema(_propertyStore, _tableNameWithType));
+    }
   }
 
-  protected File downloadSegment(String segmentName, SegmentZKMetadata zkMetadata)
+  /**
+   * Get the segment from the configured tier backend, reprocess it with the latest table
+   * config and schema and then load it. Please note that the segment is from tier backend,
+   * not deep store, for incremental processing.
+   */
+  private void downloadTierSegmentAndProcess(String segmentName, IndexLoadingConfig indexLoadingConfig,
+      @Nullable Schema schema)
       throws Exception {
-    // TODO: may support download from peer servers for RealTime table.
-    return downloadSegmentFromDeepStore(segmentName, zkMetadata);
+    SegmentDirectoryLoaderContext loaderContext =
+        new SegmentDirectoryLoaderContext(indexLoadingConfig.getTableConfig(), indexLoadingConfig.getInstanceId(),
+            segmentName, indexLoadingConfig.getSegmentDirectoryConfigs());
+    SegmentDirectoryLoader segmentDirectoryLoader =
+        SegmentDirectoryLoaderRegistry.getSegmentDirectoryLoader(indexLoadingConfig.getSegmentDirectoryLoader());
+    File indexDir = getSegmentDataDir(segmentName);
+    segmentDirectoryLoader.download(indexDir, loaderContext);

Review comment:
       The doc does not convey any design details. It is simply a commentary on this PR. Please, include the text of the doc right here in the PR as a comment.
   
   It is not clear (to me) what this PR is achieving. Is this a PR to just improve code readability? (you say "confusing" in your doc). Or, is this a PR to make Pinot more extensible for some purpose?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] codecov-commenter edited a comment on pull request #7969: refactor segment loading logic in BaseTableDataManager to decouple it with local segment directory

Posted by GitBox <gi...@apache.org>.
codecov-commenter edited a comment on pull request #7969:
URL: https://github.com/apache/pinot/pull/7969#issuecomment-1002909897


   # [Codecov](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#7969](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (49c3eba) into [master](https://codecov.io/gh/apache/pinot/commit/a1f3c30e2ac74dd0acb67087a86d8f03b416f5b2?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (a1f3c30) will **decrease** coverage by `6.51%`.
   > The diff coverage is `80.28%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/pinot/pull/7969/graphs/tree.svg?width=650&height=150&src=pr&token=4ibza2ugkz&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   ```diff
   @@             Coverage Diff              @@
   ##             master    #7969      +/-   ##
   ============================================
   - Coverage     71.38%   64.87%   -6.52%     
   - Complexity     4200     4223      +23     
   ============================================
     Files          1595     1550      -45     
     Lines         82595    80923    -1672     
     Branches      12326    12149     -177     
   ============================================
   - Hits          58964    52496    -6468     
   - Misses        19645    24695    +5050     
   + Partials       3986     3732     -254     
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | integration1 | `?` | |
   | integration2 | `?` | |
   | unittests1 | `68.22% <80.28%> (-0.01%)` | :arrow_down: |
   | unittests2 | `14.26% <0.00%> (-0.04%)` | :arrow_down: |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [...ent/index/column/PhysicalColumnIndexContainer.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2luZGV4L2NvbHVtbi9QaHlzaWNhbENvbHVtbkluZGV4Q29udGFpbmVyLmphdmE=) | `90.83% <0.00%> (-0.84%)` | :arrow_down: |
   | [...ache/pinot/segment/spi/store/SegmentDirectory.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1zcGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL3Bpbm90L3NlZ21lbnQvc3BpL3N0b3JlL1NlZ21lbnREaXJlY3RvcnkuamF2YQ==) | `55.55% <0.00%> (-6.95%)` | :arrow_down: |
   | [...egment/spi/index/metadata/SegmentMetadataImpl.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1zcGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL3Bpbm90L3NlZ21lbnQvc3BpL2luZGV4L21ldGFkYXRhL1NlZ21lbnRNZXRhZGF0YUltcGwuamF2YQ==) | `74.28% <28.57%> (-0.29%)` | :arrow_down: |
   | [...indexsegment/immutable/ImmutableSegmentLoader.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9pbmRleHNlZ21lbnQvaW1tdXRhYmxlL0ltbXV0YWJsZVNlZ21lbnRMb2FkZXIuamF2YQ==) | `84.15% <73.52%> (-7.51%)` | :arrow_down: |
   | [...ocal/segment/index/loader/SegmentPreProcessor.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2luZGV4L2xvYWRlci9TZWdtZW50UHJlUHJvY2Vzc29yLmphdmE=) | `80.00% <83.33%> (+0.21%)` | :arrow_up: |
   | [.../pinot/core/data/manager/BaseTableDataManager.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9kYXRhL21hbmFnZXIvQmFzZVRhYmxlRGF0YU1hbmFnZXIuamF2YQ==) | `88.47% <86.11%> (+3.58%)` | :arrow_up: |
   | [...nt/local/loader/DefaultSegmentDirectoryLoader.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9sb2FkZXIvRGVmYXVsdFNlZ21lbnREaXJlY3RvcnlMb2FkZXIuamF2YQ==) | `100.00% <100.00%> (ø)` | |
   | [.../columnminmaxvalue/ColumnMinMaxValueGenerator.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2luZGV4L2xvYWRlci9jb2x1bW5taW5tYXh2YWx1ZS9Db2x1bW5NaW5NYXhWYWx1ZUdlbmVyYXRvci5qYXZh) | `73.68% <100.00%> (ø)` | |
   | [...loader/defaultcolumn/BaseDefaultColumnHandler.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2luZGV4L2xvYWRlci9kZWZhdWx0Y29sdW1uL0Jhc2VEZWZhdWx0Q29sdW1uSGFuZGxlci5qYXZh) | `63.26% <100.00%> (ø)` | |
   | [...t/local/segment/store/SegmentLocalFSDirectory.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L3N0b3JlL1NlZ21lbnRMb2NhbEZTRGlyZWN0b3J5LmphdmE=) | `71.52% <100.00%> (+2.23%)` | :arrow_up: |
   | ... and [381 more](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [a1f3c30...49c3eba](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] klsince commented on a change in pull request #7969: extend SegmentDirectoryLoader interface and refactor BaseTableDataManager

Posted by GitBox <gi...@apache.org>.
klsince commented on a change in pull request #7969:
URL: https://github.com/apache/pinot/pull/7969#discussion_r777743903



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/data/manager/BaseTableDataManager.java
##########
@@ -275,53 +278,32 @@ public void addSegmentError(String segmentName, SegmentErrorInfo segmentErrorInf
 
   @Override
   public void reloadSegment(String segmentName, IndexLoadingConfig indexLoadingConfig, SegmentZKMetadata zkMetadata,
-      SegmentMetadata localMetadata, @Nullable Schema schema, boolean forceDownload)
+      SegmentMetadata segmentMetadata, @Nullable Schema schema, boolean forceDownload)
       throws Exception {
-    File indexDir = localMetadata.getIndexDir();
-    Preconditions.checkState(indexDir.isDirectory(), "Index directory: %s is not a directory", indexDir);
-
-    File parentFile = indexDir.getParentFile();
-    File segmentBackupDir =
-        new File(parentFile, indexDir.getName() + CommonConstants.Segment.SEGMENT_BACKUP_DIR_SUFFIX);
-
+    File indexDir = getSegmentDataDir(segmentName);
+    // Create backup dir to make segment reloading atomic for local tier backend.
+    LoaderUtils.createBackup(indexDir);
     try {
-      // First rename index directory to segment backup directory so that original segment have all file descriptors
-      // point to the segment backup directory to ensure original segment serves queries properly
-
-      // Rename index directory to segment backup directory (atomic)
-      Preconditions.checkState(indexDir.renameTo(segmentBackupDir),
-          "Failed to rename index directory: %s to segment backup directory: %s", indexDir, segmentBackupDir);
-
-      // Download from remote or copy from local backup directory into index directory,
-      // and then continue to load the segment from index directory.
-      boolean shouldDownload = forceDownload || !hasSameCRC(zkMetadata, localMetadata);
-      if (shouldDownload && allowDownload(segmentName, zkMetadata)) {
-        if (forceDownload) {
-          LOGGER.info("Segment: {} of table: {} is forced to download", segmentName, _tableNameWithType);
-        } else {
-          LOGGER.info("Download segment:{} of table: {} as local crc: {} mismatches remote crc: {}", segmentName,
-              _tableNameWithType, localMetadata.getCrc(), zkMetadata.getCrc());
-        }
-        indexDir = downloadSegment(segmentName, zkMetadata);
+      boolean shouldDownloadRawSegment = forceDownload || !hasSameCRC(zkMetadata, segmentMetadata);
+      if (shouldDownloadRawSegment && allowDownloadRawSegment(segmentName, zkMetadata)) {
+        downloadRawSegmentAndProcess(segmentName, indexLoadingConfig, zkMetadata, schema);
       } else {
-        LOGGER.info("Reload the local copy of segment: {} of table: {}", segmentName, _tableNameWithType);
-        FileUtils.copyDirectory(segmentBackupDir, indexDir);
+        SegmentDirectoryLoaderContext segmentLoaderContext =
+            new SegmentDirectoryLoaderContext(indexLoadingConfig.getTableConfig(), indexLoadingConfig.getInstanceId(),

Review comment:
       I like the method name. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] klsince commented on a change in pull request #7969: extend SegmentDirectoryLoader interface and refactor BaseTableDataManager

Posted by GitBox <gi...@apache.org>.
klsince commented on a change in pull request #7969:
URL: https://github.com/apache/pinot/pull/7969#discussion_r777750439



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/data/manager/BaseTableDataManager.java
##########
@@ -333,92 +315,131 @@ public void reloadSegment(String segmentName, IndexLoadingConfig indexLoadingCon
 
   @Override
   public void addOrReplaceSegment(String segmentName, IndexLoadingConfig indexLoadingConfig,
-      SegmentZKMetadata zkMetadata, @Nullable SegmentMetadata localMetadata)
+      SegmentZKMetadata zkMetadata, @Nullable SegmentMetadata segmentMetadata)
       throws Exception {
-    if (!isNewSegment(zkMetadata, localMetadata)) {
-      LOGGER.info("Segment: {} of table: {} has crc: {} same as before, already loaded, do nothing", segmentName,
-          _tableNameWithType, localMetadata.getCrc());
+    // Non-null segment metadata means the segment has already been loaded.
+    if (segmentMetadata != null) {
+      if (hasSameCRC(zkMetadata, segmentMetadata)) {
+        // Simply returns if the CRC hasn't changed. The table config may have changed
+        // since segment is loaded, but that is handled by reloadSegment() method.
+        LOGGER.info("Segment: {} of table: {} has crc: {} same as before, already loaded, do nothing", segmentName,
+            _tableNameWithType, segmentMetadata.getCrc());
+      } else {
+        // Download the raw segment, reprocess and load it if the CRC has changed.
+        LOGGER.info("Segment: {} of table: {} already loaded but its crc: {} differs from new crc: {}", segmentName,
+            _tableNameWithType, segmentMetadata.getCrc(), zkMetadata.getCrc());
+        downloadRawSegmentAndProcess(segmentName, indexLoadingConfig, zkMetadata,
+            ZKMetadataProvider.getTableSchema(_propertyStore, _tableNameWithType));
+      }
       return;
     }
 
-    // Try to recover if no local metadata is provided.
-    if (localMetadata == null) {
-      LOGGER.info("Segment: {} of table: {} is not loaded, checking disk", segmentName, _tableNameWithType);
-      localMetadata = recoverSegmentQuietly(segmentName);
-      if (!isNewSegment(zkMetadata, localMetadata)) {
-        LOGGER.info("Segment: {} of table {} has crc: {} same as before, loading", segmentName, _tableNameWithType,
-            localMetadata.getCrc());
-        if (loadSegmentQuietly(segmentName, indexLoadingConfig)) {
-          return;
-        }
-        // Set local metadata to null to indicate that the local segment fails to load,
-        // although it exists and has same crc with the remote one.
-        localMetadata = null;
-      }
+    // For local tier backend, try to recover the segment from potential
+    // reload failure. Continue upon any failure.
+    File indexDir = getSegmentDataDir(segmentName);
+    recoverReloadFailureQuietly(_tableNameWithType, segmentName, indexDir);
+
+    // Creates the SegmentDirectory object to access the segment metadata that
+    // may be from local tier backend or remote tier backend.
+    SegmentDirectoryLoaderContext segmentLoaderContext =
+        new SegmentDirectoryLoaderContext(indexLoadingConfig.getTableConfig(), indexLoadingConfig.getInstanceId(),
+            segmentName, indexLoadingConfig.getSegmentDirectoryConfigs());
+    SegmentDirectoryLoader segmentLoader =
+        SegmentDirectoryLoaderRegistry.getSegmentDirectoryLoader(indexLoadingConfig.getSegmentDirectoryLoader());
+    SegmentDirectory segmentDirectory = null;
+    try {
+      segmentDirectory = segmentLoader.load(indexDir.toURI(), segmentLoaderContext);
+    } catch (Exception e) {
+      LOGGER.warn("Failed to create SegmentDirectory for segment: {} of table: {} due to error: {}", segmentName,
+          _tableNameWithType, e.getMessage());
     }
 
-    Preconditions.checkState(allowDownload(segmentName, zkMetadata), "Segment: %s of table: %s does not allow download",
-        segmentName, _tableNameWithType);
-
-    // Download segment and replace the local one, either due to failure to recover local segment,
-    // or the segment data is updated and has new CRC now.
-    if (localMetadata == null) {
-      LOGGER.info("Download segment: {} of table: {} as no good one exists locally", segmentName, _tableNameWithType);
-    } else {
-      LOGGER.info("Download segment: {} of table: {} as local crc: {} mismatches remote crc: {}.", segmentName,
-          _tableNameWithType, localMetadata.getCrc(), zkMetadata.getCrc());
+    // Download the raw segment, reprocess and load it if it is never loaded or its CRC has changed.
+    if (segmentDirectory == null || !hasSameCRC(zkMetadata, segmentDirectory.getSegmentMetadata())) {
+      LOGGER.info("Segment: {} of table: {} not exist or its crc: {} differs from new crc: {}", segmentName,
+          _tableNameWithType, (segmentDirectory == null) ? "none" : segmentDirectory.getSegmentMetadata().getCrc(),
+          zkMetadata.getCrc());
+      closeSegmentDirectoryQuietly(segmentDirectory);
+      downloadRawSegmentAndProcess(segmentName, indexLoadingConfig, zkMetadata,
+          ZKMetadataProvider.getTableSchema(_propertyStore, _tableNameWithType));
+      return;
     }
-    File indexDir = downloadSegment(segmentName, zkMetadata);
-    addSegment(indexDir, indexLoadingConfig);
-    LOGGER.info("Downloaded and loaded segment: {} of table: {} with crc: {}", segmentName, _tableNameWithType,
-        zkMetadata.getCrc());
-  }
 
-  protected boolean allowDownload(String segmentName, SegmentZKMetadata zkMetadata) {
-    return true;
+    try {
+      Schema schema = ZKMetadataProvider.getTableSchema(_propertyStore, _tableNameWithType);
+      if (!ImmutableSegmentLoader.needPreprocess(segmentDirectory, indexLoadingConfig, schema)) {
+        // The loaded segment is still consistent with current table config or schema.
+        LOGGER.info("Segment: {} of table: {} is consistent with table config and schema", segmentName,
+            _tableNameWithType);
+        addSegment(ImmutableSegmentLoader.load(segmentDirectory, indexLoadingConfig, schema));
+        return;
+      }
+      // If any discrepancy is found, get the segment from tier backend, reprocess and load it.
+      // Please note that the segment is from tier backed, not deep store, for incremental processing.
+      LOGGER.info("Segment: {} of table: {} needs reprocess with table config and schema", segmentName,
+          _tableNameWithType);
+      // Close the stale SegmentDirectory object before loading the newly processed segment.
+      closeSegmentDirectoryQuietly(segmentDirectory);
+      segmentLoader.download(indexDir, segmentLoaderContext);
+      processAndLoadSegment(segmentName, indexDir, indexLoadingConfig, schema);
+      LOGGER.info("Segment: {} of table: {} is reprocessed and loaded", segmentName, _tableNameWithType);
+    } catch (Exception e) {
+      closeSegmentDirectoryQuietly(segmentDirectory);
+      LOGGER.error("Failed to reprocess and load segment: {} of table: {}", segmentName, _tableNameWithType, e);
+      downloadRawSegmentAndProcess(segmentName, indexLoadingConfig, zkMetadata,
+          ZKMetadataProvider.getTableSchema(_propertyStore, _tableNameWithType));
+    }
   }
 
-  protected File downloadSegment(String segmentName, SegmentZKMetadata zkMetadata)
+  /**
+   * This method downloads the raw segment from the deep store and process it, mainly
+   * for cases where segment CRC has changed or the existing segment fails to load.
+   */
+  private void downloadRawSegmentAndProcess(String segmentName, IndexLoadingConfig indexLoadingConfig,
+      SegmentZKMetadata zkMetadata, @Nullable Schema schema)
       throws Exception {
-    // TODO: may support download from peer servers for RealTime table.
-    return downloadSegmentFromDeepStore(segmentName, zkMetadata);
+    Preconditions.checkState(allowDownloadRawSegment(segmentName, zkMetadata),
+        "Segment: %s of table: %s does not allow download raw segment", segmentName, _tableNameWithType);
+
+    File indexDir = downloadRawSegment(segmentName, zkMetadata);
+    processAndLoadSegment(segmentName, indexDir, indexLoadingConfig, schema);
+    LOGGER.info("Downloaded raw segment: {} of table: {} with crc: {} and loaded it", segmentName,
+        _tableNameWithType, zkMetadata.getCrc());
   }
 
-  /**
-   * Server restart during segment reload might leave segment directory in inconsistent state, like the index
-   * directory might not exist but segment backup directory existed. This method tries to recover from reload
-   * failure before checking the existence of the index directory and loading segment metadata from it.
-   */
-  private SegmentMetadata recoverSegmentQuietly(String segmentName) {
-    File indexDir = getSegmentDataDir(segmentName);
+  private void processAndLoadSegment(String segmentName, File indexDir, IndexLoadingConfig indexLoadingConfig,
+      @Nullable Schema schema)
+      throws Exception {
+    // Preprocess the segment locally with current table config and schema.
+    ImmutableSegmentLoader.preprocess(indexDir, indexLoadingConfig, schema);
+
+    SegmentDirectoryLoaderContext segmentLoaderContext =
+        new SegmentDirectoryLoaderContext(indexLoadingConfig.getTableConfig(), indexLoadingConfig.getInstanceId(),
+            segmentName, indexLoadingConfig.getSegmentDirectoryConfigs());
+    SegmentDirectoryLoader segmentLoader =
+        SegmentDirectoryLoaderRegistry.getSegmentDirectoryLoader(indexLoadingConfig.getSegmentDirectoryLoader());
+    SegmentDirectory segmentDirectory = null;
     try {
-      LoaderUtils.reloadFailureRecovery(indexDir);
-      if (!indexDir.exists()) {
-        LOGGER.info("Segment: {} of table: {} is not found on disk", segmentName, _tableNameWithType);
-        return null;
-      }
-      SegmentMetadataImpl localMetadata = new SegmentMetadataImpl(indexDir);
-      LOGGER.info("Segment: {} of table: {} with crc: {} from disk is ready for loading", segmentName,
-          _tableNameWithType, localMetadata.getCrc());
-      return localMetadata;
+      // Upload the processed segment to server tier backend, which can be local or remote.
+      segmentLoader.upload(indexDir, segmentLoaderContext);
+      // Create the SegmentDirectory object with the newly processed segment from tier backend.
+      segmentDirectory = segmentLoader.load(indexDir.toURI(), segmentLoaderContext);

Review comment:
       sgtm




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] klsince commented on a change in pull request #7969: extend SegmentDirectoryLoader interface and refactor BaseTableDataManager

Posted by GitBox <gi...@apache.org>.
klsince commented on a change in pull request #7969:
URL: https://github.com/apache/pinot/pull/7969#discussion_r778357692



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/data/manager/BaseTableDataManager.java
##########
@@ -333,94 +306,156 @@ public void reloadSegment(String segmentName, IndexLoadingConfig indexLoadingCon
 
   @Override
   public void addOrReplaceSegment(String segmentName, IndexLoadingConfig indexLoadingConfig,
-      SegmentZKMetadata zkMetadata, @Nullable SegmentMetadata localMetadata)
+      SegmentZKMetadata zkMetadata, @Nullable SegmentMetadata segmentMetadata)
       throws Exception {
-    if (!isNewSegment(zkMetadata, localMetadata)) {
-      LOGGER.info("Segment: {} of table: {} has crc: {} same as before, already loaded, do nothing", segmentName,
-          _tableNameWithType, localMetadata.getCrc());
+    // Non-null segment metadata means the segment has already been loaded.
+    if (segmentMetadata != null) {
+      if (hasSameCRC(zkMetadata, segmentMetadata)) {
+        // Simply returns if the CRC hasn't changed. The table config may have changed
+        // since segment is loaded, but that is handled by reloadSegment() method.
+        LOGGER.info("Segment: {} of table: {} has crc: {} same as before, already loaded, do nothing", segmentName,
+            _tableNameWithType, segmentMetadata.getCrc());
+      } else {
+        // Download the raw segment, reprocess and load it if the CRC has changed.
+        LOGGER.info("Segment: {} of table: {} already loaded but its crc: {} differs from new crc: {}", segmentName,
+            _tableNameWithType, segmentMetadata.getCrc(), zkMetadata.getCrc());
+        downloadRawSegmentAndProcess(segmentName, indexLoadingConfig, zkMetadata,
+            ZKMetadataProvider.getTableSchema(_propertyStore, _tableNameWithType));
+      }
       return;
     }
 
-    // Try to recover if no local metadata is provided.
-    if (localMetadata == null) {
-      LOGGER.info("Segment: {} of table: {} is not loaded, checking disk", segmentName, _tableNameWithType);
-      localMetadata = recoverSegmentQuietly(segmentName);
-      if (!isNewSegment(zkMetadata, localMetadata)) {
-        LOGGER.info("Segment: {} of table {} has crc: {} same as before, loading", segmentName, _tableNameWithType,
-            localMetadata.getCrc());
-        if (loadSegmentQuietly(segmentName, indexLoadingConfig)) {
-          return;
-        }
-        // Set local metadata to null to indicate that the local segment fails to load,
-        // although it exists and has same crc with the remote one.
-        localMetadata = null;
-      }
-    }
+    // For local tier backend, try to recover the segment from potential
+    // reload failure. Continue upon any failure.
+    File indexDir = getSegmentDataDir(segmentName);
+    recoverReloadFailureQuietly(_tableNameWithType, segmentName, indexDir);
 
-    Preconditions.checkState(allowDownload(segmentName, zkMetadata), "Segment: %s of table: %s does not allow download",
-        segmentName, _tableNameWithType);
+    // Creates the SegmentDirectory object to access the segment metadata that

Review comment:
       If I understand the question correctly:
   1) one may implement SegmentDirectory interface to fetch metadata files in remote tier backend to initialize the SegmentMetadata object
   2) SegmentMetadata object is used to check if CRC changes or if the segment is still consistent with the latest table config and schema




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] codecov-commenter edited a comment on pull request #7969: refactor segment loading logic in BaseTableDataManager to decouple it with local segment directory

Posted by GitBox <gi...@apache.org>.
codecov-commenter edited a comment on pull request #7969:
URL: https://github.com/apache/pinot/pull/7969#issuecomment-1002909897


   # [Codecov](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#7969](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (49c3eba) into [master](https://codecov.io/gh/apache/pinot/commit/a1f3c30e2ac74dd0acb67087a86d8f03b416f5b2?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (a1f3c30) will **decrease** coverage by `57.12%`.
   > The diff coverage is `0.00%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/pinot/pull/7969/graphs/tree.svg?width=650&height=150&src=pr&token=4ibza2ugkz&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   ```diff
   @@              Coverage Diff              @@
   ##             master    #7969       +/-   ##
   =============================================
   - Coverage     71.38%   14.26%   -57.13%     
   + Complexity     4200       81     -4119     
   =============================================
     Files          1595     1550       -45     
     Lines         82595    80923     -1672     
     Branches      12326    12149      -177     
   =============================================
   - Hits          58964    11546    -47418     
   - Misses        19645    68519    +48874     
   + Partials       3986      858     -3128     
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | integration1 | `?` | |
   | integration2 | `?` | |
   | unittests1 | `?` | |
   | unittests2 | `14.26% <0.00%> (-0.04%)` | :arrow_down: |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [.../pinot/core/data/manager/BaseTableDataManager.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9kYXRhL21hbmFnZXIvQmFzZVRhYmxlRGF0YU1hbmFnZXIuamF2YQ==) | `0.00% <0.00%> (-84.90%)` | :arrow_down: |
   | [...indexsegment/immutable/ImmutableSegmentLoader.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9pbmRleHNlZ21lbnQvaW1tdXRhYmxlL0ltbXV0YWJsZVNlZ21lbnRMb2FkZXIuamF2YQ==) | `0.00% <0.00%> (-91.67%)` | :arrow_down: |
   | [...nt/local/loader/DefaultSegmentDirectoryLoader.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9sb2FkZXIvRGVmYXVsdFNlZ21lbnREaXJlY3RvcnlMb2FkZXIuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [...ent/index/column/PhysicalColumnIndexContainer.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2luZGV4L2NvbHVtbi9QaHlzaWNhbENvbHVtbkluZGV4Q29udGFpbmVyLmphdmE=) | `0.00% <0.00%> (-91.67%)` | :arrow_down: |
   | [...ocal/segment/index/loader/SegmentPreProcessor.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2luZGV4L2xvYWRlci9TZWdtZW50UHJlUHJvY2Vzc29yLmphdmE=) | `0.00% <0.00%> (-79.79%)` | :arrow_down: |
   | [.../columnminmaxvalue/ColumnMinMaxValueGenerator.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2luZGV4L2xvYWRlci9jb2x1bW5taW5tYXh2YWx1ZS9Db2x1bW5NaW5NYXhWYWx1ZUdlbmVyYXRvci5qYXZh) | `0.00% <0.00%> (-73.69%)` | :arrow_down: |
   | [...loader/defaultcolumn/BaseDefaultColumnHandler.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2luZGV4L2xvYWRlci9kZWZhdWx0Y29sdW1uL0Jhc2VEZWZhdWx0Q29sdW1uSGFuZGxlci5qYXZh) | `0.00% <0.00%> (-63.27%)` | :arrow_down: |
   | [...t/local/segment/store/SegmentLocalFSDirectory.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L3N0b3JlL1NlZ21lbnRMb2NhbEZTRGlyZWN0b3J5LmphdmE=) | `0.00% <0.00%> (-69.29%)` | :arrow_down: |
   | [...egment/spi/index/metadata/SegmentMetadataImpl.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1zcGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL3Bpbm90L3NlZ21lbnQvc3BpL2luZGV4L21ldGFkYXRhL1NlZ21lbnRNZXRhZGF0YUltcGwuamF2YQ==) | `0.00% <0.00%> (-74.57%)` | :arrow_down: |
   | [...ache/pinot/segment/spi/store/SegmentDirectory.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1zcGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL3Bpbm90L3NlZ21lbnQvc3BpL3N0b3JlL1NlZ21lbnREaXJlY3RvcnkuamF2YQ==) | `0.00% <0.00%> (-62.50%)` | :arrow_down: |
   | ... and [1279 more](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [a1f3c30...49c3eba](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] klsince commented on a change in pull request #7969: extend SegmentDirectoryLoader interface and refactor BaseTableDataManager

Posted by GitBox <gi...@apache.org>.
klsince commented on a change in pull request #7969:
URL: https://github.com/apache/pinot/pull/7969#discussion_r778365333



##########
File path: pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/index/loader/LoaderUtils.java
##########
@@ -161,4 +160,36 @@ public static void reloadFailureRecovery(File indexDir)
       FileUtils.forceDelete(segmentTempDir);
     }
   }
+
+  public static void createBackup(File indexDir)
+      throws IOException {
+    if (!indexDir.exists()) {

Review comment:
       replied [below](https://github.com/apache/pinot/pull/7969#discussion_r778332657)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] klsince commented on a change in pull request #7969: refactor segment loading logic in BaseTableDataManager to decouple it with local segment directory

Posted by GitBox <gi...@apache.org>.
klsince commented on a change in pull request #7969:
URL: https://github.com/apache/pinot/pull/7969#discussion_r780044650



##########
File path: pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/index/loader/LoaderUtils.java
##########
@@ -161,4 +160,36 @@ public static void reloadFailureRecovery(File indexDir)
       FileUtils.forceDelete(segmentTempDir);
     }
   }
+
+  public static void createBackup(File indexDir)
+      throws IOException {
+    if (!indexDir.exists()) {
+      return;
+    }
+    File parentDir = indexDir.getParentFile();
+    File segmentBackupDir = new File(parentDir, indexDir.getName() + CommonConstants.Segment.SEGMENT_BACKUP_DIR_SUFFIX);
+    // Rename index directory to segment backup directory (atomic)
+    Preconditions.checkState(indexDir.renameTo(segmentBackupDir),
+        "Failed to rename index directory: %s to segment backup directory: %s", indexDir, segmentBackupDir);
+    // Copy the backup dir back to proceed.
+    FileUtils.copyDirectory(segmentBackupDir, indexDir);

Review comment:
       The change has been reverted. Just that copying backup directory back is done with SegmentDirectory.copyTo() now to decouple the logic with local segment directory.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] mcvsubbu commented on a change in pull request #7969: refactor segment loading logic in BaseTableDataManager to decouple it with local segment directory

Posted by GitBox <gi...@apache.org>.
mcvsubbu commented on a change in pull request #7969:
URL: https://github.com/apache/pinot/pull/7969#discussion_r780606167



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/data/manager/BaseTableDataManager.java
##########
@@ -335,26 +327,19 @@ public void reloadSegment(String segmentName, IndexLoadingConfig indexLoadingCon
   public void addOrReplaceSegment(String segmentName, IndexLoadingConfig indexLoadingConfig,
       SegmentZKMetadata zkMetadata, @Nullable SegmentMetadata localMetadata)
       throws Exception {
-    if (!isNewSegment(zkMetadata, localMetadata)) {
+    if (localMetadata != null && hasSameCRC(zkMetadata, localMetadata)) {
       LOGGER.info("Segment: {} of table: {} has crc: {} same as before, already loaded, do nothing", segmentName,
           _tableNameWithType, localMetadata.getCrc());
       return;
     }
 
-    // Try to recover if no local metadata is provided.
-    if (localMetadata == null) {
-      LOGGER.info("Segment: {} of table: {} is not loaded, checking disk", segmentName, _tableNameWithType);
-      localMetadata = recoverSegmentQuietly(segmentName);
-      if (!isNewSegment(zkMetadata, localMetadata)) {
-        LOGGER.info("Segment: {} of table {} has crc: {} same as before, loading", segmentName, _tableNameWithType,
-            localMetadata.getCrc());
-        if (loadSegmentQuietly(segmentName, indexLoadingConfig)) {
-          return;
-        }
-        // Set local metadata to null to indicate that the local segment fails to load,
-        // although it exists and has same crc with the remote one.
-        localMetadata = null;
-      }
+    // The segment is not loaded by the server if the metadata object is null. But the segment
+    // may still be kept on the server. For example when server gets restarted, the segment is
+    // still on the server but the metadata object has not been initialized yet. In this case,
+    // we should check if the segment exists on server and try to load it. If the segment does
+    // not exist or fails to get loaded, we download segment from deep store to load it again.
+    if (localMetadata == null && tryLoadExistingSegment(segmentName, indexLoadingConfig, zkMetadata)) {

Review comment:
       tryLoadExistingSegment calls tryInitSegmentDirectory, which ends up loading the segment if it is locally present.
   Yet, in line 510, we call load again. That seems to be a bug.

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/data/manager/BaseTableDataManager.java
##########
@@ -277,49 +282,36 @@ public void addSegmentError(String segmentName, SegmentErrorInfo segmentErrorInf
   public void reloadSegment(String segmentName, IndexLoadingConfig indexLoadingConfig, SegmentZKMetadata zkMetadata,
       SegmentMetadata localMetadata, @Nullable Schema schema, boolean forceDownload)
       throws Exception {
-    File indexDir = localMetadata.getIndexDir();
-    Preconditions.checkState(indexDir.isDirectory(), "Index directory: %s is not a directory", indexDir);
-
-    File parentFile = indexDir.getParentFile();
-    File segmentBackupDir =
-        new File(parentFile, indexDir.getName() + CommonConstants.Segment.SEGMENT_BACKUP_DIR_SUFFIX);
-
+    File indexDir = getSegmentDataDir(segmentName);
     try {
-      // First rename index directory to segment backup directory so that original segment have all file descriptors
-      // point to the segment backup directory to ensure original segment serves queries properly
+      // Create backup directory to handle failure of segment reloading.
+      createBackup(indexDir);
 
-      // Rename index directory to segment backup directory (atomic)
-      Preconditions.checkState(indexDir.renameTo(segmentBackupDir),
-          "Failed to rename index directory: %s to segment backup directory: %s", indexDir, segmentBackupDir);
-
-      // Download from remote or copy from local backup directory into index directory,
-      // and then continue to load the segment from index directory.
+      // Download segment from deep store if CRC changes or forced to download;
+      // otherwise, copy backup directory back to the original index directory.
+      // And then continue to load the segment from the index directory.
       boolean shouldDownload = forceDownload || !hasSameCRC(zkMetadata, localMetadata);
       if (shouldDownload && allowDownload(segmentName, zkMetadata)) {
         if (forceDownload) {
           LOGGER.info("Segment: {} of table: {} is forced to download", segmentName, _tableNameWithType);
         } else {
-          LOGGER.info("Download segment:{} of table: {} as local crc: {} mismatches remote crc: {}", segmentName,
+          LOGGER.info("Download segment:{} of table: {} as crc changes from: {} to: {}", segmentName,
               _tableNameWithType, localMetadata.getCrc(), zkMetadata.getCrc());
         }
         indexDir = downloadSegment(segmentName, zkMetadata);
       } else {
-        LOGGER.info("Reload the local copy of segment: {} of table: {}", segmentName, _tableNameWithType);
-        FileUtils.copyDirectory(segmentBackupDir, indexDir);
+        LOGGER.info("Reload existing segment: {} of table: {}", segmentName, _tableNameWithType);
+        try (SegmentDirectory segmentDirectory = initSegmentDirectory(segmentName, indexLoadingConfig)) {

Review comment:
       This seems to be buggy. As I understand it, we seem to be copying from an empty dir to an empty dir, whereas we need to copy from the backupdir to the segment dir.  Correct me if I am wrong.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] mcvsubbu commented on a change in pull request #7969: extend SegmentDirectoryLoader interface and refactor BaseTableDataManager

Posted by GitBox <gi...@apache.org>.
mcvsubbu commented on a change in pull request #7969:
URL: https://github.com/apache/pinot/pull/7969#discussion_r778443585



##########
File path: pinot-server/src/main/java/org/apache/pinot/server/starter/helix/HelixInstanceDataManager.java
##########
@@ -307,6 +306,32 @@ private void reloadSegment(String tableNameWithType, SegmentMetadata segmentMeta
     }
   }
 
+  /**
+   * Try to reload a segment without a local index directory. The segment can be a consuming
+   * segment from a REALTIME table, or an immutable segment on remote tier backend.
+   * @return true if the segment is loaded.
+   */
+  @VisibleForTesting
+  boolean reloadSegmentWithNullIndexDir(String tableNameWithType, String segmentName,

Review comment:
       You can make it generic when you implement the functionality to do more. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] mcvsubbu commented on a change in pull request #7969: refactor segment loading logic in BaseTableDataManager to decouple it with local segment directory

Posted by GitBox <gi...@apache.org>.
mcvsubbu commented on a change in pull request #7969:
URL: https://github.com/apache/pinot/pull/7969#discussion_r780490610



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/data/manager/BaseTableDataManager.java
##########
@@ -277,49 +282,36 @@ public void addSegmentError(String segmentName, SegmentErrorInfo segmentErrorInf
   public void reloadSegment(String segmentName, IndexLoadingConfig indexLoadingConfig, SegmentZKMetadata zkMetadata,
       SegmentMetadata localMetadata, @Nullable Schema schema, boolean forceDownload)
       throws Exception {
-    File indexDir = localMetadata.getIndexDir();
-    Preconditions.checkState(indexDir.isDirectory(), "Index directory: %s is not a directory", indexDir);
-
-    File parentFile = indexDir.getParentFile();
-    File segmentBackupDir =
-        new File(parentFile, indexDir.getName() + CommonConstants.Segment.SEGMENT_BACKUP_DIR_SUFFIX);
-
+    File indexDir = getSegmentDataDir(segmentName);
     try {
-      // First rename index directory to segment backup directory so that original segment have all file descriptors
-      // point to the segment backup directory to ensure original segment serves queries properly
+      // Create backup directory to handle failure of segment reloading.
+      createBackup(indexDir);
 
-      // Rename index directory to segment backup directory (atomic)
-      Preconditions.checkState(indexDir.renameTo(segmentBackupDir),
-          "Failed to rename index directory: %s to segment backup directory: %s", indexDir, segmentBackupDir);
-
-      // Download from remote or copy from local backup directory into index directory,
-      // and then continue to load the segment from index directory.
+      // Download segment from deep store if CRC changes or forced to download;
+      // otherwise, copy backup directory back to the original index directory.
+      // And then continue to load the segment from the index directory.
       boolean shouldDownload = forceDownload || !hasSameCRC(zkMetadata, localMetadata);
       if (shouldDownload && allowDownload(segmentName, zkMetadata)) {
         if (forceDownload) {
           LOGGER.info("Segment: {} of table: {} is forced to download", segmentName, _tableNameWithType);
         } else {
-          LOGGER.info("Download segment:{} of table: {} as local crc: {} mismatches remote crc: {}", segmentName,
+          LOGGER.info("Download segment:{} of table: {} as crc changes from: {} to: {}", segmentName,
               _tableNameWithType, localMetadata.getCrc(), zkMetadata.getCrc());
         }
         indexDir = downloadSegment(segmentName, zkMetadata);
       } else {
-        LOGGER.info("Reload the local copy of segment: {} of table: {}", segmentName, _tableNameWithType);
-        FileUtils.copyDirectory(segmentBackupDir, indexDir);
+        LOGGER.info("Reload existing segment: {} of table: {}", segmentName, _tableNameWithType);
+        try (SegmentDirectory segmentDirectory = getSegmentDirectory(segmentName, indexLoadingConfig)) {

Review comment:
       Why can't we use the same method as that in line 285 (`getSegmentDataDir(segmentName)`) ?

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/data/manager/BaseTableDataManager.java
##########
@@ -277,49 +282,36 @@ public void addSegmentError(String segmentName, SegmentErrorInfo segmentErrorInf
   public void reloadSegment(String segmentName, IndexLoadingConfig indexLoadingConfig, SegmentZKMetadata zkMetadata,
       SegmentMetadata localMetadata, @Nullable Schema schema, boolean forceDownload)
       throws Exception {
-    File indexDir = localMetadata.getIndexDir();
-    Preconditions.checkState(indexDir.isDirectory(), "Index directory: %s is not a directory", indexDir);
-
-    File parentFile = indexDir.getParentFile();
-    File segmentBackupDir =
-        new File(parentFile, indexDir.getName() + CommonConstants.Segment.SEGMENT_BACKUP_DIR_SUFFIX);
-
+    File indexDir = getSegmentDataDir(segmentName);
     try {
-      // First rename index directory to segment backup directory so that original segment have all file descriptors
-      // point to the segment backup directory to ensure original segment serves queries properly
+      // Create backup directory to handle failure of segment reloading.
+      createBackup(indexDir);
 
-      // Rename index directory to segment backup directory (atomic)
-      Preconditions.checkState(indexDir.renameTo(segmentBackupDir),
-          "Failed to rename index directory: %s to segment backup directory: %s", indexDir, segmentBackupDir);
-
-      // Download from remote or copy from local backup directory into index directory,
-      // and then continue to load the segment from index directory.
+      // Download segment from deep store if CRC changes or forced to download;
+      // otherwise, copy backup directory back to the original index directory.
+      // And then continue to load the segment from the index directory.
       boolean shouldDownload = forceDownload || !hasSameCRC(zkMetadata, localMetadata);
       if (shouldDownload && allowDownload(segmentName, zkMetadata)) {
         if (forceDownload) {
           LOGGER.info("Segment: {} of table: {} is forced to download", segmentName, _tableNameWithType);
         } else {
-          LOGGER.info("Download segment:{} of table: {} as local crc: {} mismatches remote crc: {}", segmentName,
+          LOGGER.info("Download segment:{} of table: {} as crc changes from: {} to: {}", segmentName,
               _tableNameWithType, localMetadata.getCrc(), zkMetadata.getCrc());
         }
         indexDir = downloadSegment(segmentName, zkMetadata);
       } else {
-        LOGGER.info("Reload the local copy of segment: {} of table: {}", segmentName, _tableNameWithType);
-        FileUtils.copyDirectory(segmentBackupDir, indexDir);
+        LOGGER.info("Reload existing segment: {} of table: {}", segmentName, _tableNameWithType);
+        try (SegmentDirectory segmentDirectory = getSegmentDirectory(segmentName, indexLoadingConfig)) {

Review comment:
       Why can't we use the same method as that in line 285 (`getSegmentDataDir(segmentName)`) ?
   And also the FileUtils to copydir?

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/data/manager/BaseTableDataManager.java
##########
@@ -335,26 +327,17 @@ public void reloadSegment(String segmentName, IndexLoadingConfig indexLoadingCon
   public void addOrReplaceSegment(String segmentName, IndexLoadingConfig indexLoadingConfig,
       SegmentZKMetadata zkMetadata, @Nullable SegmentMetadata localMetadata)
       throws Exception {
-    if (!isNewSegment(zkMetadata, localMetadata)) {
+    if (localMetadata != null && hasSameCRC(zkMetadata, localMetadata)) {
       LOGGER.info("Segment: {} of table: {} has crc: {} same as before, already loaded, do nothing", segmentName,
           _tableNameWithType, localMetadata.getCrc());
       return;
     }
 
-    // Try to recover if no local metadata is provided.
-    if (localMetadata == null) {
-      LOGGER.info("Segment: {} of table: {} is not loaded, checking disk", segmentName, _tableNameWithType);
-      localMetadata = recoverSegmentQuietly(segmentName);
-      if (!isNewSegment(zkMetadata, localMetadata)) {
-        LOGGER.info("Segment: {} of table {} has crc: {} same as before, loading", segmentName, _tableNameWithType,
-            localMetadata.getCrc());
-        if (loadSegmentQuietly(segmentName, indexLoadingConfig)) {
-          return;
-        }
-        // Set local metadata to null to indicate that the local segment fails to load,
-        // although it exists and has same crc with the remote one.
-        localMetadata = null;
-      }
+    // The segment is not loaded by the server yet, but it may still kept by the server.

Review comment:
       Can you add some comments on how  you see this happening (local segment present but not loaded).

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/data/manager/BaseTableDataManager.java
##########
@@ -480,12 +426,138 @@ File getSegmentDataDir(String segmentName) {
     return new File(_indexDir, segmentName);
   }
 
-  @VisibleForTesting
-  static boolean isNewSegment(SegmentZKMetadata zkMetadata, @Nullable SegmentMetadata localMetadata) {
-    return localMetadata == null || !hasSameCRC(zkMetadata, localMetadata);
+  /**
+   * Create a backup directory to handle failure of segment reloading.
+   * First rename index directory to segment backup directory so that original segment have all file
+   * descriptors point to the segment backup directory to ensure original segment serves queries properly.
+   * The original index directory is restored lazily, as depending on the conditions,
+   * it may be restored from the backup directory or segment downloaded from deep store.
+   */
+  private void createBackup(File indexDir) {
+    if (!indexDir.exists()) {
+      return;
+    }
+    File parentDir = indexDir.getParentFile();
+    File segmentBackupDir = new File(parentDir, indexDir.getName() + CommonConstants.Segment.SEGMENT_BACKUP_DIR_SUFFIX);
+    // Rename index directory to segment backup directory (atomic).
+    Preconditions.checkState(indexDir.renameTo(segmentBackupDir),
+        "Failed to rename index directory: %s to segment backup directory: %s", indexDir, segmentBackupDir);
+  }
+
+  /**
+   * Remove the backup directory to mark the completion of segment reloading.
+   * First rename then delete is as renaming is an atomic operation, but deleting is not.
+   * When we rename the segment backup directory to segment temporary directory, we know the reload
+   * already succeeded, so that we can safely delete the segment temporary directory.
+   */
+  private void removeBackup(File indexDir)
+      throws IOException {
+    File parentDir = indexDir.getParentFile();
+    File segmentBackupDir = new File(parentDir, indexDir.getName() + CommonConstants.Segment.SEGMENT_BACKUP_DIR_SUFFIX);
+    if (!segmentBackupDir.exists()) {
+      return;
+    }
+    File segmentTempDir = new File(parentDir, indexDir.getName() + CommonConstants.Segment.SEGMENT_TEMP_DIR_SUFFIX);
+    // Rename segment backup directory to segment temporary directory (atomic).
+    Preconditions.checkState(segmentBackupDir.renameTo(segmentTempDir),
+        "Failed to rename segment backup directory: %s to segment temporary directory: %s", segmentBackupDir,
+        segmentTempDir);
+    FileUtils.deleteDirectory(segmentTempDir);
+  }
+
+  private boolean tryLoadExistingSegment(String segmentName, IndexLoadingConfig indexLoadingConfig,
+      SegmentZKMetadata zkMetadata) {
+    // Try to recover the segment from potential segment reloading failure.
+    File indexDir = getSegmentDataDir(segmentName);
+    recoverReloadFailureQuietly(_tableNameWithType, segmentName, indexDir);
+
+    // Creates the SegmentDirectory object to access the segment metadata.
+    // The metadata is null if the segment doesn't exist yet.
+    SegmentDirectory segmentDirectory = tryGetSegmentDirectory(segmentName, indexLoadingConfig);
+    SegmentMetadataImpl segmentMetadata = (segmentDirectory == null) ? null : segmentDirectory.getSegmentMetadata();
+
+    // If the segment doesn't exist on server or its CRC has changed, then we
+    // need to fall back to download the segment from deep store to load it.
+    if (segmentMetadata == null) {
+      LOGGER.info("Segment: {} of table: {} does not exist", segmentName, _tableNameWithType);
+      return false;
+    }
+    if (!hasSameCRC(zkMetadata, segmentMetadata)) {
+      LOGGER.info("Segment: {} of table: {} has crc change from: {} to: {}", segmentName, _tableNameWithType,
+          segmentMetadata.getCrc(), zkMetadata.getCrc());
+      return false;
+    }
+
+    try {
+      // If the segment is still kept by the server, then we can
+      // either load it directly if it's still consistent with latest table config and schema;
+      // or reprocess it to reflect latest table config and schema before loading.
+      Schema schema = ZKMetadataProvider.getTableSchema(_propertyStore, _tableNameWithType);
+      if (!ImmutableSegmentLoader.needPreprocess(segmentDirectory, indexLoadingConfig, schema)) {
+        LOGGER.info("Segment: {} of table: {} is consistent with latest table config and schema", segmentName,
+            _tableNameWithType);
+      } else {
+        LOGGER.info("Segment: {} of table: {} needs reprocess to reflect latest table config and schema", segmentName,
+            _tableNameWithType);
+        segmentDirectory.copyTo(indexDir);
+        // Close the stale SegmentDirectory object and recreate it with reprocessed segment.
+        closeSegmentDirectoryQuietly(segmentDirectory);
+        ImmutableSegmentLoader.preprocess(indexDir, indexLoadingConfig, schema);
+        segmentDirectory = getSegmentDirectory(segmentName, indexLoadingConfig);
+      }
+      ImmutableSegment segment = ImmutableSegmentLoader.load(segmentDirectory, indexLoadingConfig, schema);
+      addSegment(segment);
+      LOGGER.info("Loaded existing segment: {} of table: {} with crc: {}", segmentName, _tableNameWithType,
+          zkMetadata.getCrc());
+      return true;
+    } catch (Exception e) {
+      LOGGER.error("Failed to load existing segment: {} of table: {} with crc: {}", segmentName, _tableNameWithType, e);
+      closeSegmentDirectoryQuietly(segmentDirectory);
+      return false;
+    }
+  }
+
+  private SegmentDirectory tryGetSegmentDirectory(String segmentName, IndexLoadingConfig indexLoadingConfig) {
+    try {
+      return getSegmentDirectory(segmentName, indexLoadingConfig);
+    } catch (Exception e) {
+      LOGGER.warn("Attempt to get SegmentDirectory for segment: {} of table: {} failed with error: {}", segmentName,
+          _tableNameWithType, e.getMessage());
+      return null;
+    }
+  }
+
+  private SegmentDirectory getSegmentDirectory(String segmentName, IndexLoadingConfig indexLoadingConfig)

Review comment:
       this method seems redundant. We already have a getSegmentDataDir method

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/data/manager/BaseTableDataManager.java
##########
@@ -480,12 +426,138 @@ File getSegmentDataDir(String segmentName) {
     return new File(_indexDir, segmentName);
   }
 
-  @VisibleForTesting
-  static boolean isNewSegment(SegmentZKMetadata zkMetadata, @Nullable SegmentMetadata localMetadata) {
-    return localMetadata == null || !hasSameCRC(zkMetadata, localMetadata);
+  /**
+   * Create a backup directory to handle failure of segment reloading.
+   * First rename index directory to segment backup directory so that original segment have all file
+   * descriptors point to the segment backup directory to ensure original segment serves queries properly.
+   * The original index directory is restored lazily, as depending on the conditions,
+   * it may be restored from the backup directory or segment downloaded from deep store.
+   */
+  private void createBackup(File indexDir) {
+    if (!indexDir.exists()) {
+      return;
+    }
+    File parentDir = indexDir.getParentFile();
+    File segmentBackupDir = new File(parentDir, indexDir.getName() + CommonConstants.Segment.SEGMENT_BACKUP_DIR_SUFFIX);
+    // Rename index directory to segment backup directory (atomic).
+    Preconditions.checkState(indexDir.renameTo(segmentBackupDir),
+        "Failed to rename index directory: %s to segment backup directory: %s", indexDir, segmentBackupDir);
+  }
+
+  /**
+   * Remove the backup directory to mark the completion of segment reloading.
+   * First rename then delete is as renaming is an atomic operation, but deleting is not.
+   * When we rename the segment backup directory to segment temporary directory, we know the reload
+   * already succeeded, so that we can safely delete the segment temporary directory.
+   */
+  private void removeBackup(File indexDir)
+      throws IOException {
+    File parentDir = indexDir.getParentFile();
+    File segmentBackupDir = new File(parentDir, indexDir.getName() + CommonConstants.Segment.SEGMENT_BACKUP_DIR_SUFFIX);
+    if (!segmentBackupDir.exists()) {
+      return;
+    }
+    File segmentTempDir = new File(parentDir, indexDir.getName() + CommonConstants.Segment.SEGMENT_TEMP_DIR_SUFFIX);

Review comment:
       Should we add a timestamp here to the tmp segment dir name to be safe? If you agree you can add a TODO or do it in this PR.

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/data/manager/BaseTableDataManager.java
##########
@@ -480,12 +426,138 @@ File getSegmentDataDir(String segmentName) {
     return new File(_indexDir, segmentName);
   }
 
-  @VisibleForTesting
-  static boolean isNewSegment(SegmentZKMetadata zkMetadata, @Nullable SegmentMetadata localMetadata) {
-    return localMetadata == null || !hasSameCRC(zkMetadata, localMetadata);
+  /**
+   * Create a backup directory to handle failure of segment reloading.
+   * First rename index directory to segment backup directory so that original segment have all file
+   * descriptors point to the segment backup directory to ensure original segment serves queries properly.
+   * The original index directory is restored lazily, as depending on the conditions,
+   * it may be restored from the backup directory or segment downloaded from deep store.
+   */
+  private void createBackup(File indexDir) {
+    if (!indexDir.exists()) {
+      return;
+    }
+    File parentDir = indexDir.getParentFile();
+    File segmentBackupDir = new File(parentDir, indexDir.getName() + CommonConstants.Segment.SEGMENT_BACKUP_DIR_SUFFIX);
+    // Rename index directory to segment backup directory (atomic).
+    Preconditions.checkState(indexDir.renameTo(segmentBackupDir),
+        "Failed to rename index directory: %s to segment backup directory: %s", indexDir, segmentBackupDir);
+  }
+
+  /**
+   * Remove the backup directory to mark the completion of segment reloading.
+   * First rename then delete is as renaming is an atomic operation, but deleting is not.
+   * When we rename the segment backup directory to segment temporary directory, we know the reload
+   * already succeeded, so that we can safely delete the segment temporary directory.
+   */
+  private void removeBackup(File indexDir)
+      throws IOException {
+    File parentDir = indexDir.getParentFile();
+    File segmentBackupDir = new File(parentDir, indexDir.getName() + CommonConstants.Segment.SEGMENT_BACKUP_DIR_SUFFIX);
+    if (!segmentBackupDir.exists()) {
+      return;
+    }
+    File segmentTempDir = new File(parentDir, indexDir.getName() + CommonConstants.Segment.SEGMENT_TEMP_DIR_SUFFIX);
+    // Rename segment backup directory to segment temporary directory (atomic).
+    Preconditions.checkState(segmentBackupDir.renameTo(segmentTempDir),
+        "Failed to rename segment backup directory: %s to segment temporary directory: %s", segmentBackupDir,
+        segmentTempDir);
+    FileUtils.deleteDirectory(segmentTempDir);
+  }
+
+  private boolean tryLoadExistingSegment(String segmentName, IndexLoadingConfig indexLoadingConfig,
+      SegmentZKMetadata zkMetadata) {
+    // Try to recover the segment from potential segment reloading failure.
+    File indexDir = getSegmentDataDir(segmentName);
+    recoverReloadFailureQuietly(_tableNameWithType, segmentName, indexDir);
+
+    // Creates the SegmentDirectory object to access the segment metadata.
+    // The metadata is null if the segment doesn't exist yet.
+    SegmentDirectory segmentDirectory = tryGetSegmentDirectory(segmentName, indexLoadingConfig);
+    SegmentMetadataImpl segmentMetadata = (segmentDirectory == null) ? null : segmentDirectory.getSegmentMetadata();
+
+    // If the segment doesn't exist on server or its CRC has changed, then we
+    // need to fall back to download the segment from deep store to load it.
+    if (segmentMetadata == null) {
+      LOGGER.info("Segment: {} of table: {} does not exist", segmentName, _tableNameWithType);
+      return false;
+    }
+    if (!hasSameCRC(zkMetadata, segmentMetadata)) {
+      LOGGER.info("Segment: {} of table: {} has crc change from: {} to: {}", segmentName, _tableNameWithType,
+          segmentMetadata.getCrc(), zkMetadata.getCrc());
+      return false;
+    }
+
+    try {
+      // If the segment is still kept by the server, then we can
+      // either load it directly if it's still consistent with latest table config and schema;
+      // or reprocess it to reflect latest table config and schema before loading.
+      Schema schema = ZKMetadataProvider.getTableSchema(_propertyStore, _tableNameWithType);
+      if (!ImmutableSegmentLoader.needPreprocess(segmentDirectory, indexLoadingConfig, schema)) {
+        LOGGER.info("Segment: {} of table: {} is consistent with latest table config and schema", segmentName,
+            _tableNameWithType);
+      } else {
+        LOGGER.info("Segment: {} of table: {} needs reprocess to reflect latest table config and schema", segmentName,
+            _tableNameWithType);
+        segmentDirectory.copyTo(indexDir);
+        // Close the stale SegmentDirectory object and recreate it with reprocessed segment.
+        closeSegmentDirectoryQuietly(segmentDirectory);
+        ImmutableSegmentLoader.preprocess(indexDir, indexLoadingConfig, schema);
+        segmentDirectory = getSegmentDirectory(segmentName, indexLoadingConfig);
+      }
+      ImmutableSegment segment = ImmutableSegmentLoader.load(segmentDirectory, indexLoadingConfig, schema);
+      addSegment(segment);
+      LOGGER.info("Loaded existing segment: {} of table: {} with crc: {}", segmentName, _tableNameWithType,
+          zkMetadata.getCrc());
+      return true;
+    } catch (Exception e) {
+      LOGGER.error("Failed to load existing segment: {} of table: {} with crc: {}", segmentName, _tableNameWithType, e);
+      closeSegmentDirectoryQuietly(segmentDirectory);
+      return false;
+    }
+  }
+
+  private SegmentDirectory tryGetSegmentDirectory(String segmentName, IndexLoadingConfig indexLoadingConfig) {
+    try {
+      return getSegmentDirectory(segmentName, indexLoadingConfig);

Review comment:
       can we use one call to get segment directory?

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/data/manager/BaseTableDataManager.java
##########
@@ -480,12 +426,138 @@ File getSegmentDataDir(String segmentName) {
     return new File(_indexDir, segmentName);
   }
 
-  @VisibleForTesting
-  static boolean isNewSegment(SegmentZKMetadata zkMetadata, @Nullable SegmentMetadata localMetadata) {
-    return localMetadata == null || !hasSameCRC(zkMetadata, localMetadata);
+  /**
+   * Create a backup directory to handle failure of segment reloading.
+   * First rename index directory to segment backup directory so that original segment have all file
+   * descriptors point to the segment backup directory to ensure original segment serves queries properly.
+   * The original index directory is restored lazily, as depending on the conditions,
+   * it may be restored from the backup directory or segment downloaded from deep store.
+   */
+  private void createBackup(File indexDir) {
+    if (!indexDir.exists()) {
+      return;
+    }
+    File parentDir = indexDir.getParentFile();
+    File segmentBackupDir = new File(parentDir, indexDir.getName() + CommonConstants.Segment.SEGMENT_BACKUP_DIR_SUFFIX);
+    // Rename index directory to segment backup directory (atomic).
+    Preconditions.checkState(indexDir.renameTo(segmentBackupDir),
+        "Failed to rename index directory: %s to segment backup directory: %s", indexDir, segmentBackupDir);
+  }
+
+  /**
+   * Remove the backup directory to mark the completion of segment reloading.
+   * First rename then delete is as renaming is an atomic operation, but deleting is not.
+   * When we rename the segment backup directory to segment temporary directory, we know the reload
+   * already succeeded, so that we can safely delete the segment temporary directory.
+   */
+  private void removeBackup(File indexDir)
+      throws IOException {
+    File parentDir = indexDir.getParentFile();
+    File segmentBackupDir = new File(parentDir, indexDir.getName() + CommonConstants.Segment.SEGMENT_BACKUP_DIR_SUFFIX);
+    if (!segmentBackupDir.exists()) {
+      return;
+    }
+    File segmentTempDir = new File(parentDir, indexDir.getName() + CommonConstants.Segment.SEGMENT_TEMP_DIR_SUFFIX);

Review comment:
       OK, agreed.

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/data/manager/BaseTableDataManager.java
##########
@@ -335,26 +327,19 @@ public void reloadSegment(String segmentName, IndexLoadingConfig indexLoadingCon
   public void addOrReplaceSegment(String segmentName, IndexLoadingConfig indexLoadingConfig,
       SegmentZKMetadata zkMetadata, @Nullable SegmentMetadata localMetadata)
       throws Exception {
-    if (!isNewSegment(zkMetadata, localMetadata)) {
+    if (localMetadata != null && hasSameCRC(zkMetadata, localMetadata)) {
       LOGGER.info("Segment: {} of table: {} has crc: {} same as before, already loaded, do nothing", segmentName,
           _tableNameWithType, localMetadata.getCrc());
       return;
     }
 
-    // Try to recover if no local metadata is provided.
-    if (localMetadata == null) {
-      LOGGER.info("Segment: {} of table: {} is not loaded, checking disk", segmentName, _tableNameWithType);
-      localMetadata = recoverSegmentQuietly(segmentName);
-      if (!isNewSegment(zkMetadata, localMetadata)) {
-        LOGGER.info("Segment: {} of table {} has crc: {} same as before, loading", segmentName, _tableNameWithType,
-            localMetadata.getCrc());
-        if (loadSegmentQuietly(segmentName, indexLoadingConfig)) {
-          return;
-        }
-        // Set local metadata to null to indicate that the local segment fails to load,
-        // although it exists and has same crc with the remote one.
-        localMetadata = null;
-      }
+    // The segment is not loaded by the server if the metadata object is null. But the segment
+    // may still be kept on the server. For example when server gets restarted, the segment is
+    // still on the server but the metadata object has not been initialized yet. In this case,
+    // we should check if the segment exists on server and try to load it. If the segment does
+    // not exist or fails to get loaded, we download segment from deep store to load it again.
+    if (localMetadata == null && tryLoadExistingSegment(segmentName, indexLoadingConfig, zkMetadata)) {

Review comment:
       tryLoadExistingSegment calls tryInitSegmentDirectory, which ends up loading the segment if it is locally present.
   Yet, in line 510, we call load again. That seems to be a bug.

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/data/manager/BaseTableDataManager.java
##########
@@ -277,49 +282,36 @@ public void addSegmentError(String segmentName, SegmentErrorInfo segmentErrorInf
   public void reloadSegment(String segmentName, IndexLoadingConfig indexLoadingConfig, SegmentZKMetadata zkMetadata,
       SegmentMetadata localMetadata, @Nullable Schema schema, boolean forceDownload)
       throws Exception {
-    File indexDir = localMetadata.getIndexDir();
-    Preconditions.checkState(indexDir.isDirectory(), "Index directory: %s is not a directory", indexDir);
-
-    File parentFile = indexDir.getParentFile();
-    File segmentBackupDir =
-        new File(parentFile, indexDir.getName() + CommonConstants.Segment.SEGMENT_BACKUP_DIR_SUFFIX);
-
+    File indexDir = getSegmentDataDir(segmentName);
     try {
-      // First rename index directory to segment backup directory so that original segment have all file descriptors
-      // point to the segment backup directory to ensure original segment serves queries properly
+      // Create backup directory to handle failure of segment reloading.
+      createBackup(indexDir);
 
-      // Rename index directory to segment backup directory (atomic)
-      Preconditions.checkState(indexDir.renameTo(segmentBackupDir),
-          "Failed to rename index directory: %s to segment backup directory: %s", indexDir, segmentBackupDir);
-
-      // Download from remote or copy from local backup directory into index directory,
-      // and then continue to load the segment from index directory.
+      // Download segment from deep store if CRC changes or forced to download;
+      // otherwise, copy backup directory back to the original index directory.
+      // And then continue to load the segment from the index directory.
       boolean shouldDownload = forceDownload || !hasSameCRC(zkMetadata, localMetadata);
       if (shouldDownload && allowDownload(segmentName, zkMetadata)) {
         if (forceDownload) {
           LOGGER.info("Segment: {} of table: {} is forced to download", segmentName, _tableNameWithType);
         } else {
-          LOGGER.info("Download segment:{} of table: {} as local crc: {} mismatches remote crc: {}", segmentName,
+          LOGGER.info("Download segment:{} of table: {} as crc changes from: {} to: {}", segmentName,
               _tableNameWithType, localMetadata.getCrc(), zkMetadata.getCrc());
         }
         indexDir = downloadSegment(segmentName, zkMetadata);
       } else {
-        LOGGER.info("Reload the local copy of segment: {} of table: {}", segmentName, _tableNameWithType);
-        FileUtils.copyDirectory(segmentBackupDir, indexDir);
+        LOGGER.info("Reload existing segment: {} of table: {}", segmentName, _tableNameWithType);
+        try (SegmentDirectory segmentDirectory = initSegmentDirectory(segmentName, indexLoadingConfig)) {

Review comment:
       This seems to be buggy. As I understand it, we seem to be copying from an empty dir to an empty dir, whereas we need to copy from the backupdir to the segment dir.  Correct me if I am wrong.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] codecov-commenter edited a comment on pull request #7969: refactor segment loading logic in BaseTableDataManager to decouple it with local segment directory

Posted by GitBox <gi...@apache.org>.
codecov-commenter edited a comment on pull request #7969:
URL: https://github.com/apache/pinot/pull/7969#issuecomment-1002909897


   # [Codecov](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#7969](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (85c07d0) into [master](https://codecov.io/gh/apache/pinot/commit/a1f3c30e2ac74dd0acb67087a86d8f03b416f5b2?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (a1f3c30) will **decrease** coverage by `8.39%`.
   > The diff coverage is `75.43%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/pinot/pull/7969/graphs/tree.svg?width=650&height=150&src=pr&token=4ibza2ugkz&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   ```diff
   @@             Coverage Diff              @@
   ##             master    #7969      +/-   ##
   ============================================
   - Coverage     71.38%   62.99%   -8.40%     
   + Complexity     4200     4141      -59     
   ============================================
     Files          1595     1586       -9     
     Lines         82595    82454     -141     
     Branches      12326    12315      -11     
   ============================================
   - Hits          58964    51944    -7020     
   - Misses        19645    26747    +7102     
   + Partials       3986     3763     -223     
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | integration1 | `29.06% <25.73%> (-0.02%)` | :arrow_down: |
   | integration2 | `?` | |
   | unittests1 | `68.16% <78.57%> (-0.07%)` | :arrow_down: |
   | unittests2 | `?` | |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [...ent/index/column/PhysicalColumnIndexContainer.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2luZGV4L2NvbHVtbi9QaHlzaWNhbENvbHVtbkluZGV4Q29udGFpbmVyLmphdmE=) | `90.83% <0.00%> (-0.84%)` | :arrow_down: |
   | [...ache/pinot/segment/spi/store/SegmentDirectory.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1zcGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL3Bpbm90L3NlZ21lbnQvc3BpL3N0b3JlL1NlZ21lbnREaXJlY3RvcnkuamF2YQ==) | `55.55% <0.00%> (-6.95%)` | :arrow_down: |
   | [...egment/spi/index/metadata/SegmentMetadataImpl.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1zcGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL3Bpbm90L3NlZ21lbnQvc3BpL2luZGV4L21ldGFkYXRhL1NlZ21lbnRNZXRhZGF0YUltcGwuamF2YQ==) | `74.28% <28.57%> (-0.29%)` | :arrow_down: |
   | [...server/starter/helix/HelixInstanceDataManager.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VydmVyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9zZXJ2ZXIvc3RhcnRlci9oZWxpeC9IZWxpeEluc3RhbmNlRGF0YU1hbmFnZXIuamF2YQ==) | `78.88% <47.05%> (-2.62%)` | :arrow_down: |
   | [...indexsegment/immutable/ImmutableSegmentLoader.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9pbmRleHNlZ21lbnQvaW1tdXRhYmxlL0ltbXV0YWJsZVNlZ21lbnRMb2FkZXIuamF2YQ==) | `84.15% <73.52%> (-7.51%)` | :arrow_down: |
   | [.../pinot/core/data/manager/BaseTableDataManager.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9kYXRhL21hbmFnZXIvQmFzZVRhYmxlRGF0YU1hbmFnZXIuamF2YQ==) | `87.11% <81.25%> (+2.21%)` | :arrow_up: |
   | [...ocal/segment/index/loader/SegmentPreProcessor.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2luZGV4L2xvYWRlci9TZWdtZW50UHJlUHJvY2Vzc29yLmphdmE=) | `80.00% <83.33%> (+0.21%)` | :arrow_up: |
   | [...nt/local/loader/DefaultSegmentDirectoryLoader.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9sb2FkZXIvRGVmYXVsdFNlZ21lbnREaXJlY3RvcnlMb2FkZXIuamF2YQ==) | `100.00% <100.00%> (ø)` | |
   | [.../columnminmaxvalue/ColumnMinMaxValueGenerator.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2luZGV4L2xvYWRlci9jb2x1bW5taW5tYXh2YWx1ZS9Db2x1bW5NaW5NYXhWYWx1ZUdlbmVyYXRvci5qYXZh) | `73.68% <100.00%> (ø)` | |
   | [...loader/defaultcolumn/BaseDefaultColumnHandler.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2luZGV4L2xvYWRlci9kZWZhdWx0Y29sdW1uL0Jhc2VEZWZhdWx0Q29sdW1uSGFuZGxlci5qYXZh) | `63.26% <100.00%> (ø)` | |
   | ... and [289 more](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [a1f3c30...85c07d0](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] codecov-commenter edited a comment on pull request #7969: refactor segment loading logic in BaseTableDataManager to decouple it with local segment directory

Posted by GitBox <gi...@apache.org>.
codecov-commenter edited a comment on pull request #7969:
URL: https://github.com/apache/pinot/pull/7969#issuecomment-1002909897


   # [Codecov](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#7969](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (a65dbaf) into [master](https://codecov.io/gh/apache/pinot/commit/a1f3c30e2ac74dd0acb67087a86d8f03b416f5b2?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (a1f3c30) will **decrease** coverage by `6.57%`.
   > The diff coverage is `78.06%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/pinot/pull/7969/graphs/tree.svg?width=650&height=150&src=pr&token=4ibza2ugkz&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   ```diff
   @@             Coverage Diff              @@
   ##             master    #7969      +/-   ##
   ============================================
   - Coverage     71.38%   64.81%   -6.58%     
   - Complexity     4200     4221      +21     
   ============================================
     Files          1595     1550      -45     
     Lines         82595    80942    -1653     
     Branches      12326    12152     -174     
   ============================================
   - Hits          58964    52465    -6499     
   - Misses        19645    24732    +5087     
   + Partials       3986     3745     -241     
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | integration1 | `?` | |
   | integration2 | `?` | |
   | unittests1 | `68.15% <78.06%> (-0.08%)` | :arrow_down: |
   | unittests2 | `14.26% <0.00%> (-0.05%)` | :arrow_down: |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [...ent/index/column/PhysicalColumnIndexContainer.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2luZGV4L2NvbHVtbi9QaHlzaWNhbENvbHVtbkluZGV4Q29udGFpbmVyLmphdmE=) | `90.83% <0.00%> (-0.84%)` | :arrow_down: |
   | [...ache/pinot/segment/spi/store/SegmentDirectory.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1zcGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL3Bpbm90L3NlZ21lbnQvc3BpL3N0b3JlL1NlZ21lbnREaXJlY3RvcnkuamF2YQ==) | `55.55% <0.00%> (-6.95%)` | :arrow_down: |
   | [...egment/spi/index/metadata/SegmentMetadataImpl.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1zcGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL3Bpbm90L3NlZ21lbnQvc3BpL2luZGV4L21ldGFkYXRhL1NlZ21lbnRNZXRhZGF0YUltcGwuamF2YQ==) | `74.28% <28.57%> (-0.29%)` | :arrow_down: |
   | [...indexsegment/immutable/ImmutableSegmentLoader.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9pbmRleHNlZ21lbnQvaW1tdXRhYmxlL0ltbXV0YWJsZVNlZ21lbnRMb2FkZXIuamF2YQ==) | `84.15% <73.52%> (-7.51%)` | :arrow_down: |
   | [.../pinot/core/data/manager/BaseTableDataManager.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9kYXRhL21hbmFnZXIvQmFzZVRhYmxlRGF0YU1hbmFnZXIuamF2YQ==) | `86.28% <80.24%> (+1.38%)` | :arrow_up: |
   | [...ocal/segment/index/loader/SegmentPreProcessor.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2luZGV4L2xvYWRlci9TZWdtZW50UHJlUHJvY2Vzc29yLmphdmE=) | `80.00% <83.33%> (+0.21%)` | :arrow_up: |
   | [...nt/local/loader/DefaultSegmentDirectoryLoader.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9sb2FkZXIvRGVmYXVsdFNlZ21lbnREaXJlY3RvcnlMb2FkZXIuamF2YQ==) | `100.00% <100.00%> (ø)` | |
   | [.../columnminmaxvalue/ColumnMinMaxValueGenerator.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2luZGV4L2xvYWRlci9jb2x1bW5taW5tYXh2YWx1ZS9Db2x1bW5NaW5NYXhWYWx1ZUdlbmVyYXRvci5qYXZh) | `73.68% <100.00%> (ø)` | |
   | [...loader/defaultcolumn/BaseDefaultColumnHandler.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2luZGV4L2xvYWRlci9kZWZhdWx0Y29sdW1uL0Jhc2VEZWZhdWx0Q29sdW1uSGFuZGxlci5qYXZh) | `63.26% <100.00%> (ø)` | |
   | [...t/local/segment/store/SegmentLocalFSDirectory.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L3N0b3JlL1NlZ21lbnRMb2NhbEZTRGlyZWN0b3J5LmphdmE=) | `71.89% <100.00%> (+2.60%)` | :arrow_up: |
   | ... and [382 more](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [a1f3c30...a65dbaf](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] codecov-commenter edited a comment on pull request #7969: refactor segment loading logic in BaseTableDataManager to decouple it with local segment directory

Posted by GitBox <gi...@apache.org>.
codecov-commenter edited a comment on pull request #7969:
URL: https://github.com/apache/pinot/pull/7969#issuecomment-1002909897


   # [Codecov](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#7969](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (b61612b) into [master](https://codecov.io/gh/apache/pinot/commit/823aa07d7fc61543ed3d91992749d28aecd192ee?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (823aa07) will **decrease** coverage by `42.39%`.
   > The diff coverage is `27.90%`.
   
   > :exclamation: Current head b61612b differs from pull request most recent head e801de0. Consider uploading reports for the commit e801de0 to get more accurate results
   [![Impacted file tree graph](https://codecov.io/gh/apache/pinot/pull/7969/graphs/tree.svg?width=650&height=150&src=pr&token=4ibza2ugkz&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   ```diff
   @@              Coverage Diff              @@
   ##             master    #7969       +/-   ##
   =============================================
   - Coverage     71.35%   28.95%   -42.40%     
   =============================================
     Files          1595     1586        -9     
     Lines         82748    82478      -270     
     Branches      12348    12321       -27     
   =============================================
   - Hits          59046    23885    -35161     
   - Misses        19715    56445    +36730     
   + Partials       3987     2148     -1839     
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | integration1 | `28.95% <27.90%> (-0.11%)` | :arrow_down: |
   | integration2 | `?` | |
   | unittests1 | `?` | |
   | unittests2 | `?` | |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [...indexsegment/immutable/ImmutableSegmentLoader.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9pbmRleHNlZ21lbnQvaW1tdXRhYmxlL0ltbXV0YWJsZVNlZ21lbnRMb2FkZXIuamF2YQ==) | `0.00% <0.00%> (-91.67%)` | :arrow_down: |
   | [...nt/local/loader/DefaultSegmentDirectoryLoader.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9sb2FkZXIvRGVmYXVsdFNlZ21lbnREaXJlY3RvcnlMb2FkZXIuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [...ent/index/column/PhysicalColumnIndexContainer.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2luZGV4L2NvbHVtbi9QaHlzaWNhbENvbHVtbkluZGV4Q29udGFpbmVyLmphdmE=) | `0.00% <0.00%> (-91.67%)` | :arrow_down: |
   | [...ocal/segment/index/loader/SegmentPreProcessor.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2luZGV4L2xvYWRlci9TZWdtZW50UHJlUHJvY2Vzc29yLmphdmE=) | `0.00% <0.00%> (-79.79%)` | :arrow_down: |
   | [.../columnminmaxvalue/ColumnMinMaxValueGenerator.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2luZGV4L2xvYWRlci9jb2x1bW5taW5tYXh2YWx1ZS9Db2x1bW5NaW5NYXhWYWx1ZUdlbmVyYXRvci5qYXZh) | `0.00% <0.00%> (-73.69%)` | :arrow_down: |
   | [...loader/defaultcolumn/BaseDefaultColumnHandler.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2luZGV4L2xvYWRlci9kZWZhdWx0Y29sdW1uL0Jhc2VEZWZhdWx0Q29sdW1uSGFuZGxlci5qYXZh) | `0.00% <0.00%> (-63.27%)` | :arrow_down: |
   | [...t/local/segment/store/SegmentLocalFSDirectory.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L3N0b3JlL1NlZ21lbnRMb2NhbEZTRGlyZWN0b3J5LmphdmE=) | `0.00% <0.00%> (-69.29%)` | :arrow_down: |
   | [...egment/spi/index/metadata/SegmentMetadataImpl.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1zcGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL3Bpbm90L3NlZ21lbnQvc3BpL2luZGV4L21ldGFkYXRhL1NlZ21lbnRNZXRhZGF0YUltcGwuamF2YQ==) | `0.00% <0.00%> (-74.57%)` | :arrow_down: |
   | [...ache/pinot/segment/spi/store/SegmentDirectory.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1zcGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL3Bpbm90L3NlZ21lbnQvc3BpL3N0b3JlL1NlZ21lbnREaXJlY3RvcnkuamF2YQ==) | `0.00% <0.00%> (-62.50%)` | :arrow_down: |
   | [...server/starter/helix/HelixInstanceDataManager.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VydmVyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9zZXJ2ZXIvc3RhcnRlci9oZWxpeC9IZWxpeEluc3RhbmNlRGF0YU1hbmFnZXIuamF2YQ==) | `78.88% <47.05%> (-3.20%)` | :arrow_down: |
   | ... and [1152 more](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [823aa07...e801de0](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] npawar commented on a change in pull request #7969: extend SegmentDirectoryLoader interface and refactor BaseTableDataManager

Posted by GitBox <gi...@apache.org>.
npawar commented on a change in pull request #7969:
URL: https://github.com/apache/pinot/pull/7969#discussion_r777681255



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/data/manager/BaseTableDataManager.java
##########
@@ -333,92 +315,131 @@ public void reloadSegment(String segmentName, IndexLoadingConfig indexLoadingCon
 
   @Override
   public void addOrReplaceSegment(String segmentName, IndexLoadingConfig indexLoadingConfig,
-      SegmentZKMetadata zkMetadata, @Nullable SegmentMetadata localMetadata)
+      SegmentZKMetadata zkMetadata, @Nullable SegmentMetadata segmentMetadata)
       throws Exception {
-    if (!isNewSegment(zkMetadata, localMetadata)) {
-      LOGGER.info("Segment: {} of table: {} has crc: {} same as before, already loaded, do nothing", segmentName,
-          _tableNameWithType, localMetadata.getCrc());
+    // Non-null segment metadata means the segment has already been loaded.
+    if (segmentMetadata != null) {
+      if (hasSameCRC(zkMetadata, segmentMetadata)) {
+        // Simply returns if the CRC hasn't changed. The table config may have changed
+        // since segment is loaded, but that is handled by reloadSegment() method.
+        LOGGER.info("Segment: {} of table: {} has crc: {} same as before, already loaded, do nothing", segmentName,
+            _tableNameWithType, segmentMetadata.getCrc());
+      } else {
+        // Download the raw segment, reprocess and load it if the CRC has changed.
+        LOGGER.info("Segment: {} of table: {} already loaded but its crc: {} differs from new crc: {}", segmentName,
+            _tableNameWithType, segmentMetadata.getCrc(), zkMetadata.getCrc());
+        downloadRawSegmentAndProcess(segmentName, indexLoadingConfig, zkMetadata,
+            ZKMetadataProvider.getTableSchema(_propertyStore, _tableNameWithType));
+      }
       return;
     }
 
-    // Try to recover if no local metadata is provided.
-    if (localMetadata == null) {
-      LOGGER.info("Segment: {} of table: {} is not loaded, checking disk", segmentName, _tableNameWithType);
-      localMetadata = recoverSegmentQuietly(segmentName);
-      if (!isNewSegment(zkMetadata, localMetadata)) {
-        LOGGER.info("Segment: {} of table {} has crc: {} same as before, loading", segmentName, _tableNameWithType,
-            localMetadata.getCrc());
-        if (loadSegmentQuietly(segmentName, indexLoadingConfig)) {
-          return;
-        }
-        // Set local metadata to null to indicate that the local segment fails to load,
-        // although it exists and has same crc with the remote one.
-        localMetadata = null;
-      }
+    // For local tier backend, try to recover the segment from potential
+    // reload failure. Continue upon any failure.
+    File indexDir = getSegmentDataDir(segmentName);
+    recoverReloadFailureQuietly(_tableNameWithType, segmentName, indexDir);
+
+    // Creates the SegmentDirectory object to access the segment metadata that
+    // may be from local tier backend or remote tier backend.
+    SegmentDirectoryLoaderContext segmentLoaderContext =
+        new SegmentDirectoryLoaderContext(indexLoadingConfig.getTableConfig(), indexLoadingConfig.getInstanceId(),
+            segmentName, indexLoadingConfig.getSegmentDirectoryConfigs());
+    SegmentDirectoryLoader segmentLoader =
+        SegmentDirectoryLoaderRegistry.getSegmentDirectoryLoader(indexLoadingConfig.getSegmentDirectoryLoader());
+    SegmentDirectory segmentDirectory = null;
+    try {
+      segmentDirectory = segmentLoader.load(indexDir.toURI(), segmentLoaderContext);
+    } catch (Exception e) {
+      LOGGER.warn("Failed to create SegmentDirectory for segment: {} of table: {} due to error: {}", segmentName,
+          _tableNameWithType, e.getMessage());
     }
 
-    Preconditions.checkState(allowDownload(segmentName, zkMetadata), "Segment: %s of table: %s does not allow download",
-        segmentName, _tableNameWithType);
-
-    // Download segment and replace the local one, either due to failure to recover local segment,
-    // or the segment data is updated and has new CRC now.
-    if (localMetadata == null) {
-      LOGGER.info("Download segment: {} of table: {} as no good one exists locally", segmentName, _tableNameWithType);
-    } else {
-      LOGGER.info("Download segment: {} of table: {} as local crc: {} mismatches remote crc: {}.", segmentName,
-          _tableNameWithType, localMetadata.getCrc(), zkMetadata.getCrc());
+    // Download the raw segment, reprocess and load it if it is never loaded or its CRC has changed.
+    if (segmentDirectory == null || !hasSameCRC(zkMetadata, segmentDirectory.getSegmentMetadata())) {
+      LOGGER.info("Segment: {} of table: {} not exist or its crc: {} differs from new crc: {}", segmentName,
+          _tableNameWithType, (segmentDirectory == null) ? "none" : segmentDirectory.getSegmentMetadata().getCrc(),
+          zkMetadata.getCrc());
+      closeSegmentDirectoryQuietly(segmentDirectory);
+      downloadRawSegmentAndProcess(segmentName, indexLoadingConfig, zkMetadata,
+          ZKMetadataProvider.getTableSchema(_propertyStore, _tableNameWithType));
+      return;
     }
-    File indexDir = downloadSegment(segmentName, zkMetadata);
-    addSegment(indexDir, indexLoadingConfig);
-    LOGGER.info("Downloaded and loaded segment: {} of table: {} with crc: {}", segmentName, _tableNameWithType,
-        zkMetadata.getCrc());
-  }
 
-  protected boolean allowDownload(String segmentName, SegmentZKMetadata zkMetadata) {
-    return true;
+    try {
+      Schema schema = ZKMetadataProvider.getTableSchema(_propertyStore, _tableNameWithType);
+      if (!ImmutableSegmentLoader.needPreprocess(segmentDirectory, indexLoadingConfig, schema)) {
+        // The loaded segment is still consistent with current table config or schema.
+        LOGGER.info("Segment: {} of table: {} is consistent with table config and schema", segmentName,
+            _tableNameWithType);
+        addSegment(ImmutableSegmentLoader.load(segmentDirectory, indexLoadingConfig, schema));
+        return;
+      }
+      // If any discrepancy is found, get the segment from tier backend, reprocess and load it.
+      // Please note that the segment is from tier backed, not deep store, for incremental processing.
+      LOGGER.info("Segment: {} of table: {} needs reprocess with table config and schema", segmentName,
+          _tableNameWithType);
+      // Close the stale SegmentDirectory object before loading the newly processed segment.
+      closeSegmentDirectoryQuietly(segmentDirectory);
+      segmentLoader.download(indexDir, segmentLoaderContext);
+      processAndLoadSegment(segmentName, indexDir, indexLoadingConfig, schema);
+      LOGGER.info("Segment: {} of table: {} is reprocessed and loaded", segmentName, _tableNameWithType);
+    } catch (Exception e) {
+      closeSegmentDirectoryQuietly(segmentDirectory);
+      LOGGER.error("Failed to reprocess and load segment: {} of table: {}", segmentName, _tableNameWithType, e);
+      downloadRawSegmentAndProcess(segmentName, indexLoadingConfig, zkMetadata,
+          ZKMetadataProvider.getTableSchema(_propertyStore, _tableNameWithType));
+    }
   }
 
-  protected File downloadSegment(String segmentName, SegmentZKMetadata zkMetadata)
+  /**
+   * This method downloads the raw segment from the deep store and process it, mainly
+   * for cases where segment CRC has changed or the existing segment fails to load.
+   */
+  private void downloadRawSegmentAndProcess(String segmentName, IndexLoadingConfig indexLoadingConfig,
+      SegmentZKMetadata zkMetadata, @Nullable Schema schema)
       throws Exception {
-    // TODO: may support download from peer servers for RealTime table.
-    return downloadSegmentFromDeepStore(segmentName, zkMetadata);
+    Preconditions.checkState(allowDownloadRawSegment(segmentName, zkMetadata),
+        "Segment: %s of table: %s does not allow download raw segment", segmentName, _tableNameWithType);
+
+    File indexDir = downloadRawSegment(segmentName, zkMetadata);
+    processAndLoadSegment(segmentName, indexDir, indexLoadingConfig, schema);
+    LOGGER.info("Downloaded raw segment: {} of table: {} with crc: {} and loaded it", segmentName,
+        _tableNameWithType, zkMetadata.getCrc());
   }
 
-  /**
-   * Server restart during segment reload might leave segment directory in inconsistent state, like the index
-   * directory might not exist but segment backup directory existed. This method tries to recover from reload
-   * failure before checking the existence of the index directory and loading segment metadata from it.
-   */
-  private SegmentMetadata recoverSegmentQuietly(String segmentName) {
-    File indexDir = getSegmentDataDir(segmentName);
+  private void processAndLoadSegment(String segmentName, File indexDir, IndexLoadingConfig indexLoadingConfig,
+      @Nullable Schema schema)
+      throws Exception {
+    // Preprocess the segment locally with current table config and schema.
+    ImmutableSegmentLoader.preprocess(indexDir, indexLoadingConfig, schema);
+
+    SegmentDirectoryLoaderContext segmentLoaderContext =
+        new SegmentDirectoryLoaderContext(indexLoadingConfig.getTableConfig(), indexLoadingConfig.getInstanceId(),
+            segmentName, indexLoadingConfig.getSegmentDirectoryConfigs());
+    SegmentDirectoryLoader segmentLoader =
+        SegmentDirectoryLoaderRegistry.getSegmentDirectoryLoader(indexLoadingConfig.getSegmentDirectoryLoader());
+    SegmentDirectory segmentDirectory = null;
     try {
-      LoaderUtils.reloadFailureRecovery(indexDir);
-      if (!indexDir.exists()) {
-        LOGGER.info("Segment: {} of table: {} is not found on disk", segmentName, _tableNameWithType);
-        return null;
-      }
-      SegmentMetadataImpl localMetadata = new SegmentMetadataImpl(indexDir);
-      LOGGER.info("Segment: {} of table: {} with crc: {} from disk is ready for loading", segmentName,
-          _tableNameWithType, localMetadata.getCrc());
-      return localMetadata;
+      // Upload the processed segment to server tier backend, which can be local or remote.
+      segmentLoader.upload(indexDir, segmentLoaderContext);
+      // Create the SegmentDirectory object with the newly processed segment from tier backend.
+      segmentDirectory = segmentLoader.load(indexDir.toURI(), segmentLoaderContext);

Review comment:
       nit: rename variable segmentLoader to segmentDirectoryLoader. Otherwise this reads as segmentLoader.load and ImmutableSegmentLoader.load, which can be confused as the same




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] mcvsubbu commented on a change in pull request #7969: refactor segment loading logic in BaseTableDataManager to decouple it with local segment directory

Posted by GitBox <gi...@apache.org>.
mcvsubbu commented on a change in pull request #7969:
URL: https://github.com/apache/pinot/pull/7969#discussion_r784425679



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/data/manager/BaseTableDataManager.java
##########
@@ -484,6 +484,13 @@ private void removeBackup(File indexDir)
     FileUtils.deleteDirectory(segmentTempDir);
   }
 
+  /**
+   * Try to load the segment potentially still existing on the server.
+   *
+   * @return true if the segment still exists on server, its CRC is still same with the
+   * one in SegmentZKMetadata and is loaded into memory successfully; false if it doesn't
+   * exist on the server, its CRC has changed, or it fails to be loaded.

Review comment:
       please add the segmentdirectory is closed when it is false, open otherwise




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] mcvsubbu commented on a change in pull request #7969: extend SegmentDirectoryLoader interface and refactor BaseTableDataManager

Posted by GitBox <gi...@apache.org>.
mcvsubbu commented on a change in pull request #7969:
URL: https://github.com/apache/pinot/pull/7969#discussion_r778442698



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/data/manager/BaseTableDataManager.java
##########
@@ -275,53 +278,23 @@ public void addSegmentError(String segmentName, SegmentErrorInfo segmentErrorInf
 
   @Override
   public void reloadSegment(String segmentName, IndexLoadingConfig indexLoadingConfig, SegmentZKMetadata zkMetadata,
-      SegmentMetadata localMetadata, @Nullable Schema schema, boolean forceDownload)
+      SegmentMetadata segmentMetadata, @Nullable Schema schema, boolean forceDownload)
       throws Exception {
-    File indexDir = localMetadata.getIndexDir();
-    Preconditions.checkState(indexDir.isDirectory(), "Index directory: %s is not a directory", indexDir);
-
-    File parentFile = indexDir.getParentFile();
-    File segmentBackupDir =
-        new File(parentFile, indexDir.getName() + CommonConstants.Segment.SEGMENT_BACKUP_DIR_SUFFIX);
-
+    File indexDir = getSegmentDataDir(segmentName);
+    // Create backup dir to make segment reloading atomic for local tier backend.
+    LoaderUtils.createBackup(indexDir);
     try {
-      // First rename index directory to segment backup directory so that original segment have all file descriptors
-      // point to the segment backup directory to ensure original segment serves queries properly
-
-      // Rename index directory to segment backup directory (atomic)
-      Preconditions.checkState(indexDir.renameTo(segmentBackupDir),
-          "Failed to rename index directory: %s to segment backup directory: %s", indexDir, segmentBackupDir);
-
-      // Download from remote or copy from local backup directory into index directory,
-      // and then continue to load the segment from index directory.
-      boolean shouldDownload = forceDownload || !hasSameCRC(zkMetadata, localMetadata);
-      if (shouldDownload && allowDownload(segmentName, zkMetadata)) {
-        if (forceDownload) {
-          LOGGER.info("Segment: {} of table: {} is forced to download", segmentName, _tableNameWithType);
-        } else {
-          LOGGER.info("Download segment:{} of table: {} as local crc: {} mismatches remote crc: {}", segmentName,
-              _tableNameWithType, localMetadata.getCrc(), zkMetadata.getCrc());
-        }
-        indexDir = downloadSegment(segmentName, zkMetadata);
+      boolean shouldDownloadRawSegment = forceDownload || !hasSameCRC(zkMetadata, segmentMetadata);
+      if (shouldDownloadRawSegment && allowDownloadRawSegment(segmentName, zkMetadata)) {
+        downloadRawSegmentAndProcess(segmentName, indexLoadingConfig, zkMetadata, schema);

Review comment:
       I referred to the design document https://docs.google.com/document/d/1Z4FLg3ezHpqvc6zhy0jR6Wi2OL8wLO_lRC6aLkskFgs/edit#  and pinot documentation https://docs.pinot.apache.org/operators/operating-pinot/tiered-storage
   
   I could not locate the meaning of a "tier backend". In order that the code be maintainable, please provide some clarification 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] klsince commented on a change in pull request #7969: refactor segment loading logic in BaseTableDataManager to decouple it with local segment directory

Posted by GitBox <gi...@apache.org>.
klsince commented on a change in pull request #7969:
URL: https://github.com/apache/pinot/pull/7969#discussion_r780612465



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/data/manager/BaseTableDataManager.java
##########
@@ -335,26 +327,19 @@ public void reloadSegment(String segmentName, IndexLoadingConfig indexLoadingCon
   public void addOrReplaceSegment(String segmentName, IndexLoadingConfig indexLoadingConfig,
       SegmentZKMetadata zkMetadata, @Nullable SegmentMetadata localMetadata)
       throws Exception {
-    if (!isNewSegment(zkMetadata, localMetadata)) {
+    if (localMetadata != null && hasSameCRC(zkMetadata, localMetadata)) {
       LOGGER.info("Segment: {} of table: {} has crc: {} same as before, already loaded, do nothing", segmentName,
           _tableNameWithType, localMetadata.getCrc());
       return;
     }
 
-    // Try to recover if no local metadata is provided.
-    if (localMetadata == null) {
-      LOGGER.info("Segment: {} of table: {} is not loaded, checking disk", segmentName, _tableNameWithType);
-      localMetadata = recoverSegmentQuietly(segmentName);
-      if (!isNewSegment(zkMetadata, localMetadata)) {
-        LOGGER.info("Segment: {} of table {} has crc: {} same as before, loading", segmentName, _tableNameWithType,
-            localMetadata.getCrc());
-        if (loadSegmentQuietly(segmentName, indexLoadingConfig)) {
-          return;
-        }
-        // Set local metadata to null to indicate that the local segment fails to load,
-        // although it exists and has same crc with the remote one.
-        localMetadata = null;
-      }
+    // The segment is not loaded by the server if the metadata object is null. But the segment
+    // may still be kept on the server. For example when server gets restarted, the segment is
+    // still on the server but the metadata object has not been initialized yet. In this case,
+    // we should check if the segment exists on server and try to load it. If the segment does
+    // not exist or fails to get loaded, we download segment from deep store to load it again.
+    if (localMetadata == null && tryLoadExistingSegment(segmentName, indexLoadingConfig, zkMetadata)) {

Review comment:
       L510 `ImmutableSegmentLoader.load(segmentDirectory, indexLoadingConfig, schema);` uses the initialized SegmentDirectory object to create the ImmutableSegment object as the last step of the segment loading logic flow.
   
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] mcvsubbu commented on a change in pull request #7969: refactor segment loading logic in BaseTableDataManager to decouple it with local segment directory

Posted by GitBox <gi...@apache.org>.
mcvsubbu commented on a change in pull request #7969:
URL: https://github.com/apache/pinot/pull/7969#discussion_r780490610



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/data/manager/BaseTableDataManager.java
##########
@@ -277,49 +282,36 @@ public void addSegmentError(String segmentName, SegmentErrorInfo segmentErrorInf
   public void reloadSegment(String segmentName, IndexLoadingConfig indexLoadingConfig, SegmentZKMetadata zkMetadata,
       SegmentMetadata localMetadata, @Nullable Schema schema, boolean forceDownload)
       throws Exception {
-    File indexDir = localMetadata.getIndexDir();
-    Preconditions.checkState(indexDir.isDirectory(), "Index directory: %s is not a directory", indexDir);
-
-    File parentFile = indexDir.getParentFile();
-    File segmentBackupDir =
-        new File(parentFile, indexDir.getName() + CommonConstants.Segment.SEGMENT_BACKUP_DIR_SUFFIX);
-
+    File indexDir = getSegmentDataDir(segmentName);
     try {
-      // First rename index directory to segment backup directory so that original segment have all file descriptors
-      // point to the segment backup directory to ensure original segment serves queries properly
+      // Create backup directory to handle failure of segment reloading.
+      createBackup(indexDir);
 
-      // Rename index directory to segment backup directory (atomic)
-      Preconditions.checkState(indexDir.renameTo(segmentBackupDir),
-          "Failed to rename index directory: %s to segment backup directory: %s", indexDir, segmentBackupDir);
-
-      // Download from remote or copy from local backup directory into index directory,
-      // and then continue to load the segment from index directory.
+      // Download segment from deep store if CRC changes or forced to download;
+      // otherwise, copy backup directory back to the original index directory.
+      // And then continue to load the segment from the index directory.
       boolean shouldDownload = forceDownload || !hasSameCRC(zkMetadata, localMetadata);
       if (shouldDownload && allowDownload(segmentName, zkMetadata)) {
         if (forceDownload) {
           LOGGER.info("Segment: {} of table: {} is forced to download", segmentName, _tableNameWithType);
         } else {
-          LOGGER.info("Download segment:{} of table: {} as local crc: {} mismatches remote crc: {}", segmentName,
+          LOGGER.info("Download segment:{} of table: {} as crc changes from: {} to: {}", segmentName,
               _tableNameWithType, localMetadata.getCrc(), zkMetadata.getCrc());
         }
         indexDir = downloadSegment(segmentName, zkMetadata);
       } else {
-        LOGGER.info("Reload the local copy of segment: {} of table: {}", segmentName, _tableNameWithType);
-        FileUtils.copyDirectory(segmentBackupDir, indexDir);
+        LOGGER.info("Reload existing segment: {} of table: {}", segmentName, _tableNameWithType);
+        try (SegmentDirectory segmentDirectory = getSegmentDirectory(segmentName, indexLoadingConfig)) {

Review comment:
       Why can't we use the same method as that in line 285 (`getSegmentDataDir(segmentName)`) ?
   And also the FileUtils to copydir?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] klsince commented on a change in pull request #7969: extend SegmentDirectoryLoader interface and refactor BaseTableDataManager

Posted by GitBox <gi...@apache.org>.
klsince commented on a change in pull request #7969:
URL: https://github.com/apache/pinot/pull/7969#discussion_r778332657



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/data/manager/BaseTableDataManager.java
##########
@@ -275,53 +278,23 @@ public void addSegmentError(String segmentName, SegmentErrorInfo segmentErrorInf
 
   @Override
   public void reloadSegment(String segmentName, IndexLoadingConfig indexLoadingConfig, SegmentZKMetadata zkMetadata,
-      SegmentMetadata localMetadata, @Nullable Schema schema, boolean forceDownload)
+      SegmentMetadata segmentMetadata, @Nullable Schema schema, boolean forceDownload)
       throws Exception {
-    File indexDir = localMetadata.getIndexDir();
-    Preconditions.checkState(indexDir.isDirectory(), "Index directory: %s is not a directory", indexDir);
-
-    File parentFile = indexDir.getParentFile();
-    File segmentBackupDir =
-        new File(parentFile, indexDir.getName() + CommonConstants.Segment.SEGMENT_BACKUP_DIR_SUFFIX);
-
+    File indexDir = getSegmentDataDir(segmentName);
+    // Create backup dir to make segment reloading atomic for local tier backend.
+    LoaderUtils.createBackup(indexDir);
     try {
-      // First rename index directory to segment backup directory so that original segment have all file descriptors
-      // point to the segment backup directory to ensure original segment serves queries properly
-
-      // Rename index directory to segment backup directory (atomic)
-      Preconditions.checkState(indexDir.renameTo(segmentBackupDir),
-          "Failed to rename index directory: %s to segment backup directory: %s", indexDir, segmentBackupDir);
-
-      // Download from remote or copy from local backup directory into index directory,
-      // and then continue to load the segment from index directory.
-      boolean shouldDownload = forceDownload || !hasSameCRC(zkMetadata, localMetadata);
-      if (shouldDownload && allowDownload(segmentName, zkMetadata)) {
-        if (forceDownload) {
-          LOGGER.info("Segment: {} of table: {} is forced to download", segmentName, _tableNameWithType);
-        } else {
-          LOGGER.info("Download segment:{} of table: {} as local crc: {} mismatches remote crc: {}", segmentName,
-              _tableNameWithType, localMetadata.getCrc(), zkMetadata.getCrc());
-        }
-        indexDir = downloadSegment(segmentName, zkMetadata);
+      boolean shouldDownloadRawSegment = forceDownload || !hasSameCRC(zkMetadata, segmentMetadata);
+      if (shouldDownloadRawSegment && allowDownloadRawSegment(segmentName, zkMetadata)) {
+        downloadRawSegmentAndProcess(segmentName, indexLoadingConfig, zkMetadata, schema);

Review comment:
       A server's `tier backend` is not peer server or deep store.
   
   The refactoring in this PR would allow a server's tier backend to be local disk (the default impl) or remote fs (like s3). The segment from server's tier backend is not the raw segment from deep store or peer server, but the fully preprocessed segment ready to serve queries or to apply incremental changes like adding index. 
   
   When server uses remote tier backend, the `File indexDir` does not exist for the backup/restore flow, so I added the existence check at L166 ^




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] npawar commented on a change in pull request #7969: extend SegmentDirectoryLoader interface and refactor BaseTableDataManager

Posted by GitBox <gi...@apache.org>.
npawar commented on a change in pull request #7969:
URL: https://github.com/apache/pinot/pull/7969#discussion_r779253080



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/data/manager/BaseTableDataManager.java
##########
@@ -333,94 +306,156 @@ public void reloadSegment(String segmentName, IndexLoadingConfig indexLoadingCon
 
   @Override
   public void addOrReplaceSegment(String segmentName, IndexLoadingConfig indexLoadingConfig,
-      SegmentZKMetadata zkMetadata, @Nullable SegmentMetadata localMetadata)
+      SegmentZKMetadata zkMetadata, @Nullable SegmentMetadata segmentMetadata)
       throws Exception {
-    if (!isNewSegment(zkMetadata, localMetadata)) {
-      LOGGER.info("Segment: {} of table: {} has crc: {} same as before, already loaded, do nothing", segmentName,
-          _tableNameWithType, localMetadata.getCrc());
+    // Non-null segment metadata means the segment has already been loaded.
+    if (segmentMetadata != null) {
+      if (hasSameCRC(zkMetadata, segmentMetadata)) {
+        // Simply returns if the CRC hasn't changed. The table config may have changed
+        // since segment is loaded, but that is handled by reloadSegment() method.
+        LOGGER.info("Segment: {} of table: {} has crc: {} same as before, already loaded, do nothing", segmentName,
+            _tableNameWithType, segmentMetadata.getCrc());
+      } else {
+        // Download the raw segment, reprocess and load it if the CRC has changed.
+        LOGGER.info("Segment: {} of table: {} already loaded but its crc: {} differs from new crc: {}", segmentName,
+            _tableNameWithType, segmentMetadata.getCrc(), zkMetadata.getCrc());
+        downloadRawSegmentAndProcess(segmentName, indexLoadingConfig, zkMetadata,

Review comment:
       +1 to Subbu's suggestion. "rawSegment" seems like a new terminology, and it would be better to stick to terms we are familiar with. 
   downloadFromDeepStore is fine, or even just downloadDefault or download?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] klsince commented on a change in pull request #7969: extend SegmentDirectoryLoader interface and refactor BaseTableDataManager

Posted by GitBox <gi...@apache.org>.
klsince commented on a change in pull request #7969:
URL: https://github.com/apache/pinot/pull/7969#discussion_r778349042



##########
File path: pinot-server/src/main/java/org/apache/pinot/server/starter/helix/HelixInstanceDataManager.java
##########
@@ -307,6 +306,32 @@ private void reloadSegment(String tableNameWithType, SegmentMetadata segmentMeta
     }
   }
 
+  /**
+   * Try to reload a segment without a local index directory. The segment can be a consuming
+   * segment from a REALTIME table, or an immutable segment on remote tier backend.
+   * @return true if the segment is loaded.
+   */
+  @VisibleForTesting
+  boolean reloadSegmentWithNullIndexDir(String tableNameWithType, String segmentName,

Review comment:
       actually, the ImmutableSegment type check makes it a bit less specific to mutable segment reloading. I was using reloadMutableSegment but switched to this more generic name.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] mcvsubbu commented on a change in pull request #7969: extend SegmentDirectoryLoader interface and refactor BaseTableDataManager

Posted by GitBox <gi...@apache.org>.
mcvsubbu commented on a change in pull request #7969:
URL: https://github.com/apache/pinot/pull/7969#discussion_r778478304



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/data/manager/BaseTableDataManager.java
##########
@@ -333,94 +306,162 @@ public void reloadSegment(String segmentName, IndexLoadingConfig indexLoadingCon
 
   @Override
   public void addOrReplaceSegment(String segmentName, IndexLoadingConfig indexLoadingConfig,
-      SegmentZKMetadata zkMetadata, @Nullable SegmentMetadata localMetadata)
+      SegmentZKMetadata zkMetadata, @Nullable SegmentMetadata segmentMetadata)
       throws Exception {
-    if (!isNewSegment(zkMetadata, localMetadata)) {
-      LOGGER.info("Segment: {} of table: {} has crc: {} same as before, already loaded, do nothing", segmentName,
-          _tableNameWithType, localMetadata.getCrc());
+    // Non-null segment metadata means the segment has already been loaded.
+    if (segmentMetadata != null) {
+      if (hasSameCRC(zkMetadata, segmentMetadata)) {
+        // Simply returns if the CRC hasn't changed. The table config may have changed
+        // since segment is loaded, but that is handled by reloadSegment() method.
+        LOGGER.info("Segment: {} of table: {} has crc: {} same as before, already loaded, do nothing", segmentName,
+            _tableNameWithType, segmentMetadata.getCrc());
+      } else {
+        // Download the raw segment, reprocess and load it if the CRC has changed.
+        LOGGER.info("Segment: {} of table: {} already loaded but its crc: {} differs from new crc: {}", segmentName,
+            _tableNameWithType, segmentMetadata.getCrc(), zkMetadata.getCrc());
+        downloadRawSegmentAndProcess(segmentName, indexLoadingConfig, zkMetadata,
+            ZKMetadataProvider.getTableSchema(_propertyStore, _tableNameWithType));
+      }
       return;
     }
 
-    // Try to recover if no local metadata is provided.
-    if (localMetadata == null) {
-      LOGGER.info("Segment: {} of table: {} is not loaded, checking disk", segmentName, _tableNameWithType);
-      localMetadata = recoverSegmentQuietly(segmentName);
-      if (!isNewSegment(zkMetadata, localMetadata)) {
-        LOGGER.info("Segment: {} of table {} has crc: {} same as before, loading", segmentName, _tableNameWithType,
-            localMetadata.getCrc());
-        if (loadSegmentQuietly(segmentName, indexLoadingConfig)) {
-          return;
-        }
-        // Set local metadata to null to indicate that the local segment fails to load,
-        // although it exists and has same crc with the remote one.
-        localMetadata = null;
-      }
-    }
+    // For local tier backend, try to recover the segment from potential
+    // reload failure. Continue upon any failure.
+    File indexDir = getSegmentDataDir(segmentName);
+    recoverReloadFailureQuietly(_tableNameWithType, segmentName, indexDir);
 
-    Preconditions.checkState(allowDownload(segmentName, zkMetadata), "Segment: %s of table: %s does not allow download",
-        segmentName, _tableNameWithType);
+    // Creates the SegmentDirectory object to access the segment metadata that
+    // may be from local tier backend or remote tier backend.
+    SegmentDirectory segmentDirectory = null;
+    try {
+      SegmentDirectoryLoaderContext loaderContext =
+          new SegmentDirectoryLoaderContext(indexLoadingConfig.getTableConfig(), indexLoadingConfig.getInstanceId(),
+              segmentName, indexLoadingConfig.getSegmentDirectoryConfigs());
+      SegmentDirectoryLoader segmentDirectoryLoader =
+          SegmentDirectoryLoaderRegistry.getSegmentDirectoryLoader(indexLoadingConfig.getSegmentDirectoryLoader());
+      segmentDirectory = segmentDirectoryLoader.load(indexDir.toURI(), loaderContext);
+    } catch (Exception e) {
+      LOGGER.warn("Failed to create SegmentDirectory for segment: {} of table: {} due to error: {}", segmentName,
+          _tableNameWithType, e.getMessage());
+    }
 
-    // Download segment and replace the local one, either due to failure to recover local segment,
-    // or the segment data is updated and has new CRC now.
-    if (localMetadata == null) {
-      LOGGER.info("Download segment: {} of table: {} as no good one exists locally", segmentName, _tableNameWithType);
-    } else {
-      LOGGER.info("Download segment: {} of table: {} as local crc: {} mismatches remote crc: {}.", segmentName,
-          _tableNameWithType, localMetadata.getCrc(), zkMetadata.getCrc());
+    // Download the raw segment, reprocess and load it if it is never loaded or its CRC has changed.
+    if (segmentDirectory == null) {
+      LOGGER.info("Segment: {} of table: {} does not exist", segmentName, _tableNameWithType);
+      downloadRawSegmentAndProcess(segmentName, indexLoadingConfig, zkMetadata,
+          ZKMetadataProvider.getTableSchema(_propertyStore, _tableNameWithType));
+      return;
+    }
+    if (!hasSameCRC(zkMetadata, segmentDirectory.getSegmentMetadata())) {
+      LOGGER.info("Segment: {} of table: {} has crc: {} different from new crc: {}", segmentName,
+          _tableNameWithType, segmentDirectory.getSegmentMetadata().getCrc(), zkMetadata.getCrc());
+      closeSegmentDirectoryQuietly(segmentDirectory);
+      downloadRawSegmentAndProcess(segmentName, indexLoadingConfig, zkMetadata,
+          ZKMetadataProvider.getTableSchema(_propertyStore, _tableNameWithType));
+      return;
     }
-    File indexDir = downloadSegment(segmentName, zkMetadata);
-    addSegment(indexDir, indexLoadingConfig);
-    LOGGER.info("Downloaded and loaded segment: {} of table: {} with crc: {}", segmentName, _tableNameWithType,
-        zkMetadata.getCrc());
-  }
 
-  protected boolean allowDownload(String segmentName, SegmentZKMetadata zkMetadata) {
-    return true;
+    try {
+      Schema schema = ZKMetadataProvider.getTableSchema(_propertyStore, _tableNameWithType);
+      if (!ImmutableSegmentLoader.needPreprocess(segmentDirectory, indexLoadingConfig, schema)) {
+        // The loaded segment is still consistent with current table config or schema.
+        LOGGER.info("Segment: {} of table: {} is consistent with table config and schema", segmentName,
+            _tableNameWithType);
+        loadSegment(segmentDirectory, indexLoadingConfig, schema);
+        return;
+      }
+      // If any discrepancy is found, get the segment from tier backend, reprocess and load it.
+      LOGGER.info("Segment: {} of table: {} needs reprocess to reflect latest table config and schema", segmentName,
+          _tableNameWithType);
+      // Close the stale SegmentDirectory object before loading the newly processed segment.
+      closeSegmentDirectoryQuietly(segmentDirectory);
+      downloadTierSegmentAndProcess(segmentName, indexLoadingConfig, schema);
+    } catch (Exception e) {
+      closeSegmentDirectoryQuietly(segmentDirectory);
+      LOGGER.error("Failed to reprocess and load segment: {} of table: {}", segmentName, _tableNameWithType, e);
+      downloadRawSegmentAndProcess(segmentName, indexLoadingConfig, zkMetadata,
+          ZKMetadataProvider.getTableSchema(_propertyStore, _tableNameWithType));
+    }
   }
 
-  protected File downloadSegment(String segmentName, SegmentZKMetadata zkMetadata)
+  /**
+   * Get the segment from the configured tier backend, reprocess it with the latest table
+   * config and schema and then load it. Please note that the segment is from tier backend,
+   * not deep store, for incremental processing.
+   */
+  private void downloadTierSegmentAndProcess(String segmentName, IndexLoadingConfig indexLoadingConfig,
+      @Nullable Schema schema)
       throws Exception {
-    // TODO: may support download from peer servers for RealTime table.
-    return downloadSegmentFromDeepStore(segmentName, zkMetadata);
+    SegmentDirectoryLoaderContext loaderContext =
+        new SegmentDirectoryLoaderContext(indexLoadingConfig.getTableConfig(), indexLoadingConfig.getInstanceId(),
+            segmentName, indexLoadingConfig.getSegmentDirectoryConfigs());
+    SegmentDirectoryLoader segmentDirectoryLoader =
+        SegmentDirectoryLoaderRegistry.getSegmentDirectoryLoader(indexLoadingConfig.getSegmentDirectoryLoader());
+    File indexDir = getSegmentDataDir(segmentName);
+    segmentDirectoryLoader.download(indexDir, loaderContext);

Review comment:
       It seems to me that the segment directory loader should take care of downloading from tier backend or deepstore, depending on some setting, perhaps in the loadercontext.
   
   I am yet to understand what the difference is between a tier backend segment and the "raw" segment, though. Let me know what I am missing here.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] klsince commented on a change in pull request #7969: extend SegmentDirectoryLoader interface and refactor BaseTableDataManager

Posted by GitBox <gi...@apache.org>.
klsince commented on a change in pull request #7969:
URL: https://github.com/apache/pinot/pull/7969#discussion_r777739783



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/data/manager/BaseTableDataManager.java
##########
@@ -333,92 +315,131 @@ public void reloadSegment(String segmentName, IndexLoadingConfig indexLoadingCon
 
   @Override
   public void addOrReplaceSegment(String segmentName, IndexLoadingConfig indexLoadingConfig,
-      SegmentZKMetadata zkMetadata, @Nullable SegmentMetadata localMetadata)
+      SegmentZKMetadata zkMetadata, @Nullable SegmentMetadata segmentMetadata)
       throws Exception {
-    if (!isNewSegment(zkMetadata, localMetadata)) {
-      LOGGER.info("Segment: {} of table: {} has crc: {} same as before, already loaded, do nothing", segmentName,
-          _tableNameWithType, localMetadata.getCrc());
+    // Non-null segment metadata means the segment has already been loaded.
+    if (segmentMetadata != null) {
+      if (hasSameCRC(zkMetadata, segmentMetadata)) {
+        // Simply returns if the CRC hasn't changed. The table config may have changed
+        // since segment is loaded, but that is handled by reloadSegment() method.
+        LOGGER.info("Segment: {} of table: {} has crc: {} same as before, already loaded, do nothing", segmentName,
+            _tableNameWithType, segmentMetadata.getCrc());
+      } else {
+        // Download the raw segment, reprocess and load it if the CRC has changed.
+        LOGGER.info("Segment: {} of table: {} already loaded but its crc: {} differs from new crc: {}", segmentName,
+            _tableNameWithType, segmentMetadata.getCrc(), zkMetadata.getCrc());
+        downloadRawSegmentAndProcess(segmentName, indexLoadingConfig, zkMetadata,
+            ZKMetadataProvider.getTableSchema(_propertyStore, _tableNameWithType));
+      }
       return;
     }
 
-    // Try to recover if no local metadata is provided.
-    if (localMetadata == null) {
-      LOGGER.info("Segment: {} of table: {} is not loaded, checking disk", segmentName, _tableNameWithType);
-      localMetadata = recoverSegmentQuietly(segmentName);
-      if (!isNewSegment(zkMetadata, localMetadata)) {
-        LOGGER.info("Segment: {} of table {} has crc: {} same as before, loading", segmentName, _tableNameWithType,
-            localMetadata.getCrc());
-        if (loadSegmentQuietly(segmentName, indexLoadingConfig)) {
-          return;
-        }
-        // Set local metadata to null to indicate that the local segment fails to load,
-        // although it exists and has same crc with the remote one.
-        localMetadata = null;
-      }
+    // For local tier backend, try to recover the segment from potential
+    // reload failure. Continue upon any failure.
+    File indexDir = getSegmentDataDir(segmentName);
+    recoverReloadFailureQuietly(_tableNameWithType, segmentName, indexDir);
+
+    // Creates the SegmentDirectory object to access the segment metadata that
+    // may be from local tier backend or remote tier backend.
+    SegmentDirectoryLoaderContext segmentLoaderContext =
+        new SegmentDirectoryLoaderContext(indexLoadingConfig.getTableConfig(), indexLoadingConfig.getInstanceId(),
+            segmentName, indexLoadingConfig.getSegmentDirectoryConfigs());
+    SegmentDirectoryLoader segmentLoader =
+        SegmentDirectoryLoaderRegistry.getSegmentDirectoryLoader(indexLoadingConfig.getSegmentDirectoryLoader());
+    SegmentDirectory segmentDirectory = null;
+    try {
+      segmentDirectory = segmentLoader.load(indexDir.toURI(), segmentLoaderContext);
+    } catch (Exception e) {
+      LOGGER.warn("Failed to create SegmentDirectory for segment: {} of table: {} due to error: {}", segmentName,
+          _tableNameWithType, e.getMessage());
     }
 
-    Preconditions.checkState(allowDownload(segmentName, zkMetadata), "Segment: %s of table: %s does not allow download",
-        segmentName, _tableNameWithType);
-
-    // Download segment and replace the local one, either due to failure to recover local segment,
-    // or the segment data is updated and has new CRC now.
-    if (localMetadata == null) {
-      LOGGER.info("Download segment: {} of table: {} as no good one exists locally", segmentName, _tableNameWithType);
-    } else {
-      LOGGER.info("Download segment: {} of table: {} as local crc: {} mismatches remote crc: {}.", segmentName,
-          _tableNameWithType, localMetadata.getCrc(), zkMetadata.getCrc());
+    // Download the raw segment, reprocess and load it if it is never loaded or its CRC has changed.
+    if (segmentDirectory == null || !hasSameCRC(zkMetadata, segmentDirectory.getSegmentMetadata())) {
+      LOGGER.info("Segment: {} of table: {} not exist or its crc: {} differs from new crc: {}", segmentName,

Review comment:
       no problemo




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] npawar commented on a change in pull request #7969: extend SegmentDirectoryLoader interface and refactor BaseTableDataManager

Posted by GitBox <gi...@apache.org>.
npawar commented on a change in pull request #7969:
URL: https://github.com/apache/pinot/pull/7969#discussion_r778464303



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/data/manager/BaseTableDataManager.java
##########
@@ -275,53 +278,23 @@ public void addSegmentError(String segmentName, SegmentErrorInfo segmentErrorInf
 
   @Override
   public void reloadSegment(String segmentName, IndexLoadingConfig indexLoadingConfig, SegmentZKMetadata zkMetadata,
-      SegmentMetadata localMetadata, @Nullable Schema schema, boolean forceDownload)
+      SegmentMetadata segmentMetadata, @Nullable Schema schema, boolean forceDownload)
       throws Exception {
-    File indexDir = localMetadata.getIndexDir();
-    Preconditions.checkState(indexDir.isDirectory(), "Index directory: %s is not a directory", indexDir);
-
-    File parentFile = indexDir.getParentFile();
-    File segmentBackupDir =
-        new File(parentFile, indexDir.getName() + CommonConstants.Segment.SEGMENT_BACKUP_DIR_SUFFIX);
-
+    File indexDir = getSegmentDataDir(segmentName);
+    // Create backup dir to make segment reloading atomic for local tier backend.
+    LoaderUtils.createBackup(indexDir);
     try {
-      // First rename index directory to segment backup directory so that original segment have all file descriptors
-      // point to the segment backup directory to ensure original segment serves queries properly
-
-      // Rename index directory to segment backup directory (atomic)
-      Preconditions.checkState(indexDir.renameTo(segmentBackupDir),
-          "Failed to rename index directory: %s to segment backup directory: %s", indexDir, segmentBackupDir);
-
-      // Download from remote or copy from local backup directory into index directory,
-      // and then continue to load the segment from index directory.
-      boolean shouldDownload = forceDownload || !hasSameCRC(zkMetadata, localMetadata);
-      if (shouldDownload && allowDownload(segmentName, zkMetadata)) {
-        if (forceDownload) {
-          LOGGER.info("Segment: {} of table: {} is forced to download", segmentName, _tableNameWithType);
-        } else {
-          LOGGER.info("Download segment:{} of table: {} as local crc: {} mismatches remote crc: {}", segmentName,
-              _tableNameWithType, localMetadata.getCrc(), zkMetadata.getCrc());
-        }
-        indexDir = downloadSegment(segmentName, zkMetadata);
+      boolean shouldDownloadRawSegment = forceDownload || !hasSameCRC(zkMetadata, segmentMetadata);
+      if (shouldDownloadRawSegment && allowDownloadRawSegment(segmentName, zkMetadata)) {
+        downloadRawSegmentAndProcess(segmentName, indexLoadingConfig, zkMetadata, schema);

Review comment:
       @mcvsubbu tierBackend concept was already introduced last year (just updated the design doc to reflect it, sorry for not doing it when the change was added). Explanation regarding the new field added to the design too




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] npawar commented on a change in pull request #7969: extend SegmentDirectoryLoader interface and refactor BaseTableDataManager

Posted by GitBox <gi...@apache.org>.
npawar commented on a change in pull request #7969:
URL: https://github.com/apache/pinot/pull/7969#discussion_r779254245



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/data/manager/BaseTableDataManager.java
##########
@@ -333,94 +306,162 @@ public void reloadSegment(String segmentName, IndexLoadingConfig indexLoadingCon
 
   @Override
   public void addOrReplaceSegment(String segmentName, IndexLoadingConfig indexLoadingConfig,
-      SegmentZKMetadata zkMetadata, @Nullable SegmentMetadata localMetadata)
+      SegmentZKMetadata zkMetadata, @Nullable SegmentMetadata segmentMetadata)
       throws Exception {
-    if (!isNewSegment(zkMetadata, localMetadata)) {
-      LOGGER.info("Segment: {} of table: {} has crc: {} same as before, already loaded, do nothing", segmentName,
-          _tableNameWithType, localMetadata.getCrc());
+    // Non-null segment metadata means the segment has already been loaded.
+    if (segmentMetadata != null) {
+      if (hasSameCRC(zkMetadata, segmentMetadata)) {
+        // Simply returns if the CRC hasn't changed. The table config may have changed
+        // since segment is loaded, but that is handled by reloadSegment() method.
+        LOGGER.info("Segment: {} of table: {} has crc: {} same as before, already loaded, do nothing", segmentName,
+            _tableNameWithType, segmentMetadata.getCrc());
+      } else {
+        // Download the raw segment, reprocess and load it if the CRC has changed.
+        LOGGER.info("Segment: {} of table: {} already loaded but its crc: {} differs from new crc: {}", segmentName,
+            _tableNameWithType, segmentMetadata.getCrc(), zkMetadata.getCrc());
+        downloadRawSegmentAndProcess(segmentName, indexLoadingConfig, zkMetadata,
+            ZKMetadataProvider.getTableSchema(_propertyStore, _tableNameWithType));
+      }
       return;
     }
 
-    // Try to recover if no local metadata is provided.
-    if (localMetadata == null) {
-      LOGGER.info("Segment: {} of table: {} is not loaded, checking disk", segmentName, _tableNameWithType);
-      localMetadata = recoverSegmentQuietly(segmentName);
-      if (!isNewSegment(zkMetadata, localMetadata)) {
-        LOGGER.info("Segment: {} of table {} has crc: {} same as before, loading", segmentName, _tableNameWithType,
-            localMetadata.getCrc());
-        if (loadSegmentQuietly(segmentName, indexLoadingConfig)) {
-          return;
-        }
-        // Set local metadata to null to indicate that the local segment fails to load,
-        // although it exists and has same crc with the remote one.
-        localMetadata = null;
-      }
-    }
+    // For local tier backend, try to recover the segment from potential
+    // reload failure. Continue upon any failure.
+    File indexDir = getSegmentDataDir(segmentName);
+    recoverReloadFailureQuietly(_tableNameWithType, segmentName, indexDir);
 
-    Preconditions.checkState(allowDownload(segmentName, zkMetadata), "Segment: %s of table: %s does not allow download",
-        segmentName, _tableNameWithType);
+    // Creates the SegmentDirectory object to access the segment metadata that
+    // may be from local tier backend or remote tier backend.
+    SegmentDirectory segmentDirectory = null;
+    try {
+      SegmentDirectoryLoaderContext loaderContext =
+          new SegmentDirectoryLoaderContext(indexLoadingConfig.getTableConfig(), indexLoadingConfig.getInstanceId(),
+              segmentName, indexLoadingConfig.getSegmentDirectoryConfigs());
+      SegmentDirectoryLoader segmentDirectoryLoader =
+          SegmentDirectoryLoaderRegistry.getSegmentDirectoryLoader(indexLoadingConfig.getSegmentDirectoryLoader());
+      segmentDirectory = segmentDirectoryLoader.load(indexDir.toURI(), loaderContext);
+    } catch (Exception e) {
+      LOGGER.warn("Failed to create SegmentDirectory for segment: {} of table: {} due to error: {}", segmentName,
+          _tableNameWithType, e.getMessage());
+    }
 
-    // Download segment and replace the local one, either due to failure to recover local segment,
-    // or the segment data is updated and has new CRC now.
-    if (localMetadata == null) {
-      LOGGER.info("Download segment: {} of table: {} as no good one exists locally", segmentName, _tableNameWithType);
-    } else {
-      LOGGER.info("Download segment: {} of table: {} as local crc: {} mismatches remote crc: {}.", segmentName,
-          _tableNameWithType, localMetadata.getCrc(), zkMetadata.getCrc());
+    // Download the raw segment, reprocess and load it if it is never loaded or its CRC has changed.
+    if (segmentDirectory == null) {
+      LOGGER.info("Segment: {} of table: {} does not exist", segmentName, _tableNameWithType);
+      downloadRawSegmentAndProcess(segmentName, indexLoadingConfig, zkMetadata,
+          ZKMetadataProvider.getTableSchema(_propertyStore, _tableNameWithType));
+      return;
+    }
+    if (!hasSameCRC(zkMetadata, segmentDirectory.getSegmentMetadata())) {
+      LOGGER.info("Segment: {} of table: {} has crc: {} different from new crc: {}", segmentName,
+          _tableNameWithType, segmentDirectory.getSegmentMetadata().getCrc(), zkMetadata.getCrc());
+      closeSegmentDirectoryQuietly(segmentDirectory);
+      downloadRawSegmentAndProcess(segmentName, indexLoadingConfig, zkMetadata,
+          ZKMetadataProvider.getTableSchema(_propertyStore, _tableNameWithType));
+      return;
     }
-    File indexDir = downloadSegment(segmentName, zkMetadata);
-    addSegment(indexDir, indexLoadingConfig);
-    LOGGER.info("Downloaded and loaded segment: {} of table: {} with crc: {}", segmentName, _tableNameWithType,
-        zkMetadata.getCrc());
-  }
 
-  protected boolean allowDownload(String segmentName, SegmentZKMetadata zkMetadata) {
-    return true;
+    try {
+      Schema schema = ZKMetadataProvider.getTableSchema(_propertyStore, _tableNameWithType);
+      if (!ImmutableSegmentLoader.needPreprocess(segmentDirectory, indexLoadingConfig, schema)) {
+        // The loaded segment is still consistent with current table config or schema.
+        LOGGER.info("Segment: {} of table: {} is consistent with table config and schema", segmentName,
+            _tableNameWithType);
+        loadSegment(segmentDirectory, indexLoadingConfig, schema);
+        return;
+      }
+      // If any discrepancy is found, get the segment from tier backend, reprocess and load it.
+      LOGGER.info("Segment: {} of table: {} needs reprocess to reflect latest table config and schema", segmentName,
+          _tableNameWithType);
+      // Close the stale SegmentDirectory object before loading the newly processed segment.
+      closeSegmentDirectoryQuietly(segmentDirectory);
+      downloadTierSegmentAndProcess(segmentName, indexLoadingConfig, schema);
+    } catch (Exception e) {
+      closeSegmentDirectoryQuietly(segmentDirectory);
+      LOGGER.error("Failed to reprocess and load segment: {} of table: {}", segmentName, _tableNameWithType, e);
+      downloadRawSegmentAndProcess(segmentName, indexLoadingConfig, zkMetadata,
+          ZKMetadataProvider.getTableSchema(_propertyStore, _tableNameWithType));
+    }
   }
 
-  protected File downloadSegment(String segmentName, SegmentZKMetadata zkMetadata)
+  /**
+   * Get the segment from the configured tier backend, reprocess it with the latest table
+   * config and schema and then load it. Please note that the segment is from tier backend,
+   * not deep store, for incremental processing.
+   */
+  private void downloadTierSegmentAndProcess(String segmentName, IndexLoadingConfig indexLoadingConfig,
+      @Nullable Schema schema)
       throws Exception {
-    // TODO: may support download from peer servers for RealTime table.
-    return downloadSegmentFromDeepStore(segmentName, zkMetadata);
+    SegmentDirectoryLoaderContext loaderContext =
+        new SegmentDirectoryLoaderContext(indexLoadingConfig.getTableConfig(), indexLoadingConfig.getInstanceId(),
+            segmentName, indexLoadingConfig.getSegmentDirectoryConfigs());
+    SegmentDirectoryLoader segmentDirectoryLoader =
+        SegmentDirectoryLoaderRegistry.getSegmentDirectoryLoader(indexLoadingConfig.getSegmentDirectoryLoader());
+    File indexDir = getSegmentDataDir(segmentName);
+    segmentDirectoryLoader.download(indexDir, loaderContext);

Review comment:
       `the segment directory loader should take care of downloading from tier backend or deepstore, depending on some setting, perhaps in the loadercontext.` -  @klsince wdyt of this suggestion? Could we put download logic entirely behind the impl? Or is that hard to do because of the dependencies of downloadFromDeepstore?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] snleee commented on pull request #7969: refactor segment loading logic in BaseTableDataManager to decouple it with local segment directory

Posted by GitBox <gi...@apache.org>.
snleee commented on pull request #7969:
URL: https://github.com/apache/pinot/pull/7969#issuecomment-1010323545


   @klsince I see. I re-triggered the tests https://github.com/apache/pinot/runs/4771611226?check_suite_focus=true
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] npawar merged pull request #7969: refactor segment loading logic in BaseTableDataManager to decouple it with local segment directory

Posted by GitBox <gi...@apache.org>.
npawar merged pull request #7969:
URL: https://github.com/apache/pinot/pull/7969


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] klsince commented on a change in pull request #7969: refactor segment loading logic in BaseTableDataManager to decouple it with local segment directory

Posted by GitBox <gi...@apache.org>.
klsince commented on a change in pull request #7969:
URL: https://github.com/apache/pinot/pull/7969#discussion_r784359166



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/data/manager/BaseTableDataManager.java
##########
@@ -371,9 +360,9 @@ public void addOrReplaceSegment(String segmentName, IndexLoadingConfig indexLoad
     // Download segment and replace the local one, either due to failure to recover local segment,
     // or the segment data is updated and has new CRC now.
     if (localMetadata == null) {
-      LOGGER.info("Download segment: {} of table: {} as no good one exists locally", segmentName, _tableNameWithType);
+      LOGGER.info("Download segment: {} of table: {} as it doesn't exist", segmentName, _tableNameWithType);
     } else {
-      LOGGER.info("Download segment: {} of table: {} as local crc: {} mismatches remote crc: {}.", segmentName,
+      LOGGER.info("Download segment: {} of table: {} as crc changes from: {} to: {}", segmentName,

Review comment:
       This is a good question. Either of them gets logged in one method call, but not both. Because the two CRC checks have used CRC values from different places when comparing with the CRC in SegmentZKMetadata.
   
   Here the `localMetadata` object is not null, and the CRC from it has been used to compare with the CRC in SegmentZkMetadata (L342). The tryLoadExistingSegment() method is called when `localMetadata` is null. And the CRC is retrieved from segment potentially exists locally (but not loaded yet) to compare with the CRC in SegmentZkMetadata. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] codecov-commenter edited a comment on pull request #7969: refactor segment loading logic in BaseTableDataManager to decouple it with local segment directory

Posted by GitBox <gi...@apache.org>.
codecov-commenter edited a comment on pull request #7969:
URL: https://github.com/apache/pinot/pull/7969#issuecomment-1002909897


   # [Codecov](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#7969](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (46717cc) into [master](https://codecov.io/gh/apache/pinot/commit/823aa07d7fc61543ed3d91992749d28aecd192ee?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (823aa07) will **decrease** coverage by `40.74%`.
   > The diff coverage is `27.90%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/pinot/pull/7969/graphs/tree.svg?width=650&height=150&src=pr&token=4ibza2ugkz&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   ```diff
   @@              Coverage Diff              @@
   ##             master    #7969       +/-   ##
   =============================================
   - Coverage     71.35%   30.61%   -40.75%     
   =============================================
     Files          1595     1587        -8     
     Lines         82748    82526      -222     
     Branches      12348    12327       -21     
   =============================================
   - Hits          59046    25265    -33781     
   - Misses        19715    55042    +35327     
   + Partials       3987     2219     -1768     
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | integration1 | `28.90% <27.90%> (-0.17%)` | :arrow_down: |
   | integration2 | `27.64% <27.32%> (+0.08%)` | :arrow_up: |
   | unittests1 | `?` | |
   | unittests2 | `?` | |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [...indexsegment/immutable/ImmutableSegmentLoader.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9pbmRleHNlZ21lbnQvaW1tdXRhYmxlL0ltbXV0YWJsZVNlZ21lbnRMb2FkZXIuamF2YQ==) | `0.00% <0.00%> (-91.67%)` | :arrow_down: |
   | [...nt/local/loader/DefaultSegmentDirectoryLoader.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9sb2FkZXIvRGVmYXVsdFNlZ21lbnREaXJlY3RvcnlMb2FkZXIuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [...ent/index/column/PhysicalColumnIndexContainer.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2luZGV4L2NvbHVtbi9QaHlzaWNhbENvbHVtbkluZGV4Q29udGFpbmVyLmphdmE=) | `0.00% <0.00%> (-91.67%)` | :arrow_down: |
   | [...ocal/segment/index/loader/SegmentPreProcessor.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2luZGV4L2xvYWRlci9TZWdtZW50UHJlUHJvY2Vzc29yLmphdmE=) | `0.00% <0.00%> (-79.79%)` | :arrow_down: |
   | [.../columnminmaxvalue/ColumnMinMaxValueGenerator.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2luZGV4L2xvYWRlci9jb2x1bW5taW5tYXh2YWx1ZS9Db2x1bW5NaW5NYXhWYWx1ZUdlbmVyYXRvci5qYXZh) | `0.00% <0.00%> (-73.69%)` | :arrow_down: |
   | [...loader/defaultcolumn/BaseDefaultColumnHandler.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2luZGV4L2xvYWRlci9kZWZhdWx0Y29sdW1uL0Jhc2VEZWZhdWx0Q29sdW1uSGFuZGxlci5qYXZh) | `0.00% <0.00%> (-63.27%)` | :arrow_down: |
   | [...t/local/segment/store/SegmentLocalFSDirectory.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L3N0b3JlL1NlZ21lbnRMb2NhbEZTRGlyZWN0b3J5LmphdmE=) | `0.00% <0.00%> (-69.29%)` | :arrow_down: |
   | [...egment/spi/index/metadata/SegmentMetadataImpl.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1zcGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL3Bpbm90L3NlZ21lbnQvc3BpL2luZGV4L21ldGFkYXRhL1NlZ21lbnRNZXRhZGF0YUltcGwuamF2YQ==) | `0.00% <0.00%> (-74.57%)` | :arrow_down: |
   | [...ache/pinot/segment/spi/store/SegmentDirectory.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1zcGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL3Bpbm90L3NlZ21lbnQvc3BpL3N0b3JlL1NlZ21lbnREaXJlY3RvcnkuamF2YQ==) | `0.00% <0.00%> (-62.50%)` | :arrow_down: |
   | [...server/starter/helix/HelixInstanceDataManager.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VydmVyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9zZXJ2ZXIvc3RhcnRlci9oZWxpeC9IZWxpeEluc3RhbmNlRGF0YU1hbmFnZXIuamF2YQ==) | `79.44% <47.05%> (-2.64%)` | :arrow_down: |
   | ... and [1099 more](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [823aa07...46717cc](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] klsince commented on a change in pull request #7969: refactor segment loading logic in BaseTableDataManager to decouple it with local segment directory

Posted by GitBox <gi...@apache.org>.
klsince commented on a change in pull request #7969:
URL: https://github.com/apache/pinot/pull/7969#discussion_r780565301



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/data/manager/BaseTableDataManager.java
##########
@@ -480,12 +426,138 @@ File getSegmentDataDir(String segmentName) {
     return new File(_indexDir, segmentName);
   }
 
-  @VisibleForTesting
-  static boolean isNewSegment(SegmentZKMetadata zkMetadata, @Nullable SegmentMetadata localMetadata) {
-    return localMetadata == null || !hasSameCRC(zkMetadata, localMetadata);
+  /**
+   * Create a backup directory to handle failure of segment reloading.
+   * First rename index directory to segment backup directory so that original segment have all file
+   * descriptors point to the segment backup directory to ensure original segment serves queries properly.
+   * The original index directory is restored lazily, as depending on the conditions,
+   * it may be restored from the backup directory or segment downloaded from deep store.
+   */
+  private void createBackup(File indexDir) {
+    if (!indexDir.exists()) {
+      return;
+    }
+    File parentDir = indexDir.getParentFile();
+    File segmentBackupDir = new File(parentDir, indexDir.getName() + CommonConstants.Segment.SEGMENT_BACKUP_DIR_SUFFIX);
+    // Rename index directory to segment backup directory (atomic).
+    Preconditions.checkState(indexDir.renameTo(segmentBackupDir),
+        "Failed to rename index directory: %s to segment backup directory: %s", indexDir, segmentBackupDir);
+  }
+
+  /**
+   * Remove the backup directory to mark the completion of segment reloading.
+   * First rename then delete is as renaming is an atomic operation, but deleting is not.
+   * When we rename the segment backup directory to segment temporary directory, we know the reload
+   * already succeeded, so that we can safely delete the segment temporary directory.
+   */
+  private void removeBackup(File indexDir)
+      throws IOException {
+    File parentDir = indexDir.getParentFile();
+    File segmentBackupDir = new File(parentDir, indexDir.getName() + CommonConstants.Segment.SEGMENT_BACKUP_DIR_SUFFIX);
+    if (!segmentBackupDir.exists()) {
+      return;
+    }
+    File segmentTempDir = new File(parentDir, indexDir.getName() + CommonConstants.Segment.SEGMENT_TEMP_DIR_SUFFIX);
+    // Rename segment backup directory to segment temporary directory (atomic).
+    Preconditions.checkState(segmentBackupDir.renameTo(segmentTempDir),
+        "Failed to rename segment backup directory: %s to segment temporary directory: %s", segmentBackupDir,
+        segmentTempDir);
+    FileUtils.deleteDirectory(segmentTempDir);
+  }
+
+  private boolean tryLoadExistingSegment(String segmentName, IndexLoadingConfig indexLoadingConfig,
+      SegmentZKMetadata zkMetadata) {
+    // Try to recover the segment from potential segment reloading failure.
+    File indexDir = getSegmentDataDir(segmentName);
+    recoverReloadFailureQuietly(_tableNameWithType, segmentName, indexDir);
+
+    // Creates the SegmentDirectory object to access the segment metadata.
+    // The metadata is null if the segment doesn't exist yet.
+    SegmentDirectory segmentDirectory = tryGetSegmentDirectory(segmentName, indexLoadingConfig);
+    SegmentMetadataImpl segmentMetadata = (segmentDirectory == null) ? null : segmentDirectory.getSegmentMetadata();
+
+    // If the segment doesn't exist on server or its CRC has changed, then we
+    // need to fall back to download the segment from deep store to load it.
+    if (segmentMetadata == null) {
+      LOGGER.info("Segment: {} of table: {} does not exist", segmentName, _tableNameWithType);
+      return false;
+    }
+    if (!hasSameCRC(zkMetadata, segmentMetadata)) {
+      LOGGER.info("Segment: {} of table: {} has crc change from: {} to: {}", segmentName, _tableNameWithType,
+          segmentMetadata.getCrc(), zkMetadata.getCrc());
+      return false;
+    }
+
+    try {
+      // If the segment is still kept by the server, then we can
+      // either load it directly if it's still consistent with latest table config and schema;
+      // or reprocess it to reflect latest table config and schema before loading.
+      Schema schema = ZKMetadataProvider.getTableSchema(_propertyStore, _tableNameWithType);
+      if (!ImmutableSegmentLoader.needPreprocess(segmentDirectory, indexLoadingConfig, schema)) {
+        LOGGER.info("Segment: {} of table: {} is consistent with latest table config and schema", segmentName,
+            _tableNameWithType);
+      } else {
+        LOGGER.info("Segment: {} of table: {} needs reprocess to reflect latest table config and schema", segmentName,
+            _tableNameWithType);
+        segmentDirectory.copyTo(indexDir);
+        // Close the stale SegmentDirectory object and recreate it with reprocessed segment.
+        closeSegmentDirectoryQuietly(segmentDirectory);
+        ImmutableSegmentLoader.preprocess(indexDir, indexLoadingConfig, schema);
+        segmentDirectory = getSegmentDirectory(segmentName, indexLoadingConfig);
+      }
+      ImmutableSegment segment = ImmutableSegmentLoader.load(segmentDirectory, indexLoadingConfig, schema);
+      addSegment(segment);
+      LOGGER.info("Loaded existing segment: {} of table: {} with crc: {}", segmentName, _tableNameWithType,
+          zkMetadata.getCrc());
+      return true;
+    } catch (Exception e) {
+      LOGGER.error("Failed to load existing segment: {} of table: {} with crc: {}", segmentName, _tableNameWithType, e);
+      closeSegmentDirectoryQuietly(segmentDirectory);
+      return false;
+    }
+  }
+
+  private SegmentDirectory tryGetSegmentDirectory(String segmentName, IndexLoadingConfig indexLoadingConfig) {
+    try {
+      return getSegmentDirectory(segmentName, indexLoadingConfig);

Review comment:
       renamed




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] klsince commented on a change in pull request #7969: refactor segment loading logic in BaseTableDataManager to decouple it with local segment directory

Posted by GitBox <gi...@apache.org>.
klsince commented on a change in pull request #7969:
URL: https://github.com/apache/pinot/pull/7969#discussion_r780609866



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/data/manager/BaseTableDataManager.java
##########
@@ -277,49 +282,36 @@ public void addSegmentError(String segmentName, SegmentErrorInfo segmentErrorInf
   public void reloadSegment(String segmentName, IndexLoadingConfig indexLoadingConfig, SegmentZKMetadata zkMetadata,
       SegmentMetadata localMetadata, @Nullable Schema schema, boolean forceDownload)
       throws Exception {
-    File indexDir = localMetadata.getIndexDir();
-    Preconditions.checkState(indexDir.isDirectory(), "Index directory: %s is not a directory", indexDir);
-
-    File parentFile = indexDir.getParentFile();
-    File segmentBackupDir =
-        new File(parentFile, indexDir.getName() + CommonConstants.Segment.SEGMENT_BACKUP_DIR_SUFFIX);
-
+    File indexDir = getSegmentDataDir(segmentName);
     try {
-      // First rename index directory to segment backup directory so that original segment have all file descriptors
-      // point to the segment backup directory to ensure original segment serves queries properly
+      // Create backup directory to handle failure of segment reloading.
+      createBackup(indexDir);
 
-      // Rename index directory to segment backup directory (atomic)
-      Preconditions.checkState(indexDir.renameTo(segmentBackupDir),
-          "Failed to rename index directory: %s to segment backup directory: %s", indexDir, segmentBackupDir);
-
-      // Download from remote or copy from local backup directory into index directory,
-      // and then continue to load the segment from index directory.
+      // Download segment from deep store if CRC changes or forced to download;
+      // otherwise, copy backup directory back to the original index directory.
+      // And then continue to load the segment from the index directory.
       boolean shouldDownload = forceDownload || !hasSameCRC(zkMetadata, localMetadata);
       if (shouldDownload && allowDownload(segmentName, zkMetadata)) {
         if (forceDownload) {
           LOGGER.info("Segment: {} of table: {} is forced to download", segmentName, _tableNameWithType);
         } else {
-          LOGGER.info("Download segment:{} of table: {} as local crc: {} mismatches remote crc: {}", segmentName,
+          LOGGER.info("Download segment:{} of table: {} as crc changes from: {} to: {}", segmentName,
               _tableNameWithType, localMetadata.getCrc(), zkMetadata.getCrc());
         }
         indexDir = downloadSegment(segmentName, zkMetadata);
       } else {
-        LOGGER.info("Reload the local copy of segment: {} of table: {}", segmentName, _tableNameWithType);
-        FileUtils.copyDirectory(segmentBackupDir, indexDir);
+        LOGGER.info("Reload existing segment: {} of table: {}", segmentName, _tableNameWithType);
+        try (SegmentDirectory segmentDirectory = initSegmentDirectory(segmentName, indexLoadingConfig)) {

Review comment:
       right, we need to copy from backup dir, and that's implemented in SegmentLocalFSDirectory.copyTo() in this PR. That method is noop if indexDir exists already, or copies from backup dir to indexDir, keeping the original logic flow intact.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] klsince commented on a change in pull request #7969: refactor segment loading logic in BaseTableDataManager to decouple it with local segment directory

Posted by GitBox <gi...@apache.org>.
klsince commented on a change in pull request #7969:
URL: https://github.com/apache/pinot/pull/7969#discussion_r783327022



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/data/manager/BaseTableDataManager.java
##########
@@ -493,12 +445,139 @@ protected File getTmpSegmentDataDir(String segmentName) {
     return new File(_resourceTmpDir, segmentName);
   }
 
-  @VisibleForTesting
-  static boolean isNewSegment(SegmentZKMetadata zkMetadata, @Nullable SegmentMetadata localMetadata) {
-    return localMetadata == null || !hasSameCRC(zkMetadata, localMetadata);
+  /**
+   * Create a backup directory to handle failure of segment reloading.
+   * First rename index directory to segment backup directory so that original segment have all file
+   * descriptors point to the segment backup directory to ensure original segment serves queries properly.
+   * The original index directory is restored lazily, as depending on the conditions,
+   * it may be restored from the backup directory or segment downloaded from deep store.
+   */
+  private void createBackup(File indexDir) {
+    if (!indexDir.exists()) {
+      return;
+    }
+    File parentDir = indexDir.getParentFile();
+    File segmentBackupDir = new File(parentDir, indexDir.getName() + CommonConstants.Segment.SEGMENT_BACKUP_DIR_SUFFIX);
+    // Rename index directory to segment backup directory (atomic).
+    Preconditions.checkState(indexDir.renameTo(segmentBackupDir),
+        "Failed to rename index directory: %s to segment backup directory: %s", indexDir, segmentBackupDir);
+  }
+
+  /**
+   * Remove the backup directory to mark the completion of segment reloading.
+   * First rename then delete is as renaming is an atomic operation, but deleting is not.
+   * When we rename the segment backup directory to segment temporary directory, we know the reload
+   * already succeeded, so that we can safely delete the segment temporary directory.
+   */
+  private void removeBackup(File indexDir)
+      throws IOException {
+    File parentDir = indexDir.getParentFile();
+    File segmentBackupDir = new File(parentDir, indexDir.getName() + CommonConstants.Segment.SEGMENT_BACKUP_DIR_SUFFIX);
+    if (!segmentBackupDir.exists()) {
+      return;
+    }
+    // Rename segment backup directory to segment temporary directory (atomic).
+    File segmentTempDir = new File(parentDir, indexDir.getName() + CommonConstants.Segment.SEGMENT_TEMP_DIR_SUFFIX);
+    Preconditions.checkState(segmentBackupDir.renameTo(segmentTempDir),
+        "Failed to rename segment backup directory: %s to segment temporary directory: %s", segmentBackupDir,
+        segmentTempDir);
+    FileUtils.deleteDirectory(segmentTempDir);
+  }
+
+  private boolean tryLoadExistingSegment(String segmentName, IndexLoadingConfig indexLoadingConfig,
+      SegmentZKMetadata zkMetadata) {
+    // Try to recover the segment from potential segment reloading failure.
+    File indexDir = getSegmentDataDir(segmentName);
+    recoverReloadFailureQuietly(_tableNameWithType, segmentName, indexDir);
+
+    // Creates the SegmentDirectory object to access the segment metadata.
+    // The metadata is null if the segment doesn't exist yet.
+    SegmentDirectory segmentDirectory = tryInitSegmentDirectory(segmentName, indexLoadingConfig);

Review comment:
       You're right about the call stack ^ when loading a branch new segment.
   
   When server loads a brand new segment, there is no indexDir on disk yet, so SegmentLocalFSDirectory is initialized with an empty dir (no warning emit). But with an empty SegmentLocalFSDirectory object, we get to L502/L508 to return false from tryLoadExistingSegment() method. Then, the addOrReplaceSegment() method continues to download the new segment from deep store to finish loading. 
   
   This keeps the original logic flow intact. If you check the original code, it does recoverSegmentQuietly and loadSegmentQuietly, and if they fails, then go to download segment from deep store to finish loading.  




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] codecov-commenter commented on pull request #7969: extend SegmentDirectoryLoader interface and refactor BaseTableDataManager

Posted by GitBox <gi...@apache.org>.
codecov-commenter commented on pull request #7969:
URL: https://github.com/apache/pinot/pull/7969#issuecomment-1002909897


   # [Codecov](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#7969](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (5ae0dff) into [master](https://codecov.io/gh/apache/pinot/commit/d25a48879ea8b7f7b7595249d412705b670204e5?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (d25a488) will **decrease** coverage by `57.08%`.
   > The diff coverage is `0.00%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/pinot/pull/7969/graphs/tree.svg?width=650&height=150&src=pr&token=4ibza2ugkz&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   ```diff
   @@              Coverage Diff              @@
   ##             master    #7969       +/-   ##
   =============================================
   - Coverage     71.37%   14.28%   -57.09%     
   + Complexity     4194       80     -4114     
   =============================================
     Files          1594     1550       -44     
     Lines         82562    80781     -1781     
     Branches      12321    12123      -198     
   =============================================
   - Hits          58927    11542    -47385     
   - Misses        19652    68381    +48729     
   + Partials       3983      858     -3125     
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | integration1 | `?` | |
   | integration2 | `?` | |
   | unittests1 | `?` | |
   | unittests2 | `14.28% <0.00%> (+<0.01%)` | :arrow_up: |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [.../pinot/core/data/manager/BaseTableDataManager.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9kYXRhL21hbmFnZXIvQmFzZVRhYmxlRGF0YU1hbmFnZXIuamF2YQ==) | `0.00% <0.00%> (-85.94%)` | :arrow_down: |
   | [...ata/manager/realtime/RealtimeTableDataManager.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9kYXRhL21hbmFnZXIvcmVhbHRpbWUvUmVhbHRpbWVUYWJsZURhdGFNYW5hZ2VyLmphdmE=) | `0.00% <ø> (-67.89%)` | :arrow_down: |
   | [...indexsegment/immutable/ImmutableSegmentLoader.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9pbmRleHNlZ21lbnQvaW1tdXRhYmxlL0ltbXV0YWJsZVNlZ21lbnRMb2FkZXIuamF2YQ==) | `0.00% <0.00%> (-91.67%)` | :arrow_down: |
   | [...nt/local/loader/DefaultSegmentDirectoryLoader.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9sb2FkZXIvRGVmYXVsdFNlZ21lbnREaXJlY3RvcnlMb2FkZXIuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [...ent/index/column/PhysicalColumnIndexContainer.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2luZGV4L2NvbHVtbi9QaHlzaWNhbENvbHVtbkluZGV4Q29udGFpbmVyLmphdmE=) | `0.00% <0.00%> (-91.67%)` | :arrow_down: |
   | [...egment/local/segment/index/loader/LoaderUtils.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2luZGV4L2xvYWRlci9Mb2FkZXJVdGlscy5qYXZh) | `0.00% <0.00%> (-94.12%)` | :arrow_down: |
   | [...ocal/segment/index/loader/SegmentPreProcessor.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2luZGV4L2xvYWRlci9TZWdtZW50UHJlUHJvY2Vzc29yLmphdmE=) | `0.00% <0.00%> (-79.79%)` | :arrow_down: |
   | [.../columnminmaxvalue/ColumnMinMaxValueGenerator.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2luZGV4L2xvYWRlci9jb2x1bW5taW5tYXh2YWx1ZS9Db2x1bW5NaW5NYXhWYWx1ZUdlbmVyYXRvci5qYXZh) | `0.00% <0.00%> (-73.69%)` | :arrow_down: |
   | [...loader/defaultcolumn/BaseDefaultColumnHandler.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2luZGV4L2xvYWRlci9kZWZhdWx0Y29sdW1uL0Jhc2VEZWZhdWx0Q29sdW1uSGFuZGxlci5qYXZh) | `0.00% <0.00%> (-63.27%)` | :arrow_down: |
   | [...egment/spi/index/metadata/SegmentMetadataImpl.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1zcGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL3Bpbm90L3NlZ21lbnQvc3BpL2luZGV4L21ldGFkYXRhL1NlZ21lbnRNZXRhZGF0YUltcGwuamF2YQ==) | `0.00% <0.00%> (-74.57%)` | :arrow_down: |
   | ... and [1277 more](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [d25a488...5ae0dff](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] klsince commented on a change in pull request #7969: extend SegmentDirectoryLoader interface and refactor BaseTableDataManager

Posted by GitBox <gi...@apache.org>.
klsince commented on a change in pull request #7969:
URL: https://github.com/apache/pinot/pull/7969#discussion_r779749197



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/data/manager/BaseTableDataManager.java
##########
@@ -333,94 +306,162 @@ public void reloadSegment(String segmentName, IndexLoadingConfig indexLoadingCon
 
   @Override
   public void addOrReplaceSegment(String segmentName, IndexLoadingConfig indexLoadingConfig,
-      SegmentZKMetadata zkMetadata, @Nullable SegmentMetadata localMetadata)
+      SegmentZKMetadata zkMetadata, @Nullable SegmentMetadata segmentMetadata)
       throws Exception {
-    if (!isNewSegment(zkMetadata, localMetadata)) {
-      LOGGER.info("Segment: {} of table: {} has crc: {} same as before, already loaded, do nothing", segmentName,
-          _tableNameWithType, localMetadata.getCrc());
+    // Non-null segment metadata means the segment has already been loaded.
+    if (segmentMetadata != null) {
+      if (hasSameCRC(zkMetadata, segmentMetadata)) {
+        // Simply returns if the CRC hasn't changed. The table config may have changed
+        // since segment is loaded, but that is handled by reloadSegment() method.
+        LOGGER.info("Segment: {} of table: {} has crc: {} same as before, already loaded, do nothing", segmentName,
+            _tableNameWithType, segmentMetadata.getCrc());
+      } else {
+        // Download the raw segment, reprocess and load it if the CRC has changed.
+        LOGGER.info("Segment: {} of table: {} already loaded but its crc: {} differs from new crc: {}", segmentName,
+            _tableNameWithType, segmentMetadata.getCrc(), zkMetadata.getCrc());
+        downloadRawSegmentAndProcess(segmentName, indexLoadingConfig, zkMetadata,
+            ZKMetadataProvider.getTableSchema(_propertyStore, _tableNameWithType));
+      }
       return;
     }
 
-    // Try to recover if no local metadata is provided.
-    if (localMetadata == null) {
-      LOGGER.info("Segment: {} of table: {} is not loaded, checking disk", segmentName, _tableNameWithType);
-      localMetadata = recoverSegmentQuietly(segmentName);
-      if (!isNewSegment(zkMetadata, localMetadata)) {
-        LOGGER.info("Segment: {} of table {} has crc: {} same as before, loading", segmentName, _tableNameWithType,
-            localMetadata.getCrc());
-        if (loadSegmentQuietly(segmentName, indexLoadingConfig)) {
-          return;
-        }
-        // Set local metadata to null to indicate that the local segment fails to load,
-        // although it exists and has same crc with the remote one.
-        localMetadata = null;
-      }
-    }
+    // For local tier backend, try to recover the segment from potential
+    // reload failure. Continue upon any failure.
+    File indexDir = getSegmentDataDir(segmentName);
+    recoverReloadFailureQuietly(_tableNameWithType, segmentName, indexDir);
 
-    Preconditions.checkState(allowDownload(segmentName, zkMetadata), "Segment: %s of table: %s does not allow download",
-        segmentName, _tableNameWithType);
+    // Creates the SegmentDirectory object to access the segment metadata that
+    // may be from local tier backend or remote tier backend.
+    SegmentDirectory segmentDirectory = null;
+    try {
+      SegmentDirectoryLoaderContext loaderContext =
+          new SegmentDirectoryLoaderContext(indexLoadingConfig.getTableConfig(), indexLoadingConfig.getInstanceId(),
+              segmentName, indexLoadingConfig.getSegmentDirectoryConfigs());
+      SegmentDirectoryLoader segmentDirectoryLoader =
+          SegmentDirectoryLoaderRegistry.getSegmentDirectoryLoader(indexLoadingConfig.getSegmentDirectoryLoader());
+      segmentDirectory = segmentDirectoryLoader.load(indexDir.toURI(), loaderContext);
+    } catch (Exception e) {
+      LOGGER.warn("Failed to create SegmentDirectory for segment: {} of table: {} due to error: {}", segmentName,
+          _tableNameWithType, e.getMessage());
+    }
 
-    // Download segment and replace the local one, either due to failure to recover local segment,
-    // or the segment data is updated and has new CRC now.
-    if (localMetadata == null) {
-      LOGGER.info("Download segment: {} of table: {} as no good one exists locally", segmentName, _tableNameWithType);
-    } else {
-      LOGGER.info("Download segment: {} of table: {} as local crc: {} mismatches remote crc: {}.", segmentName,
-          _tableNameWithType, localMetadata.getCrc(), zkMetadata.getCrc());
+    // Download the raw segment, reprocess and load it if it is never loaded or its CRC has changed.
+    if (segmentDirectory == null) {
+      LOGGER.info("Segment: {} of table: {} does not exist", segmentName, _tableNameWithType);
+      downloadRawSegmentAndProcess(segmentName, indexLoadingConfig, zkMetadata,
+          ZKMetadataProvider.getTableSchema(_propertyStore, _tableNameWithType));
+      return;
+    }
+    if (!hasSameCRC(zkMetadata, segmentDirectory.getSegmentMetadata())) {
+      LOGGER.info("Segment: {} of table: {} has crc: {} different from new crc: {}", segmentName,
+          _tableNameWithType, segmentDirectory.getSegmentMetadata().getCrc(), zkMetadata.getCrc());
+      closeSegmentDirectoryQuietly(segmentDirectory);
+      downloadRawSegmentAndProcess(segmentName, indexLoadingConfig, zkMetadata,
+          ZKMetadataProvider.getTableSchema(_propertyStore, _tableNameWithType));
+      return;
     }
-    File indexDir = downloadSegment(segmentName, zkMetadata);
-    addSegment(indexDir, indexLoadingConfig);
-    LOGGER.info("Downloaded and loaded segment: {} of table: {} with crc: {}", segmentName, _tableNameWithType,
-        zkMetadata.getCrc());
-  }
 
-  protected boolean allowDownload(String segmentName, SegmentZKMetadata zkMetadata) {
-    return true;
+    try {
+      Schema schema = ZKMetadataProvider.getTableSchema(_propertyStore, _tableNameWithType);
+      if (!ImmutableSegmentLoader.needPreprocess(segmentDirectory, indexLoadingConfig, schema)) {
+        // The loaded segment is still consistent with current table config or schema.
+        LOGGER.info("Segment: {} of table: {} is consistent with table config and schema", segmentName,
+            _tableNameWithType);
+        loadSegment(segmentDirectory, indexLoadingConfig, schema);
+        return;
+      }
+      // If any discrepancy is found, get the segment from tier backend, reprocess and load it.
+      LOGGER.info("Segment: {} of table: {} needs reprocess to reflect latest table config and schema", segmentName,
+          _tableNameWithType);
+      // Close the stale SegmentDirectory object before loading the newly processed segment.
+      closeSegmentDirectoryQuietly(segmentDirectory);
+      downloadTierSegmentAndProcess(segmentName, indexLoadingConfig, schema);
+    } catch (Exception e) {
+      closeSegmentDirectoryQuietly(segmentDirectory);
+      LOGGER.error("Failed to reprocess and load segment: {} of table: {}", segmentName, _tableNameWithType, e);
+      downloadRawSegmentAndProcess(segmentName, indexLoadingConfig, zkMetadata,
+          ZKMetadataProvider.getTableSchema(_propertyStore, _tableNameWithType));
+    }
   }
 
-  protected File downloadSegment(String segmentName, SegmentZKMetadata zkMetadata)
+  /**
+   * Get the segment from the configured tier backend, reprocess it with the latest table
+   * config and schema and then load it. Please note that the segment is from tier backend,
+   * not deep store, for incremental processing.
+   */
+  private void downloadTierSegmentAndProcess(String segmentName, IndexLoadingConfig indexLoadingConfig,
+      @Nullable Schema schema)
       throws Exception {
-    // TODO: may support download from peer servers for RealTime table.
-    return downloadSegmentFromDeepStore(segmentName, zkMetadata);
+    SegmentDirectoryLoaderContext loaderContext =
+        new SegmentDirectoryLoaderContext(indexLoadingConfig.getTableConfig(), indexLoadingConfig.getInstanceId(),
+            segmentName, indexLoadingConfig.getSegmentDirectoryConfigs());
+    SegmentDirectoryLoader segmentDirectoryLoader =
+        SegmentDirectoryLoaderRegistry.getSegmentDirectoryLoader(indexLoadingConfig.getSegmentDirectoryLoader());
+    File indexDir = getSegmentDataDir(segmentName);
+    segmentDirectoryLoader.download(indexDir, loaderContext);

Review comment:
       Thanks for the feedbacks, @mcvsubbu ! We proposed a new way to refactor the segment loading logic flow in this [short doc](https://docs.google.com/document/d/1oTyqqfkb4ep03alDojCokBrLdwIj7lsqF8Nwjifumsw/edit). Please review!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] klsince commented on a change in pull request #7969: refactor segment loading logic in BaseTableDataManager to decouple it with local segment directory

Posted by GitBox <gi...@apache.org>.
klsince commented on a change in pull request #7969:
URL: https://github.com/apache/pinot/pull/7969#discussion_r780532850



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/data/manager/BaseTableDataManager.java
##########
@@ -277,49 +282,36 @@ public void addSegmentError(String segmentName, SegmentErrorInfo segmentErrorInf
   public void reloadSegment(String segmentName, IndexLoadingConfig indexLoadingConfig, SegmentZKMetadata zkMetadata,
       SegmentMetadata localMetadata, @Nullable Schema schema, boolean forceDownload)
       throws Exception {
-    File indexDir = localMetadata.getIndexDir();
-    Preconditions.checkState(indexDir.isDirectory(), "Index directory: %s is not a directory", indexDir);
-
-    File parentFile = indexDir.getParentFile();
-    File segmentBackupDir =
-        new File(parentFile, indexDir.getName() + CommonConstants.Segment.SEGMENT_BACKUP_DIR_SUFFIX);
-
+    File indexDir = getSegmentDataDir(segmentName);
     try {
-      // First rename index directory to segment backup directory so that original segment have all file descriptors
-      // point to the segment backup directory to ensure original segment serves queries properly
+      // Create backup directory to handle failure of segment reloading.
+      createBackup(indexDir);
 
-      // Rename index directory to segment backup directory (atomic)
-      Preconditions.checkState(indexDir.renameTo(segmentBackupDir),
-          "Failed to rename index directory: %s to segment backup directory: %s", indexDir, segmentBackupDir);
-
-      // Download from remote or copy from local backup directory into index directory,
-      // and then continue to load the segment from index directory.
+      // Download segment from deep store if CRC changes or forced to download;
+      // otherwise, copy backup directory back to the original index directory.
+      // And then continue to load the segment from the index directory.
       boolean shouldDownload = forceDownload || !hasSameCRC(zkMetadata, localMetadata);
       if (shouldDownload && allowDownload(segmentName, zkMetadata)) {
         if (forceDownload) {
           LOGGER.info("Segment: {} of table: {} is forced to download", segmentName, _tableNameWithType);
         } else {
-          LOGGER.info("Download segment:{} of table: {} as local crc: {} mismatches remote crc: {}", segmentName,
+          LOGGER.info("Download segment:{} of table: {} as crc changes from: {} to: {}", segmentName,
               _tableNameWithType, localMetadata.getCrc(), zkMetadata.getCrc());
         }
         indexDir = downloadSegment(segmentName, zkMetadata);
       } else {
-        LOGGER.info("Reload the local copy of segment: {} of table: {}", segmentName, _tableNameWithType);
-        FileUtils.copyDirectory(segmentBackupDir, indexDir);
+        LOGGER.info("Reload existing segment: {} of table: {}", segmentName, _tableNameWithType);
+        try (SegmentDirectory segmentDirectory = getSegmentDirectory(segmentName, indexLoadingConfig)) {

Review comment:
       Replacing FileUtils.copyDirectory with SegmentDirectory.copyTo() so that the physical location of the source doesn't have to be a local seg dir. The physical location becomes transparent to the logic flow here.
   
   Think I need to rename getSegmentDirectory() to sth like loadSegmentDirectory() for clarity, as it tries to create the SegmentDirectory object. The util method getSegmentDataDir just composes the file path to seg dir.
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] codecov-commenter edited a comment on pull request #7969: refactor segment loading logic in BaseTableDataManager to decouple it with local segment directory

Posted by GitBox <gi...@apache.org>.
codecov-commenter edited a comment on pull request #7969:
URL: https://github.com/apache/pinot/pull/7969#issuecomment-1002909897


   # [Codecov](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#7969](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (b47697f) into [master](https://codecov.io/gh/apache/pinot/commit/823aa07d7fc61543ed3d91992749d28aecd192ee?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (823aa07) will **decrease** coverage by `57.09%`.
   > The diff coverage is `0.00%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/pinot/pull/7969/graphs/tree.svg?width=650&height=150&src=pr&token=4ibza2ugkz&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   ```diff
   @@              Coverage Diff              @@
   ##             master    #7969       +/-   ##
   =============================================
   - Coverage     71.35%   14.25%   -57.10%     
   + Complexity     4218       81     -4137     
   =============================================
     Files          1595     1550       -45     
     Lines         82748    80952     -1796     
     Branches      12348    12155      -193     
   =============================================
   - Hits          59046    11543    -47503     
   - Misses        19715    68550    +48835     
   + Partials       3987      859     -3128     
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | integration1 | `?` | |
   | integration2 | `?` | |
   | unittests1 | `?` | |
   | unittests2 | `14.25% <0.00%> (-0.02%)` | :arrow_down: |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [.../pinot/core/data/manager/BaseTableDataManager.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9kYXRhL21hbmFnZXIvQmFzZVRhYmxlRGF0YU1hbmFnZXIuamF2YQ==) | `0.00% <0.00%> (-84.78%)` | :arrow_down: |
   | [...indexsegment/immutable/ImmutableSegmentLoader.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9pbmRleHNlZ21lbnQvaW1tdXRhYmxlL0ltbXV0YWJsZVNlZ21lbnRMb2FkZXIuamF2YQ==) | `0.00% <0.00%> (-91.67%)` | :arrow_down: |
   | [...nt/local/loader/DefaultSegmentDirectoryLoader.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9sb2FkZXIvRGVmYXVsdFNlZ21lbnREaXJlY3RvcnlMb2FkZXIuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [...ent/index/column/PhysicalColumnIndexContainer.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2luZGV4L2NvbHVtbi9QaHlzaWNhbENvbHVtbkluZGV4Q29udGFpbmVyLmphdmE=) | `0.00% <0.00%> (-91.67%)` | :arrow_down: |
   | [...ocal/segment/index/loader/SegmentPreProcessor.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2luZGV4L2xvYWRlci9TZWdtZW50UHJlUHJvY2Vzc29yLmphdmE=) | `0.00% <0.00%> (-79.79%)` | :arrow_down: |
   | [.../columnminmaxvalue/ColumnMinMaxValueGenerator.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2luZGV4L2xvYWRlci9jb2x1bW5taW5tYXh2YWx1ZS9Db2x1bW5NaW5NYXhWYWx1ZUdlbmVyYXRvci5qYXZh) | `0.00% <0.00%> (-73.69%)` | :arrow_down: |
   | [...loader/defaultcolumn/BaseDefaultColumnHandler.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2luZGV4L2xvYWRlci9kZWZhdWx0Y29sdW1uL0Jhc2VEZWZhdWx0Q29sdW1uSGFuZGxlci5qYXZh) | `0.00% <0.00%> (-63.27%)` | :arrow_down: |
   | [...t/local/segment/store/SegmentLocalFSDirectory.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L3N0b3JlL1NlZ21lbnRMb2NhbEZTRGlyZWN0b3J5LmphdmE=) | `0.00% <0.00%> (-69.29%)` | :arrow_down: |
   | [...egment/spi/index/metadata/SegmentMetadataImpl.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1zcGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL3Bpbm90L3NlZ21lbnQvc3BpL2luZGV4L21ldGFkYXRhL1NlZ21lbnRNZXRhZGF0YUltcGwuamF2YQ==) | `0.00% <0.00%> (-74.57%)` | :arrow_down: |
   | [...ache/pinot/segment/spi/store/SegmentDirectory.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1zcGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL3Bpbm90L3NlZ21lbnQvc3BpL3N0b3JlL1NlZ21lbnREaXJlY3RvcnkuamF2YQ==) | `0.00% <0.00%> (-62.50%)` | :arrow_down: |
   | ... and [1277 more](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [823aa07...b47697f](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] codecov-commenter edited a comment on pull request #7969: refactor segment loading logic in BaseTableDataManager to decouple it with local segment directory

Posted by GitBox <gi...@apache.org>.
codecov-commenter edited a comment on pull request #7969:
URL: https://github.com/apache/pinot/pull/7969#issuecomment-1002909897


   # [Codecov](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#7969](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (a65dbaf) into [master](https://codecov.io/gh/apache/pinot/commit/a1f3c30e2ac74dd0acb67087a86d8f03b416f5b2?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (a1f3c30) will **decrease** coverage by `1.25%`.
   > The diff coverage is `75.00%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/pinot/pull/7969/graphs/tree.svg?width=650&height=150&src=pr&token=4ibza2ugkz&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   ```diff
   @@             Coverage Diff              @@
   ##             master    #7969      +/-   ##
   ============================================
   - Coverage     71.38%   70.13%   -1.26%     
   - Complexity     4200     4221      +21     
   ============================================
     Files          1595     1595              
     Lines         82595    82816     +221     
     Branches      12326    12356      +30     
   ============================================
   - Hits          58964    58086     -878     
   - Misses        19645    20740    +1095     
   - Partials       3986     3990       +4     
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | integration1 | `?` | |
   | integration2 | `27.69% <27.32%> (-0.07%)` | :arrow_down: |
   | unittests1 | `68.15% <78.06%> (-0.08%)` | :arrow_down: |
   | unittests2 | `14.26% <0.00%> (-0.05%)` | :arrow_down: |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [...ent/index/column/PhysicalColumnIndexContainer.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2luZGV4L2NvbHVtbi9QaHlzaWNhbENvbHVtbkluZGV4Q29udGFpbmVyLmphdmE=) | `90.83% <0.00%> (-0.84%)` | :arrow_down: |
   | [...ache/pinot/segment/spi/store/SegmentDirectory.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1zcGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL3Bpbm90L3NlZ21lbnQvc3BpL3N0b3JlL1NlZ21lbnREaXJlY3RvcnkuamF2YQ==) | `55.55% <0.00%> (-6.95%)` | :arrow_down: |
   | [...egment/spi/index/metadata/SegmentMetadataImpl.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1zcGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL3Bpbm90L3NlZ21lbnQvc3BpL2luZGV4L21ldGFkYXRhL1NlZ21lbnRNZXRhZGF0YUltcGwuamF2YQ==) | `74.28% <28.57%> (-0.29%)` | :arrow_down: |
   | [...server/starter/helix/HelixInstanceDataManager.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VydmVyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9zZXJ2ZXIvc3RhcnRlci9oZWxpeC9IZWxpeEluc3RhbmNlRGF0YU1hbmFnZXIuamF2YQ==) | `79.44% <47.05%> (-2.06%)` | :arrow_down: |
   | [...indexsegment/immutable/ImmutableSegmentLoader.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9pbmRleHNlZ21lbnQvaW1tdXRhYmxlL0ltbXV0YWJsZVNlZ21lbnRMb2FkZXIuamF2YQ==) | `84.15% <73.52%> (-7.51%)` | :arrow_down: |
   | [.../pinot/core/data/manager/BaseTableDataManager.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9kYXRhL21hbmFnZXIvQmFzZVRhYmxlRGF0YU1hbmFnZXIuamF2YQ==) | `86.72% <80.24%> (+1.82%)` | :arrow_up: |
   | [...ocal/segment/index/loader/SegmentPreProcessor.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2luZGV4L2xvYWRlci9TZWdtZW50UHJlUHJvY2Vzc29yLmphdmE=) | `80.00% <83.33%> (+0.21%)` | :arrow_up: |
   | [...nt/local/loader/DefaultSegmentDirectoryLoader.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9sb2FkZXIvRGVmYXVsdFNlZ21lbnREaXJlY3RvcnlMb2FkZXIuamF2YQ==) | `100.00% <100.00%> (ø)` | |
   | [.../columnminmaxvalue/ColumnMinMaxValueGenerator.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2luZGV4L2xvYWRlci9jb2x1bW5taW5tYXh2YWx1ZS9Db2x1bW5NaW5NYXhWYWx1ZUdlbmVyYXRvci5qYXZh) | `73.68% <100.00%> (ø)` | |
   | [...loader/defaultcolumn/BaseDefaultColumnHandler.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2luZGV4L2xvYWRlci9kZWZhdWx0Y29sdW1uL0Jhc2VEZWZhdWx0Q29sdW1uSGFuZGxlci5qYXZh) | `63.26% <100.00%> (ø)` | |
   | ... and [121 more](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [a1f3c30...a65dbaf](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] mcvsubbu commented on a change in pull request #7969: refactor segment loading logic in BaseTableDataManager to decouple it with local segment directory

Posted by GitBox <gi...@apache.org>.
mcvsubbu commented on a change in pull request #7969:
URL: https://github.com/apache/pinot/pull/7969#discussion_r780493752



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/data/manager/BaseTableDataManager.java
##########
@@ -335,26 +327,17 @@ public void reloadSegment(String segmentName, IndexLoadingConfig indexLoadingCon
   public void addOrReplaceSegment(String segmentName, IndexLoadingConfig indexLoadingConfig,
       SegmentZKMetadata zkMetadata, @Nullable SegmentMetadata localMetadata)
       throws Exception {
-    if (!isNewSegment(zkMetadata, localMetadata)) {
+    if (localMetadata != null && hasSameCRC(zkMetadata, localMetadata)) {
       LOGGER.info("Segment: {} of table: {} has crc: {} same as before, already loaded, do nothing", segmentName,
           _tableNameWithType, localMetadata.getCrc());
       return;
     }
 
-    // Try to recover if no local metadata is provided.
-    if (localMetadata == null) {
-      LOGGER.info("Segment: {} of table: {} is not loaded, checking disk", segmentName, _tableNameWithType);
-      localMetadata = recoverSegmentQuietly(segmentName);
-      if (!isNewSegment(zkMetadata, localMetadata)) {
-        LOGGER.info("Segment: {} of table {} has crc: {} same as before, loading", segmentName, _tableNameWithType,
-            localMetadata.getCrc());
-        if (loadSegmentQuietly(segmentName, indexLoadingConfig)) {
-          return;
-        }
-        // Set local metadata to null to indicate that the local segment fails to load,
-        // although it exists and has same crc with the remote one.
-        localMetadata = null;
-      }
+    // The segment is not loaded by the server yet, but it may still kept by the server.

Review comment:
       Can you add some comments on how  you see this happening (local segment present but not loaded).

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/data/manager/BaseTableDataManager.java
##########
@@ -480,12 +426,138 @@ File getSegmentDataDir(String segmentName) {
     return new File(_indexDir, segmentName);
   }
 
-  @VisibleForTesting
-  static boolean isNewSegment(SegmentZKMetadata zkMetadata, @Nullable SegmentMetadata localMetadata) {
-    return localMetadata == null || !hasSameCRC(zkMetadata, localMetadata);
+  /**
+   * Create a backup directory to handle failure of segment reloading.
+   * First rename index directory to segment backup directory so that original segment have all file
+   * descriptors point to the segment backup directory to ensure original segment serves queries properly.
+   * The original index directory is restored lazily, as depending on the conditions,
+   * it may be restored from the backup directory or segment downloaded from deep store.
+   */
+  private void createBackup(File indexDir) {
+    if (!indexDir.exists()) {
+      return;
+    }
+    File parentDir = indexDir.getParentFile();
+    File segmentBackupDir = new File(parentDir, indexDir.getName() + CommonConstants.Segment.SEGMENT_BACKUP_DIR_SUFFIX);
+    // Rename index directory to segment backup directory (atomic).
+    Preconditions.checkState(indexDir.renameTo(segmentBackupDir),
+        "Failed to rename index directory: %s to segment backup directory: %s", indexDir, segmentBackupDir);
+  }
+
+  /**
+   * Remove the backup directory to mark the completion of segment reloading.
+   * First rename then delete is as renaming is an atomic operation, but deleting is not.
+   * When we rename the segment backup directory to segment temporary directory, we know the reload
+   * already succeeded, so that we can safely delete the segment temporary directory.
+   */
+  private void removeBackup(File indexDir)
+      throws IOException {
+    File parentDir = indexDir.getParentFile();
+    File segmentBackupDir = new File(parentDir, indexDir.getName() + CommonConstants.Segment.SEGMENT_BACKUP_DIR_SUFFIX);
+    if (!segmentBackupDir.exists()) {
+      return;
+    }
+    File segmentTempDir = new File(parentDir, indexDir.getName() + CommonConstants.Segment.SEGMENT_TEMP_DIR_SUFFIX);
+    // Rename segment backup directory to segment temporary directory (atomic).
+    Preconditions.checkState(segmentBackupDir.renameTo(segmentTempDir),
+        "Failed to rename segment backup directory: %s to segment temporary directory: %s", segmentBackupDir,
+        segmentTempDir);
+    FileUtils.deleteDirectory(segmentTempDir);
+  }
+
+  private boolean tryLoadExistingSegment(String segmentName, IndexLoadingConfig indexLoadingConfig,
+      SegmentZKMetadata zkMetadata) {
+    // Try to recover the segment from potential segment reloading failure.
+    File indexDir = getSegmentDataDir(segmentName);
+    recoverReloadFailureQuietly(_tableNameWithType, segmentName, indexDir);
+
+    // Creates the SegmentDirectory object to access the segment metadata.
+    // The metadata is null if the segment doesn't exist yet.
+    SegmentDirectory segmentDirectory = tryGetSegmentDirectory(segmentName, indexLoadingConfig);
+    SegmentMetadataImpl segmentMetadata = (segmentDirectory == null) ? null : segmentDirectory.getSegmentMetadata();
+
+    // If the segment doesn't exist on server or its CRC has changed, then we
+    // need to fall back to download the segment from deep store to load it.
+    if (segmentMetadata == null) {
+      LOGGER.info("Segment: {} of table: {} does not exist", segmentName, _tableNameWithType);
+      return false;
+    }
+    if (!hasSameCRC(zkMetadata, segmentMetadata)) {
+      LOGGER.info("Segment: {} of table: {} has crc change from: {} to: {}", segmentName, _tableNameWithType,
+          segmentMetadata.getCrc(), zkMetadata.getCrc());
+      return false;
+    }
+
+    try {
+      // If the segment is still kept by the server, then we can
+      // either load it directly if it's still consistent with latest table config and schema;
+      // or reprocess it to reflect latest table config and schema before loading.
+      Schema schema = ZKMetadataProvider.getTableSchema(_propertyStore, _tableNameWithType);
+      if (!ImmutableSegmentLoader.needPreprocess(segmentDirectory, indexLoadingConfig, schema)) {
+        LOGGER.info("Segment: {} of table: {} is consistent with latest table config and schema", segmentName,
+            _tableNameWithType);
+      } else {
+        LOGGER.info("Segment: {} of table: {} needs reprocess to reflect latest table config and schema", segmentName,
+            _tableNameWithType);
+        segmentDirectory.copyTo(indexDir);
+        // Close the stale SegmentDirectory object and recreate it with reprocessed segment.
+        closeSegmentDirectoryQuietly(segmentDirectory);
+        ImmutableSegmentLoader.preprocess(indexDir, indexLoadingConfig, schema);
+        segmentDirectory = getSegmentDirectory(segmentName, indexLoadingConfig);
+      }
+      ImmutableSegment segment = ImmutableSegmentLoader.load(segmentDirectory, indexLoadingConfig, schema);
+      addSegment(segment);
+      LOGGER.info("Loaded existing segment: {} of table: {} with crc: {}", segmentName, _tableNameWithType,
+          zkMetadata.getCrc());
+      return true;
+    } catch (Exception e) {
+      LOGGER.error("Failed to load existing segment: {} of table: {} with crc: {}", segmentName, _tableNameWithType, e);
+      closeSegmentDirectoryQuietly(segmentDirectory);
+      return false;
+    }
+  }
+
+  private SegmentDirectory tryGetSegmentDirectory(String segmentName, IndexLoadingConfig indexLoadingConfig) {
+    try {
+      return getSegmentDirectory(segmentName, indexLoadingConfig);
+    } catch (Exception e) {
+      LOGGER.warn("Attempt to get SegmentDirectory for segment: {} of table: {} failed with error: {}", segmentName,
+          _tableNameWithType, e.getMessage());
+      return null;
+    }
+  }
+
+  private SegmentDirectory getSegmentDirectory(String segmentName, IndexLoadingConfig indexLoadingConfig)

Review comment:
       this method seems redundant. We already have a getSegmentDataDir method

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/data/manager/BaseTableDataManager.java
##########
@@ -480,12 +426,138 @@ File getSegmentDataDir(String segmentName) {
     return new File(_indexDir, segmentName);
   }
 
-  @VisibleForTesting
-  static boolean isNewSegment(SegmentZKMetadata zkMetadata, @Nullable SegmentMetadata localMetadata) {
-    return localMetadata == null || !hasSameCRC(zkMetadata, localMetadata);
+  /**
+   * Create a backup directory to handle failure of segment reloading.
+   * First rename index directory to segment backup directory so that original segment have all file
+   * descriptors point to the segment backup directory to ensure original segment serves queries properly.
+   * The original index directory is restored lazily, as depending on the conditions,
+   * it may be restored from the backup directory or segment downloaded from deep store.
+   */
+  private void createBackup(File indexDir) {
+    if (!indexDir.exists()) {
+      return;
+    }
+    File parentDir = indexDir.getParentFile();
+    File segmentBackupDir = new File(parentDir, indexDir.getName() + CommonConstants.Segment.SEGMENT_BACKUP_DIR_SUFFIX);
+    // Rename index directory to segment backup directory (atomic).
+    Preconditions.checkState(indexDir.renameTo(segmentBackupDir),
+        "Failed to rename index directory: %s to segment backup directory: %s", indexDir, segmentBackupDir);
+  }
+
+  /**
+   * Remove the backup directory to mark the completion of segment reloading.
+   * First rename then delete is as renaming is an atomic operation, but deleting is not.
+   * When we rename the segment backup directory to segment temporary directory, we know the reload
+   * already succeeded, so that we can safely delete the segment temporary directory.
+   */
+  private void removeBackup(File indexDir)
+      throws IOException {
+    File parentDir = indexDir.getParentFile();
+    File segmentBackupDir = new File(parentDir, indexDir.getName() + CommonConstants.Segment.SEGMENT_BACKUP_DIR_SUFFIX);
+    if (!segmentBackupDir.exists()) {
+      return;
+    }
+    File segmentTempDir = new File(parentDir, indexDir.getName() + CommonConstants.Segment.SEGMENT_TEMP_DIR_SUFFIX);

Review comment:
       Should we add a timestamp here to the tmp segment dir name to be safe? If you agree you can add a TODO or do it in this PR.

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/data/manager/BaseTableDataManager.java
##########
@@ -480,12 +426,138 @@ File getSegmentDataDir(String segmentName) {
     return new File(_indexDir, segmentName);
   }
 
-  @VisibleForTesting
-  static boolean isNewSegment(SegmentZKMetadata zkMetadata, @Nullable SegmentMetadata localMetadata) {
-    return localMetadata == null || !hasSameCRC(zkMetadata, localMetadata);
+  /**
+   * Create a backup directory to handle failure of segment reloading.
+   * First rename index directory to segment backup directory so that original segment have all file
+   * descriptors point to the segment backup directory to ensure original segment serves queries properly.
+   * The original index directory is restored lazily, as depending on the conditions,
+   * it may be restored from the backup directory or segment downloaded from deep store.
+   */
+  private void createBackup(File indexDir) {
+    if (!indexDir.exists()) {
+      return;
+    }
+    File parentDir = indexDir.getParentFile();
+    File segmentBackupDir = new File(parentDir, indexDir.getName() + CommonConstants.Segment.SEGMENT_BACKUP_DIR_SUFFIX);
+    // Rename index directory to segment backup directory (atomic).
+    Preconditions.checkState(indexDir.renameTo(segmentBackupDir),
+        "Failed to rename index directory: %s to segment backup directory: %s", indexDir, segmentBackupDir);
+  }
+
+  /**
+   * Remove the backup directory to mark the completion of segment reloading.
+   * First rename then delete is as renaming is an atomic operation, but deleting is not.
+   * When we rename the segment backup directory to segment temporary directory, we know the reload
+   * already succeeded, so that we can safely delete the segment temporary directory.
+   */
+  private void removeBackup(File indexDir)
+      throws IOException {
+    File parentDir = indexDir.getParentFile();
+    File segmentBackupDir = new File(parentDir, indexDir.getName() + CommonConstants.Segment.SEGMENT_BACKUP_DIR_SUFFIX);
+    if (!segmentBackupDir.exists()) {
+      return;
+    }
+    File segmentTempDir = new File(parentDir, indexDir.getName() + CommonConstants.Segment.SEGMENT_TEMP_DIR_SUFFIX);
+    // Rename segment backup directory to segment temporary directory (atomic).
+    Preconditions.checkState(segmentBackupDir.renameTo(segmentTempDir),
+        "Failed to rename segment backup directory: %s to segment temporary directory: %s", segmentBackupDir,
+        segmentTempDir);
+    FileUtils.deleteDirectory(segmentTempDir);
+  }
+
+  private boolean tryLoadExistingSegment(String segmentName, IndexLoadingConfig indexLoadingConfig,
+      SegmentZKMetadata zkMetadata) {
+    // Try to recover the segment from potential segment reloading failure.
+    File indexDir = getSegmentDataDir(segmentName);
+    recoverReloadFailureQuietly(_tableNameWithType, segmentName, indexDir);
+
+    // Creates the SegmentDirectory object to access the segment metadata.
+    // The metadata is null if the segment doesn't exist yet.
+    SegmentDirectory segmentDirectory = tryGetSegmentDirectory(segmentName, indexLoadingConfig);
+    SegmentMetadataImpl segmentMetadata = (segmentDirectory == null) ? null : segmentDirectory.getSegmentMetadata();
+
+    // If the segment doesn't exist on server or its CRC has changed, then we
+    // need to fall back to download the segment from deep store to load it.
+    if (segmentMetadata == null) {
+      LOGGER.info("Segment: {} of table: {} does not exist", segmentName, _tableNameWithType);
+      return false;
+    }
+    if (!hasSameCRC(zkMetadata, segmentMetadata)) {
+      LOGGER.info("Segment: {} of table: {} has crc change from: {} to: {}", segmentName, _tableNameWithType,
+          segmentMetadata.getCrc(), zkMetadata.getCrc());
+      return false;
+    }
+
+    try {
+      // If the segment is still kept by the server, then we can
+      // either load it directly if it's still consistent with latest table config and schema;
+      // or reprocess it to reflect latest table config and schema before loading.
+      Schema schema = ZKMetadataProvider.getTableSchema(_propertyStore, _tableNameWithType);
+      if (!ImmutableSegmentLoader.needPreprocess(segmentDirectory, indexLoadingConfig, schema)) {
+        LOGGER.info("Segment: {} of table: {} is consistent with latest table config and schema", segmentName,
+            _tableNameWithType);
+      } else {
+        LOGGER.info("Segment: {} of table: {} needs reprocess to reflect latest table config and schema", segmentName,
+            _tableNameWithType);
+        segmentDirectory.copyTo(indexDir);
+        // Close the stale SegmentDirectory object and recreate it with reprocessed segment.
+        closeSegmentDirectoryQuietly(segmentDirectory);
+        ImmutableSegmentLoader.preprocess(indexDir, indexLoadingConfig, schema);
+        segmentDirectory = getSegmentDirectory(segmentName, indexLoadingConfig);
+      }
+      ImmutableSegment segment = ImmutableSegmentLoader.load(segmentDirectory, indexLoadingConfig, schema);
+      addSegment(segment);
+      LOGGER.info("Loaded existing segment: {} of table: {} with crc: {}", segmentName, _tableNameWithType,
+          zkMetadata.getCrc());
+      return true;
+    } catch (Exception e) {
+      LOGGER.error("Failed to load existing segment: {} of table: {} with crc: {}", segmentName, _tableNameWithType, e);
+      closeSegmentDirectoryQuietly(segmentDirectory);
+      return false;
+    }
+  }
+
+  private SegmentDirectory tryGetSegmentDirectory(String segmentName, IndexLoadingConfig indexLoadingConfig) {
+    try {
+      return getSegmentDirectory(segmentName, indexLoadingConfig);

Review comment:
       can we use one call to get segment directory?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] codecov-commenter edited a comment on pull request #7969: refactor segment loading logic in BaseTableDataManager to decouple it with local segment directory

Posted by GitBox <gi...@apache.org>.
codecov-commenter edited a comment on pull request #7969:
URL: https://github.com/apache/pinot/pull/7969#issuecomment-1002909897


   # [Codecov](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#7969](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (85c07d0) into [master](https://codecov.io/gh/apache/pinot/commit/a1f3c30e2ac74dd0acb67087a86d8f03b416f5b2?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (a1f3c30) will **decrease** coverage by `7.25%`.
   > The diff coverage is `75.43%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/pinot/pull/7969/graphs/tree.svg?width=650&height=150&src=pr&token=4ibza2ugkz&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   ```diff
   @@             Coverage Diff              @@
   ##             master    #7969      +/-   ##
   ============================================
   - Coverage     71.38%   64.13%   -7.26%     
   + Complexity     4200     4141      -59     
   ============================================
     Files          1595     1586       -9     
     Lines         82595    82454     -141     
     Branches      12326    12315      -11     
   ============================================
   - Hits          58964    52878    -6086     
   - Misses        19645    25796    +6151     
   + Partials       3986     3780     -206     
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | integration1 | `29.06% <25.73%> (-0.02%)` | :arrow_down: |
   | integration2 | `27.60% <25.14%> (-0.15%)` | :arrow_down: |
   | unittests1 | `68.16% <78.57%> (-0.07%)` | :arrow_down: |
   | unittests2 | `?` | |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [...ent/index/column/PhysicalColumnIndexContainer.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2luZGV4L2NvbHVtbi9QaHlzaWNhbENvbHVtbkluZGV4Q29udGFpbmVyLmphdmE=) | `90.83% <0.00%> (-0.84%)` | :arrow_down: |
   | [...ache/pinot/segment/spi/store/SegmentDirectory.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1zcGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL3Bpbm90L3NlZ21lbnQvc3BpL3N0b3JlL1NlZ21lbnREaXJlY3RvcnkuamF2YQ==) | `55.55% <0.00%> (-6.95%)` | :arrow_down: |
   | [...egment/spi/index/metadata/SegmentMetadataImpl.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1zcGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL3Bpbm90L3NlZ21lbnQvc3BpL2luZGV4L21ldGFkYXRhL1NlZ21lbnRNZXRhZGF0YUltcGwuamF2YQ==) | `74.28% <28.57%> (-0.29%)` | :arrow_down: |
   | [...server/starter/helix/HelixInstanceDataManager.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VydmVyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9zZXJ2ZXIvc3RhcnRlci9oZWxpeC9IZWxpeEluc3RhbmNlRGF0YU1hbmFnZXIuamF2YQ==) | `79.44% <47.05%> (-2.06%)` | :arrow_down: |
   | [...indexsegment/immutable/ImmutableSegmentLoader.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9pbmRleHNlZ21lbnQvaW1tdXRhYmxlL0ltbXV0YWJsZVNlZ21lbnRMb2FkZXIuamF2YQ==) | `84.15% <73.52%> (-7.51%)` | :arrow_down: |
   | [.../pinot/core/data/manager/BaseTableDataManager.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9kYXRhL21hbmFnZXIvQmFzZVRhYmxlRGF0YU1hbmFnZXIuamF2YQ==) | `87.11% <81.25%> (+2.21%)` | :arrow_up: |
   | [...ocal/segment/index/loader/SegmentPreProcessor.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2luZGV4L2xvYWRlci9TZWdtZW50UHJlUHJvY2Vzc29yLmphdmE=) | `80.00% <83.33%> (+0.21%)` | :arrow_up: |
   | [...nt/local/loader/DefaultSegmentDirectoryLoader.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9sb2FkZXIvRGVmYXVsdFNlZ21lbnREaXJlY3RvcnlMb2FkZXIuamF2YQ==) | `100.00% <100.00%> (ø)` | |
   | [.../columnminmaxvalue/ColumnMinMaxValueGenerator.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2luZGV4L2xvYWRlci9jb2x1bW5taW5tYXh2YWx1ZS9Db2x1bW5NaW5NYXhWYWx1ZUdlbmVyYXRvci5qYXZh) | `73.68% <100.00%> (ø)` | |
   | [...loader/defaultcolumn/BaseDefaultColumnHandler.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2luZGV4L2xvYWRlci9kZWZhdWx0Y29sdW1uL0Jhc2VEZWZhdWx0Q29sdW1uSGFuZGxlci5qYXZh) | `63.26% <100.00%> (ø)` | |
   | ... and [228 more](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [a1f3c30...85c07d0](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] klsince commented on a change in pull request #7969: refactor segment loading logic in BaseTableDataManager to decouple it with local segment directory

Posted by GitBox <gi...@apache.org>.
klsince commented on a change in pull request #7969:
URL: https://github.com/apache/pinot/pull/7969#discussion_r780526138



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/data/manager/BaseTableDataManager.java
##########
@@ -335,26 +327,17 @@ public void reloadSegment(String segmentName, IndexLoadingConfig indexLoadingCon
   public void addOrReplaceSegment(String segmentName, IndexLoadingConfig indexLoadingConfig,
       SegmentZKMetadata zkMetadata, @Nullable SegmentMetadata localMetadata)
       throws Exception {
-    if (!isNewSegment(zkMetadata, localMetadata)) {
+    if (localMetadata != null && hasSameCRC(zkMetadata, localMetadata)) {
       LOGGER.info("Segment: {} of table: {} has crc: {} same as before, already loaded, do nothing", segmentName,
           _tableNameWithType, localMetadata.getCrc());
       return;
     }
 
-    // Try to recover if no local metadata is provided.
-    if (localMetadata == null) {
-      LOGGER.info("Segment: {} of table: {} is not loaded, checking disk", segmentName, _tableNameWithType);
-      localMetadata = recoverSegmentQuietly(segmentName);
-      if (!isNewSegment(zkMetadata, localMetadata)) {
-        LOGGER.info("Segment: {} of table {} has crc: {} same as before, loading", segmentName, _tableNameWithType,
-            localMetadata.getCrc());
-        if (loadSegmentQuietly(segmentName, indexLoadingConfig)) {
-          return;
-        }
-        // Set local metadata to null to indicate that the local segment fails to load,
-        // although it exists and has same crc with the remote one.
-        localMetadata = null;
-      }
+    // The segment is not loaded by the server yet, but it may still kept by the server.

Review comment:
       It's a common case when server gets restarted, the segments are still on their local disk but none of the SegmentMetadata objects are loaded into memory yet. (will update comment)
   
   

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/data/manager/BaseTableDataManager.java
##########
@@ -480,12 +426,138 @@ File getSegmentDataDir(String segmentName) {
     return new File(_indexDir, segmentName);
   }
 
-  @VisibleForTesting
-  static boolean isNewSegment(SegmentZKMetadata zkMetadata, @Nullable SegmentMetadata localMetadata) {
-    return localMetadata == null || !hasSameCRC(zkMetadata, localMetadata);
+  /**
+   * Create a backup directory to handle failure of segment reloading.
+   * First rename index directory to segment backup directory so that original segment have all file
+   * descriptors point to the segment backup directory to ensure original segment serves queries properly.
+   * The original index directory is restored lazily, as depending on the conditions,
+   * it may be restored from the backup directory or segment downloaded from deep store.
+   */
+  private void createBackup(File indexDir) {
+    if (!indexDir.exists()) {
+      return;
+    }
+    File parentDir = indexDir.getParentFile();
+    File segmentBackupDir = new File(parentDir, indexDir.getName() + CommonConstants.Segment.SEGMENT_BACKUP_DIR_SUFFIX);
+    // Rename index directory to segment backup directory (atomic).
+    Preconditions.checkState(indexDir.renameTo(segmentBackupDir),
+        "Failed to rename index directory: %s to segment backup directory: %s", indexDir, segmentBackupDir);
+  }
+
+  /**
+   * Remove the backup directory to mark the completion of segment reloading.
+   * First rename then delete is as renaming is an atomic operation, but deleting is not.
+   * When we rename the segment backup directory to segment temporary directory, we know the reload
+   * already succeeded, so that we can safely delete the segment temporary directory.
+   */
+  private void removeBackup(File indexDir)
+      throws IOException {
+    File parentDir = indexDir.getParentFile();
+    File segmentBackupDir = new File(parentDir, indexDir.getName() + CommonConstants.Segment.SEGMENT_BACKUP_DIR_SUFFIX);
+    if (!segmentBackupDir.exists()) {
+      return;
+    }
+    File segmentTempDir = new File(parentDir, indexDir.getName() + CommonConstants.Segment.SEGMENT_TEMP_DIR_SUFFIX);
+    // Rename segment backup directory to segment temporary directory (atomic).
+    Preconditions.checkState(segmentBackupDir.renameTo(segmentTempDir),
+        "Failed to rename segment backup directory: %s to segment temporary directory: %s", segmentBackupDir,
+        segmentTempDir);
+    FileUtils.deleteDirectory(segmentTempDir);
+  }
+
+  private boolean tryLoadExistingSegment(String segmentName, IndexLoadingConfig indexLoadingConfig,
+      SegmentZKMetadata zkMetadata) {
+    // Try to recover the segment from potential segment reloading failure.
+    File indexDir = getSegmentDataDir(segmentName);
+    recoverReloadFailureQuietly(_tableNameWithType, segmentName, indexDir);
+
+    // Creates the SegmentDirectory object to access the segment metadata.
+    // The metadata is null if the segment doesn't exist yet.
+    SegmentDirectory segmentDirectory = tryGetSegmentDirectory(segmentName, indexLoadingConfig);
+    SegmentMetadataImpl segmentMetadata = (segmentDirectory == null) ? null : segmentDirectory.getSegmentMetadata();
+
+    // If the segment doesn't exist on server or its CRC has changed, then we
+    // need to fall back to download the segment from deep store to load it.
+    if (segmentMetadata == null) {
+      LOGGER.info("Segment: {} of table: {} does not exist", segmentName, _tableNameWithType);
+      return false;
+    }
+    if (!hasSameCRC(zkMetadata, segmentMetadata)) {
+      LOGGER.info("Segment: {} of table: {} has crc change from: {} to: {}", segmentName, _tableNameWithType,
+          segmentMetadata.getCrc(), zkMetadata.getCrc());
+      return false;
+    }
+
+    try {
+      // If the segment is still kept by the server, then we can
+      // either load it directly if it's still consistent with latest table config and schema;
+      // or reprocess it to reflect latest table config and schema before loading.
+      Schema schema = ZKMetadataProvider.getTableSchema(_propertyStore, _tableNameWithType);
+      if (!ImmutableSegmentLoader.needPreprocess(segmentDirectory, indexLoadingConfig, schema)) {
+        LOGGER.info("Segment: {} of table: {} is consistent with latest table config and schema", segmentName,
+            _tableNameWithType);
+      } else {
+        LOGGER.info("Segment: {} of table: {} needs reprocess to reflect latest table config and schema", segmentName,
+            _tableNameWithType);
+        segmentDirectory.copyTo(indexDir);
+        // Close the stale SegmentDirectory object and recreate it with reprocessed segment.
+        closeSegmentDirectoryQuietly(segmentDirectory);
+        ImmutableSegmentLoader.preprocess(indexDir, indexLoadingConfig, schema);
+        segmentDirectory = getSegmentDirectory(segmentName, indexLoadingConfig);
+      }
+      ImmutableSegment segment = ImmutableSegmentLoader.load(segmentDirectory, indexLoadingConfig, schema);
+      addSegment(segment);
+      LOGGER.info("Loaded existing segment: {} of table: {} with crc: {}", segmentName, _tableNameWithType,
+          zkMetadata.getCrc());
+      return true;
+    } catch (Exception e) {
+      LOGGER.error("Failed to load existing segment: {} of table: {} with crc: {}", segmentName, _tableNameWithType, e);
+      closeSegmentDirectoryQuietly(segmentDirectory);
+      return false;
+    }
+  }
+
+  private SegmentDirectory tryGetSegmentDirectory(String segmentName, IndexLoadingConfig indexLoadingConfig) {
+    try {
+      return getSegmentDirectory(segmentName, indexLoadingConfig);
+    } catch (Exception e) {
+      LOGGER.warn("Attempt to get SegmentDirectory for segment: {} of table: {} failed with error: {}", segmentName,
+          _tableNameWithType, e.getMessage());
+      return null;
+    }
+  }
+
+  private SegmentDirectory getSegmentDirectory(String segmentName, IndexLoadingConfig indexLoadingConfig)

Review comment:
       I'll rethink the method names: this one gets SegmentDirectory object; the getSegmentDataDir composes seg dir path.

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/data/manager/BaseTableDataManager.java
##########
@@ -277,49 +282,36 @@ public void addSegmentError(String segmentName, SegmentErrorInfo segmentErrorInf
   public void reloadSegment(String segmentName, IndexLoadingConfig indexLoadingConfig, SegmentZKMetadata zkMetadata,
       SegmentMetadata localMetadata, @Nullable Schema schema, boolean forceDownload)
       throws Exception {
-    File indexDir = localMetadata.getIndexDir();
-    Preconditions.checkState(indexDir.isDirectory(), "Index directory: %s is not a directory", indexDir);
-
-    File parentFile = indexDir.getParentFile();
-    File segmentBackupDir =
-        new File(parentFile, indexDir.getName() + CommonConstants.Segment.SEGMENT_BACKUP_DIR_SUFFIX);
-
+    File indexDir = getSegmentDataDir(segmentName);
     try {
-      // First rename index directory to segment backup directory so that original segment have all file descriptors
-      // point to the segment backup directory to ensure original segment serves queries properly
+      // Create backup directory to handle failure of segment reloading.
+      createBackup(indexDir);
 
-      // Rename index directory to segment backup directory (atomic)
-      Preconditions.checkState(indexDir.renameTo(segmentBackupDir),
-          "Failed to rename index directory: %s to segment backup directory: %s", indexDir, segmentBackupDir);
-
-      // Download from remote or copy from local backup directory into index directory,
-      // and then continue to load the segment from index directory.
+      // Download segment from deep store if CRC changes or forced to download;
+      // otherwise, copy backup directory back to the original index directory.
+      // And then continue to load the segment from the index directory.
       boolean shouldDownload = forceDownload || !hasSameCRC(zkMetadata, localMetadata);
       if (shouldDownload && allowDownload(segmentName, zkMetadata)) {
         if (forceDownload) {
           LOGGER.info("Segment: {} of table: {} is forced to download", segmentName, _tableNameWithType);
         } else {
-          LOGGER.info("Download segment:{} of table: {} as local crc: {} mismatches remote crc: {}", segmentName,
+          LOGGER.info("Download segment:{} of table: {} as crc changes from: {} to: {}", segmentName,
               _tableNameWithType, localMetadata.getCrc(), zkMetadata.getCrc());
         }
         indexDir = downloadSegment(segmentName, zkMetadata);
       } else {
-        LOGGER.info("Reload the local copy of segment: {} of table: {}", segmentName, _tableNameWithType);
-        FileUtils.copyDirectory(segmentBackupDir, indexDir);
+        LOGGER.info("Reload existing segment: {} of table: {}", segmentName, _tableNameWithType);
+        try (SegmentDirectory segmentDirectory = getSegmentDirectory(segmentName, indexLoadingConfig)) {

Review comment:
       Replacing FileUtils.copyDirectory with SegmentDirectory.copyTo() so that the physical location of the source doesn't have to be a local seg dir. The physical location becomes transparent to the logic flow here.
   
   Think I need to rename getSegmentDirectory() to sth like loadSegmentDirectory() for clarity, as it tries to create the SegmentDirectory object. The util method getSegmentDataDir just composes the file path to seg dir.
   

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/data/manager/BaseTableDataManager.java
##########
@@ -480,12 +426,138 @@ File getSegmentDataDir(String segmentName) {
     return new File(_indexDir, segmentName);
   }
 
-  @VisibleForTesting
-  static boolean isNewSegment(SegmentZKMetadata zkMetadata, @Nullable SegmentMetadata localMetadata) {
-    return localMetadata == null || !hasSameCRC(zkMetadata, localMetadata);
+  /**
+   * Create a backup directory to handle failure of segment reloading.
+   * First rename index directory to segment backup directory so that original segment have all file
+   * descriptors point to the segment backup directory to ensure original segment serves queries properly.
+   * The original index directory is restored lazily, as depending on the conditions,
+   * it may be restored from the backup directory or segment downloaded from deep store.
+   */
+  private void createBackup(File indexDir) {
+    if (!indexDir.exists()) {
+      return;
+    }
+    File parentDir = indexDir.getParentFile();
+    File segmentBackupDir = new File(parentDir, indexDir.getName() + CommonConstants.Segment.SEGMENT_BACKUP_DIR_SUFFIX);
+    // Rename index directory to segment backup directory (atomic).
+    Preconditions.checkState(indexDir.renameTo(segmentBackupDir),
+        "Failed to rename index directory: %s to segment backup directory: %s", indexDir, segmentBackupDir);
+  }
+
+  /**
+   * Remove the backup directory to mark the completion of segment reloading.
+   * First rename then delete is as renaming is an atomic operation, but deleting is not.
+   * When we rename the segment backup directory to segment temporary directory, we know the reload
+   * already succeeded, so that we can safely delete the segment temporary directory.
+   */
+  private void removeBackup(File indexDir)
+      throws IOException {
+    File parentDir = indexDir.getParentFile();
+    File segmentBackupDir = new File(parentDir, indexDir.getName() + CommonConstants.Segment.SEGMENT_BACKUP_DIR_SUFFIX);
+    if (!segmentBackupDir.exists()) {
+      return;
+    }
+    File segmentTempDir = new File(parentDir, indexDir.getName() + CommonConstants.Segment.SEGMENT_TEMP_DIR_SUFFIX);

Review comment:
       As deletion happens immediately after renaming here, I'd assume naming collision on this temp dir would be no harm.
   
   Besides, deleting this tmp dir may fail or not get chance to happen (e.g. upon JVM hard stop), so the LoaderUtils.reloadFailureRecovery method retries to delete this tmp dir if it still exists. Adding a timestamp in the name would add more complexity for this retry logic.

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/data/manager/BaseTableDataManager.java
##########
@@ -480,12 +426,138 @@ File getSegmentDataDir(String segmentName) {
     return new File(_indexDir, segmentName);
   }
 
-  @VisibleForTesting
-  static boolean isNewSegment(SegmentZKMetadata zkMetadata, @Nullable SegmentMetadata localMetadata) {
-    return localMetadata == null || !hasSameCRC(zkMetadata, localMetadata);
+  /**
+   * Create a backup directory to handle failure of segment reloading.
+   * First rename index directory to segment backup directory so that original segment have all file
+   * descriptors point to the segment backup directory to ensure original segment serves queries properly.
+   * The original index directory is restored lazily, as depending on the conditions,
+   * it may be restored from the backup directory or segment downloaded from deep store.
+   */
+  private void createBackup(File indexDir) {
+    if (!indexDir.exists()) {
+      return;
+    }
+    File parentDir = indexDir.getParentFile();
+    File segmentBackupDir = new File(parentDir, indexDir.getName() + CommonConstants.Segment.SEGMENT_BACKUP_DIR_SUFFIX);
+    // Rename index directory to segment backup directory (atomic).
+    Preconditions.checkState(indexDir.renameTo(segmentBackupDir),
+        "Failed to rename index directory: %s to segment backup directory: %s", indexDir, segmentBackupDir);
+  }
+
+  /**
+   * Remove the backup directory to mark the completion of segment reloading.
+   * First rename then delete is as renaming is an atomic operation, but deleting is not.
+   * When we rename the segment backup directory to segment temporary directory, we know the reload
+   * already succeeded, so that we can safely delete the segment temporary directory.
+   */
+  private void removeBackup(File indexDir)
+      throws IOException {
+    File parentDir = indexDir.getParentFile();
+    File segmentBackupDir = new File(parentDir, indexDir.getName() + CommonConstants.Segment.SEGMENT_BACKUP_DIR_SUFFIX);
+    if (!segmentBackupDir.exists()) {
+      return;
+    }
+    File segmentTempDir = new File(parentDir, indexDir.getName() + CommonConstants.Segment.SEGMENT_TEMP_DIR_SUFFIX);
+    // Rename segment backup directory to segment temporary directory (atomic).
+    Preconditions.checkState(segmentBackupDir.renameTo(segmentTempDir),
+        "Failed to rename segment backup directory: %s to segment temporary directory: %s", segmentBackupDir,
+        segmentTempDir);
+    FileUtils.deleteDirectory(segmentTempDir);
+  }
+
+  private boolean tryLoadExistingSegment(String segmentName, IndexLoadingConfig indexLoadingConfig,
+      SegmentZKMetadata zkMetadata) {
+    // Try to recover the segment from potential segment reloading failure.
+    File indexDir = getSegmentDataDir(segmentName);
+    recoverReloadFailureQuietly(_tableNameWithType, segmentName, indexDir);
+
+    // Creates the SegmentDirectory object to access the segment metadata.
+    // The metadata is null if the segment doesn't exist yet.
+    SegmentDirectory segmentDirectory = tryGetSegmentDirectory(segmentName, indexLoadingConfig);
+    SegmentMetadataImpl segmentMetadata = (segmentDirectory == null) ? null : segmentDirectory.getSegmentMetadata();
+
+    // If the segment doesn't exist on server or its CRC has changed, then we
+    // need to fall back to download the segment from deep store to load it.
+    if (segmentMetadata == null) {
+      LOGGER.info("Segment: {} of table: {} does not exist", segmentName, _tableNameWithType);
+      return false;
+    }
+    if (!hasSameCRC(zkMetadata, segmentMetadata)) {
+      LOGGER.info("Segment: {} of table: {} has crc change from: {} to: {}", segmentName, _tableNameWithType,
+          segmentMetadata.getCrc(), zkMetadata.getCrc());
+      return false;
+    }
+
+    try {
+      // If the segment is still kept by the server, then we can
+      // either load it directly if it's still consistent with latest table config and schema;
+      // or reprocess it to reflect latest table config and schema before loading.
+      Schema schema = ZKMetadataProvider.getTableSchema(_propertyStore, _tableNameWithType);
+      if (!ImmutableSegmentLoader.needPreprocess(segmentDirectory, indexLoadingConfig, schema)) {
+        LOGGER.info("Segment: {} of table: {} is consistent with latest table config and schema", segmentName,
+            _tableNameWithType);
+      } else {
+        LOGGER.info("Segment: {} of table: {} needs reprocess to reflect latest table config and schema", segmentName,
+            _tableNameWithType);
+        segmentDirectory.copyTo(indexDir);
+        // Close the stale SegmentDirectory object and recreate it with reprocessed segment.
+        closeSegmentDirectoryQuietly(segmentDirectory);
+        ImmutableSegmentLoader.preprocess(indexDir, indexLoadingConfig, schema);
+        segmentDirectory = getSegmentDirectory(segmentName, indexLoadingConfig);
+      }
+      ImmutableSegment segment = ImmutableSegmentLoader.load(segmentDirectory, indexLoadingConfig, schema);
+      addSegment(segment);
+      LOGGER.info("Loaded existing segment: {} of table: {} with crc: {}", segmentName, _tableNameWithType,
+          zkMetadata.getCrc());
+      return true;
+    } catch (Exception e) {
+      LOGGER.error("Failed to load existing segment: {} of table: {} with crc: {}", segmentName, _tableNameWithType, e);
+      closeSegmentDirectoryQuietly(segmentDirectory);
+      return false;
+    }
+  }
+
+  private SegmentDirectory tryGetSegmentDirectory(String segmentName, IndexLoadingConfig indexLoadingConfig) {
+    try {
+      return getSegmentDirectory(segmentName, indexLoadingConfig);

Review comment:
       renamed

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/data/manager/BaseTableDataManager.java
##########
@@ -277,49 +282,36 @@ public void addSegmentError(String segmentName, SegmentErrorInfo segmentErrorInf
   public void reloadSegment(String segmentName, IndexLoadingConfig indexLoadingConfig, SegmentZKMetadata zkMetadata,
       SegmentMetadata localMetadata, @Nullable Schema schema, boolean forceDownload)
       throws Exception {
-    File indexDir = localMetadata.getIndexDir();
-    Preconditions.checkState(indexDir.isDirectory(), "Index directory: %s is not a directory", indexDir);
-
-    File parentFile = indexDir.getParentFile();
-    File segmentBackupDir =
-        new File(parentFile, indexDir.getName() + CommonConstants.Segment.SEGMENT_BACKUP_DIR_SUFFIX);
-
+    File indexDir = getSegmentDataDir(segmentName);
     try {
-      // First rename index directory to segment backup directory so that original segment have all file descriptors
-      // point to the segment backup directory to ensure original segment serves queries properly
+      // Create backup directory to handle failure of segment reloading.
+      createBackup(indexDir);
 
-      // Rename index directory to segment backup directory (atomic)
-      Preconditions.checkState(indexDir.renameTo(segmentBackupDir),
-          "Failed to rename index directory: %s to segment backup directory: %s", indexDir, segmentBackupDir);
-
-      // Download from remote or copy from local backup directory into index directory,
-      // and then continue to load the segment from index directory.
+      // Download segment from deep store if CRC changes or forced to download;
+      // otherwise, copy backup directory back to the original index directory.
+      // And then continue to load the segment from the index directory.
       boolean shouldDownload = forceDownload || !hasSameCRC(zkMetadata, localMetadata);
       if (shouldDownload && allowDownload(segmentName, zkMetadata)) {
         if (forceDownload) {
           LOGGER.info("Segment: {} of table: {} is forced to download", segmentName, _tableNameWithType);
         } else {
-          LOGGER.info("Download segment:{} of table: {} as local crc: {} mismatches remote crc: {}", segmentName,
+          LOGGER.info("Download segment:{} of table: {} as crc changes from: {} to: {}", segmentName,
               _tableNameWithType, localMetadata.getCrc(), zkMetadata.getCrc());
         }
         indexDir = downloadSegment(segmentName, zkMetadata);
       } else {
-        LOGGER.info("Reload the local copy of segment: {} of table: {}", segmentName, _tableNameWithType);
-        FileUtils.copyDirectory(segmentBackupDir, indexDir);
+        LOGGER.info("Reload existing segment: {} of table: {}", segmentName, _tableNameWithType);
+        try (SegmentDirectory segmentDirectory = initSegmentDirectory(segmentName, indexLoadingConfig)) {

Review comment:
       right, we need to copy from backup dir, and that's implemented in SegmentLocalFSDirectory.copyTo() in this PR. That method is noop if indexDir exists already, or copies from backup dir to indexDir, keeping the original logic flow intact.

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/data/manager/BaseTableDataManager.java
##########
@@ -335,26 +327,19 @@ public void reloadSegment(String segmentName, IndexLoadingConfig indexLoadingCon
   public void addOrReplaceSegment(String segmentName, IndexLoadingConfig indexLoadingConfig,
       SegmentZKMetadata zkMetadata, @Nullable SegmentMetadata localMetadata)
       throws Exception {
-    if (!isNewSegment(zkMetadata, localMetadata)) {
+    if (localMetadata != null && hasSameCRC(zkMetadata, localMetadata)) {
       LOGGER.info("Segment: {} of table: {} has crc: {} same as before, already loaded, do nothing", segmentName,
           _tableNameWithType, localMetadata.getCrc());
       return;
     }
 
-    // Try to recover if no local metadata is provided.
-    if (localMetadata == null) {
-      LOGGER.info("Segment: {} of table: {} is not loaded, checking disk", segmentName, _tableNameWithType);
-      localMetadata = recoverSegmentQuietly(segmentName);
-      if (!isNewSegment(zkMetadata, localMetadata)) {
-        LOGGER.info("Segment: {} of table {} has crc: {} same as before, loading", segmentName, _tableNameWithType,
-            localMetadata.getCrc());
-        if (loadSegmentQuietly(segmentName, indexLoadingConfig)) {
-          return;
-        }
-        // Set local metadata to null to indicate that the local segment fails to load,
-        // although it exists and has same crc with the remote one.
-        localMetadata = null;
-      }
+    // The segment is not loaded by the server if the metadata object is null. But the segment
+    // may still be kept on the server. For example when server gets restarted, the segment is
+    // still on the server but the metadata object has not been initialized yet. In this case,
+    // we should check if the segment exists on server and try to load it. If the segment does
+    // not exist or fails to get loaded, we download segment from deep store to load it again.
+    if (localMetadata == null && tryLoadExistingSegment(segmentName, indexLoadingConfig, zkMetadata)) {

Review comment:
       L510 `ImmutableSegmentLoader.load(segmentDirectory, indexLoadingConfig, schema);` uses the initialized SegmentDirectory object to create the ImmutableSegment object as the last step of the segment loading logic flow.
   
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] codecov-commenter edited a comment on pull request #7969: extend SegmentDirectoryLoader interface and refactor BaseTableDataManager

Posted by GitBox <gi...@apache.org>.
codecov-commenter edited a comment on pull request #7969:
URL: https://github.com/apache/pinot/pull/7969#issuecomment-1002909897


   # [Codecov](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#7969](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (5ae0dff) into [master](https://codecov.io/gh/apache/pinot/commit/d25a48879ea8b7f7b7595249d412705b670204e5?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (d25a488) will **decrease** coverage by `6.46%`.
   > The diff coverage is `83.97%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/pinot/pull/7969/graphs/tree.svg?width=650&height=150&src=pr&token=4ibza2ugkz&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   ```diff
   @@             Coverage Diff              @@
   ##             master    #7969      +/-   ##
   ============================================
   - Coverage     71.37%   64.90%   -6.47%     
   - Complexity     4194     4201       +7     
   ============================================
     Files          1594     1550      -44     
     Lines         82562    80781    -1781     
     Branches      12321    12123     -198     
   ============================================
   - Hits          58927    52433    -6494     
   - Misses        19652    24620    +4968     
   + Partials       3983     3728     -255     
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | integration1 | `?` | |
   | integration2 | `?` | |
   | unittests1 | `68.27% <83.97%> (+0.02%)` | :arrow_up: |
   | unittests2 | `14.28% <0.00%> (+<0.01%)` | :arrow_up: |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [...ata/manager/realtime/RealtimeTableDataManager.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9kYXRhL21hbmFnZXIvcmVhbHRpbWUvUmVhbHRpbWVUYWJsZURhdGFNYW5hZ2VyLmphdmE=) | `11.78% <ø> (-56.10%)` | :arrow_down: |
   | [...ent/index/column/PhysicalColumnIndexContainer.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2luZGV4L2NvbHVtbi9QaHlzaWNhbENvbHVtbkluZGV4Q29udGFpbmVyLmphdmE=) | `90.83% <0.00%> (-0.84%)` | :arrow_down: |
   | [...egment/spi/index/metadata/SegmentMetadataImpl.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1zcGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL3Bpbm90L3NlZ21lbnQvc3BpL2luZGV4L21ldGFkYXRhL1NlZ21lbnRNZXRhZGF0YUltcGwuamF2YQ==) | `74.28% <28.57%> (-0.29%)` | :arrow_down: |
   | [...indexsegment/immutable/ImmutableSegmentLoader.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9pbmRleHNlZ21lbnQvaW1tdXRhYmxlL0ltbXV0YWJsZVNlZ21lbnRMb2FkZXIuamF2YQ==) | `87.50% <82.75%> (-4.17%)` | :arrow_down: |
   | [...ocal/segment/index/loader/SegmentPreProcessor.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2luZGV4L2xvYWRlci9TZWdtZW50UHJlUHJvY2Vzc29yLmphdmE=) | `80.00% <83.33%> (+0.21%)` | :arrow_up: |
   | [.../pinot/core/data/manager/BaseTableDataManager.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9kYXRhL21hbmFnZXIvQmFzZVRhYmxlRGF0YU1hbmFnZXIuamF2YQ==) | `87.78% <86.36%> (+1.84%)` | :arrow_up: |
   | [...nt/local/loader/DefaultSegmentDirectoryLoader.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9sb2FkZXIvRGVmYXVsdFNlZ21lbnREaXJlY3RvcnlMb2FkZXIuamF2YQ==) | `100.00% <100.00%> (ø)` | |
   | [...egment/local/segment/index/loader/LoaderUtils.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2luZGV4L2xvYWRlci9Mb2FkZXJVdGlscy5qYXZh) | `95.91% <100.00%> (+1.80%)` | :arrow_up: |
   | [.../columnminmaxvalue/ColumnMinMaxValueGenerator.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2luZGV4L2xvYWRlci9jb2x1bW5taW5tYXh2YWx1ZS9Db2x1bW5NaW5NYXhWYWx1ZUdlbmVyYXRvci5qYXZh) | `73.68% <100.00%> (ø)` | |
   | [...loader/defaultcolumn/BaseDefaultColumnHandler.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2luZGV4L2xvYWRlci9kZWZhdWx0Y29sdW1uL0Jhc2VEZWZhdWx0Q29sdW1uSGFuZGxlci5qYXZh) | `63.26% <100.00%> (ø)` | |
   | ... and [374 more](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [d25a488...5ae0dff](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] klsince commented on a change in pull request #7969: extend SegmentDirectoryLoader interface and refactor BaseTableDataManager

Posted by GitBox <gi...@apache.org>.
klsince commented on a change in pull request #7969:
URL: https://github.com/apache/pinot/pull/7969#discussion_r779749197



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/data/manager/BaseTableDataManager.java
##########
@@ -333,94 +306,162 @@ public void reloadSegment(String segmentName, IndexLoadingConfig indexLoadingCon
 
   @Override
   public void addOrReplaceSegment(String segmentName, IndexLoadingConfig indexLoadingConfig,
-      SegmentZKMetadata zkMetadata, @Nullable SegmentMetadata localMetadata)
+      SegmentZKMetadata zkMetadata, @Nullable SegmentMetadata segmentMetadata)
       throws Exception {
-    if (!isNewSegment(zkMetadata, localMetadata)) {
-      LOGGER.info("Segment: {} of table: {} has crc: {} same as before, already loaded, do nothing", segmentName,
-          _tableNameWithType, localMetadata.getCrc());
+    // Non-null segment metadata means the segment has already been loaded.
+    if (segmentMetadata != null) {
+      if (hasSameCRC(zkMetadata, segmentMetadata)) {
+        // Simply returns if the CRC hasn't changed. The table config may have changed
+        // since segment is loaded, but that is handled by reloadSegment() method.
+        LOGGER.info("Segment: {} of table: {} has crc: {} same as before, already loaded, do nothing", segmentName,
+            _tableNameWithType, segmentMetadata.getCrc());
+      } else {
+        // Download the raw segment, reprocess and load it if the CRC has changed.
+        LOGGER.info("Segment: {} of table: {} already loaded but its crc: {} differs from new crc: {}", segmentName,
+            _tableNameWithType, segmentMetadata.getCrc(), zkMetadata.getCrc());
+        downloadRawSegmentAndProcess(segmentName, indexLoadingConfig, zkMetadata,
+            ZKMetadataProvider.getTableSchema(_propertyStore, _tableNameWithType));
+      }
       return;
     }
 
-    // Try to recover if no local metadata is provided.
-    if (localMetadata == null) {
-      LOGGER.info("Segment: {} of table: {} is not loaded, checking disk", segmentName, _tableNameWithType);
-      localMetadata = recoverSegmentQuietly(segmentName);
-      if (!isNewSegment(zkMetadata, localMetadata)) {
-        LOGGER.info("Segment: {} of table {} has crc: {} same as before, loading", segmentName, _tableNameWithType,
-            localMetadata.getCrc());
-        if (loadSegmentQuietly(segmentName, indexLoadingConfig)) {
-          return;
-        }
-        // Set local metadata to null to indicate that the local segment fails to load,
-        // although it exists and has same crc with the remote one.
-        localMetadata = null;
-      }
-    }
+    // For local tier backend, try to recover the segment from potential
+    // reload failure. Continue upon any failure.
+    File indexDir = getSegmentDataDir(segmentName);
+    recoverReloadFailureQuietly(_tableNameWithType, segmentName, indexDir);
 
-    Preconditions.checkState(allowDownload(segmentName, zkMetadata), "Segment: %s of table: %s does not allow download",
-        segmentName, _tableNameWithType);
+    // Creates the SegmentDirectory object to access the segment metadata that
+    // may be from local tier backend or remote tier backend.
+    SegmentDirectory segmentDirectory = null;
+    try {
+      SegmentDirectoryLoaderContext loaderContext =
+          new SegmentDirectoryLoaderContext(indexLoadingConfig.getTableConfig(), indexLoadingConfig.getInstanceId(),
+              segmentName, indexLoadingConfig.getSegmentDirectoryConfigs());
+      SegmentDirectoryLoader segmentDirectoryLoader =
+          SegmentDirectoryLoaderRegistry.getSegmentDirectoryLoader(indexLoadingConfig.getSegmentDirectoryLoader());
+      segmentDirectory = segmentDirectoryLoader.load(indexDir.toURI(), loaderContext);
+    } catch (Exception e) {
+      LOGGER.warn("Failed to create SegmentDirectory for segment: {} of table: {} due to error: {}", segmentName,
+          _tableNameWithType, e.getMessage());
+    }
 
-    // Download segment and replace the local one, either due to failure to recover local segment,
-    // or the segment data is updated and has new CRC now.
-    if (localMetadata == null) {
-      LOGGER.info("Download segment: {} of table: {} as no good one exists locally", segmentName, _tableNameWithType);
-    } else {
-      LOGGER.info("Download segment: {} of table: {} as local crc: {} mismatches remote crc: {}.", segmentName,
-          _tableNameWithType, localMetadata.getCrc(), zkMetadata.getCrc());
+    // Download the raw segment, reprocess and load it if it is never loaded or its CRC has changed.
+    if (segmentDirectory == null) {
+      LOGGER.info("Segment: {} of table: {} does not exist", segmentName, _tableNameWithType);
+      downloadRawSegmentAndProcess(segmentName, indexLoadingConfig, zkMetadata,
+          ZKMetadataProvider.getTableSchema(_propertyStore, _tableNameWithType));
+      return;
+    }
+    if (!hasSameCRC(zkMetadata, segmentDirectory.getSegmentMetadata())) {
+      LOGGER.info("Segment: {} of table: {} has crc: {} different from new crc: {}", segmentName,
+          _tableNameWithType, segmentDirectory.getSegmentMetadata().getCrc(), zkMetadata.getCrc());
+      closeSegmentDirectoryQuietly(segmentDirectory);
+      downloadRawSegmentAndProcess(segmentName, indexLoadingConfig, zkMetadata,
+          ZKMetadataProvider.getTableSchema(_propertyStore, _tableNameWithType));
+      return;
     }
-    File indexDir = downloadSegment(segmentName, zkMetadata);
-    addSegment(indexDir, indexLoadingConfig);
-    LOGGER.info("Downloaded and loaded segment: {} of table: {} with crc: {}", segmentName, _tableNameWithType,
-        zkMetadata.getCrc());
-  }
 
-  protected boolean allowDownload(String segmentName, SegmentZKMetadata zkMetadata) {
-    return true;
+    try {
+      Schema schema = ZKMetadataProvider.getTableSchema(_propertyStore, _tableNameWithType);
+      if (!ImmutableSegmentLoader.needPreprocess(segmentDirectory, indexLoadingConfig, schema)) {
+        // The loaded segment is still consistent with current table config or schema.
+        LOGGER.info("Segment: {} of table: {} is consistent with table config and schema", segmentName,
+            _tableNameWithType);
+        loadSegment(segmentDirectory, indexLoadingConfig, schema);
+        return;
+      }
+      // If any discrepancy is found, get the segment from tier backend, reprocess and load it.
+      LOGGER.info("Segment: {} of table: {} needs reprocess to reflect latest table config and schema", segmentName,
+          _tableNameWithType);
+      // Close the stale SegmentDirectory object before loading the newly processed segment.
+      closeSegmentDirectoryQuietly(segmentDirectory);
+      downloadTierSegmentAndProcess(segmentName, indexLoadingConfig, schema);
+    } catch (Exception e) {
+      closeSegmentDirectoryQuietly(segmentDirectory);
+      LOGGER.error("Failed to reprocess and load segment: {} of table: {}", segmentName, _tableNameWithType, e);
+      downloadRawSegmentAndProcess(segmentName, indexLoadingConfig, zkMetadata,
+          ZKMetadataProvider.getTableSchema(_propertyStore, _tableNameWithType));
+    }
   }
 
-  protected File downloadSegment(String segmentName, SegmentZKMetadata zkMetadata)
+  /**
+   * Get the segment from the configured tier backend, reprocess it with the latest table
+   * config and schema and then load it. Please note that the segment is from tier backend,
+   * not deep store, for incremental processing.
+   */
+  private void downloadTierSegmentAndProcess(String segmentName, IndexLoadingConfig indexLoadingConfig,
+      @Nullable Schema schema)
       throws Exception {
-    // TODO: may support download from peer servers for RealTime table.
-    return downloadSegmentFromDeepStore(segmentName, zkMetadata);
+    SegmentDirectoryLoaderContext loaderContext =
+        new SegmentDirectoryLoaderContext(indexLoadingConfig.getTableConfig(), indexLoadingConfig.getInstanceId(),
+            segmentName, indexLoadingConfig.getSegmentDirectoryConfigs());
+    SegmentDirectoryLoader segmentDirectoryLoader =
+        SegmentDirectoryLoaderRegistry.getSegmentDirectoryLoader(indexLoadingConfig.getSegmentDirectoryLoader());
+    File indexDir = getSegmentDataDir(segmentName);
+    segmentDirectoryLoader.download(indexDir, loaderContext);

Review comment:
       Thanks for the feedbacks, @mcvsubbu ! We've proposed a new way to refactor the segment loading logic flow like below. pls review.
   ```
   public void addOrReplaceSegment() {
       segdir = loader.load();
       if (segdir == null || crc is diff) {
         download raw from deep store;
         preprocess(indexDir);
       } else if (not consistent with tbl cfg) {
         segdir.copyTo(indexDir);
         preprocess(indexDir);
       }
       if (preprocessed) {
          segdir = loader.load();
       }
       return;
     }
   ```
   below is the one used in PR right now as comparison.
   ```
     public void addOrReplaceSegment() {
       segdir = loader.load();
       if (segdir == null || crc is diff) {
         download raw from deep store;
         preprocess(indexDir);
         loader.upload(index)
       } else if (not consistent with tbl cfg) {
         loader.download(indexDir);
         preprocess(indexDir);
         loader.upload(index)
       }
       if (preprocessed) {
          segdir = loader.load(); 
       }
       return;
     }
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] mcvsubbu commented on a change in pull request #7969: extend SegmentDirectoryLoader interface and refactor BaseTableDataManager

Posted by GitBox <gi...@apache.org>.
mcvsubbu commented on a change in pull request #7969:
URL: https://github.com/apache/pinot/pull/7969#discussion_r778280965



##########
File path: pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/index/loader/LoaderUtils.java
##########
@@ -161,4 +160,36 @@ public static void reloadFailureRecovery(File indexDir)
       FileUtils.forceDelete(segmentTempDir);
     }
   }
+
+  public static void createBackup(File indexDir)
+      throws IOException {
+    if (!indexDir.exists()) {

Review comment:
       Why has this check been added?

##########
File path: pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/index/loader/LoaderUtils.java
##########
@@ -161,4 +160,36 @@ public static void reloadFailureRecovery(File indexDir)
       FileUtils.forceDelete(segmentTempDir);
     }
   }
+
+  public static void createBackup(File indexDir)
+      throws IOException {
+    if (!indexDir.exists()) {
+      return;
+    }
+    File parentDir = indexDir.getParentFile();
+    File segmentBackupDir = new File(parentDir, indexDir.getName() + CommonConstants.Segment.SEGMENT_BACKUP_DIR_SUFFIX);
+    // Rename index directory to segment backup directory (atomic)
+    Preconditions.checkState(indexDir.renameTo(segmentBackupDir),
+        "Failed to rename index directory: %s to segment backup directory: %s", indexDir, segmentBackupDir);
+    // Copy the backup dir back to proceed.
+    FileUtils.copyDirectory(segmentBackupDir, indexDir);

Review comment:
       We used to rename the current directory and reload. Can you clarify why we have changed it to copy the segment instead?

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/data/manager/BaseTableDataManager.java
##########
@@ -275,53 +278,23 @@ public void addSegmentError(String segmentName, SegmentErrorInfo segmentErrorInf
 
   @Override
   public void reloadSegment(String segmentName, IndexLoadingConfig indexLoadingConfig, SegmentZKMetadata zkMetadata,
-      SegmentMetadata localMetadata, @Nullable Schema schema, boolean forceDownload)
+      SegmentMetadata segmentMetadata, @Nullable Schema schema, boolean forceDownload)
       throws Exception {
-    File indexDir = localMetadata.getIndexDir();
-    Preconditions.checkState(indexDir.isDirectory(), "Index directory: %s is not a directory", indexDir);
-
-    File parentFile = indexDir.getParentFile();
-    File segmentBackupDir =
-        new File(parentFile, indexDir.getName() + CommonConstants.Segment.SEGMENT_BACKUP_DIR_SUFFIX);
-
+    File indexDir = getSegmentDataDir(segmentName);
+    // Create backup dir to make segment reloading atomic for local tier backend.
+    LoaderUtils.createBackup(indexDir);
     try {
-      // First rename index directory to segment backup directory so that original segment have all file descriptors
-      // point to the segment backup directory to ensure original segment serves queries properly
-
-      // Rename index directory to segment backup directory (atomic)
-      Preconditions.checkState(indexDir.renameTo(segmentBackupDir),
-          "Failed to rename index directory: %s to segment backup directory: %s", indexDir, segmentBackupDir);
-
-      // Download from remote or copy from local backup directory into index directory,
-      // and then continue to load the segment from index directory.
-      boolean shouldDownload = forceDownload || !hasSameCRC(zkMetadata, localMetadata);
-      if (shouldDownload && allowDownload(segmentName, zkMetadata)) {
-        if (forceDownload) {
-          LOGGER.info("Segment: {} of table: {} is forced to download", segmentName, _tableNameWithType);
-        } else {
-          LOGGER.info("Download segment:{} of table: {} as local crc: {} mismatches remote crc: {}", segmentName,
-              _tableNameWithType, localMetadata.getCrc(), zkMetadata.getCrc());
-        }
-        indexDir = downloadSegment(segmentName, zkMetadata);
+      boolean shouldDownloadRawSegment = forceDownload || !hasSameCRC(zkMetadata, segmentMetadata);
+      if (shouldDownloadRawSegment && allowDownloadRawSegment(segmentName, zkMetadata)) {
+        downloadRawSegmentAndProcess(segmentName, indexLoadingConfig, zkMetadata, schema);

Review comment:
       Can you clarify why we download from a "tier" (I assume this to mean a peer storage server that already hosts the segment), as opposed to always downloading from the master and keep things simple?

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/data/manager/BaseTableDataManager.java
##########
@@ -275,53 +278,23 @@ public void addSegmentError(String segmentName, SegmentErrorInfo segmentErrorInf
 
   @Override
   public void reloadSegment(String segmentName, IndexLoadingConfig indexLoadingConfig, SegmentZKMetadata zkMetadata,
-      SegmentMetadata localMetadata, @Nullable Schema schema, boolean forceDownload)
+      SegmentMetadata segmentMetadata, @Nullable Schema schema, boolean forceDownload)
       throws Exception {
-    File indexDir = localMetadata.getIndexDir();
-    Preconditions.checkState(indexDir.isDirectory(), "Index directory: %s is not a directory", indexDir);
-
-    File parentFile = indexDir.getParentFile();
-    File segmentBackupDir =
-        new File(parentFile, indexDir.getName() + CommonConstants.Segment.SEGMENT_BACKUP_DIR_SUFFIX);
-
+    File indexDir = getSegmentDataDir(segmentName);
+    // Create backup dir to make segment reloading atomic for local tier backend.
+    LoaderUtils.createBackup(indexDir);
     try {
-      // First rename index directory to segment backup directory so that original segment have all file descriptors
-      // point to the segment backup directory to ensure original segment serves queries properly
-
-      // Rename index directory to segment backup directory (atomic)
-      Preconditions.checkState(indexDir.renameTo(segmentBackupDir),
-          "Failed to rename index directory: %s to segment backup directory: %s", indexDir, segmentBackupDir);
-
-      // Download from remote or copy from local backup directory into index directory,
-      // and then continue to load the segment from index directory.
-      boolean shouldDownload = forceDownload || !hasSameCRC(zkMetadata, localMetadata);
-      if (shouldDownload && allowDownload(segmentName, zkMetadata)) {
-        if (forceDownload) {
-          LOGGER.info("Segment: {} of table: {} is forced to download", segmentName, _tableNameWithType);
-        } else {
-          LOGGER.info("Download segment:{} of table: {} as local crc: {} mismatches remote crc: {}", segmentName,
-              _tableNameWithType, localMetadata.getCrc(), zkMetadata.getCrc());
-        }
-        indexDir = downloadSegment(segmentName, zkMetadata);
+      boolean shouldDownloadRawSegment = forceDownload || !hasSameCRC(zkMetadata, segmentMetadata);
+      if (shouldDownloadRawSegment && allowDownloadRawSegment(segmentName, zkMetadata)) {
+        downloadRawSegmentAndProcess(segmentName, indexLoadingConfig, zkMetadata, schema);
       } else {
-        LOGGER.info("Reload the local copy of segment: {} of table: {}", segmentName, _tableNameWithType);
-        FileUtils.copyDirectory(segmentBackupDir, indexDir);
+        downloadTierSegmentAndProcess(segmentName, indexLoadingConfig, schema);
       }
-
-      // Load from index directory and replace the old segment in memory.
-      addSegment(ImmutableSegmentLoader.load(indexDir, indexLoadingConfig, schema));
-
-      // Rename segment backup directory to segment temporary directory (atomic)
-      // The reason to first rename then delete is that, renaming is an atomic operation, but deleting is not. When we
-      // rename the segment backup directory to segment temporary directory, we know the reload already succeeded, so
-      // that we can safely delete the segment temporary directory
-      File segmentTempDir = new File(parentFile, indexDir.getName() + CommonConstants.Segment.SEGMENT_TEMP_DIR_SUFFIX);
-      Preconditions.checkState(segmentBackupDir.renameTo(segmentTempDir),
-          "Failed to rename segment backup directory: %s to segment temporary directory: %s", segmentBackupDir,
-          segmentTempDir);
-      FileUtils.deleteDirectory(segmentTempDir);
+      // Remove backup dir to mark the completion of reloading for local tier backend.
+      LoaderUtils.removeBackup(indexDir);

Review comment:
       Can we keep the comments and renaming code here? Moving this to a method called "removeBackup" in a different class may lead to future modifications that may not keep the semantics of rename and remove (yes, you have moved the comments as well, but I think it is better to keep it local).




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] klsince commented on a change in pull request #7969: extend SegmentDirectoryLoader interface and refactor BaseTableDataManager

Posted by GitBox <gi...@apache.org>.
klsince commented on a change in pull request #7969:
URL: https://github.com/apache/pinot/pull/7969#discussion_r779041798



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/data/manager/BaseTableDataManager.java
##########
@@ -333,94 +306,162 @@ public void reloadSegment(String segmentName, IndexLoadingConfig indexLoadingCon
 
   @Override
   public void addOrReplaceSegment(String segmentName, IndexLoadingConfig indexLoadingConfig,
-      SegmentZKMetadata zkMetadata, @Nullable SegmentMetadata localMetadata)
+      SegmentZKMetadata zkMetadata, @Nullable SegmentMetadata segmentMetadata)
       throws Exception {
-    if (!isNewSegment(zkMetadata, localMetadata)) {
-      LOGGER.info("Segment: {} of table: {} has crc: {} same as before, already loaded, do nothing", segmentName,
-          _tableNameWithType, localMetadata.getCrc());
+    // Non-null segment metadata means the segment has already been loaded.
+    if (segmentMetadata != null) {
+      if (hasSameCRC(zkMetadata, segmentMetadata)) {
+        // Simply returns if the CRC hasn't changed. The table config may have changed
+        // since segment is loaded, but that is handled by reloadSegment() method.
+        LOGGER.info("Segment: {} of table: {} has crc: {} same as before, already loaded, do nothing", segmentName,
+            _tableNameWithType, segmentMetadata.getCrc());
+      } else {
+        // Download the raw segment, reprocess and load it if the CRC has changed.
+        LOGGER.info("Segment: {} of table: {} already loaded but its crc: {} differs from new crc: {}", segmentName,
+            _tableNameWithType, segmentMetadata.getCrc(), zkMetadata.getCrc());
+        downloadRawSegmentAndProcess(segmentName, indexLoadingConfig, zkMetadata,
+            ZKMetadataProvider.getTableSchema(_propertyStore, _tableNameWithType));
+      }
       return;
     }
 
-    // Try to recover if no local metadata is provided.
-    if (localMetadata == null) {
-      LOGGER.info("Segment: {} of table: {} is not loaded, checking disk", segmentName, _tableNameWithType);
-      localMetadata = recoverSegmentQuietly(segmentName);
-      if (!isNewSegment(zkMetadata, localMetadata)) {
-        LOGGER.info("Segment: {} of table {} has crc: {} same as before, loading", segmentName, _tableNameWithType,
-            localMetadata.getCrc());
-        if (loadSegmentQuietly(segmentName, indexLoadingConfig)) {
-          return;
-        }
-        // Set local metadata to null to indicate that the local segment fails to load,
-        // although it exists and has same crc with the remote one.
-        localMetadata = null;
-      }
-    }
+    // For local tier backend, try to recover the segment from potential
+    // reload failure. Continue upon any failure.
+    File indexDir = getSegmentDataDir(segmentName);
+    recoverReloadFailureQuietly(_tableNameWithType, segmentName, indexDir);
 
-    Preconditions.checkState(allowDownload(segmentName, zkMetadata), "Segment: %s of table: %s does not allow download",
-        segmentName, _tableNameWithType);
+    // Creates the SegmentDirectory object to access the segment metadata that
+    // may be from local tier backend or remote tier backend.
+    SegmentDirectory segmentDirectory = null;
+    try {
+      SegmentDirectoryLoaderContext loaderContext =
+          new SegmentDirectoryLoaderContext(indexLoadingConfig.getTableConfig(), indexLoadingConfig.getInstanceId(),
+              segmentName, indexLoadingConfig.getSegmentDirectoryConfigs());
+      SegmentDirectoryLoader segmentDirectoryLoader =
+          SegmentDirectoryLoaderRegistry.getSegmentDirectoryLoader(indexLoadingConfig.getSegmentDirectoryLoader());
+      segmentDirectory = segmentDirectoryLoader.load(indexDir.toURI(), loaderContext);
+    } catch (Exception e) {
+      LOGGER.warn("Failed to create SegmentDirectory for segment: {} of table: {} due to error: {}", segmentName,
+          _tableNameWithType, e.getMessage());
+    }
 
-    // Download segment and replace the local one, either due to failure to recover local segment,
-    // or the segment data is updated and has new CRC now.
-    if (localMetadata == null) {
-      LOGGER.info("Download segment: {} of table: {} as no good one exists locally", segmentName, _tableNameWithType);
-    } else {
-      LOGGER.info("Download segment: {} of table: {} as local crc: {} mismatches remote crc: {}.", segmentName,
-          _tableNameWithType, localMetadata.getCrc(), zkMetadata.getCrc());
+    // Download the raw segment, reprocess and load it if it is never loaded or its CRC has changed.
+    if (segmentDirectory == null) {
+      LOGGER.info("Segment: {} of table: {} does not exist", segmentName, _tableNameWithType);
+      downloadRawSegmentAndProcess(segmentName, indexLoadingConfig, zkMetadata,
+          ZKMetadataProvider.getTableSchema(_propertyStore, _tableNameWithType));
+      return;
+    }
+    if (!hasSameCRC(zkMetadata, segmentDirectory.getSegmentMetadata())) {
+      LOGGER.info("Segment: {} of table: {} has crc: {} different from new crc: {}", segmentName,
+          _tableNameWithType, segmentDirectory.getSegmentMetadata().getCrc(), zkMetadata.getCrc());
+      closeSegmentDirectoryQuietly(segmentDirectory);
+      downloadRawSegmentAndProcess(segmentName, indexLoadingConfig, zkMetadata,
+          ZKMetadataProvider.getTableSchema(_propertyStore, _tableNameWithType));
+      return;
     }
-    File indexDir = downloadSegment(segmentName, zkMetadata);
-    addSegment(indexDir, indexLoadingConfig);
-    LOGGER.info("Downloaded and loaded segment: {} of table: {} with crc: {}", segmentName, _tableNameWithType,
-        zkMetadata.getCrc());
-  }
 
-  protected boolean allowDownload(String segmentName, SegmentZKMetadata zkMetadata) {
-    return true;
+    try {
+      Schema schema = ZKMetadataProvider.getTableSchema(_propertyStore, _tableNameWithType);
+      if (!ImmutableSegmentLoader.needPreprocess(segmentDirectory, indexLoadingConfig, schema)) {
+        // The loaded segment is still consistent with current table config or schema.
+        LOGGER.info("Segment: {} of table: {} is consistent with table config and schema", segmentName,
+            _tableNameWithType);
+        loadSegment(segmentDirectory, indexLoadingConfig, schema);
+        return;
+      }
+      // If any discrepancy is found, get the segment from tier backend, reprocess and load it.
+      LOGGER.info("Segment: {} of table: {} needs reprocess to reflect latest table config and schema", segmentName,
+          _tableNameWithType);
+      // Close the stale SegmentDirectory object before loading the newly processed segment.
+      closeSegmentDirectoryQuietly(segmentDirectory);
+      downloadTierSegmentAndProcess(segmentName, indexLoadingConfig, schema);
+    } catch (Exception e) {
+      closeSegmentDirectoryQuietly(segmentDirectory);
+      LOGGER.error("Failed to reprocess and load segment: {} of table: {}", segmentName, _tableNameWithType, e);
+      downloadRawSegmentAndProcess(segmentName, indexLoadingConfig, zkMetadata,
+          ZKMetadataProvider.getTableSchema(_propertyStore, _tableNameWithType));
+    }
   }
 
-  protected File downloadSegment(String segmentName, SegmentZKMetadata zkMetadata)
+  /**
+   * Get the segment from the configured tier backend, reprocess it with the latest table
+   * config and schema and then load it. Please note that the segment is from tier backend,
+   * not deep store, for incremental processing.
+   */
+  private void downloadTierSegmentAndProcess(String segmentName, IndexLoadingConfig indexLoadingConfig,
+      @Nullable Schema schema)
       throws Exception {
-    // TODO: may support download from peer servers for RealTime table.
-    return downloadSegmentFromDeepStore(segmentName, zkMetadata);
+    SegmentDirectoryLoaderContext loaderContext =
+        new SegmentDirectoryLoaderContext(indexLoadingConfig.getTableConfig(), indexLoadingConfig.getInstanceId(),
+            segmentName, indexLoadingConfig.getSegmentDirectoryConfigs());
+    SegmentDirectoryLoader segmentDirectoryLoader =
+        SegmentDirectoryLoaderRegistry.getSegmentDirectoryLoader(indexLoadingConfig.getSegmentDirectoryLoader());
+    File indexDir = getSegmentDataDir(segmentName);
+    segmentDirectoryLoader.download(indexDir, loaderContext);

Review comment:
       The download and upload methods are introduced to SegmentDirectoryLoader interface is because today the SegmentPreprocessor and pretty much all kinds of IndexCreators used by it requires the `File indexDir` as an input param. If they were able to work with SegmentDirectory interface instead of `File indexDir`, the download/upload methods wouldn't be needed in SegmentDirectoryLoader interface. (This is briefly explained in the comments of SegmentDirectoryLoader interface)
   
   Hope this adds more clarity on why downloading raw segment from deep store is not part of SegmentDirectoryLoader interface. As there can be a path forward to remove download/upload methods from SegmentDirectoryLoader, if segment preprocessing logic could work with SegmentDirectory interface instead of requiring File index explicitly. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] klsince commented on a change in pull request #7969: refactor segment loading logic in BaseTableDataManager to decouple it with local segment directory

Posted by GitBox <gi...@apache.org>.
klsince commented on a change in pull request #7969:
URL: https://github.com/apache/pinot/pull/7969#discussion_r782516626



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/data/manager/BaseTableDataManager.java
##########
@@ -285,49 +290,36 @@ public void addSegmentError(String segmentName, SegmentErrorInfo segmentErrorInf
   public void reloadSegment(String segmentName, IndexLoadingConfig indexLoadingConfig, SegmentZKMetadata zkMetadata,
       SegmentMetadata localMetadata, @Nullable Schema schema, boolean forceDownload)
       throws Exception {
-    File indexDir = localMetadata.getIndexDir();
-    Preconditions.checkState(indexDir.isDirectory(), "Index directory: %s is not a directory", indexDir);
-
-    File parentFile = indexDir.getParentFile();
-    File segmentBackupDir =
-        new File(parentFile, indexDir.getName() + CommonConstants.Segment.SEGMENT_BACKUP_DIR_SUFFIX);
-
+    File indexDir = getSegmentDataDir(segmentName);
     try {
-      // First rename index directory to segment backup directory so that original segment have all file descriptors
-      // point to the segment backup directory to ensure original segment serves queries properly
+      // Create backup directory to handle failure of segment reloading.
+      createBackup(indexDir);
 
-      // Rename index directory to segment backup directory (atomic)
-      Preconditions.checkState(indexDir.renameTo(segmentBackupDir),
-          "Failed to rename index directory: %s to segment backup directory: %s", indexDir, segmentBackupDir);
-
-      // Download from remote or copy from local backup directory into index directory,
-      // and then continue to load the segment from index directory.
+      // Download segment from deep store if CRC changes or forced to download;
+      // otherwise, copy backup directory back to the original index directory.
+      // And then continue to load the segment from the index directory.
       boolean shouldDownload = forceDownload || !hasSameCRC(zkMetadata, localMetadata);
       if (shouldDownload && allowDownload(segmentName, zkMetadata)) {
         if (forceDownload) {
           LOGGER.info("Segment: {} of table: {} is forced to download", segmentName, _tableNameWithType);
         } else {
-          LOGGER.info("Download segment:{} of table: {} as local crc: {} mismatches remote crc: {}", segmentName,
+          LOGGER.info("Download segment:{} of table: {} as crc changes from: {} to: {}", segmentName,
               _tableNameWithType, localMetadata.getCrc(), zkMetadata.getCrc());
         }
         indexDir = downloadSegment(segmentName, zkMetadata);
       } else {
-        LOGGER.info("Reload the local copy of segment: {} of table: {}", segmentName, _tableNameWithType);
-        FileUtils.copyDirectory(segmentBackupDir, indexDir);
+        LOGGER.info("Reload existing segment: {} of table: {}", segmentName, _tableNameWithType);
+        try (SegmentDirectory segmentDirectory = initSegmentDirectory(segmentName, indexLoadingConfig)) {
+          segmentDirectory.copyTo(indexDir);
+        }
       }
 
       // Load from index directory and replace the old segment in memory.
-      addSegment(ImmutableSegmentLoader.load(indexDir, indexLoadingConfig, schema));
-
-      // Rename segment backup directory to segment temporary directory (atomic)
-      // The reason to first rename then delete is that, renaming is an atomic operation, but deleting is not. When we
-      // rename the segment backup directory to segment temporary directory, we know the reload already succeeded, so
-      // that we can safely delete the segment temporary directory
-      File segmentTempDir = new File(parentFile, indexDir.getName() + CommonConstants.Segment.SEGMENT_TEMP_DIR_SUFFIX);
-      Preconditions.checkState(segmentBackupDir.renameTo(segmentTempDir),
-          "Failed to rename segment backup directory: %s to segment temporary directory: %s", segmentBackupDir,
-          segmentTempDir);
-      FileUtils.deleteDirectory(segmentTempDir);
+      ImmutableSegment segment = ImmutableSegmentLoader.load(indexDir, indexLoadingConfig, schema);

Review comment:
       hmm.. like what kind of issues? As to what happens at these two places:
   
   The 1st time to initialize SegmentDirectory, the indexDir is empty, so essentially it's an empty SegmentLocalFSDirectory object to be created. But it's able to make a copy at indexDir with knowledge about where the source is. Other SegmentDirectory impls may do copyTo with their source location, like remote fs.
   
   The 2nd time to create SegmentDirectory via ImmutableSegmentLoader.load(indexDir) here, the indexDir is non empty now. It's preprocessed, and then used to initialize the SegmentDirectory object to create the ImmutableSegment object in the end. 
   
   Would love to understand your concerns, so feel free to comment.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] klsince commented on a change in pull request #7969: refactor segment loading logic in BaseTableDataManager to decouple it with local segment directory

Posted by GitBox <gi...@apache.org>.
klsince commented on a change in pull request #7969:
URL: https://github.com/apache/pinot/pull/7969#discussion_r782644142



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/data/manager/BaseTableDataManager.java
##########
@@ -285,49 +290,36 @@ public void addSegmentError(String segmentName, SegmentErrorInfo segmentErrorInf
   public void reloadSegment(String segmentName, IndexLoadingConfig indexLoadingConfig, SegmentZKMetadata zkMetadata,
       SegmentMetadata localMetadata, @Nullable Schema schema, boolean forceDownload)
       throws Exception {
-    File indexDir = localMetadata.getIndexDir();
-    Preconditions.checkState(indexDir.isDirectory(), "Index directory: %s is not a directory", indexDir);
-
-    File parentFile = indexDir.getParentFile();
-    File segmentBackupDir =
-        new File(parentFile, indexDir.getName() + CommonConstants.Segment.SEGMENT_BACKUP_DIR_SUFFIX);
-
+    File indexDir = getSegmentDataDir(segmentName);
     try {
-      // First rename index directory to segment backup directory so that original segment have all file descriptors
-      // point to the segment backup directory to ensure original segment serves queries properly
+      // Create backup directory to handle failure of segment reloading.
+      createBackup(indexDir);
 
-      // Rename index directory to segment backup directory (atomic)
-      Preconditions.checkState(indexDir.renameTo(segmentBackupDir),
-          "Failed to rename index directory: %s to segment backup directory: %s", indexDir, segmentBackupDir);
-
-      // Download from remote or copy from local backup directory into index directory,
-      // and then continue to load the segment from index directory.
+      // Download segment from deep store if CRC changes or forced to download;
+      // otherwise, copy backup directory back to the original index directory.
+      // And then continue to load the segment from the index directory.
       boolean shouldDownload = forceDownload || !hasSameCRC(zkMetadata, localMetadata);
       if (shouldDownload && allowDownload(segmentName, zkMetadata)) {
         if (forceDownload) {
           LOGGER.info("Segment: {} of table: {} is forced to download", segmentName, _tableNameWithType);
         } else {
-          LOGGER.info("Download segment:{} of table: {} as local crc: {} mismatches remote crc: {}", segmentName,
+          LOGGER.info("Download segment:{} of table: {} as crc changes from: {} to: {}", segmentName,
               _tableNameWithType, localMetadata.getCrc(), zkMetadata.getCrc());
         }
         indexDir = downloadSegment(segmentName, zkMetadata);
       } else {
-        LOGGER.info("Reload the local copy of segment: {} of table: {}", segmentName, _tableNameWithType);
-        FileUtils.copyDirectory(segmentBackupDir, indexDir);
+        LOGGER.info("Reload existing segment: {} of table: {}", segmentName, _tableNameWithType);
+        try (SegmentDirectory segmentDirectory = initSegmentDirectory(segmentName, indexLoadingConfig)) {
+          segmentDirectory.copyTo(indexDir);
+        }
       }
 
       // Load from index directory and replace the old segment in memory.
-      addSegment(ImmutableSegmentLoader.load(indexDir, indexLoadingConfig, schema));
-
-      // Rename segment backup directory to segment temporary directory (atomic)
-      // The reason to first rename then delete is that, renaming is an atomic operation, but deleting is not. When we
-      // rename the segment backup directory to segment temporary directory, we know the reload already succeeded, so
-      // that we can safely delete the segment temporary directory
-      File segmentTempDir = new File(parentFile, indexDir.getName() + CommonConstants.Segment.SEGMENT_TEMP_DIR_SUFFIX);
-      Preconditions.checkState(segmentBackupDir.renameTo(segmentTempDir),
-          "Failed to rename segment backup directory: %s to segment temporary directory: %s", segmentBackupDir,
-          segmentTempDir);
-      FileUtils.deleteDirectory(segmentTempDir);
+      ImmutableSegment segment = ImmutableSegmentLoader.load(indexDir, indexLoadingConfig, schema);

Review comment:
       Your understanding is correct. Although I plugged SegmentDirectory into the logic flow for extensibility, I tried to keep the original logic intact for segments kept on local disk. I'll add comments for clarity.
   
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] codecov-commenter edited a comment on pull request #7969: refactor segment loading logic in BaseTableDataManager to decouple it with local segment directory

Posted by GitBox <gi...@apache.org>.
codecov-commenter edited a comment on pull request #7969:
URL: https://github.com/apache/pinot/pull/7969#issuecomment-1002909897


   # [Codecov](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#7969](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (46717cc) into [master](https://codecov.io/gh/apache/pinot/commit/823aa07d7fc61543ed3d91992749d28aecd192ee?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (823aa07) will **decrease** coverage by `42.45%`.
   > The diff coverage is `27.90%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/pinot/pull/7969/graphs/tree.svg?width=650&height=150&src=pr&token=4ibza2ugkz&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   ```diff
   @@              Coverage Diff              @@
   ##             master    #7969       +/-   ##
   =============================================
   - Coverage     71.35%   28.90%   -42.46%     
   =============================================
     Files          1595     1587        -8     
     Lines         82748    82526      -222     
     Branches      12348    12327       -21     
   =============================================
   - Hits          59046    23853    -35193     
   - Misses        19715    56506    +36791     
   + Partials       3987     2167     -1820     
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | integration1 | `28.90% <27.90%> (-0.17%)` | :arrow_down: |
   | integration2 | `?` | |
   | unittests1 | `?` | |
   | unittests2 | `?` | |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [...indexsegment/immutable/ImmutableSegmentLoader.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9pbmRleHNlZ21lbnQvaW1tdXRhYmxlL0ltbXV0YWJsZVNlZ21lbnRMb2FkZXIuamF2YQ==) | `0.00% <0.00%> (-91.67%)` | :arrow_down: |
   | [...nt/local/loader/DefaultSegmentDirectoryLoader.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9sb2FkZXIvRGVmYXVsdFNlZ21lbnREaXJlY3RvcnlMb2FkZXIuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [...ent/index/column/PhysicalColumnIndexContainer.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2luZGV4L2NvbHVtbi9QaHlzaWNhbENvbHVtbkluZGV4Q29udGFpbmVyLmphdmE=) | `0.00% <0.00%> (-91.67%)` | :arrow_down: |
   | [...ocal/segment/index/loader/SegmentPreProcessor.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2luZGV4L2xvYWRlci9TZWdtZW50UHJlUHJvY2Vzc29yLmphdmE=) | `0.00% <0.00%> (-79.79%)` | :arrow_down: |
   | [.../columnminmaxvalue/ColumnMinMaxValueGenerator.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2luZGV4L2xvYWRlci9jb2x1bW5taW5tYXh2YWx1ZS9Db2x1bW5NaW5NYXhWYWx1ZUdlbmVyYXRvci5qYXZh) | `0.00% <0.00%> (-73.69%)` | :arrow_down: |
   | [...loader/defaultcolumn/BaseDefaultColumnHandler.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2luZGV4L2xvYWRlci9kZWZhdWx0Y29sdW1uL0Jhc2VEZWZhdWx0Q29sdW1uSGFuZGxlci5qYXZh) | `0.00% <0.00%> (-63.27%)` | :arrow_down: |
   | [...t/local/segment/store/SegmentLocalFSDirectory.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L3N0b3JlL1NlZ21lbnRMb2NhbEZTRGlyZWN0b3J5LmphdmE=) | `0.00% <0.00%> (-69.29%)` | :arrow_down: |
   | [...egment/spi/index/metadata/SegmentMetadataImpl.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1zcGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL3Bpbm90L3NlZ21lbnQvc3BpL2luZGV4L21ldGFkYXRhL1NlZ21lbnRNZXRhZGF0YUltcGwuamF2YQ==) | `0.00% <0.00%> (-74.57%)` | :arrow_down: |
   | [...ache/pinot/segment/spi/store/SegmentDirectory.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1zcGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL3Bpbm90L3NlZ21lbnQvc3BpL3N0b3JlL1NlZ21lbnREaXJlY3RvcnkuamF2YQ==) | `0.00% <0.00%> (-62.50%)` | :arrow_down: |
   | [...server/starter/helix/HelixInstanceDataManager.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VydmVyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9zZXJ2ZXIvc3RhcnRlci9oZWxpeC9IZWxpeEluc3RhbmNlRGF0YU1hbmFnZXIuamF2YQ==) | `78.88% <47.05%> (-3.20%)` | :arrow_down: |
   | ... and [1156 more](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [823aa07...46717cc](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] codecov-commenter edited a comment on pull request #7969: refactor segment loading logic in BaseTableDataManager to decouple it with local segment directory

Posted by GitBox <gi...@apache.org>.
codecov-commenter edited a comment on pull request #7969:
URL: https://github.com/apache/pinot/pull/7969#issuecomment-1002909897


   # [Codecov](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#7969](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (cf77dab) into [master](https://codecov.io/gh/apache/pinot/commit/a1f3c30e2ac74dd0acb67087a86d8f03b416f5b2?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (a1f3c30) will **decrease** coverage by `6.55%`.
   > The diff coverage is `80.82%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/pinot/pull/7969/graphs/tree.svg?width=650&height=150&src=pr&token=4ibza2ugkz&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   ```diff
   @@             Coverage Diff              @@
   ##             master    #7969      +/-   ##
   ============================================
   - Coverage     71.38%   64.83%   -6.56%     
   - Complexity     4200     4223      +23     
   ============================================
     Files          1595     1550      -45     
     Lines         82595    80926    -1669     
     Branches      12326    12150     -176     
   ============================================
   - Hits          58964    52470    -6494     
   - Misses        19645    24720    +5075     
   + Partials       3986     3736     -250     
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | integration1 | `?` | |
   | integration2 | `?` | |
   | unittests1 | `68.17% <80.82%> (-0.06%)` | :arrow_down: |
   | unittests2 | `14.26% <0.00%> (-0.04%)` | :arrow_down: |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [...ent/index/column/PhysicalColumnIndexContainer.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2luZGV4L2NvbHVtbi9QaHlzaWNhbENvbHVtbkluZGV4Q29udGFpbmVyLmphdmE=) | `90.83% <0.00%> (-0.84%)` | :arrow_down: |
   | [...ache/pinot/segment/spi/store/SegmentDirectory.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1zcGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL3Bpbm90L3NlZ21lbnQvc3BpL3N0b3JlL1NlZ21lbnREaXJlY3RvcnkuamF2YQ==) | `55.55% <0.00%> (-6.95%)` | :arrow_down: |
   | [...egment/spi/index/metadata/SegmentMetadataImpl.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1zcGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL3Bpbm90L3NlZ21lbnQvc3BpL2luZGV4L21ldGFkYXRhL1NlZ21lbnRNZXRhZGF0YUltcGwuamF2YQ==) | `74.28% <28.57%> (-0.29%)` | :arrow_down: |
   | [...indexsegment/immutable/ImmutableSegmentLoader.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9pbmRleHNlZ21lbnQvaW1tdXRhYmxlL0ltbXV0YWJsZVNlZ21lbnRMb2FkZXIuamF2YQ==) | `84.15% <73.52%> (-7.51%)` | :arrow_down: |
   | [...ocal/segment/index/loader/SegmentPreProcessor.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2luZGV4L2xvYWRlci9TZWdtZW50UHJlUHJvY2Vzc29yLmphdmE=) | `80.00% <83.33%> (+0.21%)` | :arrow_up: |
   | [.../pinot/core/data/manager/BaseTableDataManager.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9kYXRhL21hbmFnZXIvQmFzZVRhYmxlRGF0YU1hbmFnZXIuamF2YQ==) | `88.47% <86.11%> (+3.58%)` | :arrow_up: |
   | [...nt/local/loader/DefaultSegmentDirectoryLoader.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9sb2FkZXIvRGVmYXVsdFNlZ21lbnREaXJlY3RvcnlMb2FkZXIuamF2YQ==) | `100.00% <100.00%> (ø)` | |
   | [.../columnminmaxvalue/ColumnMinMaxValueGenerator.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2luZGV4L2xvYWRlci9jb2x1bW5taW5tYXh2YWx1ZS9Db2x1bW5NaW5NYXhWYWx1ZUdlbmVyYXRvci5qYXZh) | `73.68% <100.00%> (ø)` | |
   | [...loader/defaultcolumn/BaseDefaultColumnHandler.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2luZGV4L2xvYWRlci9kZWZhdWx0Y29sdW1uL0Jhc2VEZWZhdWx0Q29sdW1uSGFuZGxlci5qYXZh) | `63.26% <100.00%> (ø)` | |
   | [...t/local/segment/store/SegmentLocalFSDirectory.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L3N0b3JlL1NlZ21lbnRMb2NhbEZTRGlyZWN0b3J5LmphdmE=) | `71.89% <100.00%> (+2.60%)` | :arrow_up: |
   | ... and [385 more](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [a1f3c30...cf77dab](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] klsince commented on a change in pull request #7969: refactor segment loading logic in BaseTableDataManager to decouple it with local segment directory

Posted by GitBox <gi...@apache.org>.
klsince commented on a change in pull request #7969:
URL: https://github.com/apache/pinot/pull/7969#discussion_r782517157



##########
File path: pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/store/SegmentDirectory.java
##########
@@ -140,6 +141,14 @@ public void acquire(FetchContext fetchContext) {
   public void release(FetchContext fetchContext) {
   }
 
+  /**
+   * Copy segment directory to a local directory.
+   * @param dest the destination directory
+   */
+  public void copyTo(File dest)

Review comment:
       I'd prefer copyTo as it doesn't imply where the source is. copyToLoca() kinda implies the source is remote.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] codecov-commenter edited a comment on pull request #7969: refactor segment loading logic in BaseTableDataManager to decouple it with local segment directory

Posted by GitBox <gi...@apache.org>.
codecov-commenter edited a comment on pull request #7969:
URL: https://github.com/apache/pinot/pull/7969#issuecomment-1002909897


   # [Codecov](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#7969](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (ac41900) into [master](https://codecov.io/gh/apache/pinot/commit/823aa07d7fc61543ed3d91992749d28aecd192ee?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (823aa07) will **decrease** coverage by `1.25%`.
   > The diff coverage is `75.00%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/pinot/pull/7969/graphs/tree.svg?width=650&height=150&src=pr&token=4ibza2ugkz&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   ```diff
   @@             Coverage Diff              @@
   ##             master    #7969      +/-   ##
   ============================================
   - Coverage     71.35%   70.10%   -1.26%     
   - Complexity     4218     4221       +3     
   ============================================
     Files          1595     1596       +1     
     Lines         82748    82834      +86     
     Branches      12348    12359      +11     
   ============================================
   - Hits          59046    58071     -975     
   - Misses        19715    20780    +1065     
   + Partials       3987     3983       -4     
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | integration1 | `?` | |
   | integration2 | `27.63% <27.32%> (+0.07%)` | :arrow_up: |
   | unittests1 | `68.16% <78.06%> (-0.03%)` | :arrow_down: |
   | unittests2 | `14.27% <0.00%> (+0.01%)` | :arrow_up: |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [...ent/index/column/PhysicalColumnIndexContainer.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2luZGV4L2NvbHVtbi9QaHlzaWNhbENvbHVtbkluZGV4Q29udGFpbmVyLmphdmE=) | `90.83% <0.00%> (-0.84%)` | :arrow_down: |
   | [...ache/pinot/segment/spi/store/SegmentDirectory.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1zcGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL3Bpbm90L3NlZ21lbnQvc3BpL3N0b3JlL1NlZ21lbnREaXJlY3RvcnkuamF2YQ==) | `55.55% <0.00%> (-6.95%)` | :arrow_down: |
   | [...egment/spi/index/metadata/SegmentMetadataImpl.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1zcGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL3Bpbm90L3NlZ21lbnQvc3BpL2luZGV4L21ldGFkYXRhL1NlZ21lbnRNZXRhZGF0YUltcGwuamF2YQ==) | `74.28% <28.57%> (-0.29%)` | :arrow_down: |
   | [...server/starter/helix/HelixInstanceDataManager.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VydmVyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9zZXJ2ZXIvc3RhcnRlci9oZWxpeC9IZWxpeEluc3RhbmNlRGF0YU1hbmFnZXIuamF2YQ==) | `79.44% <47.05%> (-2.64%)` | :arrow_down: |
   | [...indexsegment/immutable/ImmutableSegmentLoader.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9pbmRleHNlZ21lbnQvaW1tdXRhYmxlL0ltbXV0YWJsZVNlZ21lbnRMb2FkZXIuamF2YQ==) | `84.15% <73.52%> (-7.51%)` | :arrow_down: |
   | [.../pinot/core/data/manager/BaseTableDataManager.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9kYXRhL21hbmFnZXIvQmFzZVRhYmxlRGF0YU1hbmFnZXIuamF2YQ==) | `86.58% <80.24%> (+1.80%)` | :arrow_up: |
   | [...ocal/segment/index/loader/SegmentPreProcessor.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2luZGV4L2xvYWRlci9TZWdtZW50UHJlUHJvY2Vzc29yLmphdmE=) | `80.00% <83.33%> (+0.21%)` | :arrow_up: |
   | [...nt/local/loader/DefaultSegmentDirectoryLoader.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9sb2FkZXIvRGVmYXVsdFNlZ21lbnREaXJlY3RvcnlMb2FkZXIuamF2YQ==) | `100.00% <100.00%> (ø)` | |
   | [.../columnminmaxvalue/ColumnMinMaxValueGenerator.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2luZGV4L2xvYWRlci9jb2x1bW5taW5tYXh2YWx1ZS9Db2x1bW5NaW5NYXhWYWx1ZUdlbmVyYXRvci5qYXZh) | `73.68% <100.00%> (ø)` | |
   | [...loader/defaultcolumn/BaseDefaultColumnHandler.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2luZGV4L2xvYWRlci9kZWZhdWx0Y29sdW1uL0Jhc2VEZWZhdWx0Q29sdW1uSGFuZGxlci5qYXZh) | `63.26% <100.00%> (ø)` | |
   | ... and [122 more](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [823aa07...ac41900](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] klsince commented on a change in pull request #7969: extend SegmentDirectoryLoader interface and refactor BaseTableDataManager

Posted by GitBox <gi...@apache.org>.
klsince commented on a change in pull request #7969:
URL: https://github.com/apache/pinot/pull/7969#discussion_r778489953



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/data/manager/BaseTableDataManager.java
##########
@@ -333,94 +306,162 @@ public void reloadSegment(String segmentName, IndexLoadingConfig indexLoadingCon
 
   @Override
   public void addOrReplaceSegment(String segmentName, IndexLoadingConfig indexLoadingConfig,
-      SegmentZKMetadata zkMetadata, @Nullable SegmentMetadata localMetadata)
+      SegmentZKMetadata zkMetadata, @Nullable SegmentMetadata segmentMetadata)
       throws Exception {
-    if (!isNewSegment(zkMetadata, localMetadata)) {
-      LOGGER.info("Segment: {} of table: {} has crc: {} same as before, already loaded, do nothing", segmentName,
-          _tableNameWithType, localMetadata.getCrc());
+    // Non-null segment metadata means the segment has already been loaded.
+    if (segmentMetadata != null) {
+      if (hasSameCRC(zkMetadata, segmentMetadata)) {
+        // Simply returns if the CRC hasn't changed. The table config may have changed
+        // since segment is loaded, but that is handled by reloadSegment() method.
+        LOGGER.info("Segment: {} of table: {} has crc: {} same as before, already loaded, do nothing", segmentName,
+            _tableNameWithType, segmentMetadata.getCrc());
+      } else {
+        // Download the raw segment, reprocess and load it if the CRC has changed.
+        LOGGER.info("Segment: {} of table: {} already loaded but its crc: {} differs from new crc: {}", segmentName,
+            _tableNameWithType, segmentMetadata.getCrc(), zkMetadata.getCrc());
+        downloadRawSegmentAndProcess(segmentName, indexLoadingConfig, zkMetadata,
+            ZKMetadataProvider.getTableSchema(_propertyStore, _tableNameWithType));
+      }
       return;
     }
 
-    // Try to recover if no local metadata is provided.
-    if (localMetadata == null) {
-      LOGGER.info("Segment: {} of table: {} is not loaded, checking disk", segmentName, _tableNameWithType);
-      localMetadata = recoverSegmentQuietly(segmentName);
-      if (!isNewSegment(zkMetadata, localMetadata)) {
-        LOGGER.info("Segment: {} of table {} has crc: {} same as before, loading", segmentName, _tableNameWithType,
-            localMetadata.getCrc());
-        if (loadSegmentQuietly(segmentName, indexLoadingConfig)) {
-          return;
-        }
-        // Set local metadata to null to indicate that the local segment fails to load,
-        // although it exists and has same crc with the remote one.
-        localMetadata = null;
-      }
-    }
+    // For local tier backend, try to recover the segment from potential
+    // reload failure. Continue upon any failure.
+    File indexDir = getSegmentDataDir(segmentName);
+    recoverReloadFailureQuietly(_tableNameWithType, segmentName, indexDir);
 
-    Preconditions.checkState(allowDownload(segmentName, zkMetadata), "Segment: %s of table: %s does not allow download",
-        segmentName, _tableNameWithType);
+    // Creates the SegmentDirectory object to access the segment metadata that
+    // may be from local tier backend or remote tier backend.
+    SegmentDirectory segmentDirectory = null;
+    try {
+      SegmentDirectoryLoaderContext loaderContext =
+          new SegmentDirectoryLoaderContext(indexLoadingConfig.getTableConfig(), indexLoadingConfig.getInstanceId(),
+              segmentName, indexLoadingConfig.getSegmentDirectoryConfigs());
+      SegmentDirectoryLoader segmentDirectoryLoader =
+          SegmentDirectoryLoaderRegistry.getSegmentDirectoryLoader(indexLoadingConfig.getSegmentDirectoryLoader());
+      segmentDirectory = segmentDirectoryLoader.load(indexDir.toURI(), loaderContext);
+    } catch (Exception e) {
+      LOGGER.warn("Failed to create SegmentDirectory for segment: {} of table: {} due to error: {}", segmentName,
+          _tableNameWithType, e.getMessage());
+    }
 
-    // Download segment and replace the local one, either due to failure to recover local segment,
-    // or the segment data is updated and has new CRC now.
-    if (localMetadata == null) {
-      LOGGER.info("Download segment: {} of table: {} as no good one exists locally", segmentName, _tableNameWithType);
-    } else {
-      LOGGER.info("Download segment: {} of table: {} as local crc: {} mismatches remote crc: {}.", segmentName,
-          _tableNameWithType, localMetadata.getCrc(), zkMetadata.getCrc());
+    // Download the raw segment, reprocess and load it if it is never loaded or its CRC has changed.
+    if (segmentDirectory == null) {
+      LOGGER.info("Segment: {} of table: {} does not exist", segmentName, _tableNameWithType);
+      downloadRawSegmentAndProcess(segmentName, indexLoadingConfig, zkMetadata,
+          ZKMetadataProvider.getTableSchema(_propertyStore, _tableNameWithType));
+      return;
+    }
+    if (!hasSameCRC(zkMetadata, segmentDirectory.getSegmentMetadata())) {
+      LOGGER.info("Segment: {} of table: {} has crc: {} different from new crc: {}", segmentName,
+          _tableNameWithType, segmentDirectory.getSegmentMetadata().getCrc(), zkMetadata.getCrc());
+      closeSegmentDirectoryQuietly(segmentDirectory);
+      downloadRawSegmentAndProcess(segmentName, indexLoadingConfig, zkMetadata,
+          ZKMetadataProvider.getTableSchema(_propertyStore, _tableNameWithType));
+      return;
     }
-    File indexDir = downloadSegment(segmentName, zkMetadata);
-    addSegment(indexDir, indexLoadingConfig);
-    LOGGER.info("Downloaded and loaded segment: {} of table: {} with crc: {}", segmentName, _tableNameWithType,
-        zkMetadata.getCrc());
-  }
 
-  protected boolean allowDownload(String segmentName, SegmentZKMetadata zkMetadata) {
-    return true;
+    try {
+      Schema schema = ZKMetadataProvider.getTableSchema(_propertyStore, _tableNameWithType);
+      if (!ImmutableSegmentLoader.needPreprocess(segmentDirectory, indexLoadingConfig, schema)) {
+        // The loaded segment is still consistent with current table config or schema.
+        LOGGER.info("Segment: {} of table: {} is consistent with table config and schema", segmentName,
+            _tableNameWithType);
+        loadSegment(segmentDirectory, indexLoadingConfig, schema);
+        return;
+      }
+      // If any discrepancy is found, get the segment from tier backend, reprocess and load it.
+      LOGGER.info("Segment: {} of table: {} needs reprocess to reflect latest table config and schema", segmentName,
+          _tableNameWithType);
+      // Close the stale SegmentDirectory object before loading the newly processed segment.
+      closeSegmentDirectoryQuietly(segmentDirectory);
+      downloadTierSegmentAndProcess(segmentName, indexLoadingConfig, schema);
+    } catch (Exception e) {
+      closeSegmentDirectoryQuietly(segmentDirectory);
+      LOGGER.error("Failed to reprocess and load segment: {} of table: {}", segmentName, _tableNameWithType, e);
+      downloadRawSegmentAndProcess(segmentName, indexLoadingConfig, zkMetadata,
+          ZKMetadataProvider.getTableSchema(_propertyStore, _tableNameWithType));
+    }
   }
 
-  protected File downloadSegment(String segmentName, SegmentZKMetadata zkMetadata)
+  /**
+   * Get the segment from the configured tier backend, reprocess it with the latest table
+   * config and schema and then load it. Please note that the segment is from tier backend,
+   * not deep store, for incremental processing.
+   */
+  private void downloadTierSegmentAndProcess(String segmentName, IndexLoadingConfig indexLoadingConfig,
+      @Nullable Schema schema)
       throws Exception {
-    // TODO: may support download from peer servers for RealTime table.
-    return downloadSegmentFromDeepStore(segmentName, zkMetadata);
+    SegmentDirectoryLoaderContext loaderContext =
+        new SegmentDirectoryLoaderContext(indexLoadingConfig.getTableConfig(), indexLoadingConfig.getInstanceId(),
+            segmentName, indexLoadingConfig.getSegmentDirectoryConfigs());
+    SegmentDirectoryLoader segmentDirectoryLoader =
+        SegmentDirectoryLoaderRegistry.getSegmentDirectoryLoader(indexLoadingConfig.getSegmentDirectoryLoader());
+    File indexDir = getSegmentDataDir(segmentName);
+    segmentDirectoryLoader.download(indexDir, loaderContext);

Review comment:
       Preprocessed segments (or tier backend segment) are put on server's storage backend (not using local disk here on purpose) to serve queries. SegmentDirectoryLoader interface knows how to load those preprocessed segments, essentially to initialize objects like SegmentDirectory and ColumnIndexDirectory to get ready for queries. Those interfaces abstract away the impl details and also where the preprocessed segments physically are.
   
   It's on purpose not putting logic around deep store behind SegmentDirectoryLoader interface. As deep store is not designed as server's storage backend to hold preprocessed segments. Deep store keeps the raw segments, not preprocessed with latest table config and schema for queries. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] mcvsubbu commented on a change in pull request #7969: extend SegmentDirectoryLoader interface and refactor BaseTableDataManager

Posted by GitBox <gi...@apache.org>.
mcvsubbu commented on a change in pull request #7969:
URL: https://github.com/apache/pinot/pull/7969#discussion_r778303472



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/data/manager/BaseTableDataManager.java
##########
@@ -333,94 +306,156 @@ public void reloadSegment(String segmentName, IndexLoadingConfig indexLoadingCon
 
   @Override
   public void addOrReplaceSegment(String segmentName, IndexLoadingConfig indexLoadingConfig,
-      SegmentZKMetadata zkMetadata, @Nullable SegmentMetadata localMetadata)
+      SegmentZKMetadata zkMetadata, @Nullable SegmentMetadata segmentMetadata)

Review comment:
       Please check the comments on TableDataManager.java and re-word if needed.

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/data/manager/BaseTableDataManager.java
##########
@@ -333,94 +306,156 @@ public void reloadSegment(String segmentName, IndexLoadingConfig indexLoadingCon
 
   @Override
   public void addOrReplaceSegment(String segmentName, IndexLoadingConfig indexLoadingConfig,
-      SegmentZKMetadata zkMetadata, @Nullable SegmentMetadata localMetadata)
+      SegmentZKMetadata zkMetadata, @Nullable SegmentMetadata segmentMetadata)
       throws Exception {
-    if (!isNewSegment(zkMetadata, localMetadata)) {
-      LOGGER.info("Segment: {} of table: {} has crc: {} same as before, already loaded, do nothing", segmentName,
-          _tableNameWithType, localMetadata.getCrc());
+    // Non-null segment metadata means the segment has already been loaded.
+    if (segmentMetadata != null) {
+      if (hasSameCRC(zkMetadata, segmentMetadata)) {
+        // Simply returns if the CRC hasn't changed. The table config may have changed
+        // since segment is loaded, but that is handled by reloadSegment() method.
+        LOGGER.info("Segment: {} of table: {} has crc: {} same as before, already loaded, do nothing", segmentName,
+            _tableNameWithType, segmentMetadata.getCrc());
+      } else {
+        // Download the raw segment, reprocess and load it if the CRC has changed.
+        LOGGER.info("Segment: {} of table: {} already loaded but its crc: {} differs from new crc: {}", segmentName,
+            _tableNameWithType, segmentMetadata.getCrc(), zkMetadata.getCrc());
+        downloadRawSegmentAndProcess(segmentName, indexLoadingConfig, zkMetadata,
+            ZKMetadataProvider.getTableSchema(_propertyStore, _tableNameWithType));
+      }
       return;
     }
 
-    // Try to recover if no local metadata is provided.
-    if (localMetadata == null) {
-      LOGGER.info("Segment: {} of table: {} is not loaded, checking disk", segmentName, _tableNameWithType);
-      localMetadata = recoverSegmentQuietly(segmentName);
-      if (!isNewSegment(zkMetadata, localMetadata)) {
-        LOGGER.info("Segment: {} of table {} has crc: {} same as before, loading", segmentName, _tableNameWithType,
-            localMetadata.getCrc());
-        if (loadSegmentQuietly(segmentName, indexLoadingConfig)) {
-          return;
-        }
-        // Set local metadata to null to indicate that the local segment fails to load,
-        // although it exists and has same crc with the remote one.
-        localMetadata = null;
-      }
-    }
+    // For local tier backend, try to recover the segment from potential
+    // reload failure. Continue upon any failure.
+    File indexDir = getSegmentDataDir(segmentName);
+    recoverReloadFailureQuietly(_tableNameWithType, segmentName, indexDir);
 
-    Preconditions.checkState(allowDownload(segmentName, zkMetadata), "Segment: %s of table: %s does not allow download",
-        segmentName, _tableNameWithType);
+    // Creates the SegmentDirectory object to access the segment metadata that

Review comment:
       Where are we accessing segment metadata that is in a remote tier backend?

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/data/manager/BaseTableDataManager.java
##########
@@ -333,94 +306,156 @@ public void reloadSegment(String segmentName, IndexLoadingConfig indexLoadingCon
 
   @Override
   public void addOrReplaceSegment(String segmentName, IndexLoadingConfig indexLoadingConfig,
-      SegmentZKMetadata zkMetadata, @Nullable SegmentMetadata localMetadata)
+      SegmentZKMetadata zkMetadata, @Nullable SegmentMetadata segmentMetadata)
       throws Exception {
-    if (!isNewSegment(zkMetadata, localMetadata)) {
-      LOGGER.info("Segment: {} of table: {} has crc: {} same as before, already loaded, do nothing", segmentName,
-          _tableNameWithType, localMetadata.getCrc());
+    // Non-null segment metadata means the segment has already been loaded.
+    if (segmentMetadata != null) {
+      if (hasSameCRC(zkMetadata, segmentMetadata)) {
+        // Simply returns if the CRC hasn't changed. The table config may have changed
+        // since segment is loaded, but that is handled by reloadSegment() method.
+        LOGGER.info("Segment: {} of table: {} has crc: {} same as before, already loaded, do nothing", segmentName,
+            _tableNameWithType, segmentMetadata.getCrc());
+      } else {
+        // Download the raw segment, reprocess and load it if the CRC has changed.
+        LOGGER.info("Segment: {} of table: {} already loaded but its crc: {} differs from new crc: {}", segmentName,
+            _tableNameWithType, segmentMetadata.getCrc(), zkMetadata.getCrc());
+        downloadRawSegmentAndProcess(segmentName, indexLoadingConfig, zkMetadata,

Review comment:
       rename to downloadFromDeepStore?

##########
File path: pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/index/loader/LoaderUtils.java
##########
@@ -138,7 +138,6 @@ public static void writeIndexToV3Format(SegmentDirectory.Writer segmentWriter, S
   public static void reloadFailureRecovery(File indexDir)
       throws IOException {
     File parentDir = indexDir.getParentFile();
-

Review comment:
       let us have a blank line here please

##########
File path: pinot-server/src/main/java/org/apache/pinot/server/starter/helix/HelixInstanceDataManager.java
##########
@@ -307,6 +306,32 @@ private void reloadSegment(String tableNameWithType, SegmentMetadata segmentMeta
     }
   }
 
+  /**
+   * Try to reload a segment without a local index directory. The segment can be a consuming
+   * segment from a REALTIME table, or an immutable segment on remote tier backend.
+   * @return true if the segment is loaded.
+   */
+  @VisibleForTesting
+  boolean reloadSegmentWithNullIndexDir(String tableNameWithType, String segmentName,
+      SegmentDataManager segmentDataManager, @Nullable Schema schema) {
+    IndexSegment segment = segmentDataManager.getSegment();
+    if (segment instanceof ImmutableSegment) {
+      LOGGER.info("Found an immutable segment: {} in table: {} on remote tier backend", segmentName,

Review comment:
       ```suggestion
         LOGGER.info("Found an immutable segment: {} in table: {}.", segmentName,
   ```

##########
File path: pinot-server/src/main/java/org/apache/pinot/server/starter/helix/HelixInstanceDataManager.java
##########
@@ -307,6 +306,32 @@ private void reloadSegment(String tableNameWithType, SegmentMetadata segmentMeta
     }
   }
 
+  /**
+   * Try to reload a segment without a local index directory. The segment can be a consuming
+   * segment from a REALTIME table, or an immutable segment on remote tier backend.
+   * @return true if the segment is loaded.
+   */
+  @VisibleForTesting
+  boolean reloadSegmentWithNullIndexDir(String tableNameWithType, String segmentName,

Review comment:
       Please rename this method as reloadMutableSegment() -- that is what it does

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/data/manager/BaseTableDataManager.java
##########
@@ -333,92 +315,131 @@ public void reloadSegment(String segmentName, IndexLoadingConfig indexLoadingCon
 
   @Override
   public void addOrReplaceSegment(String segmentName, IndexLoadingConfig indexLoadingConfig,
-      SegmentZKMetadata zkMetadata, @Nullable SegmentMetadata localMetadata)
+      SegmentZKMetadata zkMetadata, @Nullable SegmentMetadata segmentMetadata)
       throws Exception {
-    if (!isNewSegment(zkMetadata, localMetadata)) {
-      LOGGER.info("Segment: {} of table: {} has crc: {} same as before, already loaded, do nothing", segmentName,
-          _tableNameWithType, localMetadata.getCrc());
+    // Non-null segment metadata means the segment has already been loaded.
+    if (segmentMetadata != null) {
+      if (hasSameCRC(zkMetadata, segmentMetadata)) {
+        // Simply returns if the CRC hasn't changed. The table config may have changed
+        // since segment is loaded, but that is handled by reloadSegment() method.
+        LOGGER.info("Segment: {} of table: {} has crc: {} same as before, already loaded, do nothing", segmentName,
+            _tableNameWithType, segmentMetadata.getCrc());
+      } else {
+        // Download the raw segment, reprocess and load it if the CRC has changed.
+        LOGGER.info("Segment: {} of table: {} already loaded but its crc: {} differs from new crc: {}", segmentName,
+            _tableNameWithType, segmentMetadata.getCrc(), zkMetadata.getCrc());
+        downloadRawSegmentAndProcess(segmentName, indexLoadingConfig, zkMetadata,
+            ZKMetadataProvider.getTableSchema(_propertyStore, _tableNameWithType));
+      }
       return;
     }
 
-    // Try to recover if no local metadata is provided.
-    if (localMetadata == null) {
-      LOGGER.info("Segment: {} of table: {} is not loaded, checking disk", segmentName, _tableNameWithType);
-      localMetadata = recoverSegmentQuietly(segmentName);
-      if (!isNewSegment(zkMetadata, localMetadata)) {
-        LOGGER.info("Segment: {} of table {} has crc: {} same as before, loading", segmentName, _tableNameWithType,
-            localMetadata.getCrc());
-        if (loadSegmentQuietly(segmentName, indexLoadingConfig)) {
-          return;
-        }
-        // Set local metadata to null to indicate that the local segment fails to load,
-        // although it exists and has same crc with the remote one.
-        localMetadata = null;
-      }
+    // For local tier backend, try to recover the segment from potential
+    // reload failure. Continue upon any failure.
+    File indexDir = getSegmentDataDir(segmentName);
+    recoverReloadFailureQuietly(_tableNameWithType, segmentName, indexDir);
+
+    // Creates the SegmentDirectory object to access the segment metadata that
+    // may be from local tier backend or remote tier backend.
+    SegmentDirectoryLoaderContext segmentLoaderContext =
+        new SegmentDirectoryLoaderContext(indexLoadingConfig.getTableConfig(), indexLoadingConfig.getInstanceId(),
+            segmentName, indexLoadingConfig.getSegmentDirectoryConfigs());
+    SegmentDirectoryLoader segmentLoader =
+        SegmentDirectoryLoaderRegistry.getSegmentDirectoryLoader(indexLoadingConfig.getSegmentDirectoryLoader());
+    SegmentDirectory segmentDirectory = null;
+    try {
+      segmentDirectory = segmentLoader.load(indexDir.toURI(), segmentLoaderContext);
+    } catch (Exception e) {
+      LOGGER.warn("Failed to create SegmentDirectory for segment: {} of table: {} due to error: {}", segmentName,
+          _tableNameWithType, e.getMessage());
     }
 
-    Preconditions.checkState(allowDownload(segmentName, zkMetadata), "Segment: %s of table: %s does not allow download",
-        segmentName, _tableNameWithType);
-
-    // Download segment and replace the local one, either due to failure to recover local segment,
-    // or the segment data is updated and has new CRC now.
-    if (localMetadata == null) {
-      LOGGER.info("Download segment: {} of table: {} as no good one exists locally", segmentName, _tableNameWithType);
-    } else {
-      LOGGER.info("Download segment: {} of table: {} as local crc: {} mismatches remote crc: {}.", segmentName,
-          _tableNameWithType, localMetadata.getCrc(), zkMetadata.getCrc());
+    // Download the raw segment, reprocess and load it if it is never loaded or its CRC has changed.
+    if (segmentDirectory == null || !hasSameCRC(zkMetadata, segmentDirectory.getSegmentMetadata())) {
+      LOGGER.info("Segment: {} of table: {} not exist or its crc: {} differs from new crc: {}", segmentName,
+          _tableNameWithType, (segmentDirectory == null) ? "none" : segmentDirectory.getSegmentMetadata().getCrc(),
+          zkMetadata.getCrc());
+      closeSegmentDirectoryQuietly(segmentDirectory);
+      downloadRawSegmentAndProcess(segmentName, indexLoadingConfig, zkMetadata,
+          ZKMetadataProvider.getTableSchema(_propertyStore, _tableNameWithType));
+      return;
     }
-    File indexDir = downloadSegment(segmentName, zkMetadata);
-    addSegment(indexDir, indexLoadingConfig);
-    LOGGER.info("Downloaded and loaded segment: {} of table: {} with crc: {}", segmentName, _tableNameWithType,
-        zkMetadata.getCrc());
-  }
 
-  protected boolean allowDownload(String segmentName, SegmentZKMetadata zkMetadata) {
-    return true;
+    try {
+      Schema schema = ZKMetadataProvider.getTableSchema(_propertyStore, _tableNameWithType);
+      if (!ImmutableSegmentLoader.needPreprocess(segmentDirectory, indexLoadingConfig, schema)) {
+        // The loaded segment is still consistent with current table config or schema.
+        LOGGER.info("Segment: {} of table: {} is consistent with table config and schema", segmentName,
+            _tableNameWithType);
+        addSegment(ImmutableSegmentLoader.load(segmentDirectory, indexLoadingConfig, schema));

Review comment:
       +1 to Neha's comment, please do not use the 1-liner




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] npawar commented on a change in pull request #7969: extend SegmentDirectoryLoader interface and refactor BaseTableDataManager

Posted by GitBox <gi...@apache.org>.
npawar commented on a change in pull request #7969:
URL: https://github.com/apache/pinot/pull/7969#discussion_r779255713



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/data/manager/BaseTableDataManager.java
##########
@@ -275,53 +278,23 @@ public void addSegmentError(String segmentName, SegmentErrorInfo segmentErrorInf
 
   @Override
   public void reloadSegment(String segmentName, IndexLoadingConfig indexLoadingConfig, SegmentZKMetadata zkMetadata,
-      SegmentMetadata localMetadata, @Nullable Schema schema, boolean forceDownload)
+      SegmentMetadata segmentMetadata, @Nullable Schema schema, boolean forceDownload)
       throws Exception {
-    File indexDir = localMetadata.getIndexDir();
-    Preconditions.checkState(indexDir.isDirectory(), "Index directory: %s is not a directory", indexDir);
-
-    File parentFile = indexDir.getParentFile();
-    File segmentBackupDir =
-        new File(parentFile, indexDir.getName() + CommonConstants.Segment.SEGMENT_BACKUP_DIR_SUFFIX);
-
+    File indexDir = getSegmentDataDir(segmentName);
+    // Create backup dir to make segment reloading atomic for local tier backend.
+    LoaderUtils.createBackup(indexDir);
     try {
-      // First rename index directory to segment backup directory so that original segment have all file descriptors
-      // point to the segment backup directory to ensure original segment serves queries properly
-
-      // Rename index directory to segment backup directory (atomic)
-      Preconditions.checkState(indexDir.renameTo(segmentBackupDir),
-          "Failed to rename index directory: %s to segment backup directory: %s", indexDir, segmentBackupDir);
-
-      // Download from remote or copy from local backup directory into index directory,
-      // and then continue to load the segment from index directory.
-      boolean shouldDownload = forceDownload || !hasSameCRC(zkMetadata, localMetadata);
-      if (shouldDownload && allowDownload(segmentName, zkMetadata)) {
-        if (forceDownload) {
-          LOGGER.info("Segment: {} of table: {} is forced to download", segmentName, _tableNameWithType);
-        } else {
-          LOGGER.info("Download segment:{} of table: {} as local crc: {} mismatches remote crc: {}", segmentName,
-              _tableNameWithType, localMetadata.getCrc(), zkMetadata.getCrc());
-        }
-        indexDir = downloadSegment(segmentName, zkMetadata);
+      boolean shouldDownloadRawSegment = forceDownload || !hasSameCRC(zkMetadata, segmentMetadata);
+      if (shouldDownloadRawSegment && allowDownloadRawSegment(segmentName, zkMetadata)) {
+        downloadRawSegmentAndProcess(segmentName, indexLoadingConfig, zkMetadata, schema);

Review comment:
       To answer this further, `Can you clarify why we download from a "tier" (I assume this to mean a peer storage server that already hosts the segment), as opposed to always downloading from the master and keep things simple?` - there is a distinction being made between the master (raw) segment and the tier (fully preprocessed segment ready to serve queries), as mentioned by @klsince above. This distinction is primarily to help us future proof the implementations of lazy loading, where using the master copy of segment may suffice.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] klsince commented on a change in pull request #7969: refactor segment loading logic in BaseTableDataManager to decouple it with local segment directory

Posted by GitBox <gi...@apache.org>.
klsince commented on a change in pull request #7969:
URL: https://github.com/apache/pinot/pull/7969#discussion_r780526138



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/data/manager/BaseTableDataManager.java
##########
@@ -335,26 +327,17 @@ public void reloadSegment(String segmentName, IndexLoadingConfig indexLoadingCon
   public void addOrReplaceSegment(String segmentName, IndexLoadingConfig indexLoadingConfig,
       SegmentZKMetadata zkMetadata, @Nullable SegmentMetadata localMetadata)
       throws Exception {
-    if (!isNewSegment(zkMetadata, localMetadata)) {
+    if (localMetadata != null && hasSameCRC(zkMetadata, localMetadata)) {
       LOGGER.info("Segment: {} of table: {} has crc: {} same as before, already loaded, do nothing", segmentName,
           _tableNameWithType, localMetadata.getCrc());
       return;
     }
 
-    // Try to recover if no local metadata is provided.
-    if (localMetadata == null) {
-      LOGGER.info("Segment: {} of table: {} is not loaded, checking disk", segmentName, _tableNameWithType);
-      localMetadata = recoverSegmentQuietly(segmentName);
-      if (!isNewSegment(zkMetadata, localMetadata)) {
-        LOGGER.info("Segment: {} of table {} has crc: {} same as before, loading", segmentName, _tableNameWithType,
-            localMetadata.getCrc());
-        if (loadSegmentQuietly(segmentName, indexLoadingConfig)) {
-          return;
-        }
-        // Set local metadata to null to indicate that the local segment fails to load,
-        // although it exists and has same crc with the remote one.
-        localMetadata = null;
-      }
+    // The segment is not loaded by the server yet, but it may still kept by the server.

Review comment:
       It's a common case when server gets restarted, the segments are still on their local disk but none of the SegmentMetadata objects are loaded into memory yet. (will update comment)
   
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] mcvsubbu commented on a change in pull request #7969: extend SegmentDirectoryLoader interface and refactor BaseTableDataManager

Posted by GitBox <gi...@apache.org>.
mcvsubbu commented on a change in pull request #7969:
URL: https://github.com/apache/pinot/pull/7969#discussion_r778468032



##########
File path: pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/index/loader/LoaderUtils.java
##########
@@ -161,4 +160,36 @@ public static void reloadFailureRecovery(File indexDir)
       FileUtils.forceDelete(segmentTempDir);
     }
   }
+
+  public static void createBackup(File indexDir)
+      throws IOException {
+    if (!indexDir.exists()) {
+      return;
+    }
+    File parentDir = indexDir.getParentFile();
+    File segmentBackupDir = new File(parentDir, indexDir.getName() + CommonConstants.Segment.SEGMENT_BACKUP_DIR_SUFFIX);
+    // Rename index directory to segment backup directory (atomic)
+    Preconditions.checkState(indexDir.renameTo(segmentBackupDir),
+        "Failed to rename index directory: %s to segment backup directory: %s", indexDir, segmentBackupDir);
+    // Copy the backup dir back to proceed.
+    FileUtils.copyDirectory(segmentBackupDir, indexDir);

Review comment:
       Agree that the rename happens the same way, but prior to this PR, we just downloaded the segment. Looks like we are now copying after the rename (and then perhaps overwriting what we copied?)

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/data/manager/BaseTableDataManager.java
##########
@@ -275,53 +278,23 @@ public void addSegmentError(String segmentName, SegmentErrorInfo segmentErrorInf
 
   @Override
   public void reloadSegment(String segmentName, IndexLoadingConfig indexLoadingConfig, SegmentZKMetadata zkMetadata,
-      SegmentMetadata localMetadata, @Nullable Schema schema, boolean forceDownload)
+      SegmentMetadata segmentMetadata, @Nullable Schema schema, boolean forceDownload)
       throws Exception {
-    File indexDir = localMetadata.getIndexDir();
-    Preconditions.checkState(indexDir.isDirectory(), "Index directory: %s is not a directory", indexDir);
-
-    File parentFile = indexDir.getParentFile();
-    File segmentBackupDir =
-        new File(parentFile, indexDir.getName() + CommonConstants.Segment.SEGMENT_BACKUP_DIR_SUFFIX);
-
+    File indexDir = getSegmentDataDir(segmentName);
+    // Create backup dir to make segment reloading atomic for local tier backend.
+    LoaderUtils.createBackup(indexDir);
     try {
-      // First rename index directory to segment backup directory so that original segment have all file descriptors
-      // point to the segment backup directory to ensure original segment serves queries properly
-
-      // Rename index directory to segment backup directory (atomic)
-      Preconditions.checkState(indexDir.renameTo(segmentBackupDir),
-          "Failed to rename index directory: %s to segment backup directory: %s", indexDir, segmentBackupDir);
-
-      // Download from remote or copy from local backup directory into index directory,
-      // and then continue to load the segment from index directory.
-      boolean shouldDownload = forceDownload || !hasSameCRC(zkMetadata, localMetadata);
-      if (shouldDownload && allowDownload(segmentName, zkMetadata)) {
-        if (forceDownload) {
-          LOGGER.info("Segment: {} of table: {} is forced to download", segmentName, _tableNameWithType);
-        } else {
-          LOGGER.info("Download segment:{} of table: {} as local crc: {} mismatches remote crc: {}", segmentName,
-              _tableNameWithType, localMetadata.getCrc(), zkMetadata.getCrc());
-        }
-        indexDir = downloadSegment(segmentName, zkMetadata);
+      boolean shouldDownloadRawSegment = forceDownload || !hasSameCRC(zkMetadata, segmentMetadata);
+      if (shouldDownloadRawSegment && allowDownloadRawSegment(segmentName, zkMetadata)) {
+        downloadRawSegmentAndProcess(segmentName, indexLoadingConfig, zkMetadata, schema);

Review comment:
       What does it mean to have a "local" vs "non-local" tieredBackend? How is a segment expected to be different in a tierBackend? (Or, should we have this discussion in the design doc?) I am fine abstracting things as long as the comments in the code are clear about what the interfaces are supposed to do, and what layout of the cluster are we coding for




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] klsince commented on a change in pull request #7969: refactor segment loading logic in BaseTableDataManager to decouple it with local segment directory

Posted by GitBox <gi...@apache.org>.
klsince commented on a change in pull request #7969:
URL: https://github.com/apache/pinot/pull/7969#discussion_r784333246



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/data/manager/BaseTableDataManager.java
##########
@@ -493,12 +445,139 @@ protected File getTmpSegmentDataDir(String segmentName) {
     return new File(_resourceTmpDir, segmentName);
   }
 
-  @VisibleForTesting
-  static boolean isNewSegment(SegmentZKMetadata zkMetadata, @Nullable SegmentMetadata localMetadata) {
-    return localMetadata == null || !hasSameCRC(zkMetadata, localMetadata);
+  /**
+   * Create a backup directory to handle failure of segment reloading.
+   * First rename index directory to segment backup directory so that original segment have all file
+   * descriptors point to the segment backup directory to ensure original segment serves queries properly.
+   * The original index directory is restored lazily, as depending on the conditions,
+   * it may be restored from the backup directory or segment downloaded from deep store.
+   */
+  private void createBackup(File indexDir) {
+    if (!indexDir.exists()) {
+      return;
+    }
+    File parentDir = indexDir.getParentFile();
+    File segmentBackupDir = new File(parentDir, indexDir.getName() + CommonConstants.Segment.SEGMENT_BACKUP_DIR_SUFFIX);
+    // Rename index directory to segment backup directory (atomic).
+    Preconditions.checkState(indexDir.renameTo(segmentBackupDir),
+        "Failed to rename index directory: %s to segment backup directory: %s", indexDir, segmentBackupDir);
+  }
+
+  /**
+   * Remove the backup directory to mark the completion of segment reloading.
+   * First rename then delete is as renaming is an atomic operation, but deleting is not.
+   * When we rename the segment backup directory to segment temporary directory, we know the reload
+   * already succeeded, so that we can safely delete the segment temporary directory.
+   */
+  private void removeBackup(File indexDir)
+      throws IOException {
+    File parentDir = indexDir.getParentFile();
+    File segmentBackupDir = new File(parentDir, indexDir.getName() + CommonConstants.Segment.SEGMENT_BACKUP_DIR_SUFFIX);
+    if (!segmentBackupDir.exists()) {
+      return;
+    }
+    // Rename segment backup directory to segment temporary directory (atomic).
+    File segmentTempDir = new File(parentDir, indexDir.getName() + CommonConstants.Segment.SEGMENT_TEMP_DIR_SUFFIX);
+    Preconditions.checkState(segmentBackupDir.renameTo(segmentTempDir),
+        "Failed to rename segment backup directory: %s to segment temporary directory: %s", segmentBackupDir,
+        segmentTempDir);
+    FileUtils.deleteDirectory(segmentTempDir);
+  }
+
+  private boolean tryLoadExistingSegment(String segmentName, IndexLoadingConfig indexLoadingConfig,

Review comment:
       Will add comment, and for the questions:
   
   It also returns false if loading fails with exception. As before, upon exception to load locally existed segment, we should fall back to download the segment from deep store to finish the loading.
   
   SegmentDirectory object is closed when the method has to return false.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] klsince commented on a change in pull request #7969: refactor segment loading logic in BaseTableDataManager to decouple it with local segment directory

Posted by GitBox <gi...@apache.org>.
klsince commented on a change in pull request #7969:
URL: https://github.com/apache/pinot/pull/7969#discussion_r784363540



##########
File path: pinot-server/src/main/java/org/apache/pinot/server/starter/helix/HelixInstanceDataManager.java
##########
@@ -307,6 +306,30 @@ private void reloadSegment(String tableNameWithType, SegmentMetadata segmentMeta
     }
   }
 
+  /**
+   * Try to reload a mutable segment.
+   * @return true if the segment is mutable; false if the segment is immutable.
+   */
+  @VisibleForTesting
+  boolean reloadMutableSegment(String tableNameWithType, String segmentName,

Review comment:
       still prefer reloadMutableSegment as you suggested early on.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] codecov-commenter edited a comment on pull request #7969: refactor segment loading logic in BaseTableDataManager to decouple it with local segment directory

Posted by GitBox <gi...@apache.org>.
codecov-commenter edited a comment on pull request #7969:
URL: https://github.com/apache/pinot/pull/7969#issuecomment-1002909897


   # [Codecov](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#7969](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (ad6d167) into [master](https://codecov.io/gh/apache/pinot/commit/823aa07d7fc61543ed3d91992749d28aecd192ee?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (823aa07) will **decrease** coverage by `57.01%`.
   > The diff coverage is `0.00%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/pinot/pull/7969/graphs/tree.svg?width=650&height=150&src=pr&token=4ibza2ugkz&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   ```diff
   @@              Coverage Diff              @@
   ##             master    #7969       +/-   ##
   =============================================
   - Coverage     71.35%   14.33%   -57.02%     
   + Complexity     4218       81     -4137     
   =============================================
     Files          1595     1551       -44     
     Lines         82748    81000     -1748     
     Branches      12348    12161      -187     
   =============================================
   - Hits          59046    11613    -47433     
   - Misses        19715    68531    +48816     
   + Partials       3987      856     -3131     
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | integration1 | `?` | |
   | integration2 | `?` | |
   | unittests1 | `?` | |
   | unittests2 | `14.33% <0.00%> (+0.06%)` | :arrow_up: |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [.../pinot/core/data/manager/BaseTableDataManager.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9kYXRhL21hbmFnZXIvQmFzZVRhYmxlRGF0YU1hbmFnZXIuamF2YQ==) | `0.00% <0.00%> (-84.78%)` | :arrow_down: |
   | [...indexsegment/immutable/ImmutableSegmentLoader.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9pbmRleHNlZ21lbnQvaW1tdXRhYmxlL0ltbXV0YWJsZVNlZ21lbnRMb2FkZXIuamF2YQ==) | `0.00% <0.00%> (-91.67%)` | :arrow_down: |
   | [...nt/local/loader/DefaultSegmentDirectoryLoader.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9sb2FkZXIvRGVmYXVsdFNlZ21lbnREaXJlY3RvcnlMb2FkZXIuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [...ent/index/column/PhysicalColumnIndexContainer.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2luZGV4L2NvbHVtbi9QaHlzaWNhbENvbHVtbkluZGV4Q29udGFpbmVyLmphdmE=) | `0.00% <0.00%> (-91.67%)` | :arrow_down: |
   | [...ocal/segment/index/loader/SegmentPreProcessor.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2luZGV4L2xvYWRlci9TZWdtZW50UHJlUHJvY2Vzc29yLmphdmE=) | `0.00% <0.00%> (-79.79%)` | :arrow_down: |
   | [.../columnminmaxvalue/ColumnMinMaxValueGenerator.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2luZGV4L2xvYWRlci9jb2x1bW5taW5tYXh2YWx1ZS9Db2x1bW5NaW5NYXhWYWx1ZUdlbmVyYXRvci5qYXZh) | `0.00% <0.00%> (-73.69%)` | :arrow_down: |
   | [...loader/defaultcolumn/BaseDefaultColumnHandler.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2luZGV4L2xvYWRlci9kZWZhdWx0Y29sdW1uL0Jhc2VEZWZhdWx0Q29sdW1uSGFuZGxlci5qYXZh) | `0.00% <0.00%> (-63.27%)` | :arrow_down: |
   | [...t/local/segment/store/SegmentLocalFSDirectory.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L3N0b3JlL1NlZ21lbnRMb2NhbEZTRGlyZWN0b3J5LmphdmE=) | `0.00% <0.00%> (-69.29%)` | :arrow_down: |
   | [...egment/spi/index/metadata/SegmentMetadataImpl.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1zcGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL3Bpbm90L3NlZ21lbnQvc3BpL2luZGV4L21ldGFkYXRhL1NlZ21lbnRNZXRhZGF0YUltcGwuamF2YQ==) | `0.00% <0.00%> (-74.57%)` | :arrow_down: |
   | [...ache/pinot/segment/spi/store/SegmentDirectory.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1zcGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL3Bpbm90L3NlZ21lbnQvc3BpL3N0b3JlL1NlZ21lbnREaXJlY3RvcnkuamF2YQ==) | `0.00% <0.00%> (-62.50%)` | :arrow_down: |
   | ... and [1281 more](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [823aa07...ad6d167](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] snleee commented on a change in pull request #7969: refactor segment loading logic in BaseTableDataManager to decouple it with local segment directory

Posted by GitBox <gi...@apache.org>.
snleee commented on a change in pull request #7969:
URL: https://github.com/apache/pinot/pull/7969#discussion_r782453731



##########
File path: pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/store/SegmentDirectory.java
##########
@@ -140,6 +141,14 @@ public void acquire(FetchContext fetchContext) {
   public void release(FetchContext fetchContext) {
   }
 
+  /**
+   * Copy segment directory to a local directory.
+   * @param dest the destination directory
+   */
+  public void copyTo(File dest)

Review comment:
       `copyTo` -> `copyToLocal`?

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/data/manager/BaseTableDataManager.java
##########
@@ -285,49 +290,36 @@ public void addSegmentError(String segmentName, SegmentErrorInfo segmentErrorInf
   public void reloadSegment(String segmentName, IndexLoadingConfig indexLoadingConfig, SegmentZKMetadata zkMetadata,
       SegmentMetadata localMetadata, @Nullable Schema schema, boolean forceDownload)
       throws Exception {
-    File indexDir = localMetadata.getIndexDir();
-    Preconditions.checkState(indexDir.isDirectory(), "Index directory: %s is not a directory", indexDir);
-
-    File parentFile = indexDir.getParentFile();
-    File segmentBackupDir =
-        new File(parentFile, indexDir.getName() + CommonConstants.Segment.SEGMENT_BACKUP_DIR_SUFFIX);
-
+    File indexDir = getSegmentDataDir(segmentName);
     try {
-      // First rename index directory to segment backup directory so that original segment have all file descriptors
-      // point to the segment backup directory to ensure original segment serves queries properly
+      // Create backup directory to handle failure of segment reloading.
+      createBackup(indexDir);
 
-      // Rename index directory to segment backup directory (atomic)
-      Preconditions.checkState(indexDir.renameTo(segmentBackupDir),
-          "Failed to rename index directory: %s to segment backup directory: %s", indexDir, segmentBackupDir);
-
-      // Download from remote or copy from local backup directory into index directory,
-      // and then continue to load the segment from index directory.
+      // Download segment from deep store if CRC changes or forced to download;
+      // otherwise, copy backup directory back to the original index directory.
+      // And then continue to load the segment from the index directory.
       boolean shouldDownload = forceDownload || !hasSameCRC(zkMetadata, localMetadata);
       if (shouldDownload && allowDownload(segmentName, zkMetadata)) {
         if (forceDownload) {
           LOGGER.info("Segment: {} of table: {} is forced to download", segmentName, _tableNameWithType);
         } else {
-          LOGGER.info("Download segment:{} of table: {} as local crc: {} mismatches remote crc: {}", segmentName,
+          LOGGER.info("Download segment:{} of table: {} as crc changes from: {} to: {}", segmentName,
               _tableNameWithType, localMetadata.getCrc(), zkMetadata.getCrc());
         }
         indexDir = downloadSegment(segmentName, zkMetadata);
       } else {
-        LOGGER.info("Reload the local copy of segment: {} of table: {}", segmentName, _tableNameWithType);
-        FileUtils.copyDirectory(segmentBackupDir, indexDir);
+        LOGGER.info("Reload existing segment: {} of table: {}", segmentName, _tableNameWithType);
+        try (SegmentDirectory segmentDirectory = initSegmentDirectory(segmentName, indexLoadingConfig)) {
+          segmentDirectory.copyTo(indexDir);
+        }
       }
 
       // Load from index directory and replace the old segment in memory.
-      addSegment(ImmutableSegmentLoader.load(indexDir, indexLoadingConfig, schema));
-
-      // Rename segment backup directory to segment temporary directory (atomic)
-      // The reason to first rename then delete is that, renaming is an atomic operation, but deleting is not. When we
-      // rename the segment backup directory to segment temporary directory, we know the reload already succeeded, so
-      // that we can safely delete the segment temporary directory
-      File segmentTempDir = new File(parentFile, indexDir.getName() + CommonConstants.Segment.SEGMENT_TEMP_DIR_SUFFIX);
-      Preconditions.checkState(segmentBackupDir.renameTo(segmentTempDir),
-          "Failed to rename segment backup directory: %s to segment temporary directory: %s", segmentBackupDir,
-          segmentTempDir);
-      FileUtils.deleteDirectory(segmentTempDir);
+      ImmutableSegment segment = ImmutableSegmentLoader.load(indexDir, indexLoadingConfig, schema);

Review comment:
       `ImmutableSegmentLoader.load(indexDir)` would call the `SegmentDirectory.load(indexDir)` as part of its function calls. We have called `SegmentDirectory.load(indexDir)` once already during `initSegmentDirectory()`. Will this cause any issues?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] klsince commented on pull request #7969: extend SegmentDirectoryLoader interface and refactor BaseTableDataManager

Posted by GitBox <gi...@apache.org>.
klsince commented on pull request #7969:
URL: https://github.com/apache/pinot/pull/7969#issuecomment-1005038909


   Thanks for the comments @npawar @jackjlli.  
   
   cc @siddharthteotia @Jackie-Jiang to help take a look as well, as you helped review the [previous PR](https://github.com/apache/pinot/pull/7925), a prep for this one. Thank you!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] klsince commented on a change in pull request #7969: extend SegmentDirectoryLoader interface and refactor BaseTableDataManager

Posted by GitBox <gi...@apache.org>.
klsince commented on a change in pull request #7969:
URL: https://github.com/apache/pinot/pull/7969#discussion_r777757099



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/data/manager/BaseTableDataManager.java
##########
@@ -333,92 +315,131 @@ public void reloadSegment(String segmentName, IndexLoadingConfig indexLoadingCon
 
   @Override
   public void addOrReplaceSegment(String segmentName, IndexLoadingConfig indexLoadingConfig,
-      SegmentZKMetadata zkMetadata, @Nullable SegmentMetadata localMetadata)
+      SegmentZKMetadata zkMetadata, @Nullable SegmentMetadata segmentMetadata)
       throws Exception {
-    if (!isNewSegment(zkMetadata, localMetadata)) {
-      LOGGER.info("Segment: {} of table: {} has crc: {} same as before, already loaded, do nothing", segmentName,
-          _tableNameWithType, localMetadata.getCrc());
+    // Non-null segment metadata means the segment has already been loaded.
+    if (segmentMetadata != null) {
+      if (hasSameCRC(zkMetadata, segmentMetadata)) {
+        // Simply returns if the CRC hasn't changed. The table config may have changed
+        // since segment is loaded, but that is handled by reloadSegment() method.
+        LOGGER.info("Segment: {} of table: {} has crc: {} same as before, already loaded, do nothing", segmentName,
+            _tableNameWithType, segmentMetadata.getCrc());
+      } else {
+        // Download the raw segment, reprocess and load it if the CRC has changed.
+        LOGGER.info("Segment: {} of table: {} already loaded but its crc: {} differs from new crc: {}", segmentName,
+            _tableNameWithType, segmentMetadata.getCrc(), zkMetadata.getCrc());
+        downloadRawSegmentAndProcess(segmentName, indexLoadingConfig, zkMetadata,
+            ZKMetadataProvider.getTableSchema(_propertyStore, _tableNameWithType));
+      }
       return;
     }
 
-    // Try to recover if no local metadata is provided.
-    if (localMetadata == null) {
-      LOGGER.info("Segment: {} of table: {} is not loaded, checking disk", segmentName, _tableNameWithType);
-      localMetadata = recoverSegmentQuietly(segmentName);
-      if (!isNewSegment(zkMetadata, localMetadata)) {
-        LOGGER.info("Segment: {} of table {} has crc: {} same as before, loading", segmentName, _tableNameWithType,
-            localMetadata.getCrc());
-        if (loadSegmentQuietly(segmentName, indexLoadingConfig)) {
-          return;
-        }
-        // Set local metadata to null to indicate that the local segment fails to load,
-        // although it exists and has same crc with the remote one.
-        localMetadata = null;
-      }
+    // For local tier backend, try to recover the segment from potential
+    // reload failure. Continue upon any failure.
+    File indexDir = getSegmentDataDir(segmentName);
+    recoverReloadFailureQuietly(_tableNameWithType, segmentName, indexDir);
+
+    // Creates the SegmentDirectory object to access the segment metadata that
+    // may be from local tier backend or remote tier backend.
+    SegmentDirectoryLoaderContext segmentLoaderContext =
+        new SegmentDirectoryLoaderContext(indexLoadingConfig.getTableConfig(), indexLoadingConfig.getInstanceId(),
+            segmentName, indexLoadingConfig.getSegmentDirectoryConfigs());
+    SegmentDirectoryLoader segmentLoader =
+        SegmentDirectoryLoaderRegistry.getSegmentDirectoryLoader(indexLoadingConfig.getSegmentDirectoryLoader());
+    SegmentDirectory segmentDirectory = null;
+    try {
+      segmentDirectory = segmentLoader.load(indexDir.toURI(), segmentLoaderContext);
+    } catch (Exception e) {
+      LOGGER.warn("Failed to create SegmentDirectory for segment: {} of table: {} due to error: {}", segmentName,
+          _tableNameWithType, e.getMessage());
     }
 
-    Preconditions.checkState(allowDownload(segmentName, zkMetadata), "Segment: %s of table: %s does not allow download",
-        segmentName, _tableNameWithType);
-
-    // Download segment and replace the local one, either due to failure to recover local segment,
-    // or the segment data is updated and has new CRC now.
-    if (localMetadata == null) {
-      LOGGER.info("Download segment: {} of table: {} as no good one exists locally", segmentName, _tableNameWithType);
-    } else {
-      LOGGER.info("Download segment: {} of table: {} as local crc: {} mismatches remote crc: {}.", segmentName,
-          _tableNameWithType, localMetadata.getCrc(), zkMetadata.getCrc());
+    // Download the raw segment, reprocess and load it if it is never loaded or its CRC has changed.
+    if (segmentDirectory == null || !hasSameCRC(zkMetadata, segmentDirectory.getSegmentMetadata())) {
+      LOGGER.info("Segment: {} of table: {} not exist or its crc: {} differs from new crc: {}", segmentName,
+          _tableNameWithType, (segmentDirectory == null) ? "none" : segmentDirectory.getSegmentMetadata().getCrc(),
+          zkMetadata.getCrc());
+      closeSegmentDirectoryQuietly(segmentDirectory);
+      downloadRawSegmentAndProcess(segmentName, indexLoadingConfig, zkMetadata,
+          ZKMetadataProvider.getTableSchema(_propertyStore, _tableNameWithType));
+      return;
     }
-    File indexDir = downloadSegment(segmentName, zkMetadata);
-    addSegment(indexDir, indexLoadingConfig);
-    LOGGER.info("Downloaded and loaded segment: {} of table: {} with crc: {}", segmentName, _tableNameWithType,
-        zkMetadata.getCrc());
-  }
 
-  protected boolean allowDownload(String segmentName, SegmentZKMetadata zkMetadata) {
-    return true;
+    try {
+      Schema schema = ZKMetadataProvider.getTableSchema(_propertyStore, _tableNameWithType);
+      if (!ImmutableSegmentLoader.needPreprocess(segmentDirectory, indexLoadingConfig, schema)) {
+        // The loaded segment is still consistent with current table config or schema.
+        LOGGER.info("Segment: {} of table: {} is consistent with table config and schema", segmentName,
+            _tableNameWithType);
+        addSegment(ImmutableSegmentLoader.load(segmentDirectory, indexLoadingConfig, schema));

Review comment:
       I kinda prefer to this one-liner.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] codecov-commenter edited a comment on pull request #7969: extend SegmentDirectoryLoader interface and refactor BaseTableDataManager

Posted by GitBox <gi...@apache.org>.
codecov-commenter edited a comment on pull request #7969:
URL: https://github.com/apache/pinot/pull/7969#issuecomment-1002909897


   # [Codecov](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#7969](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (f76a05c) into [master](https://codecov.io/gh/apache/pinot/commit/a1f3c30e2ac74dd0acb67087a86d8f03b416f5b2?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (a1f3c30) will **decrease** coverage by `1.02%`.
   > The diff coverage is `80.44%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/pinot/pull/7969/graphs/tree.svg?width=650&height=150&src=pr&token=4ibza2ugkz&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   ```diff
   @@             Coverage Diff              @@
   ##             master    #7969      +/-   ##
   ============================================
   - Coverage     71.38%   70.36%   -1.03%     
   - Complexity     4200     4208       +8     
   ============================================
     Files          1595     1595              
     Lines         82595    82669      +74     
     Branches      12326    12330       +4     
   ============================================
   - Hits          58964    58166     -798     
   - Misses        19645    20514     +869     
   - Partials       3986     3989       +3     
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | integration1 | `28.98% <35.75%> (-0.10%)` | :arrow_down: |
   | integration2 | `?` | |
   | unittests1 | `68.28% <82.71%> (+0.04%)` | :arrow_up: |
   | unittests2 | `14.29% <0.00%> (-0.02%)` | :arrow_down: |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [...ata/manager/realtime/RealtimeTableDataManager.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9kYXRhL21hbmFnZXIvcmVhbHRpbWUvUmVhbHRpbWVUYWJsZURhdGFNYW5hZ2VyLmphdmE=) | `43.49% <ø> (-24.40%)` | :arrow_down: |
   | [...ent/index/column/PhysicalColumnIndexContainer.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2luZGV4L2NvbHVtbi9QaHlzaWNhbENvbHVtbkluZGV4Q29udGFpbmVyLmphdmE=) | `90.83% <0.00%> (-0.84%)` | :arrow_down: |
   | [...egment/spi/index/metadata/SegmentMetadataImpl.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1zcGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL3Bpbm90L3NlZ21lbnQvc3BpL2luZGV4L21ldGFkYXRhL1NlZ21lbnRNZXRhZGF0YUltcGwuamF2YQ==) | `74.28% <28.57%> (-0.29%)` | :arrow_down: |
   | [...server/starter/helix/HelixInstanceDataManager.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VydmVyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9zZXJ2ZXIvc3RhcnRlci9oZWxpeC9IZWxpeEluc3RhbmNlRGF0YU1hbmFnZXIuamF2YQ==) | `78.88% <47.05%> (-2.62%)` | :arrow_down: |
   | [...indexsegment/immutable/ImmutableSegmentLoader.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9pbmRleHNlZ21lbnQvaW1tdXRhYmxlL0ltbXV0YWJsZVNlZ21lbnRMb2FkZXIuamF2YQ==) | `87.50% <82.75%> (-4.17%)` | :arrow_down: |
   | [...ocal/segment/index/loader/SegmentPreProcessor.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2luZGV4L2xvYWRlci9TZWdtZW50UHJlUHJvY2Vzc29yLmphdmE=) | `80.00% <83.33%> (+0.21%)` | :arrow_up: |
   | [.../pinot/core/data/manager/BaseTableDataManager.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9kYXRhL21hbmFnZXIvQmFzZVRhYmxlRGF0YU1hbmFnZXIuamF2YQ==) | `87.66% <86.17%> (+2.76%)` | :arrow_up: |
   | [...nt/local/loader/DefaultSegmentDirectoryLoader.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9sb2FkZXIvRGVmYXVsdFNlZ21lbnREaXJlY3RvcnlMb2FkZXIuamF2YQ==) | `100.00% <100.00%> (ø)` | |
   | [...egment/local/segment/index/loader/LoaderUtils.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2luZGV4L2xvYWRlci9Mb2FkZXJVdGlscy5qYXZh) | `95.91% <100.00%> (+1.80%)` | :arrow_up: |
   | [.../columnminmaxvalue/ColumnMinMaxValueGenerator.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2luZGV4L2xvYWRlci9jb2x1bW5taW5tYXh2YWx1ZS9Db2x1bW5NaW5NYXhWYWx1ZUdlbmVyYXRvci5qYXZh) | `73.68% <100.00%> (ø)` | |
   | ... and [100 more](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [a1f3c30...f76a05c](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] klsince commented on a change in pull request #7969: extend SegmentDirectoryLoader interface and refactor BaseTableDataManager

Posted by GitBox <gi...@apache.org>.
klsince commented on a change in pull request #7969:
URL: https://github.com/apache/pinot/pull/7969#discussion_r778361089



##########
File path: pinot-server/src/main/java/org/apache/pinot/server/starter/helix/HelixInstanceDataManager.java
##########
@@ -307,6 +306,32 @@ private void reloadSegment(String tableNameWithType, SegmentMetadata segmentMeta
     }
   }
 
+  /**
+   * Try to reload a segment without a local index directory. The segment can be a consuming
+   * segment from a REALTIME table, or an immutable segment on remote tier backend.
+   * @return true if the segment is loaded.
+   */
+  @VisibleForTesting
+  boolean reloadSegmentWithNullIndexDir(String tableNameWithType, String segmentName,
+      SegmentDataManager segmentDataManager, @Nullable Schema schema) {
+    IndexSegment segment = segmentDataManager.getSegment();
+    if (segment instanceof ImmutableSegment) {
+      LOGGER.info("Found an immutable segment: {} in table: {} on remote tier backend", segmentName,

Review comment:
       👌 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] klsince commented on a change in pull request #7969: refactor segment loading logic in BaseTableDataManager to decouple it with local segment directory

Posted by GitBox <gi...@apache.org>.
klsince commented on a change in pull request #7969:
URL: https://github.com/apache/pinot/pull/7969#discussion_r781623210



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/data/manager/BaseTableDataManager.java
##########
@@ -277,49 +282,36 @@ public void addSegmentError(String segmentName, SegmentErrorInfo segmentErrorInf
   public void reloadSegment(String segmentName, IndexLoadingConfig indexLoadingConfig, SegmentZKMetadata zkMetadata,
       SegmentMetadata localMetadata, @Nullable Schema schema, boolean forceDownload)
       throws Exception {
-    File indexDir = localMetadata.getIndexDir();
-    Preconditions.checkState(indexDir.isDirectory(), "Index directory: %s is not a directory", indexDir);
-
-    File parentFile = indexDir.getParentFile();
-    File segmentBackupDir =
-        new File(parentFile, indexDir.getName() + CommonConstants.Segment.SEGMENT_BACKUP_DIR_SUFFIX);
-
+    File indexDir = getSegmentDataDir(segmentName);
     try {
-      // First rename index directory to segment backup directory so that original segment have all file descriptors
-      // point to the segment backup directory to ensure original segment serves queries properly
+      // Create backup directory to handle failure of segment reloading.

Review comment:
       The detailed comments are moved to method createBackup to explain its implementation.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] mcvsubbu commented on pull request #7969: refactor segment loading logic in BaseTableDataManager to decouple it with local segment directory

Posted by GitBox <gi...@apache.org>.
mcvsubbu commented on pull request #7969:
URL: https://github.com/apache/pinot/pull/7969#issuecomment-1012532159


   This the best I could do as far as suggestions. Please feel free to merge after addressing my minor comments in the latest review. The comment about having a factory is for future, esp. if you are planning more changes in this area due to a non-local segment directory. If we discover issues in production, we may need to undo or re-work some of the changes. Like I mentioned in a previous comment as well, a lot of these are best hidden behind your implementation of SegmentDirectory.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] snleee commented on a change in pull request #7969: refactor segment loading logic in BaseTableDataManager to decouple it with local segment directory

Posted by GitBox <gi...@apache.org>.
snleee commented on a change in pull request #7969:
URL: https://github.com/apache/pinot/pull/7969#discussion_r782589318



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/data/manager/BaseTableDataManager.java
##########
@@ -285,49 +290,36 @@ public void addSegmentError(String segmentName, SegmentErrorInfo segmentErrorInf
   public void reloadSegment(String segmentName, IndexLoadingConfig indexLoadingConfig, SegmentZKMetadata zkMetadata,
       SegmentMetadata localMetadata, @Nullable Schema schema, boolean forceDownload)
       throws Exception {
-    File indexDir = localMetadata.getIndexDir();
-    Preconditions.checkState(indexDir.isDirectory(), "Index directory: %s is not a directory", indexDir);
-
-    File parentFile = indexDir.getParentFile();
-    File segmentBackupDir =
-        new File(parentFile, indexDir.getName() + CommonConstants.Segment.SEGMENT_BACKUP_DIR_SUFFIX);
-
+    File indexDir = getSegmentDataDir(segmentName);
     try {
-      // First rename index directory to segment backup directory so that original segment have all file descriptors
-      // point to the segment backup directory to ensure original segment serves queries properly
+      // Create backup directory to handle failure of segment reloading.
+      createBackup(indexDir);
 
-      // Rename index directory to segment backup directory (atomic)
-      Preconditions.checkState(indexDir.renameTo(segmentBackupDir),
-          "Failed to rename index directory: %s to segment backup directory: %s", indexDir, segmentBackupDir);
-
-      // Download from remote or copy from local backup directory into index directory,
-      // and then continue to load the segment from index directory.
+      // Download segment from deep store if CRC changes or forced to download;
+      // otherwise, copy backup directory back to the original index directory.
+      // And then continue to load the segment from the index directory.
       boolean shouldDownload = forceDownload || !hasSameCRC(zkMetadata, localMetadata);
       if (shouldDownload && allowDownload(segmentName, zkMetadata)) {
         if (forceDownload) {
           LOGGER.info("Segment: {} of table: {} is forced to download", segmentName, _tableNameWithType);
         } else {
-          LOGGER.info("Download segment:{} of table: {} as local crc: {} mismatches remote crc: {}", segmentName,
+          LOGGER.info("Download segment:{} of table: {} as crc changes from: {} to: {}", segmentName,
               _tableNameWithType, localMetadata.getCrc(), zkMetadata.getCrc());
         }
         indexDir = downloadSegment(segmentName, zkMetadata);
       } else {
-        LOGGER.info("Reload the local copy of segment: {} of table: {}", segmentName, _tableNameWithType);
-        FileUtils.copyDirectory(segmentBackupDir, indexDir);
+        LOGGER.info("Reload existing segment: {} of table: {}", segmentName, _tableNameWithType);
+        try (SegmentDirectory segmentDirectory = initSegmentDirectory(segmentName, indexLoadingConfig)) {
+          segmentDirectory.copyTo(indexDir);
+        }
       }
 
       // Load from index directory and replace the old segment in memory.
-      addSegment(ImmutableSegmentLoader.load(indexDir, indexLoadingConfig, schema));
-
-      // Rename segment backup directory to segment temporary directory (atomic)
-      // The reason to first rename then delete is that, renaming is an atomic operation, but deleting is not. When we
-      // rename the segment backup directory to segment temporary directory, we know the reload already succeeded, so
-      // that we can safely delete the segment temporary directory
-      File segmentTempDir = new File(parentFile, indexDir.getName() + CommonConstants.Segment.SEGMENT_TEMP_DIR_SUFFIX);
-      Preconditions.checkState(segmentBackupDir.renameTo(segmentTempDir),
-          "Failed to rename segment backup directory: %s to segment temporary directory: %s", segmentBackupDir,
-          segmentTempDir);
-      FileUtils.deleteDirectory(segmentTempDir);
+      ImmutableSegment segment = ImmutableSegmentLoader.load(indexDir, indexLoadingConfig, schema);

Review comment:
       If I understand it correctly, you're saying that we will never have the case where we call `SegmentDirectory.load()` twice with the valid (or non-emtpy) `indexDir`because we rename in `createBackup()`. I get the point now. I got confused because we should not call `xx.load()` twice in general and found that `segmentDirector.load()` got invoked to the same path. Maybe we can add the comment on this briefly.
   
   i.e. 
   
   on L312, we can mention that ` indexDir is empty, so essentially it's an empty SegmentLocalFSDirectory object to be created `.
   
   on L318, we can say that `indexDir is non empty now`.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] klsince commented on pull request #7969: refactor segment loading logic in BaseTableDataManager to decouple it with local segment directory

Posted by GitBox <gi...@apache.org>.
klsince commented on pull request #7969:
URL: https://github.com/apache/pinot/pull/7969#issuecomment-1012633860


   hi @mcvsubbu thanks a lot for the feedbacks! I've replied your latest comments. Please take a look, and if they look good, could you help lift the 'requested changes' and I'll merge it for now. Thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] snleee commented on a change in pull request #7969: refactor segment loading logic in BaseTableDataManager to decouple it with local segment directory

Posted by GitBox <gi...@apache.org>.
snleee commented on a change in pull request #7969:
URL: https://github.com/apache/pinot/pull/7969#discussion_r781531660



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/data/manager/BaseTableDataManager.java
##########
@@ -277,49 +282,36 @@ public void addSegmentError(String segmentName, SegmentErrorInfo segmentErrorInf
   public void reloadSegment(String segmentName, IndexLoadingConfig indexLoadingConfig, SegmentZKMetadata zkMetadata,
       SegmentMetadata localMetadata, @Nullable Schema schema, boolean forceDownload)
       throws Exception {
-    File indexDir = localMetadata.getIndexDir();
-    Preconditions.checkState(indexDir.isDirectory(), "Index directory: %s is not a directory", indexDir);
-
-    File parentFile = indexDir.getParentFile();
-    File segmentBackupDir =
-        new File(parentFile, indexDir.getName() + CommonConstants.Segment.SEGMENT_BACKUP_DIR_SUFFIX);
-
+    File indexDir = getSegmentDataDir(segmentName);
     try {
-      // First rename index directory to segment backup directory so that original segment have all file descriptors
-      // point to the segment backup directory to ensure original segment serves queries properly
+      // Create backup directory to handle failure of segment reloading.

Review comment:
       Add more information to the comment that we are actually `renaming` the `indexDir` to `indexBackupDir`.

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/data/manager/BaseTableDataManager.java
##########
@@ -277,49 +282,36 @@ public void addSegmentError(String segmentName, SegmentErrorInfo segmentErrorInf
   public void reloadSegment(String segmentName, IndexLoadingConfig indexLoadingConfig, SegmentZKMetadata zkMetadata,
       SegmentMetadata localMetadata, @Nullable Schema schema, boolean forceDownload)
       throws Exception {
-    File indexDir = localMetadata.getIndexDir();
-    Preconditions.checkState(indexDir.isDirectory(), "Index directory: %s is not a directory", indexDir);
-
-    File parentFile = indexDir.getParentFile();
-    File segmentBackupDir =
-        new File(parentFile, indexDir.getName() + CommonConstants.Segment.SEGMENT_BACKUP_DIR_SUFFIX);
-
+    File indexDir = getSegmentDataDir(segmentName);
     try {
-      // First rename index directory to segment backup directory so that original segment have all file descriptors
-      // point to the segment backup directory to ensure original segment serves queries properly
+      // Create backup directory to handle failure of segment reloading.
+      createBackup(indexDir);
 
-      // Rename index directory to segment backup directory (atomic)
-      Preconditions.checkState(indexDir.renameTo(segmentBackupDir),
-          "Failed to rename index directory: %s to segment backup directory: %s", indexDir, segmentBackupDir);
-
-      // Download from remote or copy from local backup directory into index directory,
-      // and then continue to load the segment from index directory.
+      // Download segment from deep store if CRC changes or forced to download;
+      // otherwise, copy backup directory back to the original index directory.
+      // And then continue to load the segment from the index directory.
       boolean shouldDownload = forceDownload || !hasSameCRC(zkMetadata, localMetadata);
       if (shouldDownload && allowDownload(segmentName, zkMetadata)) {
         if (forceDownload) {
           LOGGER.info("Segment: {} of table: {} is forced to download", segmentName, _tableNameWithType);
         } else {
-          LOGGER.info("Download segment:{} of table: {} as local crc: {} mismatches remote crc: {}", segmentName,
+          LOGGER.info("Download segment:{} of table: {} as crc changes from: {} to: {}", segmentName,
               _tableNameWithType, localMetadata.getCrc(), zkMetadata.getCrc());
         }
         indexDir = downloadSegment(segmentName, zkMetadata);
       } else {
-        LOGGER.info("Reload the local copy of segment: {} of table: {}", segmentName, _tableNameWithType);
-        FileUtils.copyDirectory(segmentBackupDir, indexDir);
+        LOGGER.info("Reload existing segment: {} of table: {}", segmentName, _tableNameWithType);
+        try (SegmentDirectory segmentDirectory = initSegmentDirectory(segmentName, indexLoadingConfig)) {

Review comment:
       Let's say 
   
   indexDir = original segments dir (/segments/table/segment_name)
   backupDir = backup dir (/segmetns/table/segment_name.segment.bak)
   
   The previous logic was:
   ```
   1. copy backup dir -> index dir
   2. load index dir
   ```
   
   The current logic looks that:
   
   ```
   1. copy indexDir -> indexDir (but indexDir is empty so fall back to backup in `copyTo()` and this makes `backupDir -> indexDir`)
   2. load index dir
   ```
   
   I get the point the the logic is the same; however, I would prefer to see that you use `indexDir` and `indexBackupDir` variables directly in `reload()` function to have much better readability. I had to read all the sub-function details to understand the existing logic. How do you think?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] codecov-commenter edited a comment on pull request #7969: extend SegmentDirectoryLoader interface and refactor BaseTableDataManager

Posted by GitBox <gi...@apache.org>.
codecov-commenter edited a comment on pull request #7969:
URL: https://github.com/apache/pinot/pull/7969#issuecomment-1002909897


   # [Codecov](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#7969](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (5ae0dff) into [master](https://codecov.io/gh/apache/pinot/commit/d25a48879ea8b7f7b7595249d412705b670204e5?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (d25a488) will **decrease** coverage by `1.25%`.
   > The diff coverage is `83.43%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/pinot/pull/7969/graphs/tree.svg?width=650&height=150&src=pr&token=4ibza2ugkz&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   ```diff
   @@             Coverage Diff              @@
   ##             master    #7969      +/-   ##
   ============================================
   - Coverage     71.37%   70.11%   -1.26%     
   - Complexity     4194     4201       +7     
   ============================================
     Files          1594     1595       +1     
     Lines         82562    82647      +85     
     Branches      12321    12325       +4     
   ============================================
   - Hits          58927    57947     -980     
   - Misses        19652    20728    +1076     
   + Partials       3983     3972      -11     
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | integration1 | `?` | |
   | integration2 | `27.57% <36.09%> (-0.16%)` | :arrow_down: |
   | unittests1 | `68.27% <83.97%> (+0.02%)` | :arrow_up: |
   | unittests2 | `14.28% <0.00%> (+<0.01%)` | :arrow_up: |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [...ata/manager/realtime/RealtimeTableDataManager.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9kYXRhL21hbmFnZXIvcmVhbHRpbWUvUmVhbHRpbWVUYWJsZURhdGFNYW5hZ2VyLmphdmE=) | `66.66% <ø> (-1.22%)` | :arrow_down: |
   | [...ent/index/column/PhysicalColumnIndexContainer.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2luZGV4L2NvbHVtbi9QaHlzaWNhbENvbHVtbkluZGV4Q29udGFpbmVyLmphdmE=) | `90.83% <0.00%> (-0.84%)` | :arrow_down: |
   | [...egment/spi/index/metadata/SegmentMetadataImpl.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1zcGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL3Bpbm90L3NlZ21lbnQvc3BpL2luZGV4L21ldGFkYXRhL1NlZ21lbnRNZXRhZGF0YUltcGwuamF2YQ==) | `74.28% <28.57%> (-0.29%)` | :arrow_down: |
   | [...server/starter/helix/HelixInstanceDataManager.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VydmVyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9zZXJ2ZXIvc3RhcnRlci9oZWxpeC9IZWxpeEluc3RhbmNlRGF0YU1hbmFnZXIuamF2YQ==) | `78.33% <69.23%> (-2.15%)` | :arrow_down: |
   | [...indexsegment/immutable/ImmutableSegmentLoader.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9pbmRleHNlZ21lbnQvaW1tdXRhYmxlL0ltbXV0YWJsZVNlZ21lbnRMb2FkZXIuamF2YQ==) | `87.50% <82.75%> (-4.17%)` | :arrow_down: |
   | [...ocal/segment/index/loader/SegmentPreProcessor.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2luZGV4L2xvYWRlci9TZWdtZW50UHJlUHJvY2Vzc29yLmphdmE=) | `80.00% <83.33%> (+0.21%)` | :arrow_up: |
   | [.../pinot/core/data/manager/BaseTableDataManager.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9kYXRhL21hbmFnZXIvQmFzZVRhYmxlRGF0YU1hbmFnZXIuamF2YQ==) | `88.23% <87.50%> (+2.29%)` | :arrow_up: |
   | [...nt/local/loader/DefaultSegmentDirectoryLoader.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9sb2FkZXIvRGVmYXVsdFNlZ21lbnREaXJlY3RvcnlMb2FkZXIuamF2YQ==) | `100.00% <100.00%> (ø)` | |
   | [...egment/local/segment/index/loader/LoaderUtils.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2luZGV4L2xvYWRlci9Mb2FkZXJVdGlscy5qYXZh) | `95.91% <100.00%> (+1.80%)` | :arrow_up: |
   | [.../columnminmaxvalue/ColumnMinMaxValueGenerator.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2luZGV4L2xvYWRlci9jb2x1bW5taW5tYXh2YWx1ZS9Db2x1bW5NaW5NYXhWYWx1ZUdlbmVyYXRvci5qYXZh) | `73.68% <100.00%> (ø)` | |
   | ... and [116 more](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [d25a488...5ae0dff](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] klsince commented on a change in pull request #7969: extend SegmentDirectoryLoader interface and refactor BaseTableDataManager

Posted by GitBox <gi...@apache.org>.
klsince commented on a change in pull request #7969:
URL: https://github.com/apache/pinot/pull/7969#discussion_r779041798



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/data/manager/BaseTableDataManager.java
##########
@@ -333,94 +306,162 @@ public void reloadSegment(String segmentName, IndexLoadingConfig indexLoadingCon
 
   @Override
   public void addOrReplaceSegment(String segmentName, IndexLoadingConfig indexLoadingConfig,
-      SegmentZKMetadata zkMetadata, @Nullable SegmentMetadata localMetadata)
+      SegmentZKMetadata zkMetadata, @Nullable SegmentMetadata segmentMetadata)
       throws Exception {
-    if (!isNewSegment(zkMetadata, localMetadata)) {
-      LOGGER.info("Segment: {} of table: {} has crc: {} same as before, already loaded, do nothing", segmentName,
-          _tableNameWithType, localMetadata.getCrc());
+    // Non-null segment metadata means the segment has already been loaded.
+    if (segmentMetadata != null) {
+      if (hasSameCRC(zkMetadata, segmentMetadata)) {
+        // Simply returns if the CRC hasn't changed. The table config may have changed
+        // since segment is loaded, but that is handled by reloadSegment() method.
+        LOGGER.info("Segment: {} of table: {} has crc: {} same as before, already loaded, do nothing", segmentName,
+            _tableNameWithType, segmentMetadata.getCrc());
+      } else {
+        // Download the raw segment, reprocess and load it if the CRC has changed.
+        LOGGER.info("Segment: {} of table: {} already loaded but its crc: {} differs from new crc: {}", segmentName,
+            _tableNameWithType, segmentMetadata.getCrc(), zkMetadata.getCrc());
+        downloadRawSegmentAndProcess(segmentName, indexLoadingConfig, zkMetadata,
+            ZKMetadataProvider.getTableSchema(_propertyStore, _tableNameWithType));
+      }
       return;
     }
 
-    // Try to recover if no local metadata is provided.
-    if (localMetadata == null) {
-      LOGGER.info("Segment: {} of table: {} is not loaded, checking disk", segmentName, _tableNameWithType);
-      localMetadata = recoverSegmentQuietly(segmentName);
-      if (!isNewSegment(zkMetadata, localMetadata)) {
-        LOGGER.info("Segment: {} of table {} has crc: {} same as before, loading", segmentName, _tableNameWithType,
-            localMetadata.getCrc());
-        if (loadSegmentQuietly(segmentName, indexLoadingConfig)) {
-          return;
-        }
-        // Set local metadata to null to indicate that the local segment fails to load,
-        // although it exists and has same crc with the remote one.
-        localMetadata = null;
-      }
-    }
+    // For local tier backend, try to recover the segment from potential
+    // reload failure. Continue upon any failure.
+    File indexDir = getSegmentDataDir(segmentName);
+    recoverReloadFailureQuietly(_tableNameWithType, segmentName, indexDir);
 
-    Preconditions.checkState(allowDownload(segmentName, zkMetadata), "Segment: %s of table: %s does not allow download",
-        segmentName, _tableNameWithType);
+    // Creates the SegmentDirectory object to access the segment metadata that
+    // may be from local tier backend or remote tier backend.
+    SegmentDirectory segmentDirectory = null;
+    try {
+      SegmentDirectoryLoaderContext loaderContext =
+          new SegmentDirectoryLoaderContext(indexLoadingConfig.getTableConfig(), indexLoadingConfig.getInstanceId(),
+              segmentName, indexLoadingConfig.getSegmentDirectoryConfigs());
+      SegmentDirectoryLoader segmentDirectoryLoader =
+          SegmentDirectoryLoaderRegistry.getSegmentDirectoryLoader(indexLoadingConfig.getSegmentDirectoryLoader());
+      segmentDirectory = segmentDirectoryLoader.load(indexDir.toURI(), loaderContext);
+    } catch (Exception e) {
+      LOGGER.warn("Failed to create SegmentDirectory for segment: {} of table: {} due to error: {}", segmentName,
+          _tableNameWithType, e.getMessage());
+    }
 
-    // Download segment and replace the local one, either due to failure to recover local segment,
-    // or the segment data is updated and has new CRC now.
-    if (localMetadata == null) {
-      LOGGER.info("Download segment: {} of table: {} as no good one exists locally", segmentName, _tableNameWithType);
-    } else {
-      LOGGER.info("Download segment: {} of table: {} as local crc: {} mismatches remote crc: {}.", segmentName,
-          _tableNameWithType, localMetadata.getCrc(), zkMetadata.getCrc());
+    // Download the raw segment, reprocess and load it if it is never loaded or its CRC has changed.
+    if (segmentDirectory == null) {
+      LOGGER.info("Segment: {} of table: {} does not exist", segmentName, _tableNameWithType);
+      downloadRawSegmentAndProcess(segmentName, indexLoadingConfig, zkMetadata,
+          ZKMetadataProvider.getTableSchema(_propertyStore, _tableNameWithType));
+      return;
+    }
+    if (!hasSameCRC(zkMetadata, segmentDirectory.getSegmentMetadata())) {
+      LOGGER.info("Segment: {} of table: {} has crc: {} different from new crc: {}", segmentName,
+          _tableNameWithType, segmentDirectory.getSegmentMetadata().getCrc(), zkMetadata.getCrc());
+      closeSegmentDirectoryQuietly(segmentDirectory);
+      downloadRawSegmentAndProcess(segmentName, indexLoadingConfig, zkMetadata,
+          ZKMetadataProvider.getTableSchema(_propertyStore, _tableNameWithType));
+      return;
     }
-    File indexDir = downloadSegment(segmentName, zkMetadata);
-    addSegment(indexDir, indexLoadingConfig);
-    LOGGER.info("Downloaded and loaded segment: {} of table: {} with crc: {}", segmentName, _tableNameWithType,
-        zkMetadata.getCrc());
-  }
 
-  protected boolean allowDownload(String segmentName, SegmentZKMetadata zkMetadata) {
-    return true;
+    try {
+      Schema schema = ZKMetadataProvider.getTableSchema(_propertyStore, _tableNameWithType);
+      if (!ImmutableSegmentLoader.needPreprocess(segmentDirectory, indexLoadingConfig, schema)) {
+        // The loaded segment is still consistent with current table config or schema.
+        LOGGER.info("Segment: {} of table: {} is consistent with table config and schema", segmentName,
+            _tableNameWithType);
+        loadSegment(segmentDirectory, indexLoadingConfig, schema);
+        return;
+      }
+      // If any discrepancy is found, get the segment from tier backend, reprocess and load it.
+      LOGGER.info("Segment: {} of table: {} needs reprocess to reflect latest table config and schema", segmentName,
+          _tableNameWithType);
+      // Close the stale SegmentDirectory object before loading the newly processed segment.
+      closeSegmentDirectoryQuietly(segmentDirectory);
+      downloadTierSegmentAndProcess(segmentName, indexLoadingConfig, schema);
+    } catch (Exception e) {
+      closeSegmentDirectoryQuietly(segmentDirectory);
+      LOGGER.error("Failed to reprocess and load segment: {} of table: {}", segmentName, _tableNameWithType, e);
+      downloadRawSegmentAndProcess(segmentName, indexLoadingConfig, zkMetadata,
+          ZKMetadataProvider.getTableSchema(_propertyStore, _tableNameWithType));
+    }
   }
 
-  protected File downloadSegment(String segmentName, SegmentZKMetadata zkMetadata)
+  /**
+   * Get the segment from the configured tier backend, reprocess it with the latest table
+   * config and schema and then load it. Please note that the segment is from tier backend,
+   * not deep store, for incremental processing.
+   */
+  private void downloadTierSegmentAndProcess(String segmentName, IndexLoadingConfig indexLoadingConfig,
+      @Nullable Schema schema)
       throws Exception {
-    // TODO: may support download from peer servers for RealTime table.
-    return downloadSegmentFromDeepStore(segmentName, zkMetadata);
+    SegmentDirectoryLoaderContext loaderContext =
+        new SegmentDirectoryLoaderContext(indexLoadingConfig.getTableConfig(), indexLoadingConfig.getInstanceId(),
+            segmentName, indexLoadingConfig.getSegmentDirectoryConfigs());
+    SegmentDirectoryLoader segmentDirectoryLoader =
+        SegmentDirectoryLoaderRegistry.getSegmentDirectoryLoader(indexLoadingConfig.getSegmentDirectoryLoader());
+    File indexDir = getSegmentDataDir(segmentName);
+    segmentDirectoryLoader.download(indexDir, loaderContext);

Review comment:
       The download and upload methods are introduced to SegmentDirectoryLoader interface is because today the SegmentPreprocessor and pretty much all kinds of IndexCreators used by it requires the `File indexDir` as an input param. If they were able to work with SegmentDirectory interface instead of `File indexDir`, the download/upload methods wouldn't be needed in SegmentDirectoryLoader interface. 
   
   Hope this adds more clarity on why downloading raw segment from deep store is not part of SegmentDirectoryLoader interface. As there can be a path forward to remove download/upload methods from SegmentDirectoryLoader, if segment preprocessing logic could work with SegmentDirectory interface instead of requiring File index explicitly. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] codecov-commenter edited a comment on pull request #7969: refactor segment loading logic in BaseTableDataManager to decouple it with local segment directory

Posted by GitBox <gi...@apache.org>.
codecov-commenter edited a comment on pull request #7969:
URL: https://github.com/apache/pinot/pull/7969#issuecomment-1002909897


   # [Codecov](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#7969](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (cf77dab) into [master](https://codecov.io/gh/apache/pinot/commit/a1f3c30e2ac74dd0acb67087a86d8f03b416f5b2?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (a1f3c30) will **decrease** coverage by `0.05%`.
   > The diff coverage is `77.30%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/pinot/pull/7969/graphs/tree.svg?width=650&height=150&src=pr&token=4ibza2ugkz&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   ```diff
   @@             Coverage Diff              @@
   ##             master    #7969      +/-   ##
   ============================================
   - Coverage     71.38%   71.33%   -0.06%     
   - Complexity     4200     4223      +23     
   ============================================
     Files          1595     1595              
     Lines         82595    82792     +197     
     Branches      12326    12352      +26     
   ============================================
   + Hits          58964    59061      +97     
   - Misses        19645    19737      +92     
   - Partials       3986     3994       +8     
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | integration1 | `29.02% <26.99%> (-0.06%)` | :arrow_down: |
   | integration2 | `27.60% <26.38%> (-0.16%)` | :arrow_down: |
   | unittests1 | `68.17% <80.82%> (-0.06%)` | :arrow_down: |
   | unittests2 | `14.26% <0.00%> (-0.04%)` | :arrow_down: |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [...ent/index/column/PhysicalColumnIndexContainer.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2luZGV4L2NvbHVtbi9QaHlzaWNhbENvbHVtbkluZGV4Q29udGFpbmVyLmphdmE=) | `90.83% <0.00%> (-0.84%)` | :arrow_down: |
   | [...ache/pinot/segment/spi/store/SegmentDirectory.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1zcGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL3Bpbm90L3NlZ21lbnQvc3BpL3N0b3JlL1NlZ21lbnREaXJlY3RvcnkuamF2YQ==) | `55.55% <0.00%> (-6.95%)` | :arrow_down: |
   | [...egment/spi/index/metadata/SegmentMetadataImpl.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1zcGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL3Bpbm90L3NlZ21lbnQvc3BpL2luZGV4L21ldGFkYXRhL1NlZ21lbnRNZXRhZGF0YUltcGwuamF2YQ==) | `74.28% <28.57%> (-0.29%)` | :arrow_down: |
   | [...server/starter/helix/HelixInstanceDataManager.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VydmVyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9zZXJ2ZXIvc3RhcnRlci9oZWxpeC9IZWxpeEluc3RhbmNlRGF0YU1hbmFnZXIuamF2YQ==) | `79.44% <47.05%> (-2.06%)` | :arrow_down: |
   | [...indexsegment/immutable/ImmutableSegmentLoader.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9pbmRleHNlZ21lbnQvaW1tdXRhYmxlL0ltbXV0YWJsZVNlZ21lbnRMb2FkZXIuamF2YQ==) | `84.15% <73.52%> (-7.51%)` | :arrow_down: |
   | [...ocal/segment/index/loader/SegmentPreProcessor.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2luZGV4L2xvYWRlci9TZWdtZW50UHJlUHJvY2Vzc29yLmphdmE=) | `80.00% <83.33%> (+0.21%)` | :arrow_up: |
   | [.../pinot/core/data/manager/BaseTableDataManager.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9kYXRhL21hbmFnZXIvQmFzZVRhYmxlRGF0YU1hbmFnZXIuamF2YQ==) | `89.86% <86.11%> (+4.96%)` | :arrow_up: |
   | [...nt/local/loader/DefaultSegmentDirectoryLoader.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9sb2FkZXIvRGVmYXVsdFNlZ21lbnREaXJlY3RvcnlMb2FkZXIuamF2YQ==) | `100.00% <100.00%> (ø)` | |
   | [.../columnminmaxvalue/ColumnMinMaxValueGenerator.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2luZGV4L2xvYWRlci9jb2x1bW5taW5tYXh2YWx1ZS9Db2x1bW5NaW5NYXhWYWx1ZUdlbmVyYXRvci5qYXZh) | `73.68% <100.00%> (ø)` | |
   | [...loader/defaultcolumn/BaseDefaultColumnHandler.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2luZGV4L2xvYWRlci9kZWZhdWx0Y29sdW1uL0Jhc2VEZWZhdWx0Q29sdW1uSGFuZGxlci5qYXZh) | `63.26% <100.00%> (ø)` | |
   | ... and [36 more](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [a1f3c30...cf77dab](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] mcvsubbu commented on a change in pull request #7969: refactor segment loading logic in BaseTableDataManager to decouple it with local segment directory

Posted by GitBox <gi...@apache.org>.
mcvsubbu commented on a change in pull request #7969:
URL: https://github.com/apache/pinot/pull/7969#discussion_r780490610



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/data/manager/BaseTableDataManager.java
##########
@@ -277,49 +282,36 @@ public void addSegmentError(String segmentName, SegmentErrorInfo segmentErrorInf
   public void reloadSegment(String segmentName, IndexLoadingConfig indexLoadingConfig, SegmentZKMetadata zkMetadata,
       SegmentMetadata localMetadata, @Nullable Schema schema, boolean forceDownload)
       throws Exception {
-    File indexDir = localMetadata.getIndexDir();
-    Preconditions.checkState(indexDir.isDirectory(), "Index directory: %s is not a directory", indexDir);
-
-    File parentFile = indexDir.getParentFile();
-    File segmentBackupDir =
-        new File(parentFile, indexDir.getName() + CommonConstants.Segment.SEGMENT_BACKUP_DIR_SUFFIX);
-
+    File indexDir = getSegmentDataDir(segmentName);
     try {
-      // First rename index directory to segment backup directory so that original segment have all file descriptors
-      // point to the segment backup directory to ensure original segment serves queries properly
+      // Create backup directory to handle failure of segment reloading.
+      createBackup(indexDir);
 
-      // Rename index directory to segment backup directory (atomic)
-      Preconditions.checkState(indexDir.renameTo(segmentBackupDir),
-          "Failed to rename index directory: %s to segment backup directory: %s", indexDir, segmentBackupDir);
-
-      // Download from remote or copy from local backup directory into index directory,
-      // and then continue to load the segment from index directory.
+      // Download segment from deep store if CRC changes or forced to download;
+      // otherwise, copy backup directory back to the original index directory.
+      // And then continue to load the segment from the index directory.
       boolean shouldDownload = forceDownload || !hasSameCRC(zkMetadata, localMetadata);
       if (shouldDownload && allowDownload(segmentName, zkMetadata)) {
         if (forceDownload) {
           LOGGER.info("Segment: {} of table: {} is forced to download", segmentName, _tableNameWithType);
         } else {
-          LOGGER.info("Download segment:{} of table: {} as local crc: {} mismatches remote crc: {}", segmentName,
+          LOGGER.info("Download segment:{} of table: {} as crc changes from: {} to: {}", segmentName,
               _tableNameWithType, localMetadata.getCrc(), zkMetadata.getCrc());
         }
         indexDir = downloadSegment(segmentName, zkMetadata);
       } else {
-        LOGGER.info("Reload the local copy of segment: {} of table: {}", segmentName, _tableNameWithType);
-        FileUtils.copyDirectory(segmentBackupDir, indexDir);
+        LOGGER.info("Reload existing segment: {} of table: {}", segmentName, _tableNameWithType);
+        try (SegmentDirectory segmentDirectory = getSegmentDirectory(segmentName, indexLoadingConfig)) {

Review comment:
       Why can't we use the same method as that in line 285 (`getSegmentDataDir(segmentName)`) ?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] codecov-commenter edited a comment on pull request #7969: refactor segment loading logic in BaseTableDataManager to decouple it with local segment directory

Posted by GitBox <gi...@apache.org>.
codecov-commenter edited a comment on pull request #7969:
URL: https://github.com/apache/pinot/pull/7969#issuecomment-1002909897






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] codecov-commenter edited a comment on pull request #7969: refactor segment loading logic in BaseTableDataManager to decouple it with local segment directory

Posted by GitBox <gi...@apache.org>.
codecov-commenter edited a comment on pull request #7969:
URL: https://github.com/apache/pinot/pull/7969#issuecomment-1002909897


   # [Codecov](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#7969](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (cf77dab) into [master](https://codecov.io/gh/apache/pinot/commit/a1f3c30e2ac74dd0acb67087a86d8f03b416f5b2?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (a1f3c30) will **decrease** coverage by `57.12%`.
   > The diff coverage is `0.00%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/pinot/pull/7969/graphs/tree.svg?width=650&height=150&src=pr&token=4ibza2ugkz&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   ```diff
   @@              Coverage Diff              @@
   ##             master    #7969       +/-   ##
   =============================================
   - Coverage     71.38%   14.26%   -57.13%     
   + Complexity     4200       81     -4119     
   =============================================
     Files          1595     1550       -45     
     Lines         82595    80926     -1669     
     Branches      12326    12150      -176     
   =============================================
   - Hits          58964    11546    -47418     
   - Misses        19645    68522    +48877     
   + Partials       3986      858     -3128     
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | integration1 | `?` | |
   | integration2 | `?` | |
   | unittests1 | `?` | |
   | unittests2 | `14.26% <0.00%> (-0.04%)` | :arrow_down: |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [.../pinot/core/data/manager/BaseTableDataManager.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9kYXRhL21hbmFnZXIvQmFzZVRhYmxlRGF0YU1hbmFnZXIuamF2YQ==) | `0.00% <0.00%> (-84.90%)` | :arrow_down: |
   | [...indexsegment/immutable/ImmutableSegmentLoader.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9pbmRleHNlZ21lbnQvaW1tdXRhYmxlL0ltbXV0YWJsZVNlZ21lbnRMb2FkZXIuamF2YQ==) | `0.00% <0.00%> (-91.67%)` | :arrow_down: |
   | [...nt/local/loader/DefaultSegmentDirectoryLoader.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9sb2FkZXIvRGVmYXVsdFNlZ21lbnREaXJlY3RvcnlMb2FkZXIuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [...ent/index/column/PhysicalColumnIndexContainer.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2luZGV4L2NvbHVtbi9QaHlzaWNhbENvbHVtbkluZGV4Q29udGFpbmVyLmphdmE=) | `0.00% <0.00%> (-91.67%)` | :arrow_down: |
   | [...ocal/segment/index/loader/SegmentPreProcessor.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2luZGV4L2xvYWRlci9TZWdtZW50UHJlUHJvY2Vzc29yLmphdmE=) | `0.00% <0.00%> (-79.79%)` | :arrow_down: |
   | [.../columnminmaxvalue/ColumnMinMaxValueGenerator.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2luZGV4L2xvYWRlci9jb2x1bW5taW5tYXh2YWx1ZS9Db2x1bW5NaW5NYXhWYWx1ZUdlbmVyYXRvci5qYXZh) | `0.00% <0.00%> (-73.69%)` | :arrow_down: |
   | [...loader/defaultcolumn/BaseDefaultColumnHandler.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2luZGV4L2xvYWRlci9kZWZhdWx0Y29sdW1uL0Jhc2VEZWZhdWx0Q29sdW1uSGFuZGxlci5qYXZh) | `0.00% <0.00%> (-63.27%)` | :arrow_down: |
   | [...t/local/segment/store/SegmentLocalFSDirectory.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L3N0b3JlL1NlZ21lbnRMb2NhbEZTRGlyZWN0b3J5LmphdmE=) | `0.00% <0.00%> (-69.29%)` | :arrow_down: |
   | [...egment/spi/index/metadata/SegmentMetadataImpl.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1zcGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL3Bpbm90L3NlZ21lbnQvc3BpL2luZGV4L21ldGFkYXRhL1NlZ21lbnRNZXRhZGF0YUltcGwuamF2YQ==) | `0.00% <0.00%> (-74.57%)` | :arrow_down: |
   | [...ache/pinot/segment/spi/store/SegmentDirectory.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1zcGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL3Bpbm90L3NlZ21lbnQvc3BpL3N0b3JlL1NlZ21lbnREaXJlY3RvcnkuamF2YQ==) | `0.00% <0.00%> (-62.50%)` | :arrow_down: |
   | ... and [1279 more](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [a1f3c30...cf77dab](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] codecov-commenter edited a comment on pull request #7969: extend SegmentDirectoryLoader interface and refactor BaseTableDataManager

Posted by GitBox <gi...@apache.org>.
codecov-commenter edited a comment on pull request #7969:
URL: https://github.com/apache/pinot/pull/7969#issuecomment-1002909897


   # [Codecov](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#7969](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (f76a05c) into [master](https://codecov.io/gh/apache/pinot/commit/a1f3c30e2ac74dd0acb67087a86d8f03b416f5b2?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (a1f3c30) will **decrease** coverage by `0.06%`.
   > The diff coverage is `80.44%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/pinot/pull/7969/graphs/tree.svg?width=650&height=150&src=pr&token=4ibza2ugkz&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   ```diff
   @@             Coverage Diff              @@
   ##             master    #7969      +/-   ##
   ============================================
   - Coverage     71.38%   71.32%   -0.07%     
   - Complexity     4200     4208       +8     
   ============================================
     Files          1595     1595              
     Lines         82595    82669      +74     
     Branches      12326    12330       +4     
   ============================================
   + Hits          58964    58967       +3     
   - Misses        19645    19711      +66     
   - Partials       3986     3991       +5     
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | integration1 | `28.98% <35.75%> (-0.10%)` | :arrow_down: |
   | integration2 | `27.62% <32.96%> (-0.14%)` | :arrow_down: |
   | unittests1 | `68.28% <82.71%> (+0.04%)` | :arrow_up: |
   | unittests2 | `14.29% <0.00%> (-0.02%)` | :arrow_down: |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [...ata/manager/realtime/RealtimeTableDataManager.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9kYXRhL21hbmFnZXIvcmVhbHRpbWUvUmVhbHRpbWVUYWJsZURhdGFNYW5hZ2VyLmphdmE=) | `67.88% <ø> (ø)` | |
   | [...ent/index/column/PhysicalColumnIndexContainer.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2luZGV4L2NvbHVtbi9QaHlzaWNhbENvbHVtbkluZGV4Q29udGFpbmVyLmphdmE=) | `90.83% <0.00%> (-0.84%)` | :arrow_down: |
   | [...egment/spi/index/metadata/SegmentMetadataImpl.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1zcGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL3Bpbm90L3NlZ21lbnQvc3BpL2luZGV4L21ldGFkYXRhL1NlZ21lbnRNZXRhZGF0YUltcGwuamF2YQ==) | `74.28% <28.57%> (-0.29%)` | :arrow_down: |
   | [...server/starter/helix/HelixInstanceDataManager.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VydmVyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9zZXJ2ZXIvc3RhcnRlci9oZWxpeC9IZWxpeEluc3RhbmNlRGF0YU1hbmFnZXIuamF2YQ==) | `79.44% <47.05%> (-2.06%)` | :arrow_down: |
   | [...indexsegment/immutable/ImmutableSegmentLoader.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9pbmRleHNlZ21lbnQvaW1tdXRhYmxlL0ltbXV0YWJsZVNlZ21lbnRMb2FkZXIuamF2YQ==) | `87.50% <82.75%> (-4.17%)` | :arrow_down: |
   | [...ocal/segment/index/loader/SegmentPreProcessor.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2luZGV4L2xvYWRlci9TZWdtZW50UHJlUHJvY2Vzc29yLmphdmE=) | `80.00% <83.33%> (+0.21%)` | :arrow_up: |
   | [.../pinot/core/data/manager/BaseTableDataManager.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9kYXRhL21hbmFnZXIvQmFzZVRhYmxlRGF0YU1hbmFnZXIuamF2YQ==) | `87.66% <86.17%> (+2.76%)` | :arrow_up: |
   | [...nt/local/loader/DefaultSegmentDirectoryLoader.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9sb2FkZXIvRGVmYXVsdFNlZ21lbnREaXJlY3RvcnlMb2FkZXIuamF2YQ==) | `100.00% <100.00%> (ø)` | |
   | [...egment/local/segment/index/loader/LoaderUtils.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2luZGV4L2xvYWRlci9Mb2FkZXJVdGlscy5qYXZh) | `95.91% <100.00%> (+1.80%)` | :arrow_up: |
   | [.../columnminmaxvalue/ColumnMinMaxValueGenerator.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2luZGV4L2xvYWRlci9jb2x1bW5taW5tYXh2YWx1ZS9Db2x1bW5NaW5NYXhWYWx1ZUdlbmVyYXRvci5qYXZh) | `73.68% <100.00%> (ø)` | |
   | ... and [28 more](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [a1f3c30...f76a05c](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] codecov-commenter edited a comment on pull request #7969: extend SegmentDirectoryLoader interface and refactor BaseTableDataManager

Posted by GitBox <gi...@apache.org>.
codecov-commenter edited a comment on pull request #7969:
URL: https://github.com/apache/pinot/pull/7969#issuecomment-1002909897


   # [Codecov](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#7969](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (f76a05c) into [master](https://codecov.io/gh/apache/pinot/commit/a1f3c30e2ac74dd0acb67087a86d8f03b416f5b2?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (a1f3c30) will **decrease** coverage by `57.09%`.
   > The diff coverage is `0.00%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/pinot/pull/7969/graphs/tree.svg?width=650&height=150&src=pr&token=4ibza2ugkz&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   ```diff
   @@              Coverage Diff              @@
   ##             master    #7969       +/-   ##
   =============================================
   - Coverage     71.38%   14.29%   -57.10%     
   + Complexity     4200       80     -4120     
   =============================================
     Files          1595     1550       -45     
     Lines         82595    80803     -1792     
     Branches      12326    12128      -198     
   =============================================
   - Hits          58964    11549    -47415     
   - Misses        19645    68393    +48748     
   + Partials       3986      861     -3125     
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | integration1 | `?` | |
   | integration2 | `?` | |
   | unittests1 | `?` | |
   | unittests2 | `14.29% <0.00%> (-0.02%)` | :arrow_down: |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [.../pinot/core/data/manager/BaseTableDataManager.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9kYXRhL21hbmFnZXIvQmFzZVRhYmxlRGF0YU1hbmFnZXIuamF2YQ==) | `0.00% <0.00%> (-84.90%)` | :arrow_down: |
   | [...ata/manager/realtime/RealtimeTableDataManager.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9kYXRhL21hbmFnZXIvcmVhbHRpbWUvUmVhbHRpbWVUYWJsZURhdGFNYW5hZ2VyLmphdmE=) | `0.00% <ø> (-67.89%)` | :arrow_down: |
   | [...indexsegment/immutable/ImmutableSegmentLoader.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9pbmRleHNlZ21lbnQvaW1tdXRhYmxlL0ltbXV0YWJsZVNlZ21lbnRMb2FkZXIuamF2YQ==) | `0.00% <0.00%> (-91.67%)` | :arrow_down: |
   | [...nt/local/loader/DefaultSegmentDirectoryLoader.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9sb2FkZXIvRGVmYXVsdFNlZ21lbnREaXJlY3RvcnlMb2FkZXIuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [...ent/index/column/PhysicalColumnIndexContainer.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2luZGV4L2NvbHVtbi9QaHlzaWNhbENvbHVtbkluZGV4Q29udGFpbmVyLmphdmE=) | `0.00% <0.00%> (-91.67%)` | :arrow_down: |
   | [...egment/local/segment/index/loader/LoaderUtils.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2luZGV4L2xvYWRlci9Mb2FkZXJVdGlscy5qYXZh) | `0.00% <0.00%> (-94.12%)` | :arrow_down: |
   | [...ocal/segment/index/loader/SegmentPreProcessor.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2luZGV4L2xvYWRlci9TZWdtZW50UHJlUHJvY2Vzc29yLmphdmE=) | `0.00% <0.00%> (-79.79%)` | :arrow_down: |
   | [.../columnminmaxvalue/ColumnMinMaxValueGenerator.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2luZGV4L2xvYWRlci9jb2x1bW5taW5tYXh2YWx1ZS9Db2x1bW5NaW5NYXhWYWx1ZUdlbmVyYXRvci5qYXZh) | `0.00% <0.00%> (-73.69%)` | :arrow_down: |
   | [...loader/defaultcolumn/BaseDefaultColumnHandler.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2luZGV4L2xvYWRlci9kZWZhdWx0Y29sdW1uL0Jhc2VEZWZhdWx0Q29sdW1uSGFuZGxlci5qYXZh) | `0.00% <0.00%> (-63.27%)` | :arrow_down: |
   | [...egment/spi/index/metadata/SegmentMetadataImpl.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1zcGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL3Bpbm90L3NlZ21lbnQvc3BpL2luZGV4L21ldGFkYXRhL1NlZ21lbnRNZXRhZGF0YUltcGwuamF2YQ==) | `0.00% <0.00%> (-74.57%)` | :arrow_down: |
   | ... and [1278 more](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [a1f3c30...f76a05c](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] klsince commented on a change in pull request #7969: extend SegmentDirectoryLoader interface and refactor BaseTableDataManager

Posted by GitBox <gi...@apache.org>.
klsince commented on a change in pull request #7969:
URL: https://github.com/apache/pinot/pull/7969#discussion_r778376537



##########
File path: pinot-server/src/main/java/org/apache/pinot/server/starter/helix/HelixInstanceDataManager.java
##########
@@ -307,6 +306,32 @@ private void reloadSegment(String tableNameWithType, SegmentMetadata segmentMeta
     }
   }
 
+  /**
+   * Try to reload a segment without a local index directory. The segment can be a consuming
+   * segment from a REALTIME table, or an immutable segment on remote tier backend.
+   * @return true if the segment is loaded.
+   */
+  @VisibleForTesting
+  boolean reloadSegmentWithNullIndexDir(String tableNameWithType, String segmentName,
+      SegmentDataManager segmentDataManager, @Nullable Schema schema) {
+    IndexSegment segment = segmentDataManager.getSegment();
+    if (segment instanceof ImmutableSegment) {
+      LOGGER.info("Found an immutable segment: {} in table: {} on remote tier backend", segmentName,

Review comment:
       I'll save the period in the end, as the convention of log msg in this class seems like not ending with period.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] klsince commented on a change in pull request #7969: extend SegmentDirectoryLoader interface and refactor BaseTableDataManager

Posted by GitBox <gi...@apache.org>.
klsince commented on a change in pull request #7969:
URL: https://github.com/apache/pinot/pull/7969#discussion_r778505597



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/data/manager/BaseTableDataManager.java
##########
@@ -275,53 +278,23 @@ public void addSegmentError(String segmentName, SegmentErrorInfo segmentErrorInf
 
   @Override
   public void reloadSegment(String segmentName, IndexLoadingConfig indexLoadingConfig, SegmentZKMetadata zkMetadata,
-      SegmentMetadata localMetadata, @Nullable Schema schema, boolean forceDownload)
+      SegmentMetadata segmentMetadata, @Nullable Schema schema, boolean forceDownload)
       throws Exception {
-    File indexDir = localMetadata.getIndexDir();
-    Preconditions.checkState(indexDir.isDirectory(), "Index directory: %s is not a directory", indexDir);
-
-    File parentFile = indexDir.getParentFile();
-    File segmentBackupDir =
-        new File(parentFile, indexDir.getName() + CommonConstants.Segment.SEGMENT_BACKUP_DIR_SUFFIX);
-
+    File indexDir = getSegmentDataDir(segmentName);
+    // Create backup dir to make segment reloading atomic for local tier backend.
+    LoaderUtils.createBackup(indexDir);
     try {
-      // First rename index directory to segment backup directory so that original segment have all file descriptors
-      // point to the segment backup directory to ensure original segment serves queries properly
-
-      // Rename index directory to segment backup directory (atomic)
-      Preconditions.checkState(indexDir.renameTo(segmentBackupDir),
-          "Failed to rename index directory: %s to segment backup directory: %s", indexDir, segmentBackupDir);
-
-      // Download from remote or copy from local backup directory into index directory,
-      // and then continue to load the segment from index directory.
-      boolean shouldDownload = forceDownload || !hasSameCRC(zkMetadata, localMetadata);
-      if (shouldDownload && allowDownload(segmentName, zkMetadata)) {
-        if (forceDownload) {
-          LOGGER.info("Segment: {} of table: {} is forced to download", segmentName, _tableNameWithType);
-        } else {
-          LOGGER.info("Download segment:{} of table: {} as local crc: {} mismatches remote crc: {}", segmentName,
-              _tableNameWithType, localMetadata.getCrc(), zkMetadata.getCrc());
-        }
-        indexDir = downloadSegment(segmentName, zkMetadata);
+      boolean shouldDownloadRawSegment = forceDownload || !hasSameCRC(zkMetadata, segmentMetadata);
+      if (shouldDownloadRawSegment && allowDownloadRawSegment(segmentName, zkMetadata)) {
+        downloadRawSegmentAndProcess(segmentName, indexLoadingConfig, zkMetadata, schema);

Review comment:
       If I understand the questions correctly, lemme use an example to add some clarity: 
   
   The segments can move between tiers depends selection criteria like segment age (as said in the docs/wiki ^). The most recent segments may be kept in a tier where the servers in that tier use local disk; older segments may move to a tier where the servers use remote store to keep them. But in those tiers, the segments are preprocessed and ready for queries, just that some tier has low latency and some higher depending on servers' storage backend in those tiers. 
   
   As in a [reply below](https://github.com/apache/pinot/pull/7969#discussion_r778489953), the segments in server's backend (or tier backend) are preprocessed with latest table config and schema and ready for queries.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] codecov-commenter edited a comment on pull request #7969: extend SegmentDirectoryLoader interface and refactor BaseTableDataManager

Posted by GitBox <gi...@apache.org>.
codecov-commenter edited a comment on pull request #7969:
URL: https://github.com/apache/pinot/pull/7969#issuecomment-1002909897


   # [Codecov](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#7969](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (03c234b) into [master](https://codecov.io/gh/apache/pinot/commit/a1f3c30e2ac74dd0acb67087a86d8f03b416f5b2?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (a1f3c30) will **decrease** coverage by `57.10%`.
   > The diff coverage is `0.00%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/pinot/pull/7969/graphs/tree.svg?width=650&height=150&src=pr&token=4ibza2ugkz&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   ```diff
   @@              Coverage Diff              @@
   ##             master    #7969       +/-   ##
   =============================================
   - Coverage     71.38%   14.28%   -57.11%     
   + Complexity     4200       80     -4120     
   =============================================
     Files          1595     1550       -45     
     Lines         82595    80824     -1771     
     Branches      12326    12132      -194     
   =============================================
   - Hits          58964    11547    -47417     
   - Misses        19645    68419    +48774     
   + Partials       3986      858     -3128     
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | integration1 | `?` | |
   | integration2 | `?` | |
   | unittests1 | `?` | |
   | unittests2 | `14.28% <0.00%> (-0.02%)` | :arrow_down: |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [.../pinot/core/data/manager/BaseTableDataManager.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9kYXRhL21hbmFnZXIvQmFzZVRhYmxlRGF0YU1hbmFnZXIuamF2YQ==) | `0.00% <0.00%> (-84.90%)` | :arrow_down: |
   | [...ata/manager/realtime/RealtimeTableDataManager.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9kYXRhL21hbmFnZXIvcmVhbHRpbWUvUmVhbHRpbWVUYWJsZURhdGFNYW5hZ2VyLmphdmE=) | `0.00% <ø> (-67.89%)` | :arrow_down: |
   | [...indexsegment/immutable/ImmutableSegmentLoader.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9pbmRleHNlZ21lbnQvaW1tdXRhYmxlL0ltbXV0YWJsZVNlZ21lbnRMb2FkZXIuamF2YQ==) | `0.00% <0.00%> (-91.67%)` | :arrow_down: |
   | [...nt/local/loader/DefaultSegmentDirectoryLoader.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9sb2FkZXIvRGVmYXVsdFNlZ21lbnREaXJlY3RvcnlMb2FkZXIuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [...ent/index/column/PhysicalColumnIndexContainer.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2luZGV4L2NvbHVtbi9QaHlzaWNhbENvbHVtbkluZGV4Q29udGFpbmVyLmphdmE=) | `0.00% <0.00%> (-91.67%)` | :arrow_down: |
   | [...egment/local/segment/index/loader/LoaderUtils.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2luZGV4L2xvYWRlci9Mb2FkZXJVdGlscy5qYXZh) | `0.00% <0.00%> (-94.12%)` | :arrow_down: |
   | [...ocal/segment/index/loader/SegmentPreProcessor.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2luZGV4L2xvYWRlci9TZWdtZW50UHJlUHJvY2Vzc29yLmphdmE=) | `0.00% <0.00%> (-79.79%)` | :arrow_down: |
   | [.../columnminmaxvalue/ColumnMinMaxValueGenerator.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2luZGV4L2xvYWRlci9jb2x1bW5taW5tYXh2YWx1ZS9Db2x1bW5NaW5NYXhWYWx1ZUdlbmVyYXRvci5qYXZh) | `0.00% <0.00%> (-73.69%)` | :arrow_down: |
   | [...loader/defaultcolumn/BaseDefaultColumnHandler.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2luZGV4L2xvYWRlci9kZWZhdWx0Y29sdW1uL0Jhc2VEZWZhdWx0Q29sdW1uSGFuZGxlci5qYXZh) | `0.00% <0.00%> (-63.27%)` | :arrow_down: |
   | [...egment/spi/index/metadata/SegmentMetadataImpl.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1zcGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL3Bpbm90L3NlZ21lbnQvc3BpL2luZGV4L21ldGFkYXRhL1NlZ21lbnRNZXRhZGF0YUltcGwuamF2YQ==) | `0.00% <0.00%> (-74.57%)` | :arrow_down: |
   | ... and [1278 more](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [a1f3c30...03c234b](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] klsince commented on a change in pull request #7969: extend SegmentDirectoryLoader interface and refactor BaseTableDataManager

Posted by GitBox <gi...@apache.org>.
klsince commented on a change in pull request #7969:
URL: https://github.com/apache/pinot/pull/7969#discussion_r777753085



##########
File path: pinot-segment-local/src/main/java/org/apache/pinot/segment/local/indexsegment/immutable/ImmutableSegmentLoader.java
##########
@@ -95,35 +96,52 @@ public static ImmutableSegment load(File indexDir, IndexLoadingConfig indexLoadi
    * modify the segment like to convert segment format, add or remove indices.
    */
   public static ImmutableSegment load(File indexDir, IndexLoadingConfig indexLoadingConfig, @Nullable Schema schema,

Review comment:
       In few test cases and RawIndexConverter. 
   
   There are other load() methods above for test or non-test classes. Those load() variants can be refactored to use the new load(SegmentDirectory), but there are still some boilerplate codes which I kept in this load() method.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] klsince commented on a change in pull request #7969: extend SegmentDirectoryLoader interface and refactor BaseTableDataManager

Posted by GitBox <gi...@apache.org>.
klsince commented on a change in pull request #7969:
URL: https://github.com/apache/pinot/pull/7969#discussion_r778360221



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/data/manager/BaseTableDataManager.java
##########
@@ -333,94 +306,156 @@ public void reloadSegment(String segmentName, IndexLoadingConfig indexLoadingCon
 
   @Override
   public void addOrReplaceSegment(String segmentName, IndexLoadingConfig indexLoadingConfig,
-      SegmentZKMetadata zkMetadata, @Nullable SegmentMetadata localMetadata)
+      SegmentZKMetadata zkMetadata, @Nullable SegmentMetadata segmentMetadata)
       throws Exception {
-    if (!isNewSegment(zkMetadata, localMetadata)) {
-      LOGGER.info("Segment: {} of table: {} has crc: {} same as before, already loaded, do nothing", segmentName,
-          _tableNameWithType, localMetadata.getCrc());
+    // Non-null segment metadata means the segment has already been loaded.
+    if (segmentMetadata != null) {
+      if (hasSameCRC(zkMetadata, segmentMetadata)) {
+        // Simply returns if the CRC hasn't changed. The table config may have changed
+        // since segment is loaded, but that is handled by reloadSegment() method.
+        LOGGER.info("Segment: {} of table: {} has crc: {} same as before, already loaded, do nothing", segmentName,
+            _tableNameWithType, segmentMetadata.getCrc());
+      } else {
+        // Download the raw segment, reprocess and load it if the CRC has changed.
+        LOGGER.info("Segment: {} of table: {} already loaded but its crc: {} differs from new crc: {}", segmentName,
+            _tableNameWithType, segmentMetadata.getCrc(), zkMetadata.getCrc());
+        downloadRawSegmentAndProcess(segmentName, indexLoadingConfig, zkMetadata,

Review comment:
       per my understanding, raw segment can be downloaded from deep store (default) or peer server (only for RealTime table today), so I keeps this method name generic. Underlying, it calls a method downloadSegmentFromDeepStore.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] klsince commented on a change in pull request #7969: extend SegmentDirectoryLoader interface and refactor BaseTableDataManager

Posted by GitBox <gi...@apache.org>.
klsince commented on a change in pull request #7969:
URL: https://github.com/apache/pinot/pull/7969#discussion_r777745294



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/data/manager/BaseTableDataManager.java
##########
@@ -333,92 +315,131 @@ public void reloadSegment(String segmentName, IndexLoadingConfig indexLoadingCon
 
   @Override
   public void addOrReplaceSegment(String segmentName, IndexLoadingConfig indexLoadingConfig,
-      SegmentZKMetadata zkMetadata, @Nullable SegmentMetadata localMetadata)
+      SegmentZKMetadata zkMetadata, @Nullable SegmentMetadata segmentMetadata)
       throws Exception {
-    if (!isNewSegment(zkMetadata, localMetadata)) {
-      LOGGER.info("Segment: {} of table: {} has crc: {} same as before, already loaded, do nothing", segmentName,
-          _tableNameWithType, localMetadata.getCrc());
+    // Non-null segment metadata means the segment has already been loaded.
+    if (segmentMetadata != null) {
+      if (hasSameCRC(zkMetadata, segmentMetadata)) {
+        // Simply returns if the CRC hasn't changed. The table config may have changed
+        // since segment is loaded, but that is handled by reloadSegment() method.
+        LOGGER.info("Segment: {} of table: {} has crc: {} same as before, already loaded, do nothing", segmentName,
+            _tableNameWithType, segmentMetadata.getCrc());
+      } else {
+        // Download the raw segment, reprocess and load it if the CRC has changed.
+        LOGGER.info("Segment: {} of table: {} already loaded but its crc: {} differs from new crc: {}", segmentName,
+            _tableNameWithType, segmentMetadata.getCrc(), zkMetadata.getCrc());
+        downloadRawSegmentAndProcess(segmentName, indexLoadingConfig, zkMetadata,
+            ZKMetadataProvider.getTableSchema(_propertyStore, _tableNameWithType));
+      }
       return;
     }
 
-    // Try to recover if no local metadata is provided.
-    if (localMetadata == null) {
-      LOGGER.info("Segment: {} of table: {} is not loaded, checking disk", segmentName, _tableNameWithType);
-      localMetadata = recoverSegmentQuietly(segmentName);
-      if (!isNewSegment(zkMetadata, localMetadata)) {
-        LOGGER.info("Segment: {} of table {} has crc: {} same as before, loading", segmentName, _tableNameWithType,
-            localMetadata.getCrc());
-        if (loadSegmentQuietly(segmentName, indexLoadingConfig)) {
-          return;
-        }
-        // Set local metadata to null to indicate that the local segment fails to load,
-        // although it exists and has same crc with the remote one.
-        localMetadata = null;
-      }
+    // For local tier backend, try to recover the segment from potential
+    // reload failure. Continue upon any failure.
+    File indexDir = getSegmentDataDir(segmentName);
+    recoverReloadFailureQuietly(_tableNameWithType, segmentName, indexDir);
+
+    // Creates the SegmentDirectory object to access the segment metadata that
+    // may be from local tier backend or remote tier backend.
+    SegmentDirectoryLoaderContext segmentLoaderContext =
+        new SegmentDirectoryLoaderContext(indexLoadingConfig.getTableConfig(), indexLoadingConfig.getInstanceId(),
+            segmentName, indexLoadingConfig.getSegmentDirectoryConfigs());
+    SegmentDirectoryLoader segmentLoader =
+        SegmentDirectoryLoaderRegistry.getSegmentDirectoryLoader(indexLoadingConfig.getSegmentDirectoryLoader());
+    SegmentDirectory segmentDirectory = null;
+    try {
+      segmentDirectory = segmentLoader.load(indexDir.toURI(), segmentLoaderContext);
+    } catch (Exception e) {
+      LOGGER.warn("Failed to create SegmentDirectory for segment: {} of table: {} due to error: {}", segmentName,
+          _tableNameWithType, e.getMessage());
     }
 
-    Preconditions.checkState(allowDownload(segmentName, zkMetadata), "Segment: %s of table: %s does not allow download",
-        segmentName, _tableNameWithType);
-
-    // Download segment and replace the local one, either due to failure to recover local segment,
-    // or the segment data is updated and has new CRC now.
-    if (localMetadata == null) {
-      LOGGER.info("Download segment: {} of table: {} as no good one exists locally", segmentName, _tableNameWithType);
-    } else {
-      LOGGER.info("Download segment: {} of table: {} as local crc: {} mismatches remote crc: {}.", segmentName,
-          _tableNameWithType, localMetadata.getCrc(), zkMetadata.getCrc());
+    // Download the raw segment, reprocess and load it if it is never loaded or its CRC has changed.
+    if (segmentDirectory == null || !hasSameCRC(zkMetadata, segmentDirectory.getSegmentMetadata())) {
+      LOGGER.info("Segment: {} of table: {} not exist or its crc: {} differs from new crc: {}", segmentName,
+          _tableNameWithType, (segmentDirectory == null) ? "none" : segmentDirectory.getSegmentMetadata().getCrc(),
+          zkMetadata.getCrc());
+      closeSegmentDirectoryQuietly(segmentDirectory);
+      downloadRawSegmentAndProcess(segmentName, indexLoadingConfig, zkMetadata,
+          ZKMetadataProvider.getTableSchema(_propertyStore, _tableNameWithType));
+      return;
     }
-    File indexDir = downloadSegment(segmentName, zkMetadata);
-    addSegment(indexDir, indexLoadingConfig);
-    LOGGER.info("Downloaded and loaded segment: {} of table: {} with crc: {}", segmentName, _tableNameWithType,
-        zkMetadata.getCrc());
-  }
 
-  protected boolean allowDownload(String segmentName, SegmentZKMetadata zkMetadata) {
-    return true;
+    try {
+      Schema schema = ZKMetadataProvider.getTableSchema(_propertyStore, _tableNameWithType);
+      if (!ImmutableSegmentLoader.needPreprocess(segmentDirectory, indexLoadingConfig, schema)) {
+        // The loaded segment is still consistent with current table config or schema.
+        LOGGER.info("Segment: {} of table: {} is consistent with table config and schema", segmentName,
+            _tableNameWithType);
+        addSegment(ImmutableSegmentLoader.load(segmentDirectory, indexLoadingConfig, schema));
+        return;
+      }
+      // If any discrepancy is found, get the segment from tier backend, reprocess and load it.
+      // Please note that the segment is from tier backed, not deep store, for incremental processing.
+      LOGGER.info("Segment: {} of table: {} needs reprocess with table config and schema", segmentName,
+          _tableNameWithType);
+      // Close the stale SegmentDirectory object before loading the newly processed segment.
+      closeSegmentDirectoryQuietly(segmentDirectory);
+      segmentLoader.download(indexDir, segmentLoaderContext);
+      processAndLoadSegment(segmentName, indexDir, indexLoadingConfig, schema);
+      LOGGER.info("Segment: {} of table: {} is reprocessed and loaded", segmentName, _tableNameWithType);
+    } catch (Exception e) {
+      closeSegmentDirectoryQuietly(segmentDirectory);
+      LOGGER.error("Failed to reprocess and load segment: {} of table: {}", segmentName, _tableNameWithType, e);
+      downloadRawSegmentAndProcess(segmentName, indexLoadingConfig, zkMetadata,
+          ZKMetadataProvider.getTableSchema(_propertyStore, _tableNameWithType));
+    }
   }
 
-  protected File downloadSegment(String segmentName, SegmentZKMetadata zkMetadata)
+  /**
+   * This method downloads the raw segment from the deep store and process it, mainly
+   * for cases where segment CRC has changed or the existing segment fails to load.
+   */
+  private void downloadRawSegmentAndProcess(String segmentName, IndexLoadingConfig indexLoadingConfig,
+      SegmentZKMetadata zkMetadata, @Nullable Schema schema)
       throws Exception {
-    // TODO: may support download from peer servers for RealTime table.
-    return downloadSegmentFromDeepStore(segmentName, zkMetadata);
+    Preconditions.checkState(allowDownloadRawSegment(segmentName, zkMetadata),
+        "Segment: %s of table: %s does not allow download raw segment", segmentName, _tableNameWithType);
+
+    File indexDir = downloadRawSegment(segmentName, zkMetadata);
+    processAndLoadSegment(segmentName, indexDir, indexLoadingConfig, schema);
+    LOGGER.info("Downloaded raw segment: {} of table: {} with crc: {} and loaded it", segmentName,
+        _tableNameWithType, zkMetadata.getCrc());
   }
 
-  /**
-   * Server restart during segment reload might leave segment directory in inconsistent state, like the index
-   * directory might not exist but segment backup directory existed. This method tries to recover from reload
-   * failure before checking the existence of the index directory and loading segment metadata from it.
-   */
-  private SegmentMetadata recoverSegmentQuietly(String segmentName) {
-    File indexDir = getSegmentDataDir(segmentName);
+  private void processAndLoadSegment(String segmentName, File indexDir, IndexLoadingConfig indexLoadingConfig,
+      @Nullable Schema schema)
+      throws Exception {
+    // Preprocess the segment locally with current table config and schema.
+    ImmutableSegmentLoader.preprocess(indexDir, indexLoadingConfig, schema);
+
+    SegmentDirectoryLoaderContext segmentLoaderContext =
+        new SegmentDirectoryLoaderContext(indexLoadingConfig.getTableConfig(), indexLoadingConfig.getInstanceId(),
+            segmentName, indexLoadingConfig.getSegmentDirectoryConfigs());
+    SegmentDirectoryLoader segmentLoader =
+        SegmentDirectoryLoaderRegistry.getSegmentDirectoryLoader(indexLoadingConfig.getSegmentDirectoryLoader());
+    SegmentDirectory segmentDirectory = null;
     try {
-      LoaderUtils.reloadFailureRecovery(indexDir);
-      if (!indexDir.exists()) {
-        LOGGER.info("Segment: {} of table: {} is not found on disk", segmentName, _tableNameWithType);
-        return null;
-      }
-      SegmentMetadataImpl localMetadata = new SegmentMetadataImpl(indexDir);
-      LOGGER.info("Segment: {} of table: {} with crc: {} from disk is ready for loading", segmentName,
-          _tableNameWithType, localMetadata.getCrc());
-      return localMetadata;
+      // Upload the processed segment to server tier backend, which can be local or remote.
+      segmentLoader.upload(indexDir, segmentLoaderContext);
+      // Create the SegmentDirectory object with the newly processed segment from tier backend.
+      segmentDirectory = segmentLoader.load(indexDir.toURI(), segmentLoaderContext);
+      addSegment(ImmutableSegmentLoader.load(segmentDirectory, indexLoadingConfig, schema));
     } catch (Exception e) {
-      LOGGER.error("Failed to recover segment: {} of table: {} from disk", segmentName, _tableNameWithType, e);
-      FileUtils.deleteQuietly(indexDir);
-      return null;
+      closeSegmentDirectoryQuietly(segmentDirectory);
+      LOGGER.warn("Failed to load newly processed segment: {} of table: {}", segmentName, _tableNameWithType, e);

Review comment:
       good catch




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] klsince commented on a change in pull request #7969: extend SegmentDirectoryLoader interface and refactor BaseTableDataManager

Posted by GitBox <gi...@apache.org>.
klsince commented on a change in pull request #7969:
URL: https://github.com/apache/pinot/pull/7969#discussion_r777739218



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/data/manager/BaseTableDataManager.java
##########
@@ -333,92 +315,131 @@ public void reloadSegment(String segmentName, IndexLoadingConfig indexLoadingCon
 
   @Override
   public void addOrReplaceSegment(String segmentName, IndexLoadingConfig indexLoadingConfig,
-      SegmentZKMetadata zkMetadata, @Nullable SegmentMetadata localMetadata)
+      SegmentZKMetadata zkMetadata, @Nullable SegmentMetadata segmentMetadata)
       throws Exception {
-    if (!isNewSegment(zkMetadata, localMetadata)) {
-      LOGGER.info("Segment: {} of table: {} has crc: {} same as before, already loaded, do nothing", segmentName,
-          _tableNameWithType, localMetadata.getCrc());
+    // Non-null segment metadata means the segment has already been loaded.
+    if (segmentMetadata != null) {
+      if (hasSameCRC(zkMetadata, segmentMetadata)) {
+        // Simply returns if the CRC hasn't changed. The table config may have changed
+        // since segment is loaded, but that is handled by reloadSegment() method.
+        LOGGER.info("Segment: {} of table: {} has crc: {} same as before, already loaded, do nothing", segmentName,
+            _tableNameWithType, segmentMetadata.getCrc());
+      } else {
+        // Download the raw segment, reprocess and load it if the CRC has changed.
+        LOGGER.info("Segment: {} of table: {} already loaded but its crc: {} differs from new crc: {}", segmentName,
+            _tableNameWithType, segmentMetadata.getCrc(), zkMetadata.getCrc());
+        downloadRawSegmentAndProcess(segmentName, indexLoadingConfig, zkMetadata,
+            ZKMetadataProvider.getTableSchema(_propertyStore, _tableNameWithType));
+      }
       return;
     }
 
-    // Try to recover if no local metadata is provided.
-    if (localMetadata == null) {
-      LOGGER.info("Segment: {} of table: {} is not loaded, checking disk", segmentName, _tableNameWithType);
-      localMetadata = recoverSegmentQuietly(segmentName);
-      if (!isNewSegment(zkMetadata, localMetadata)) {
-        LOGGER.info("Segment: {} of table {} has crc: {} same as before, loading", segmentName, _tableNameWithType,
-            localMetadata.getCrc());
-        if (loadSegmentQuietly(segmentName, indexLoadingConfig)) {
-          return;
-        }
-        // Set local metadata to null to indicate that the local segment fails to load,
-        // although it exists and has same crc with the remote one.
-        localMetadata = null;
-      }
+    // For local tier backend, try to recover the segment from potential
+    // reload failure. Continue upon any failure.
+    File indexDir = getSegmentDataDir(segmentName);
+    recoverReloadFailureQuietly(_tableNameWithType, segmentName, indexDir);
+
+    // Creates the SegmentDirectory object to access the segment metadata that
+    // may be from local tier backend or remote tier backend.
+    SegmentDirectoryLoaderContext segmentLoaderContext =
+        new SegmentDirectoryLoaderContext(indexLoadingConfig.getTableConfig(), indexLoadingConfig.getInstanceId(),
+            segmentName, indexLoadingConfig.getSegmentDirectoryConfigs());
+    SegmentDirectoryLoader segmentLoader =
+        SegmentDirectoryLoaderRegistry.getSegmentDirectoryLoader(indexLoadingConfig.getSegmentDirectoryLoader());
+    SegmentDirectory segmentDirectory = null;
+    try {
+      segmentDirectory = segmentLoader.load(indexDir.toURI(), segmentLoaderContext);
+    } catch (Exception e) {
+      LOGGER.warn("Failed to create SegmentDirectory for segment: {} of table: {} due to error: {}", segmentName,
+          _tableNameWithType, e.getMessage());
     }
 
-    Preconditions.checkState(allowDownload(segmentName, zkMetadata), "Segment: %s of table: %s does not allow download",
-        segmentName, _tableNameWithType);
-
-    // Download segment and replace the local one, either due to failure to recover local segment,
-    // or the segment data is updated and has new CRC now.
-    if (localMetadata == null) {
-      LOGGER.info("Download segment: {} of table: {} as no good one exists locally", segmentName, _tableNameWithType);
-    } else {
-      LOGGER.info("Download segment: {} of table: {} as local crc: {} mismatches remote crc: {}.", segmentName,
-          _tableNameWithType, localMetadata.getCrc(), zkMetadata.getCrc());
+    // Download the raw segment, reprocess and load it if it is never loaded or its CRC has changed.
+    if (segmentDirectory == null || !hasSameCRC(zkMetadata, segmentDirectory.getSegmentMetadata())) {
+      LOGGER.info("Segment: {} of table: {} not exist or its crc: {} differs from new crc: {}", segmentName,
+          _tableNameWithType, (segmentDirectory == null) ? "none" : segmentDirectory.getSegmentMetadata().getCrc(),
+          zkMetadata.getCrc());
+      closeSegmentDirectoryQuietly(segmentDirectory);
+      downloadRawSegmentAndProcess(segmentName, indexLoadingConfig, zkMetadata,
+          ZKMetadataProvider.getTableSchema(_propertyStore, _tableNameWithType));
+      return;
     }
-    File indexDir = downloadSegment(segmentName, zkMetadata);
-    addSegment(indexDir, indexLoadingConfig);
-    LOGGER.info("Downloaded and loaded segment: {} of table: {} with crc: {}", segmentName, _tableNameWithType,
-        zkMetadata.getCrc());
-  }
 
-  protected boolean allowDownload(String segmentName, SegmentZKMetadata zkMetadata) {
-    return true;
+    try {
+      Schema schema = ZKMetadataProvider.getTableSchema(_propertyStore, _tableNameWithType);
+      if (!ImmutableSegmentLoader.needPreprocess(segmentDirectory, indexLoadingConfig, schema)) {
+        // The loaded segment is still consistent with current table config or schema.
+        LOGGER.info("Segment: {} of table: {} is consistent with table config and schema", segmentName,
+            _tableNameWithType);
+        addSegment(ImmutableSegmentLoader.load(segmentDirectory, indexLoadingConfig, schema));
+        return;
+      }
+      // If any discrepancy is found, get the segment from tier backend, reprocess and load it.
+      // Please note that the segment is from tier backed, not deep store, for incremental processing.
+      LOGGER.info("Segment: {} of table: {} needs reprocess with table config and schema", segmentName,

Review comment:
       lgtm, but will keep the `:` as it seems like the convention for the log msgs in this class.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] mcvsubbu commented on a change in pull request #7969: refactor segment loading logic in BaseTableDataManager to decouple it with local segment directory

Posted by GitBox <gi...@apache.org>.
mcvsubbu commented on a change in pull request #7969:
URL: https://github.com/apache/pinot/pull/7969#discussion_r780595425



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/data/manager/BaseTableDataManager.java
##########
@@ -480,12 +426,138 @@ File getSegmentDataDir(String segmentName) {
     return new File(_indexDir, segmentName);
   }
 
-  @VisibleForTesting
-  static boolean isNewSegment(SegmentZKMetadata zkMetadata, @Nullable SegmentMetadata localMetadata) {
-    return localMetadata == null || !hasSameCRC(zkMetadata, localMetadata);
+  /**
+   * Create a backup directory to handle failure of segment reloading.
+   * First rename index directory to segment backup directory so that original segment have all file
+   * descriptors point to the segment backup directory to ensure original segment serves queries properly.
+   * The original index directory is restored lazily, as depending on the conditions,
+   * it may be restored from the backup directory or segment downloaded from deep store.
+   */
+  private void createBackup(File indexDir) {
+    if (!indexDir.exists()) {
+      return;
+    }
+    File parentDir = indexDir.getParentFile();
+    File segmentBackupDir = new File(parentDir, indexDir.getName() + CommonConstants.Segment.SEGMENT_BACKUP_DIR_SUFFIX);
+    // Rename index directory to segment backup directory (atomic).
+    Preconditions.checkState(indexDir.renameTo(segmentBackupDir),
+        "Failed to rename index directory: %s to segment backup directory: %s", indexDir, segmentBackupDir);
+  }
+
+  /**
+   * Remove the backup directory to mark the completion of segment reloading.
+   * First rename then delete is as renaming is an atomic operation, but deleting is not.
+   * When we rename the segment backup directory to segment temporary directory, we know the reload
+   * already succeeded, so that we can safely delete the segment temporary directory.
+   */
+  private void removeBackup(File indexDir)
+      throws IOException {
+    File parentDir = indexDir.getParentFile();
+    File segmentBackupDir = new File(parentDir, indexDir.getName() + CommonConstants.Segment.SEGMENT_BACKUP_DIR_SUFFIX);
+    if (!segmentBackupDir.exists()) {
+      return;
+    }
+    File segmentTempDir = new File(parentDir, indexDir.getName() + CommonConstants.Segment.SEGMENT_TEMP_DIR_SUFFIX);

Review comment:
       OK, agreed.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] codecov-commenter edited a comment on pull request #7969: refactor segment loading logic in BaseTableDataManager to decouple it with local segment directory

Posted by GitBox <gi...@apache.org>.
codecov-commenter edited a comment on pull request #7969:
URL: https://github.com/apache/pinot/pull/7969#issuecomment-1002909897


   # [Codecov](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#7969](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (cf77dab) into [master](https://codecov.io/gh/apache/pinot/commit/a1f3c30e2ac74dd0acb67087a86d8f03b416f5b2?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (a1f3c30) will **decrease** coverage by `1.00%`.
   > The diff coverage is `77.30%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/pinot/pull/7969/graphs/tree.svg?width=650&height=150&src=pr&token=4ibza2ugkz&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   ```diff
   @@             Coverage Diff              @@
   ##             master    #7969      +/-   ##
   ============================================
   - Coverage     71.38%   70.37%   -1.01%     
   - Complexity     4200     4223      +23     
   ============================================
     Files          1595     1595              
     Lines         82595    82792     +197     
     Branches      12326    12352      +26     
   ============================================
   - Hits          58964    58269     -695     
   - Misses        19645    20536     +891     
   - Partials       3986     3987       +1     
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | integration1 | `29.02% <26.99%> (-0.06%)` | :arrow_down: |
   | integration2 | `?` | |
   | unittests1 | `68.17% <80.82%> (-0.06%)` | :arrow_down: |
   | unittests2 | `14.26% <0.00%> (-0.04%)` | :arrow_down: |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [...ent/index/column/PhysicalColumnIndexContainer.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2luZGV4L2NvbHVtbi9QaHlzaWNhbENvbHVtbkluZGV4Q29udGFpbmVyLmphdmE=) | `90.83% <0.00%> (-0.84%)` | :arrow_down: |
   | [...ache/pinot/segment/spi/store/SegmentDirectory.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1zcGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL3Bpbm90L3NlZ21lbnQvc3BpL3N0b3JlL1NlZ21lbnREaXJlY3RvcnkuamF2YQ==) | `55.55% <0.00%> (-6.95%)` | :arrow_down: |
   | [...egment/spi/index/metadata/SegmentMetadataImpl.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1zcGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL3Bpbm90L3NlZ21lbnQvc3BpL2luZGV4L21ldGFkYXRhL1NlZ21lbnRNZXRhZGF0YUltcGwuamF2YQ==) | `74.28% <28.57%> (-0.29%)` | :arrow_down: |
   | [...server/starter/helix/HelixInstanceDataManager.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VydmVyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9zZXJ2ZXIvc3RhcnRlci9oZWxpeC9IZWxpeEluc3RhbmNlRGF0YU1hbmFnZXIuamF2YQ==) | `78.88% <47.05%> (-2.62%)` | :arrow_down: |
   | [...indexsegment/immutable/ImmutableSegmentLoader.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9pbmRleHNlZ21lbnQvaW1tdXRhYmxlL0ltbXV0YWJsZVNlZ21lbnRMb2FkZXIuamF2YQ==) | `84.15% <73.52%> (-7.51%)` | :arrow_down: |
   | [...ocal/segment/index/loader/SegmentPreProcessor.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2luZGV4L2xvYWRlci9TZWdtZW50UHJlUHJvY2Vzc29yLmphdmE=) | `80.00% <83.33%> (+0.21%)` | :arrow_up: |
   | [.../pinot/core/data/manager/BaseTableDataManager.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9kYXRhL21hbmFnZXIvQmFzZVRhYmxlRGF0YU1hbmFnZXIuamF2YQ==) | `89.86% <86.11%> (+4.96%)` | :arrow_up: |
   | [...nt/local/loader/DefaultSegmentDirectoryLoader.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9sb2FkZXIvRGVmYXVsdFNlZ21lbnREaXJlY3RvcnlMb2FkZXIuamF2YQ==) | `100.00% <100.00%> (ø)` | |
   | [.../columnminmaxvalue/ColumnMinMaxValueGenerator.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2luZGV4L2xvYWRlci9jb2x1bW5taW5tYXh2YWx1ZS9Db2x1bW5NaW5NYXhWYWx1ZUdlbmVyYXRvci5qYXZh) | `73.68% <100.00%> (ø)` | |
   | [...loader/defaultcolumn/BaseDefaultColumnHandler.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2luZGV4L2xvYWRlci9kZWZhdWx0Y29sdW1uL0Jhc2VEZWZhdWx0Q29sdW1uSGFuZGxlci5qYXZh) | `63.26% <100.00%> (ø)` | |
   | ... and [109 more](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [a1f3c30...cf77dab](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] klsince commented on a change in pull request #7969: refactor segment loading logic in BaseTableDataManager to decouple it with local segment directory

Posted by GitBox <gi...@apache.org>.
klsince commented on a change in pull request #7969:
URL: https://github.com/apache/pinot/pull/7969#discussion_r780564787



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/data/manager/BaseTableDataManager.java
##########
@@ -480,12 +426,138 @@ File getSegmentDataDir(String segmentName) {
     return new File(_indexDir, segmentName);
   }
 
-  @VisibleForTesting
-  static boolean isNewSegment(SegmentZKMetadata zkMetadata, @Nullable SegmentMetadata localMetadata) {
-    return localMetadata == null || !hasSameCRC(zkMetadata, localMetadata);
+  /**
+   * Create a backup directory to handle failure of segment reloading.
+   * First rename index directory to segment backup directory so that original segment have all file
+   * descriptors point to the segment backup directory to ensure original segment serves queries properly.
+   * The original index directory is restored lazily, as depending on the conditions,
+   * it may be restored from the backup directory or segment downloaded from deep store.
+   */
+  private void createBackup(File indexDir) {
+    if (!indexDir.exists()) {
+      return;
+    }
+    File parentDir = indexDir.getParentFile();
+    File segmentBackupDir = new File(parentDir, indexDir.getName() + CommonConstants.Segment.SEGMENT_BACKUP_DIR_SUFFIX);
+    // Rename index directory to segment backup directory (atomic).
+    Preconditions.checkState(indexDir.renameTo(segmentBackupDir),
+        "Failed to rename index directory: %s to segment backup directory: %s", indexDir, segmentBackupDir);
+  }
+
+  /**
+   * Remove the backup directory to mark the completion of segment reloading.
+   * First rename then delete is as renaming is an atomic operation, but deleting is not.
+   * When we rename the segment backup directory to segment temporary directory, we know the reload
+   * already succeeded, so that we can safely delete the segment temporary directory.
+   */
+  private void removeBackup(File indexDir)
+      throws IOException {
+    File parentDir = indexDir.getParentFile();
+    File segmentBackupDir = new File(parentDir, indexDir.getName() + CommonConstants.Segment.SEGMENT_BACKUP_DIR_SUFFIX);
+    if (!segmentBackupDir.exists()) {
+      return;
+    }
+    File segmentTempDir = new File(parentDir, indexDir.getName() + CommonConstants.Segment.SEGMENT_TEMP_DIR_SUFFIX);

Review comment:
       As deletion happens immediately after renaming here, I'd assume naming collision on this temp dir would be no harm.
   
   Besides, deleting this tmp dir may fail or not get chance to happen (e.g. upon JVM hard stop), so the LoaderUtils.reloadFailureRecovery method retries to delete this tmp dir if it still exists. Adding a timestamp in the name would add more complexity for this retry logic.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] codecov-commenter edited a comment on pull request #7969: refactor segment loading logic in BaseTableDataManager to decouple it with local segment directory

Posted by GitBox <gi...@apache.org>.
codecov-commenter edited a comment on pull request #7969:
URL: https://github.com/apache/pinot/pull/7969#issuecomment-1002909897


   # [Codecov](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#7969](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (a65dbaf) into [master](https://codecov.io/gh/apache/pinot/commit/a1f3c30e2ac74dd0acb67087a86d8f03b416f5b2?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (a1f3c30) will **decrease** coverage by `57.12%`.
   > The diff coverage is `0.00%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/pinot/pull/7969/graphs/tree.svg?width=650&height=150&src=pr&token=4ibza2ugkz&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   ```diff
   @@              Coverage Diff              @@
   ##             master    #7969       +/-   ##
   =============================================
   - Coverage     71.38%   14.26%   -57.13%     
   + Complexity     4200       81     -4119     
   =============================================
     Files          1595     1550       -45     
     Lines         82595    80942     -1653     
     Branches      12326    12152      -174     
   =============================================
   - Hits          58964    11545    -47419     
   - Misses        19645    68537    +48892     
   + Partials       3986      860     -3126     
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | integration1 | `?` | |
   | integration2 | `?` | |
   | unittests1 | `?` | |
   | unittests2 | `14.26% <0.00%> (-0.05%)` | :arrow_down: |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [.../pinot/core/data/manager/BaseTableDataManager.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9kYXRhL21hbmFnZXIvQmFzZVRhYmxlRGF0YU1hbmFnZXIuamF2YQ==) | `0.00% <0.00%> (-84.90%)` | :arrow_down: |
   | [...indexsegment/immutable/ImmutableSegmentLoader.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9pbmRleHNlZ21lbnQvaW1tdXRhYmxlL0ltbXV0YWJsZVNlZ21lbnRMb2FkZXIuamF2YQ==) | `0.00% <0.00%> (-91.67%)` | :arrow_down: |
   | [...nt/local/loader/DefaultSegmentDirectoryLoader.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9sb2FkZXIvRGVmYXVsdFNlZ21lbnREaXJlY3RvcnlMb2FkZXIuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [...ent/index/column/PhysicalColumnIndexContainer.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2luZGV4L2NvbHVtbi9QaHlzaWNhbENvbHVtbkluZGV4Q29udGFpbmVyLmphdmE=) | `0.00% <0.00%> (-91.67%)` | :arrow_down: |
   | [...ocal/segment/index/loader/SegmentPreProcessor.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2luZGV4L2xvYWRlci9TZWdtZW50UHJlUHJvY2Vzc29yLmphdmE=) | `0.00% <0.00%> (-79.79%)` | :arrow_down: |
   | [.../columnminmaxvalue/ColumnMinMaxValueGenerator.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2luZGV4L2xvYWRlci9jb2x1bW5taW5tYXh2YWx1ZS9Db2x1bW5NaW5NYXhWYWx1ZUdlbmVyYXRvci5qYXZh) | `0.00% <0.00%> (-73.69%)` | :arrow_down: |
   | [...loader/defaultcolumn/BaseDefaultColumnHandler.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2luZGV4L2xvYWRlci9kZWZhdWx0Y29sdW1uL0Jhc2VEZWZhdWx0Q29sdW1uSGFuZGxlci5qYXZh) | `0.00% <0.00%> (-63.27%)` | :arrow_down: |
   | [...t/local/segment/store/SegmentLocalFSDirectory.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L3N0b3JlL1NlZ21lbnRMb2NhbEZTRGlyZWN0b3J5LmphdmE=) | `0.00% <0.00%> (-69.29%)` | :arrow_down: |
   | [...egment/spi/index/metadata/SegmentMetadataImpl.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1zcGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL3Bpbm90L3NlZ21lbnQvc3BpL2luZGV4L21ldGFkYXRhL1NlZ21lbnRNZXRhZGF0YUltcGwuamF2YQ==) | `0.00% <0.00%> (-74.57%)` | :arrow_down: |
   | [...ache/pinot/segment/spi/store/SegmentDirectory.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1zcGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL3Bpbm90L3NlZ21lbnQvc3BpL3N0b3JlL1NlZ21lbnREaXJlY3RvcnkuamF2YQ==) | `0.00% <0.00%> (-62.50%)` | :arrow_down: |
   | ... and [1279 more](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [a1f3c30...a65dbaf](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] codecov-commenter edited a comment on pull request #7969: refactor segment loading logic in BaseTableDataManager to decouple it with local segment directory

Posted by GitBox <gi...@apache.org>.
codecov-commenter edited a comment on pull request #7969:
URL: https://github.com/apache/pinot/pull/7969#issuecomment-1002909897


   # [Codecov](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#7969](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (ac41900) into [master](https://codecov.io/gh/apache/pinot/commit/823aa07d7fc61543ed3d91992749d28aecd192ee?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (823aa07) will **decrease** coverage by `57.07%`.
   > The diff coverage is `0.00%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/pinot/pull/7969/graphs/tree.svg?width=650&height=150&src=pr&token=4ibza2ugkz&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   ```diff
   @@              Coverage Diff              @@
   ##             master    #7969       +/-   ##
   =============================================
   - Coverage     71.35%   14.27%   -57.08%     
   + Complexity     4218       81     -4137     
   =============================================
     Files          1595     1551       -44     
     Lines         82748    80960     -1788     
     Branches      12348    12155      -193     
   =============================================
   - Hits          59046    11561    -47485     
   - Misses        19715    68546    +48831     
   + Partials       3987      853     -3134     
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | integration1 | `?` | |
   | integration2 | `?` | |
   | unittests1 | `?` | |
   | unittests2 | `14.27% <0.00%> (+0.01%)` | :arrow_up: |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [.../pinot/core/data/manager/BaseTableDataManager.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9kYXRhL21hbmFnZXIvQmFzZVRhYmxlRGF0YU1hbmFnZXIuamF2YQ==) | `0.00% <0.00%> (-84.78%)` | :arrow_down: |
   | [...indexsegment/immutable/ImmutableSegmentLoader.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9pbmRleHNlZ21lbnQvaW1tdXRhYmxlL0ltbXV0YWJsZVNlZ21lbnRMb2FkZXIuamF2YQ==) | `0.00% <0.00%> (-91.67%)` | :arrow_down: |
   | [...nt/local/loader/DefaultSegmentDirectoryLoader.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9sb2FkZXIvRGVmYXVsdFNlZ21lbnREaXJlY3RvcnlMb2FkZXIuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [...ent/index/column/PhysicalColumnIndexContainer.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2luZGV4L2NvbHVtbi9QaHlzaWNhbENvbHVtbkluZGV4Q29udGFpbmVyLmphdmE=) | `0.00% <0.00%> (-91.67%)` | :arrow_down: |
   | [...ocal/segment/index/loader/SegmentPreProcessor.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2luZGV4L2xvYWRlci9TZWdtZW50UHJlUHJvY2Vzc29yLmphdmE=) | `0.00% <0.00%> (-79.79%)` | :arrow_down: |
   | [.../columnminmaxvalue/ColumnMinMaxValueGenerator.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2luZGV4L2xvYWRlci9jb2x1bW5taW5tYXh2YWx1ZS9Db2x1bW5NaW5NYXhWYWx1ZUdlbmVyYXRvci5qYXZh) | `0.00% <0.00%> (-73.69%)` | :arrow_down: |
   | [...loader/defaultcolumn/BaseDefaultColumnHandler.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2luZGV4L2xvYWRlci9kZWZhdWx0Y29sdW1uL0Jhc2VEZWZhdWx0Q29sdW1uSGFuZGxlci5qYXZh) | `0.00% <0.00%> (-63.27%)` | :arrow_down: |
   | [...t/local/segment/store/SegmentLocalFSDirectory.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L3N0b3JlL1NlZ21lbnRMb2NhbEZTRGlyZWN0b3J5LmphdmE=) | `0.00% <0.00%> (-69.29%)` | :arrow_down: |
   | [...egment/spi/index/metadata/SegmentMetadataImpl.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1zcGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL3Bpbm90L3NlZ21lbnQvc3BpL2luZGV4L21ldGFkYXRhL1NlZ21lbnRNZXRhZGF0YUltcGwuamF2YQ==) | `0.00% <0.00%> (-74.57%)` | :arrow_down: |
   | [...ache/pinot/segment/spi/store/SegmentDirectory.java](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1zcGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL3Bpbm90L3NlZ21lbnQvc3BpL3N0b3JlL1NlZ21lbnREaXJlY3RvcnkuamF2YQ==) | `0.00% <0.00%> (-62.50%)` | :arrow_down: |
   | ... and [1279 more](https://codecov.io/gh/apache/pinot/pull/7969/diff?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [823aa07...ac41900](https://codecov.io/gh/apache/pinot/pull/7969?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] npawar commented on pull request #7969: refactor segment loading logic in BaseTableDataManager to decouple it with local segment directory

Posted by GitBox <gi...@apache.org>.
npawar commented on pull request #7969:
URL: https://github.com/apache/pinot/pull/7969#issuecomment-1010238175


   it looks like all major feedback has been addressed. if it’s only comments and naming that’s needing to change, can we merge this and add followups as needed? @snleee @mcvsubbu 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] klsince commented on pull request #7969: refactor segment loading logic in BaseTableDataManager to decouple it with local segment directory

Posted by GitBox <gi...@apache.org>.
klsince commented on pull request #7969:
URL: https://github.com/apache/pinot/pull/7969#issuecomment-1010314758


   > Please fix the integration test. Thanks!
   
   Yes, I'm investigating the failure. It's running ok on my local. So I'm using a [debug PR](https://github.com/apache/pinot/pull/7991) to dig on this. That debug PR is made off the latest master and with a few prints. But as you can see the failure has moved to another integ test case there. That made me wonder if the Github CI env got some issue. Not sure if you've got permission to help take a quick look at the CI env. Thank you @snleee.  


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] klsince commented on a change in pull request #7969: refactor segment loading logic in BaseTableDataManager to decouple it with local segment directory

Posted by GitBox <gi...@apache.org>.
klsince commented on a change in pull request #7969:
URL: https://github.com/apache/pinot/pull/7969#discussion_r781622620



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/data/manager/BaseTableDataManager.java
##########
@@ -277,49 +282,36 @@ public void addSegmentError(String segmentName, SegmentErrorInfo segmentErrorInf
   public void reloadSegment(String segmentName, IndexLoadingConfig indexLoadingConfig, SegmentZKMetadata zkMetadata,
       SegmentMetadata localMetadata, @Nullable Schema schema, boolean forceDownload)
       throws Exception {
-    File indexDir = localMetadata.getIndexDir();
-    Preconditions.checkState(indexDir.isDirectory(), "Index directory: %s is not a directory", indexDir);
-
-    File parentFile = indexDir.getParentFile();
-    File segmentBackupDir =
-        new File(parentFile, indexDir.getName() + CommonConstants.Segment.SEGMENT_BACKUP_DIR_SUFFIX);
-
+    File indexDir = getSegmentDataDir(segmentName);
     try {
-      // First rename index directory to segment backup directory so that original segment have all file descriptors
-      // point to the segment backup directory to ensure original segment serves queries properly
+      // Create backup directory to handle failure of segment reloading.
+      createBackup(indexDir);
 
-      // Rename index directory to segment backup directory (atomic)
-      Preconditions.checkState(indexDir.renameTo(segmentBackupDir),
-          "Failed to rename index directory: %s to segment backup directory: %s", indexDir, segmentBackupDir);
-
-      // Download from remote or copy from local backup directory into index directory,
-      // and then continue to load the segment from index directory.
+      // Download segment from deep store if CRC changes or forced to download;
+      // otherwise, copy backup directory back to the original index directory.
+      // And then continue to load the segment from the index directory.
       boolean shouldDownload = forceDownload || !hasSameCRC(zkMetadata, localMetadata);
       if (shouldDownload && allowDownload(segmentName, zkMetadata)) {
         if (forceDownload) {
           LOGGER.info("Segment: {} of table: {} is forced to download", segmentName, _tableNameWithType);
         } else {
-          LOGGER.info("Download segment:{} of table: {} as local crc: {} mismatches remote crc: {}", segmentName,
+          LOGGER.info("Download segment:{} of table: {} as crc changes from: {} to: {}", segmentName,
               _tableNameWithType, localMetadata.getCrc(), zkMetadata.getCrc());
         }
         indexDir = downloadSegment(segmentName, zkMetadata);
       } else {
-        LOGGER.info("Reload the local copy of segment: {} of table: {}", segmentName, _tableNameWithType);
-        FileUtils.copyDirectory(segmentBackupDir, indexDir);
+        LOGGER.info("Reload existing segment: {} of table: {}", segmentName, _tableNameWithType);
+        try (SegmentDirectory segmentDirectory = initSegmentDirectory(segmentName, indexLoadingConfig)) {

Review comment:
       Thanks for the feedback! 
   
   A key motivation to use `SegmentDirectory.copyTo(indexDir)` instead of `FileUtils.copyDirectory(segmentBackupDir, indexDir)` is to abstract where the source directory is. So, I'd suggest to read current logic as 
   ```
   1. make a copy of SegmentDirectory at indexDir; (SegmentDirectory knows source dir is kept)
   2. load index dir
   ```
   
   This allows some implementation of SegmentDirectory to copy from places other than the local backup dir.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org