You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by GitBox <gi...@apache.org> on 2021/12/15 01:01:23 UTC

[GitHub] [pinot] Jackie-Jiang commented on a change in pull request #7898: Add help methods to check if segment needs reprocessing

Jackie-Jiang commented on a change in pull request #7898:
URL: https://github.com/apache/pinot/pull/7898#discussion_r769160946



##########
File path: pinot-segment-local/src/main/java/org/apache/pinot/segment/local/indexsegment/immutable/ImmutableSegmentLoader.java
##########
@@ -173,6 +174,34 @@ public static ImmutableSegment load(File indexDir, IndexLoadingConfig indexLoadi
     return segment;
   }
 
+  /**
+   * Check segment directory against the table config and schema to see if any preprocessing is needed,
+   * like changing segment format, adding new indices or updating default columns.
+   */
+  public static boolean needPreprocess(SegmentDirectory segmentDirectory, IndexLoadingConfig indexLoadingConfig,
+      @Nullable Schema schema)
+      throws Exception {
+    if (needConvertSegmentFormat(indexLoadingConfig, segmentDirectory.getSegmentMetadata())) {
+      return true;
+    }
+    SegmentPreProcessor preProcessor = new SegmentPreProcessor(segmentDirectory, indexLoadingConfig, schema);
+    return preProcessor.needProcess();
+  }
+
+  @VisibleForTesting
+  static boolean needConvertSegmentFormat(IndexLoadingConfig indexLoadingConfig,
+      SegmentMetadataImpl segmentMetadata) {

Review comment:
       Since we already have the loaded `SegmentDirectory`, we should directly pass that in, where the version info can be derived based on the type of the segment directory.

##########
File path: pinot-segment-local/src/main/java/org/apache/pinot/segment/local/indexsegment/immutable/ImmutableSegmentLoader.java
##########
@@ -173,6 +174,34 @@ public static ImmutableSegment load(File indexDir, IndexLoadingConfig indexLoadi
     return segment;
   }
 
+  /**
+   * Check segment directory against the table config and schema to see if any preprocessing is needed,
+   * like changing segment format, adding new indices or updating default columns.
+   */
+  public static boolean needPreprocess(SegmentDirectory segmentDirectory, IndexLoadingConfig indexLoadingConfig,
+      @Nullable Schema schema)
+      throws Exception {
+    if (needConvertSegmentFormat(indexLoadingConfig, segmentDirectory.getSegmentMetadata())) {
+      return true;
+    }
+    SegmentPreProcessor preProcessor = new SegmentPreProcessor(segmentDirectory, indexLoadingConfig, schema);
+    return preProcessor.needProcess();
+  }
+
+  @VisibleForTesting
+  static boolean needConvertSegmentFormat(IndexLoadingConfig indexLoadingConfig,
+      SegmentMetadataImpl segmentMetadata) {
+    SegmentVersion segmentVersionToLoad = indexLoadingConfig.getSegmentVersion();
+    if (segmentVersionToLoad == null) {
+      return false;
+    }
+    File indexDir = segmentMetadata.getIndexDir();
+    if (indexDir != null && SegmentDirectoryPaths.segmentDirectoryFor(indexDir, segmentVersionToLoad).isDirectory()) {
+      return false;
+    }
+    return segmentVersionToLoad != segmentMetadata.getVersion();

Review comment:
       @siddharthteotia I think we should not throw exception or ignore the version downgrade within the version check as that will change the existing behavior.

##########
File path: pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/index/loader/IndexHandlerFactory.java
##########
@@ -60,4 +72,31 @@ public static IndexHandler getIndexHandler(ColumnIndexType type, File indexDir,
         return NO_OP_HANDLER;
     }
   }
+
+  /**
+   * This method creates handlers to check different types of indices. A segment reader is required,
+   * as only read-only operations are allowed during checks.

Review comment:
       Having 2 different api to get the index handler seems redundant and not easy to use. Can we always pass in the `SegmentDirectory`, and the `indexDir` can be read from the `SegmentDirectory`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org