You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by GitBox <gi...@apache.org> on 2022/01/18 19:50:24 UTC

[GitHub] [pinot] npawar commented on a change in pull request #8035: refine segment consistency check

npawar commented on a change in pull request #8035:
URL: https://github.com/apache/pinot/pull/8035#discussion_r787093993



##########
File path: pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/index/loader/invertedindex/FSTIndexHandler.java
##########
@@ -79,8 +79,24 @@ public FSTIndexHandler(SegmentMetadata segmentMetadata, IndexLoadingConfig index
 
   @Override
   public boolean needUpdateIndices(SegmentDirectory.Reader segmentReader) {
+    String segmentName = _segmentMetadata.getName();
     Set<String> existingColumns = segmentReader.toSegmentDirectory().getColumnsWithIndex(ColumnIndexType.FST_INDEX);
-    return !existingColumns.equals(_columnsToAddIdx);
+    // Check if any existing index need to be removed.
+    for (String column : existingColumns) {
+      if (!_columnsToAddIdx.remove(column)) {
+        LOGGER.debug("Need to remove existing FST index from segment: {}, column: {}", segmentName, column);
+        return true;
+      }
+    }
+    // Check if any new index need to be added.
+    for (String column : _columnsToAddIdx) {
+      ColumnMetadata columnMetadata = _segmentMetadata.getColumnMetadataFor(column);
+      if (columnMetadata != null) {

Review comment:
       if any of the checks in `checkUnsupportedOperationsForFSTIndex` fail, we would have simply downloaded the segment, only to throw exception. How about checking those here as well (by sharing the method) and failing here itself?

##########
File path: pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/index/loader/bloomfilter/BloomFilterHandler.java
##########
@@ -65,9 +65,25 @@ public BloomFilterHandler(SegmentMetadata segmentMetadata, IndexLoadingConfig in
 
   @Override
   public boolean needUpdateIndices(SegmentDirectory.Reader segmentReader) {
+    String segmentName = _segmentMetadata.getName();
     Set<String> columnsToAddBF = new HashSet<>(_bloomFilterConfigs.keySet());
     Set<String> existingColumns = segmentReader.toSegmentDirectory().getColumnsWithIndex(ColumnIndexType.BLOOM_FILTER);
-    return !existingColumns.equals(columnsToAddBF);
+    // Check if any existing bloomfilter need to be removed.
+    for (String column : existingColumns) {
+      if (!columnsToAddBF.remove(column)) {
+        LOGGER.debug("Need to remove existing bloom filter from segment: {}, column: {}", segmentName, column);
+        return true;
+      }
+    }
+    // Check if any new bloomfilter need to be added.
+    for (String column : columnsToAddBF) {
+      ColumnMetadata columnMetadata = _segmentMetadata.getColumnMetadataFor(column);
+      if (columnMetadata != null && columnMetadata.hasDictionary()) {

Review comment:
       should we make a short util method like so?
   ```
   boolean shouldCreateBloomFilter(ColumnMetadata columMetadata) {
     return columnMetadata != null && columnMetadata.hasDictionary();
   }
   ```
   It can be shared by `needUpdate` and `update`, so we reduce chances of them becoming inconsistent over time. 

##########
File path: pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/index/loader/invertedindex/InvertedIndexHandler.java
##########
@@ -56,9 +56,26 @@ public InvertedIndexHandler(SegmentMetadata segmentMetadata, IndexLoadingConfig
 
   @Override
   public boolean needUpdateIndices(SegmentDirectory.Reader segmentReader) {
+    String segmentName = _segmentMetadata.getName();
     Set<String> existingColumns =
         segmentReader.toSegmentDirectory().getColumnsWithIndex(ColumnIndexType.INVERTED_INDEX);
-    return !existingColumns.equals(_columnsToAddIdx);
+    // Check if any existing index need to be removed.
+    for (String column : existingColumns) {
+      if (!_columnsToAddIdx.remove(column)) {
+        LOGGER.debug("Need to remove existing inverted index from segment: {}, column: {}", segmentName, column);
+        return true;
+      }
+    }
+    // Check if any new index need to be added.
+    for (String column : _columnsToAddIdx) {
+      ColumnMetadata columnMetadata = _segmentMetadata.getColumnMetadataFor(column);
+      // Only create inverted index on dictionary-encoded unsorted columns.
+      if (columnMetadata != null && !columnMetadata.isSorted() && columnMetadata.hasDictionary()) {

Review comment:
       same comment here about the local util method to share between methods need/update (and for other indexes too, even if it's just a null check). Future developers and reviewers may not always catch changes to this if in separate places.

##########
File path: pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/index/loader/invertedindex/RangeIndexHandler.java
##########
@@ -58,8 +58,25 @@ public RangeIndexHandler(SegmentMetadata segmentMetadata, IndexLoadingConfig ind
 
   @Override
   public boolean needUpdateIndices(SegmentDirectory.Reader segmentReader) {
+    String segmentName = _segmentMetadata.getName();
     Set<String> existingColumns = segmentReader.toSegmentDirectory().getColumnsWithIndex(ColumnIndexType.RANGE_INDEX);
-    return !existingColumns.equals(_columnsToAddIdx);
+    // Check if any existing index need to be removed.
+    for (String column : existingColumns) {
+      if (!_columnsToAddIdx.remove(column)) {
+        LOGGER.debug("Need to remove existing range index from segment: {}, column: {}", segmentName, column);

Review comment:
       i feel there should be at least 1 info log from these methods (even both becoming info is fine imo). Will give us more visibility into the decisions made in the `need` methods.

##########
File path: pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/index/loader/SegmentPreProcessor.java
##########
@@ -143,22 +143,26 @@ public boolean needProcess()
         DefaultColumnHandler defaultColumnHandler = DefaultColumnHandlerFactory
             .getDefaultColumnHandler(null, _segmentMetadata, _indexLoadingConfig, _schema, null);
         if (defaultColumnHandler.needUpdateDefaultColumns()) {
+          LOGGER.debug("Found default columns need updates");
           return true;
         }
       }
       // Check if there is need to update single-column indices, like inverted index, json index etc.
       for (ColumnIndexType type : ColumnIndexType.values()) {
         if (IndexHandlerFactory.getIndexHandler(type, _segmentMetadata, _indexLoadingConfig)
             .needUpdateIndices(segmentReader)) {
+          LOGGER.debug("Found index type: {} needs updates", type);
           return true;
         }
       }
       // Check if there is need to create/modify/remove star-trees.
       if (needProcessStarTrees()) {
+        LOGGER.debug("Found startree index needs updates");
         return true;
       }
       // Check if there is need to update column min max value.
       if (needUpdateColumnMinMaxValue()) {

Review comment:
       make these info? only 0-1 of these will be printed anyway

##########
File path: pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/index/loader/invertedindex/TextIndexHandler.java
##########
@@ -93,8 +93,24 @@ public TextIndexHandler(SegmentMetadata segmentMetadata, IndexLoadingConfig inde
 
   @Override
   public boolean needUpdateIndices(SegmentDirectory.Reader segmentReader) {
+    String segmentName = _segmentMetadata.getName();
     Set<String> existingColumns = segmentReader.toSegmentDirectory().getColumnsWithIndex(ColumnIndexType.TEXT_INDEX);
-    return !existingColumns.equals(_columnsToAddIdx);
+    // Check if any existing index need to be removed.
+    for (String column : existingColumns) {
+      if (!_columnsToAddIdx.remove(column)) {
+        LOGGER.debug("Need to remove existing text index from segment: {}, column: {}", segmentName, column);
+        return true;
+      }
+    }
+    // Check if any new index need to be added.
+    for (String column : _columnsToAddIdx) {
+      ColumnMetadata columnMetadata = _segmentMetadata.getColumnMetadataFor(column);
+      if (columnMetadata != null) {

Review comment:
       same comment as FST index regarding the `checkUnsupportedOperationsForTextIndex`. Text indexes will be large, and we'd rather fail sooner and skip download. wdyt?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org