You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by GitBox <gi...@apache.org> on 2022/09/19 18:40:32 UTC

[GitHub] [pinot] jackjlli commented on a diff in pull request #9423: Cache Deleted Segment Names in Server to Avoid SegmentMissingError

jackjlli commented on code in PR #9423:
URL: https://github.com/apache/pinot/pull/9423#discussion_r974547839


##########
pinot-core/src/main/java/org/apache/pinot/core/query/executor/ServerQueryExecutorV1Impl.java:
##########
@@ -191,8 +192,13 @@ private DataTable processQueryInternal(ServerQueryRequest queryRequest, Executor
     }
 
     List<String> segmentsToQuery = queryRequest.getSegmentsToQuery();
-    List<String> missingSegments = new ArrayList<>();
-    List<SegmentDataManager> segmentDataManagers = tableDataManager.acquireSegments(segmentsToQuery, missingSegments);
+    List<String> unAcquiredSegments = new ArrayList<>();
+    List<SegmentDataManager> segmentDataManagers = tableDataManager.acquireSegments(
+        segmentsToQuery, unAcquiredSegments);
+    List<String> missingSegments =
+        unAcquiredSegments.stream()
+            .filter(segmentName -> !tableDataManager.isSegmentDeletedRecently(segmentName))

Review Comment:
   Suggest moving this logic inside the if block of Line 301 so that pinot-server doesn't have to do it for all the queries all the time.



##########
pinot-server/src/main/java/org/apache/pinot/server/starter/helix/HelixInstanceDataManagerConfig.java:
##########
@@ -121,8 +121,13 @@ public class HelixInstanceDataManagerConfig implements InstanceDataManagerConfig
   // Size of cache that holds errors.
   private static final String ERROR_CACHE_SIZE = "error.cache.size";
 
+  private static final String DELETED_SEGMENTS_CACHE_SIZE = "table.deleted.segments.cache.size";
+  private static final String DELETED_SEGMENTS_CACHE_TTL_MINUTES = "table.deleted.segments.cache.ttl.minutes";
+
   private final static String[] REQUIRED_KEYS = {INSTANCE_ID, INSTANCE_DATA_DIR, READ_MODE};
   private static final long DEFAULT_ERROR_CACHE_SIZE = 100L;
+  private static final int DEFAULT_DELETED_SEGMENTS_CACHE_SIZE = 10_000;
+  private static final int DEFAULT_DELETED_SEGMENTS_CACHE_TTL_MINUTES = 10;

Review Comment:
   Usually the state transition doesn't take too much time (at most sub-seconds) on spreading the helix messages and notifications. If it really needs 10 minutes to converge, it probably means there is something wrong in your cluster and needs to be fixed.



##########
pinot-core/src/main/java/org/apache/pinot/core/data/manager/BaseTableDataManager.java:
##########
@@ -411,6 +429,31 @@ public void addOrReplaceSegment(String segmentName, IndexLoadingConfig indexLoad
         zkMetadata.getCrc());
   }
 
+  /**
+   * _segmentDataManagerMap is used for fetching segments that need to be queried. If a new segment is created,
+   * calling this method ensures that all queries in the future can use the new segment. This method may replace an
+   * existing segment with the same name.
+   */
+  @Nullable
+  protected SegmentDataManager registerSegment(String segmentName, SegmentDataManager segmentDataManager) {
+    SegmentDataManager oldSegmentDataManager = _segmentDataManagerMap.put(segmentName, segmentDataManager);
+    _deletedSegments.invalidate(segmentName);
+    return oldSegmentDataManager;
+  }
+
+  /**
+   * De-registering a segment ensures that no query uses the given segment until a segment with that name is
+   * re-registered. There may be scenarios where the broker thinks that a segment is available even though it has
+   * been de-registered in the servers (either due to manual deletion or retention). In such cases, acquireSegments
+   * will mark those segments as missingSegments. The caller can use {@link #isSegmentDeletedRecently(String)} to
+   * identify this scenario.
+   */
+  @Nullable
+  protected SegmentDataManager deRegisterSegment(String segmentName) {

Review Comment:
   nit: unregisterSegment



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org