You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by "Jackie-Jiang (via GitHub)" <gi...@apache.org> on 2023/07/01 07:47:15 UTC

[GitHub] [pinot] Jackie-Jiang commented on a diff in pull request #11020: allow to preload segments with upsert snapshots to speedup table loading

Jackie-Jiang commented on code in PR #11020:
URL: https://github.com/apache/pinot/pull/11020#discussion_r1248613256


##########
pinot-server/src/main/java/org/apache/pinot/server/starter/helix/HelixInstanceDataManager.java:
##########
@@ -135,6 +136,11 @@ public synchronized void init(PinotConfiguration config, HelixManager helixManag
     // used to initialize a segment refresh semaphore to limit the parallelism, so create a pool of same size.
     _segmentRefreshExecutor = Executors.newFixedThreadPool(getMaxParallelRefreshThreads(),
         new ThreadFactoryBuilder().setNameFormat("segment-refresh-thread-%d").build());
+    LOGGER.info("Initialized segment refresh executor thread pool: {}", getMaxParallelRefreshThreads());

Review Comment:
   (minor) Put `getMaxParallelRefreshThreads()` and `getMaxSegmentPreloadThreads()` into local variable



##########
pinot-server/src/main/java/org/apache/pinot/server/starter/helix/HelixInstanceDataManager.java:
##########
@@ -135,6 +136,11 @@ public synchronized void init(PinotConfiguration config, HelixManager helixManag
     // used to initialize a segment refresh semaphore to limit the parallelism, so create a pool of same size.
     _segmentRefreshExecutor = Executors.newFixedThreadPool(getMaxParallelRefreshThreads(),
         new ThreadFactoryBuilder().setNameFormat("segment-refresh-thread-%d").build());
+    LOGGER.info("Initialized segment refresh executor thread pool: {}", getMaxParallelRefreshThreads());
+    _segmentPreloadExecutor = Executors.newFixedThreadPool(_instanceDataManagerConfig.getMaxSegmentPreloadThreads(),

Review Comment:
   Do we always want to create this executor? Or by default not creating it (0 preload thread)? Currently the table data manager can take nullable executor



##########
pinot-core/src/main/java/org/apache/pinot/core/data/manager/BaseTableDataManager.java:
##########
@@ -99,6 +100,7 @@ public abstract class BaseTableDataManager implements TableDataManager {
   protected File _resourceTmpDir;
   protected Logger _logger;
   protected HelixManager _helixManager;
+  protected ExecutorService _preloadExecutor;

Review Comment:
   (nit) `_segmentPreloadExecutor` to be more specific



##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/data/manager/TableDataManager.java:
##########
@@ -126,6 +130,15 @@ void addOrReplaceSegment(String segmentName, IndexLoadingConfig indexLoadingConf
    */
   void removeSegment(String segmentName);
 
+  default boolean tryLoadExistingSegment(String segmentName, IndexLoadingConfig indexLoadingConfig,
+      SegmentZKMetadata zkMetadata) {
+    return false;
+  }
+
+  default File getSegmentDataDir(String segmentName, @Nullable String segmentTier, TableConfig tableConfig) {
+    return null;
+  }

Review Comment:
   (minor) I don't think we need to provide default implementation for them. This interface is internal, and base class has them implemented



##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/upsert/BasePartitionUpsertMetadataManager.java:
##########
@@ -190,7 +192,11 @@ public void addSegment(ImmutableSegmentImpl segment, @Nullable ThreadSafeMutable
       if (queryableDocIds == null && _deleteRecordColumn != null) {
         queryableDocIds = new ThreadSafeMutableRoaringBitmap();
       }
-      addOrReplaceSegment(segment, validDocIds, queryableDocIds, recordInfoIterator, null, null);
+      if (_tableUpsertMetadataManager.isPreloadingSegment(segmentName)) {

Review Comment:
   I feel it is cleaner if we add `preloadSegment(ImmutableSegment segment)` into `PartitionUpsertMetadataManager` and directly call it in `RealtimeTableDataManager.handleUpsert()`. This way we don't need to pass `TableUpsertMetadataManager` into `PartitionUpsertMetadataManager`, and during preload, we can perform some extra checks such as snapshot exist etc. 



##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/upsert/TableUpsertMetadataManager.java:
##########
@@ -43,4 +46,9 @@ public interface TableUpsertMetadataManager extends Closeable {
    * Stops the metadata manager. After invoking this method, no access to the metadata will be accepted.
    */
   void stop();
+
+  void waitTillReady()

Review Comment:
   Add some javadoc for the new added method



##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/upsert/BaseTableUpsertMetadataManager.java:
##########
@@ -42,10 +73,19 @@ public abstract class BaseTableUpsertMetadataManager implements TableUpsertMetad
   protected PartialUpsertHandler _partialUpsertHandler;
   protected boolean _enableSnapshot;
   protected ServerMetrics _serverMetrics;
+  private HelixManager _helixManager;
+  private ExecutorService _preloadExecutor;
+  private volatile boolean _isReady = false;
+  private final Lock _isReadyLock = new ReentrantLock();
+  private final Condition _isReadyCon = _isReadyLock.newCondition();
+  private final Set<String> _preloadingSegmentsSet = ConcurrentHashMap.newKeySet();

Review Comment:
   Do we need to track this set? Before the manager returns preload finishes, all the add segment is preload



##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/upsert/BaseTableUpsertMetadataManager.java:
##########
@@ -75,6 +115,145 @@ public void init(TableConfig tableConfig, Schema schema, TableDataManager tableD
 
     _enableSnapshot = upsertConfig.isEnableSnapshot();
     _serverMetrics = serverMetrics;
+    _helixManager = helixManager;
+    _preloadExecutor = preloadExecutor;
+    try {
+      if (_enableSnapshot && upsertConfig.isEnablePreload()) {
+        preloadSegments();

Review Comment:
   (MAJOR) Currently we are blocking the `init()` to preload all the segments, and this will prevent table data manager from being created. I feel it might work, but it is not intended. With the current code, we don't even need the lock because before preloading is done, table data manager won't be created.



##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/upsert/TableUpsertMetadataManager.java:
##########
@@ -43,4 +46,9 @@ public interface TableUpsertMetadataManager extends Closeable {
    * Stops the metadata manager. After invoking this method, no access to the metadata will be accepted.
    */
   void stop();
+
+  void waitTillReady()

Review Comment:
   Do we want to make it more specific? e.g. `waitTillReadyToAddData` or `waitTillPreloadIsDone`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org