You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by "sylph-eu (via GitHub)" <gi...@apache.org> on 2023/07/31 11:20:51 UTC

[GitHub] [pinot] sylph-eu opened a new pull request, #11226: [bugfix] Do not move real-time segments to working dir on restart

sylph-eu opened a new pull request, #11226:
URL: https://github.com/apache/pinot/pull/11226

   **Context**
   We're running Pinot on K8S in one of the public clouds, with `pinot.server.instance.segment.directory.loader=tierBased` and multiple tiers/volumes and data directories.
   
   We've noticed that working directory of Pinot (located on ephemeral storage) collects plenty of real-time segments that, per configuration, shall reside in the respective data directory for a tier (persistent storage). Further investigation revealed that `RealtimeTableDataManager` doesn't initialize `IndexLoadingConfig` during segment loading, thus forcing the segment to be moved to `<empty>/<segment_dir>`, which results in a folder in the working directory. The latter contributes to the instability of Pinot and longer restart times.
   
   **How to reproduce**:
   1. Set `pinot.server.instance.segment.directory.loader=tierBased` 
   2. Create a completed real-time segment.
   3. Restart pinot-server, observe the relocation of the segment to the working dir.
   
   **Changes**:
   - Initialize `indexLoadingConfig` with table data dir, in the same way as `BaseTableDataManager.addSegment` is doing.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] Jackie-Jiang commented on a diff in pull request #11226: [bugfix] Do not move real-time segments to working dir on restart

Posted by "Jackie-Jiang (via GitHub)" <gi...@apache.org>.
Jackie-Jiang commented on code in PR #11226:
URL: https://github.com/apache/pinot/pull/11226#discussion_r1281388735


##########
pinot-core/src/main/java/org/apache/pinot/core/data/manager/BaseTableDataManager.java:
##########
@@ -232,7 +232,9 @@ public void addSegment(ImmutableSegment immutableSegment) {
   @Override
   public void addSegment(File indexDir, IndexLoadingConfig indexLoadingConfig)
       throws Exception {
+

Review Comment:
   ```suggestion
   ```



##########
pinot-core/src/test/java/org/apache/pinot/core/data/manager/realtime/RealtimeTableDataManagerTest.java:
##########
@@ -157,6 +157,60 @@ public void testAddSegmentNoBackupCopy()
     assertEquals(llmd.getTotalDocs(), 5);
   }
 
+  @Test
+  public void testAddSegmentDefaultTierByTierBasedDirLoader()
+      throws Exception {
+    RealtimeTableDataManager tmgr1 = new RealtimeTableDataManager(null);
+    TableDataManagerConfig tableDataManagerConfig = createTableDataManagerConfig();
+    ZkHelixPropertyStore propertyStore = mock(ZkHelixPropertyStore.class);
+    TableConfig tableConfig = setupTableConfig(propertyStore);
+    Schema schema = setupSchema(propertyStore);
+    tmgr1.init(tableDataManagerConfig, "server01", propertyStore,
+        new ServerMetrics(PinotMetricUtils.getPinotMetricsRegistry()), mock(HelixManager.class), null, null,
+        new TableDataManagerParams(0, false, -1));
+
+    // Create a raw segment and put it in deep store backed by local fs.
+    String segName = "seg_tiered_01";
+    SegmentZKMetadata segmentZKMetadata =
+        TableDataManagerTestUtils.makeRawSegment(segName, createSegment(tableConfig, schema, segName),
+            new File(TEMP_DIR, segName + TarGzCompressionUtils.TAR_GZ_FILE_EXTENSION), true);
+    segmentZKMetadata.setStatus(Status.DONE);
+
+    // Local segment dir doesn't exist, thus downloading from deep store.
+    File localSegDir = new File(TABLE_DATA_DIR, segName);
+    assertFalse(localSegDir.exists());
+
+    // Add segment
+    IndexLoadingConfig indexLoadingConfig =
+        TableDataManagerTestUtils.createIndexLoadingConfig("tierBased", tableConfig, schema);
+    tmgr1.addSegment(segName, indexLoadingConfig, segmentZKMetadata);
+    assertTrue(localSegDir.exists());
+    SegmentMetadataImpl llmd = new SegmentMetadataImpl(new File(TABLE_DATA_DIR, segName));
+    assertEquals(llmd.getTotalDocs(), 5);
+
+    // Now, repeat initialization of the table data manager
+    tmgr1.shutDown();
+    RealtimeTableDataManager tmgr2 = new RealtimeTableDataManager(null);
+    tableDataManagerConfig = createTableDataManagerConfig();
+    propertyStore = mock(ZkHelixPropertyStore.class);
+    tableConfig = setupTableConfig(propertyStore);
+    schema = setupSchema(propertyStore);
+    tmgr2.init(tableDataManagerConfig, "server01", propertyStore,
+        new ServerMetrics(PinotMetricUtils.getPinotMetricsRegistry()), mock(HelixManager.class), null, null,
+        new TableDataManagerParams(0, false, -1));
+
+    // Reinitialize index loading config and try adding the segment
+    indexLoadingConfig =
+        TableDataManagerTestUtils.createIndexLoadingConfig("tierBased", tableConfig, schema);
+    tmgr2.addSegment(segName, indexLoadingConfig, segmentZKMetadata);
+
+    // Make sure that the segment hasn't been moved
+    assertTrue(localSegDir.exists());
+    llmd = new SegmentMetadataImpl(new File(TABLE_DATA_DIR, segName));
+    assertEquals(llmd.getTotalDocs(), 5);
+  }
+

Review Comment:
   ```suggestion
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] Jackie-Jiang merged pull request #11226: [bugfix] Do not move real-time segments to working dir on restart

Posted by "Jackie-Jiang (via GitHub)" <gi...@apache.org>.
Jackie-Jiang merged PR #11226:
URL: https://github.com/apache/pinot/pull/11226


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] codecov-commenter commented on pull request #11226: [bugfix] Do not move real-time segments to working dir on restart

Posted by "codecov-commenter (via GitHub)" <gi...@apache.org>.
codecov-commenter commented on PR #11226:
URL: https://github.com/apache/pinot/pull/11226#issuecomment-1658238842

   ## [Codecov](https://app.codecov.io/gh/apache/pinot/pull/11226?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) Report
   > Merging [#11226](https://app.codecov.io/gh/apache/pinot/pull/11226?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) (84f4e0f) into [master](https://app.codecov.io/gh/apache/pinot/commit/834c9707e81dc6b40660f6eb0737f5ca3293a2e2?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) (834c970) will **not change** coverage.
   > The diff coverage is `0.00%`.
   
   ```diff
   @@           Coverage Diff           @@
   ##           master   #11226   +/-   ##
   =======================================
     Coverage    0.11%    0.11%           
   =======================================
     Files        2227     2227           
     Lines      119628   119629    +1     
     Branches    18102    18102           
   =======================================
     Hits          137      137           
   - Misses     119471   119472    +1     
     Partials       20       20           
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | integration1temurin11 | `?` | |
   | integration1temurin17 | `0.00% <0.00%> (ø)` | |
   | integration1temurin20 | `?` | |
   | integration2temurin11 | `?` | |
   | integration2temurin17 | `?` | |
   | integration2temurin20 | `?` | |
   | unittests1temurin17 | `?` | |
   | unittests1temurin20 | `?` | |
   | unittests2temurin11 | `0.11% <0.00%> (ø)` | |
   | unittests2temurin17 | `0.11% <0.00%> (ø)` | |
   | unittests2temurin20 | `0.11% <0.00%> (ø)` | |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Files Changed](https://app.codecov.io/gh/apache/pinot/pull/11226?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | Coverage Δ | |
   |---|---|---|
   | [...ata/manager/realtime/RealtimeTableDataManager.java](https://app.codecov.io/gh/apache/pinot/pull/11226?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9kYXRhL21hbmFnZXIvcmVhbHRpbWUvUmVhbHRpbWVUYWJsZURhdGFNYW5hZ2VyLmphdmE=) | `0.00% <0.00%> (ø)` | |
   
   :mega: We’re building smart automated test selection to slash your CI/CD build times. [Learn more](https://about.codecov.io/iterative-testing/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] Jackie-Jiang commented on pull request #11226: [bugfix] Do not move real-time segments to working dir on restart

Posted by "Jackie-Jiang (via GitHub)" <gi...@apache.org>.
Jackie-Jiang commented on PR #11226:
URL: https://github.com/apache/pinot/pull/11226#issuecomment-1658721920

   @klsince Can you also take a look?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] klsince commented on a diff in pull request #11226: [bugfix] Do not move real-time segments to working dir on restart

Posted by "klsince (via GitHub)" <gi...@apache.org>.
klsince commented on code in PR #11226:
URL: https://github.com/apache/pinot/pull/11226#discussion_r1279693383


##########
pinot-core/src/main/java/org/apache/pinot/core/data/manager/realtime/RealtimeTableDataManager.java:
##########
@@ -390,6 +390,9 @@ public void addSegment(String segmentName, IndexLoadingConfig indexLoadingConfig
       return;
     }
 
+    // Assign table directory to not let the segment be moved during loading/preprocessing
+    indexLoadingConfig.setTableDataDir(_tableDataDir);

Review Comment:
   good catch!
   
   As in BaseTableDataMgr, we would also need tier and tierConfigs like below, for segments (particularly those immutable ones) to stay on the expected tiers they were moved to previously. Otherwise, they'd be put back on the default tier when server restarts, and then get moved back to the expected tiers by SegmentRelocator.
   ```
   indexLoadingConfig.setTableDataDir(_tableDataDir);
   indexLoadingConfig.setSegmentTier(segmentTier);
   indexLoadingConfig.setInstanceTierConfigs(_tableDataManagerConfig.getInstanceTierConfigs());
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] sylph-eu commented on pull request #11226: [bugfix] Do not move real-time segments to working dir on restart

Posted by "sylph-eu (via GitHub)" <gi...@apache.org>.
sylph-eu commented on PR #11226:
URL: https://github.com/apache/pinot/pull/11226#issuecomment-1659771349

   @klsince , updated the code. Tier information for segments that belong to offline table is provided at the level of `HelixInstanceDataManager`, as otherwise it would mess with the variety of overloaded `addSegment` methods. The latter  now seem to have certain convention behind, so I left that in tact.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] sylph-eu commented on pull request #11226: [bugfix] Do not move real-time segments to working dir on restart

Posted by "sylph-eu (via GitHub)" <gi...@apache.org>.
sylph-eu commented on PR #11226:
URL: https://github.com/apache/pinot/pull/11226#issuecomment-1660752656

   Rebased


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org