You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by "ankitsultana (via GitHub)" <gi...@apache.org> on 2023/01/27 18:29:40 UTC

[GitHub] [pinot] ankitsultana commented on issue #10185: Restarting a Server with Partial Upsert Tables Leads to Tables in Bad State

ankitsultana commented on issue #10185:
URL: https://github.com/apache/pinot/issues/10185#issuecomment-1406922018

   @Jackie-Jiang : Saw this issue again today and maybe the allSegmentsLoaded lock is an issue. The server was restarted around 10 hours ago from the moment I started debugging. This time most tables have recovered and only 1 partial upsert table is still having an issue.
   
   For that table, there are 6 segments that should be in CONSUMING state as per IS that are in OFFLINE state. There's also 1 segment for the same table that should be in ONLINE state but is in ERROR state.
   
   In the thread-dump, I see that around 5 threads are blocked, waiting to acquire the `allSegmentsLoaded` lock.
   
   ```
   ❯❯❯ cat 1.thdump | grep "0x00007f30e3b2ae68"
           - waiting to lock <0x00007f30e3b2ae68> (a java.util.concurrent.atomic.AtomicBoolean)
           - waiting to lock <0x00007f30e3b2ae68> (a java.util.concurrent.atomic.AtomicBoolean)
           - waiting to lock <0x00007f30e3b2ae68> (a java.util.concurrent.atomic.AtomicBoolean)
           - waiting to lock <0x00007f30e3b2ae68> (a java.util.concurrent.atomic.AtomicBoolean)
           - waiting to lock <0x00007f30e3b2ae68> (a java.util.concurrent.atomic.AtomicBoolean)
           - locked <0x00007f30e3b2ae68> (a java.util.concurrent.atomic.AtomicBoolean)
   ```
   
   ```
   "HelixTaskExecutor-message_handle_thread_36" #117 daemon prio=5 os_prio=0 cpu=4243721.44ms elapsed=37170.30s tid=0x00007f2d1804d800 nid=0xda waiting on condition  [0x00007f2c97dfb000]
      java.lang.Thread.State: TIMED_WAITING (sleeping)
           at java.lang.Thread.sleep(java.base@11.0.15/Native Method)
           at org.apache.pinot.segment.local.utils.tablestate.TableStateUtils.waitForAllSegmentsLoaded(TableStateUtils.java:133)
           at org.apache.pinot.core.data.manager.realtime.RealtimeTableDataManager.addSegment(RealtimeTableDataManager.java:416)
           - locked <0x00007f30e3b2ae68> (a java.util.concurrent.atomic.AtomicBoolean)
           at org.apache.pinot.server.starter.helix.HelixInstanceDataManager.addRealtimeSegment(HelixInstanceDataManager.java:189)
           at org.apache.pinot.server.starter.helix.SegmentOnlineOfflineStateModelFactory$SegmentOnlineOfflineStateModel.onBecomeOnlineFromOffline(SegmentOnlineOfflineStateModelFactory.java:168)
           at org.apache.pinot.server.starter.helix.SegmentOnlineOfflineStateModelFactory$SegmentOnlineOfflineStateModel.onBecomeConsumingFromOffline(SegmentOnlineOfflineStateModelFactory.java:83)
   ```
   
   I do see a bunch of helix message handler threads are sitting idle (we have 50+ and only 6 are involved with that lock above, all presumably for the same table).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org