You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by GitBox <gi...@apache.org> on 2022/10/07 20:05:17 UTC

[GitHub] [pinot] dang-stripe commented on issue #8948: Stuck unassigned segments after metadata push while controller is shutting down

dang-stripe commented on issue #8948:
URL: https://github.com/apache/pinot/issues/8948#issuecomment-1272044875

   This issue happened again, but we were able to get more logs this time. It seems like:
   
   1. Job sends segment upload request controller
   2. Controller processes upload, adds segment to property store
   3. Controller starts shutting down including ZK conn
   4. Controller fails to update ideal state due to ZK conn closed
   5. Upload request fails
   
   The batch job that performed the push seemed to succeed surprisingly. When we try to perform the upload again by re-running the push job, it does a no-op because it finds the segment already, but nothing updates the ideal state.
   
   ```
   [2022-10-07 01:49:04.335934] INFO [PinotSegmentUploadDownloadRestletResource:324] [jersey-server-managed-async-executor-6:224] HTTP Header: DOWNLOAD_URI is: s3://...
   [2022-10-07 01:49:04.338695] INFO [PinotSegmentUploadDownloadRestletResource:246] [jersey-server-managed-async-executor-6:224] Uploading a segment table1_1_2022-09-24-21_2022-10-06-23_0 to table: table1, push type METADATA, (Derived from API parameter)
   [2022-10-07 01:49:04.338807] INFO [PinotSegmentUploadDownloadRestletResource:267] [jersey-server-managed-async-executor-6:224] Processing upload request for segment: table1_1_2022-09-24-21_2022-10-06-23_0 of table: table1_OFFLINE from client: localhost, ingestion descriptor: null
   [2022-10-07 01:49:04.338867] INFO [PinotSegmentUploadDownloadRestletResource:295] [jersey-server-managed-async-executor-6:224] Setting zkDownloadUri: to s3://... for segment: table1_1_2022-09-24-21_2022-10-06-23_0 of table: table1_OFFLINE, skipping move
   [2022-10-07 01:49:04.340471] INFO [ZKOperator:76] [jersey-server-managed-async-executor-6:224] Adding new segment table1_1_2022-09-24-21_2022-10-06-23_0 from table table1_OFFLINE
   [2022-10-07 01:49:04.340565] INFO [ZKOperator:240] [jersey-server-managed-async-executor-6:224] Skipping segment move, keeping segment table1_1_2022-09-24-21_2022-10-06-23_0 from table table1_OFFLINE at s3://...
   [2022-10-07 01:49:04.646401] INFO [PinotHelixResourceManager:1799] [jersey-server-managed-async-executor-6:224] Added segment: table1_1_2022-09-24-21_2022-10-06-23_0 of table: table1_OFFLINE to property store
   [2022-10-07 01:51:52.018327] ERROR [PinotHelixResourceManager:1834] [jersey-server-managed-async-executor-6:224] Caught exception while adding segment: table1_1_2022-09-24-21_2022-10-06-23_0 to IdealState for table: table1_OFFLINE, deleting segment ZK metadata
   [2022-10-07 01:51:52.018343] java.lang.RuntimeException: Caught exception while updating ideal state for resource: table1_OFFLINE
   [2022-10-07 01:51:52.018368] 	at org.apache.pinot.common.utils.helix.HelixHelper.updateIdealState(HelixHelper.java:160) ~[pinot-all-0.9.0-2022-09-15-148e6c38e-SNAPSHOT-jar-with-dependencies.jar:0.9.0-2022-09-15-148e6c38e-SNAPSHOT-148e6c38ec4e797319fa1b66f0f212a4c2b2f171]
   [2022-10-07 01:51:52.018392] 	at org.apache.pinot.common.utils.helix.HelixHelper.updateIdealState(HelixHelper.java:180) ~[pinot-all-0.9.0-2022-09-15-148e6c38e-SNAPSHOT-jar-with-dependencies.jar:0.9.0-2022-09-15-148e6c38e-SNAPSHOT-148e6c38ec4e797319fa1b66f0f212a4c2b2f171]
   [2022-10-07 01:51:52.018418] 	at org.apache.pinot.controller.helix.core.PinotHelixResourceManager.assignTableSegment(PinotHelixResourceManager.java:1814) ~[pinot-all-0.9.0-2022-09-15-148e6c38e-SNAPSHOT-jar-with-dependencies.jar:0.9.0-2022-09-15-148e6c38e-SNAPSHOT-148e6c38ec4e797319fa1b66f0f212a4c2b2f171]
   Show all 46 lines
   [2022-10-07 01:51:52.025769] WARN [ZkClient:1273] [jersey-server-managed-async-executor-6:224] Failed to delete path /pinot_cluster/PROPERTYSTORE/SEGMENTS/table1_OFFLINE/table1_1_2022-09-24-21_2022-10-06-23_0! java.lang.IllegalStateException: ZkClient already closed!
   [2022-10-07 01:51:52.028874] ERROR [PinotSegmentUploadDownloadRestletResource:53] [jersey-server-managed-async-executor-6:224] Caught internal server exception while uploading segment [2022-10-07 01:51:52.028892] java.lang.IllegalStateException: ZkClient already closed!
   ```
   
   Shutdown logs on the same host:
   ```
   [2022-10-07 01:51:46.631185] INFO [PinotServiceManager:233] [Thread-1:30] Shutting down Pinot Service Manager with all running Pinot instances...
   --
     | [2022-10-07 01:51:51.679919] INFO [HelixTaskExecutor:1148] [Thread-1:30] Shutting down HelixTaskExecutor
     | [2022-10-07 01:51:51.680457] INFO [HelixTaskExecutor:535] [Thread-1:30] Shutting down pool: java.util.concurrent.ThreadPoolExecutor@6aed4a23[Running, pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 0]
     | [2022-10-07 01:51:51.680633] INFO [HelixTaskExecutor:535] [Thread-1:30] Shutting down pool: java.util.concurrent.ThreadPoolExecutor@404566e7[Running, pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 0]
     | [2022-10-07 01:51:51.680777] INFO [HelixTaskExecutor:535] [Thread-1:30] Shutting down pool: java.util.concurrent.ThreadPoolExecutor@2559f65c[Running, pool size = 40, active threads = 0, queued tasks = 0, completed tasks = 56]
     | [2022-10-07 01:51:51.692292] INFO [HelixTaskExecutor:535] [Thread-1:30] Shutting down pool: java.util.concurrent.ThreadPoolExecutor@7d5cbcc9[Running, pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 0]
     | [2022-10-07 01:51:51.821168] INFO [BaseControllerStarter:718] [Thread-1:30] Shutting down executor service
     | [2022-10-07 01:51:51.821300] INFO [HelixTaskExecutor:1148] [Thread-1:30] Shutting down HelixTaskExecutor
     | [2022-10-07 01:51:51.821982] INFO [HelixTaskExecutor:535] [Thread-1:30] Shutting down pool: java.util.concurrent.ThreadPoolExecutor@4f482051[Running, pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 0]
     | [2022-10-07 01:51:51.822040] INFO [HelixTaskExecutor:535] [Thread-1:30] Shutting down pool: java.util.concurrent.ThreadPoolExecutor@294804bd[Running, pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 0]
     | [2022-10-07 01:51:51.822095] INFO [HelixTaskExecutor:535] [Thread-1:30] Shutting down pool: java.util.concurrent.ThreadPoolExecutor@51573738[Running, pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 0]
     | [2022-10-07 01:51:51.822145] INFO [HelixTaskExecutor:535] [Thread-1:30] Shutting down pool: java.util.concurrent.ThreadPoolExecutor@2474df51[Running, pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 0]
     | [2022-10-07 01:52:04.331357] INFO [GenericHelixController:1082] [Thread-1:30] Shutting down DEFAULT pipeline
     | [2022-10-07 01:52:05.013192] INFO [GenericHelixController:1085] [Thread-1:30] Shutting down TASK pipeline
     | [2022-10-07 01:52:05.127604] INFO [BaseControllerStarter:679] [Thread-1:30] Finish shutting down Pinot controller for Controller_pinot-controller-fw-3.service.consul_9000
     | [2022-10-07 01:52:05.128153] INFO [PinotServiceManager:224] [Thread-1:30] Shutting down Pinot Service Manager admin application...
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org