You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by "dang-stripe (via GitHub)" <gi...@apache.org> on 2023/05/19 21:02:03 UTC

[GitHub] [pinot] dang-stripe opened a new issue, #10787: Failed queries due to servers shutting down before brokers finish routing table changes

dang-stripe opened a new issue, #10787:
URL: https://github.com/apache/pinot/issues/10787

   We've observed some 425 error query failures during rolling restarts on a relatively low QPS cluster. Looking at logs, we noticed that the server shutdown before the broker finished processing the routing table update. It doesn't seem as though the server is waiting the full `pinot.server.shutdown.noQueryThresholdMs` before shutting down the process fully.
   
   ```
   # server begins shutdown
   [2023-05-18 05:44:52.728337] INFO [BaseServerStarter] [Thread-41:17] Shutting down Pinot server
   [2023-05-18 05:44:52.747490] INFO [BaseServerStarter] [Thread-41:17] Sleep for 4608ms as there are still incoming queries (no query time: 10392ms is smaller than the threshold: 15000ms)
   
   # broker receives signal to remove server from routing table
   [2023-05-18 05:44:52.817685] INFO [BrokerRoutingManager] [ClusterChangeHandlingThread:25] Removing entry for server=Server1, table=Table1 
   
   # server stops quiescing after 4.6s
   [2023-05-18 05:44:57.355546] INFO [BaseServerStarter] [Thread-41:17] No query received within 15000ms (larger than the threshold: 15000ms), mark it as no incoming queries 
   [2023-05-18 05:44:57.355592] INFO [BaseServerStarter] [Thread-41:17] Finished draining queries after 4608ms
   
   # roughly the time when broker starts query
   [2023-05-18 05:45:00.671645] Caused by: java.net.ConnectException: Connection refused
   [2023-05-18 05:45:00.671634] org.apache.pinot.shaded.io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: Server1/10.20.30.40:8098
   [2023-05-18 05:45:00.671597] ERROR [QueryRouter] [jersey-server-managed-async-executor-788:25] Caught exception while sending request 55024 to server: Server1, marking query failed
   [2023-05-18 05:45:00.723279] INFO [QueryLogger] [jersey-server-managed-async-executor-788:25] requestId=55024,table=Table1,timeMs=490
   
   # broker finishes processing routing table change
   [2023-05-18 05:45:00.944494] INFO [BrokerRoutingManager] [ClusterChangeHandlingThread:25] Processed instance config change in 191ms (fetch 1040 instance configs: 68ms, calculate changed servers: 2ms, update 4 routing entries: 121ms), new enabled servers: [], new disabled servers: [Server1], excluded servers: [Server1]
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] Jackie-Jiang closed issue #10787: Failed queries due to servers shutting down before brokers finish routing table changes

Posted by "Jackie-Jiang (via GitHub)" <gi...@apache.org>.
Jackie-Jiang closed issue #10787: Failed queries due to servers shutting down before brokers finish routing table changes
URL: https://github.com/apache/pinot/issues/10787


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org