You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2022/09/14 14:01:25 UTC

[GitHub] [druid] gianm commented on a diff in pull request #13087: Don't wait for graceful shutdown on failed start

gianm commented on code in PR #13087:
URL: https://github.com/apache/druid/pull/13087#discussion_r970847285


##########
server/src/main/java/org/apache/druid/curator/CuratorModule.java:
##########
@@ -97,6 +99,21 @@ public CuratorFramework makeCurator(ZkEnablementConfig zkEnablementConfig, Curat
 
     framework.getUnhandledErrorListenable().addListener((message, e) -> {
       log.error(e, "Unhandled error in Curator, stopping server.");
+      final Thread halter = new Thread(
+          () -> {
+            try {
+              Threads.sleepFor(10, TimeUnit.SECONDS);
+            }
+            catch (InterruptedException ex) {
+
+            }
+            log.info("Normal shutdown didn't work, halting things NOW!!!!");

Review Comment:
   About the log message:
   
   1) Seems like more of a WARN than an INFO, especially given the original was an ERROR
   2) I'd write the message less colloquially, since people will see this message when they are debugging some sad thing, and a somber tone is called for
   
   Putting those together, how about:
   
   ```
   log.warn("Could not stop server within %d seconds after unhandled Curator error. Halting immediately.");
   ```



##########
server/src/main/java/org/apache/druid/curator/CuratorModule.java:
##########
@@ -97,6 +99,21 @@ public CuratorFramework makeCurator(ZkEnablementConfig zkEnablementConfig, Curat
 
     framework.getUnhandledErrorListenable().addListener((message, e) -> {
       log.error(e, "Unhandled error in Curator, stopping server.");
+      final Thread halter = new Thread(
+          () -> {
+            try {
+              Threads.sleepFor(10, TimeUnit.SECONDS);

Review Comment:
   At first I thought 10s seemed short. But now I'm not so sure. I suppose it's fine. We're trying to shut down the server quickly; the only reason we're waiting any time at all is to give shutdown hooks a chance to run, clean up temporary files, close upstream connections cleanly, finish currently-executing queries, etc.
   
   Due to the "finish currently-executing queries" thing, maybe we should go with 30s?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org