You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "yaooqinn (via GitHub)" <gi...@apache.org> on 2023/12/19 10:30:39 UTC

[PR] [SPARK-46456][CORE] Set Jetty server stop timeout to 5 seconds to reduce the risk of interrupting shutdown hooks [spark]

yaooqinn opened a new pull request, #44413:
URL: https://github.com/apache/spark/pull/44413

   <!--
   Thanks for sending a pull request!  Here are some tips for you:
     1. If this is your first time, please read our contributor guidelines: https://spark.apache.org/contributing.html
     2. Ensure you have added or run the appropriate tests for your PR: https://spark.apache.org/developer-tools.html
     3. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP][SPARK-XXXX] Your PR title ...'.
     4. Be sure to keep the PR description updated to reflect all changes.
     5. Please write your PR title to summarize what this PR proposes.
     6. If possible, provide a concise example to reproduce the issue for a faster review.
     7. If you want to add a new configuration, please read the guideline first for naming configurations in
        'core/src/main/scala/org/apache/spark/internal/config/ConfigEntry.scala'.
     8. If you want to add or modify an error type or message, please read the guideline first in
        'core/src/main/resources/error/README.md'.
   -->
   
   ### What changes were proposed in this pull request?
   <!--
   Please clarify what changes you are proposing. The purpose of this section is to outline the changes and how this PR fixes the issue. 
   If possible, please consider writing useful notes for better and faster reviews in your PR. See the examples below.
     1. If you refactor some codes with changing classes, showing the class hierarchy will help reviewers.
     2. If you fix some SQL features, you can provide some references of other DBMSes.
     3. If there is design documentation, please add the link.
     4. If there is a discussion in the mailing list, please add the link.
   -->
   
   This pull request aims to address the issue of interrupting shutdown hooks during the shutdown process. By setting the _stopTimeout to 5 seconds, we can reduce the risk of causing modules such as MapOutputTracker and BlockManager in the SparkContext to not be properly stopped, resulting in uncleaned resources.
   
   This pull request reduces the value to 5 seconds, taking into account the value from the [QueuedThreadPool](https://git.eclipse.org/c/jetty/org.eclipse.jetty.project.git/tree/jetty-util/src/main/java/org/eclipse/jetty/util/thread/QueuedThreadPool.java#n96)
   
   ### Why are the changes needed?
   <!--
   Please clarify why the changes are needed. For instance,
     1. If you propose a new API, clarify the use case for a new API.
     2. If you fix a bug, you can clarify why it is a bug.
   -->
   
   In Jetty, the ContainerLifeCycle implementation manages a collection of contained beans. For managed beans, it stops them one by one and waits for each to stop for a specified time(30s). A single bean can result in shut down hook timeout, i.e.,
   
   
   ```
   23/12/19 17:07:40 DEBUG QueuedThreadPool: Waiting for Thread[MasterUI-81,5,main] for 14999
   23/12/19 17:07:55 DEBUG QueuedThreadPool: Waiting for Thread[MasterUI-81,5,main] for 14999
   ```
   ```
   23/12/19 17:08:09 WARN ShutdownHookManager: ShutdownHook '' timeout, java.util.concurrent.TimeoutException
   java.util.concurrent.TimeoutException
   	at java.base/java.util.concurrent.FutureTask.get(FutureTask.java:204)
   	at org.apache.hadoop.util.ShutdownHookManager.executeShutdown(ShutdownHookManager.java:124)
   	at org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:95)
   ```
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   <!--
   Note that it means *any* user-facing change including all aspects such as the documentation fix.
   If yes, please clarify the previous behavior and the change this PR proposes - provide the console output, description and/or an example to show the behavior difference if possible.
   If possible, please also clarify if this is a user-facing change compared to the released Spark versions or within the unreleased branches such as master.
   If no, write 'No'.
   -->
   
   no
   
   ### How was this patch tested?
   <!--
   If tests were added, say they were added here. Please make sure to add some test cases that check the changes thoroughly including negative and positive cases if possible.
   If it was tested in a way different from regular unit tests, please clarify how you tested step by step, ideally copy and paste-able, so that other reviewers can test and check, and descendants can verify in the future.
   If tests were not added, please describe why they were not added and/or why it was difficult to add.
   If benchmark tests were added, please run the benchmarks in GitHub Actions for the consistent environment, and the instructions could accord to: https://spark.apache.org/developer-tools.html#github-workflow-benchmarks.
   -->
   
   This can be reproduced easily by local-cluster with proxied SparkUI. 
   
   #### Before 
   ```
   23/12/19 17:08:09 WARN ShutdownHookManager: ShutdownHook '' timeout, java.util.concurrent.TimeoutException
   java.util.concurrent.TimeoutException
   	at java.base/java.util.concurrent.FutureTask.get(FutureTask.java:204)
   	at org.apache.hadoop.util.ShutdownHookManager.executeShutdown(ShutdownHookManager.java:124)
   	at org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:95)
   23/12/19 17:08:09 ERROR Utils: Uncaught exception in thread shutdown-hook-0
   java.lang.InterruptedException
   	at java.base/java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:1679)
   	at java.base/java.util.concurrent.ThreadPoolExecutor.awaitTermination(ThreadPoolExecutor.java:1464)
   	at org.apache.spark.rpc.netty.MessageLoop.stop(MessageLoop.scala:60)
   	at org.apache.spark.rpc.netty.Dispatcher.stop(Dispatcher.scala:205)
   	at org.apache.spark.rpc.netty.NettyRpcEnv.cleanup(NettyRpcEnv.scala:333)
   	at org.apache.spark.rpc.netty.NettyRpcEnv.shutdown(NettyRpcEnv.scala:311)
   	at org.apache.spark.deploy.LocalSparkCluster.$anonfun$stop$4(LocalSparkCluster.scala:97)
   	at org.apache.spark.deploy.LocalSparkCluster.$anonfun$stop$4$adapted(LocalSparkCluster.scala:97)
   	at scala.collection.IterableOnceOps.foreach(IterableOnce.scala:576)
   	at scala.collection.IterableOnceOps.foreach$(IterableOnce.scala:574)
   	at scala.collection.AbstractIterable.foreach(Iterable.scala:933)
   	at org.apache.spark.deploy.LocalSparkCluster.stop(LocalSparkCluster.scala:97)
   	at org.apache.spark.SparkContext$.$anonfun$createTaskScheduler$2(SparkContext.scala:3233)
   	at org.apache.spark.SparkContext$.$anonfun$createTaskScheduler$2$adapted(SparkContext.scala:3232)
   	at org.apache.spark.scheduler.cluster.StandaloneSchedulerBackend.org$apache$spark$scheduler$cluster$StandaloneSchedulerBackend$$stop(StandaloneSchedulerBackend.scala:280)
   	at org.apache.spark.scheduler.cluster.StandaloneSchedulerBackend.stop(StandaloneSchedulerBackend.scala:143)
   	at org.apache.spark.scheduler.SchedulerBackend.stop(SchedulerBackend.scala:34)
   	at org.apache.spark.scheduler.SchedulerBackend.stop$(SchedulerBackend.scala:34)
   	at org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.stop(CoarseGrainedSchedulerBackend.scala:55)
   	at org.apache.spark.scheduler.TaskSchedulerImpl.$anonfun$stop$2(TaskSchedulerImpl.scala:992)
   	at org.apache.spark.util.Utils$.tryLogNonFatalError(Utils.scala:1288)
   	at org.apache.spark.scheduler.TaskSchedulerImpl.stop(TaskSchedulerImpl.scala:992)
   	at org.apache.spark.scheduler.DAGScheduler.$anonfun$stop$4(DAGScheduler.scala:3005)
   	at org.apache.spark.util.Utils$.tryLogNonFatalError(Utils.scala:1288)
   	at org.apache.spark.scheduler.DAGScheduler.stop(DAGScheduler.scala:3005)
   	at org.apache.spark.SparkContext.$anonfun$stop$12(SparkContext.scala:2293)
   	at org.apache.spark.util.Utils$.tryLogNonFatalError(Utils.scala:1288)
   	at org.apache.spark.SparkContext.stop(SparkContext.scala:2293)
   	at org.apache.spark.sql.hive.thriftserver.SparkSQLEnv$.stop(SparkSQLEnv.scala:88)
   	at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.$anonfun$main$2(SparkSQLCLIDriver.scala:151)
   	at org.apache.spark.util.SparkShutdownHook.run(ShutdownHookManager.scala:214)
   	at org.apache.spark.util.SparkShutdownHookManager.$anonfun$runAll$2(ShutdownHookManager.scala:188)
   	at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.scala:18)
   	at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1842)
   	at org.apache.spark.util.SparkShutdownHookManager.$anonfun$runAll$1(ShutdownHookManager.scala:188)
   	at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.scala:18)
   	at scala.util.Try$.apply(Try.scala:210)
   	at org.apache.spark.util.SparkShutdownHookManager.runAll(ShutdownHookManager.scala:188)
   	at org.apache.spark.util.SparkShutdownHookManager$$anon$2.run(ShutdownHookManager.scala:178)
   	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539)
   	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
   	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
   	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
   	at java.base/java.lang.Thread.run(Thread.java:840)
   ```
   
   #### After
   
   ```
   23/12/19 17:38:09 DEBUG QueuedThreadPool: Waiting for Thread[MasterUI-81,5,main] for -3
   23/12/19 17:38:09 WARN QueuedThreadPool: Couldn't stop Thread[MasterUI-78,5,main]
   	at java.base@17.0.9/sun.nio.ch.KQueue.poll(Native Method)
   	at java.base@17.0.9/sun.nio.ch.KQueueSelectorImpl.doSelect(KQueueSelectorImpl.java:122)
   	at java.base@17.0.9/sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:129)
   	at java.base@17.0.9/sun.nio.ch.SelectorImpl.select(SelectorImpl.java:146)
   	at app//org.sparkproject.jetty.io.ManagedSelector.nioSelect(ManagedSelector.java:183)
   	at app//org.sparkproject.jetty.io.ManagedSelector.select(ManagedSelector.java:190)
   	at app//org.sparkproject.jetty.io.ManagedSelector$SelectorProducer.select(ManagedSelector.java:606)
   	at app//org.sparkproject.jetty.io.ManagedSelector$SelectorProducer.produce(ManagedSelector.java:543)
   	at app//org.sparkproject.jetty.util.thread.strategy.EatWhatYouKill.produceTask(EatWhatYouKill.java:362)
   	at app//org.sparkproject.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:186)
   	at app//org.sparkproject.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:173)
   	at app//org.sparkproject.jetty.util.thread.strategy.EatWhatYouKill.produce(EatWhatYouKill.java:137)
   	at app//org.sparkproject.jetty.io.ManagedSelector$$Lambda$775/0x000000c801527460.run(Unknown Source)
   	at app//org.sparkproject.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:883)
   	at app//org.sparkproject.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1034)
   	at java.base@17.0.9/java.lang.Thread.run(Thread.java:840)
   23/12/19 17:38:09 WARN QueuedThreadPool: Couldn't stop Thread[MasterUI-79,5,main]
   	at java.base@17.0.9/sun.nio.ch.KQueue.poll(Native Method)
   	at java.base@17.0.9/sun.nio.ch.KQueueSelectorImpl.doSelect(KQueueSelectorImpl.java:122)
   	at java.base@17.0.9/sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:129)
   	at java.base@17.0.9/sun.nio.ch.SelectorImpl.select(SelectorImpl.java:146)
   	at app//org.sparkproject.jetty.io.ManagedSelector.nioSelect(ManagedSelector.java:183)
   	at app//org.sparkproject.jetty.io.ManagedSelector.select(ManagedSelector.java:190)
   	at app//org.sparkproject.jetty.io.ManagedSelector$SelectorProducer.select(ManagedSelector.java:606)
   	at app//org.sparkproject.jetty.io.ManagedSelector$SelectorProducer.produce(ManagedSelector.java:543)
   	at app//org.sparkproject.jetty.util.thread.strategy.EatWhatYouKill.produceTask(EatWhatYouKill.java:362)
   	at app//org.sparkproject.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:186)
   	at app//org.sparkproject.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:173)
   	at app//org.sparkproject.jetty.util.thread.strategy.EatWhatYouKill.produce(EatWhatYouKill.java:137)
   	at app//org.sparkproject.jetty.io.ManagedSelector$$Lambda$775/0x000000c801527460.run(Unknown Source)
   	at app//org.sparkproject.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:883)
   	at app//org.sparkproject.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1034)
   	at java.base@17.0.9/java.lang.Thread.run(Thread.java:840)
   23/12/19 17:38:09 WARN QueuedThreadPool: Couldn't stop Thread[MasterUI-77,5,main]
   	at java.base@17.0.9/sun.nio.ch.KQueue.poll(Native Method)
   	at java.base@17.0.9/sun.nio.ch.KQueueSelectorImpl.doSelect(KQueueSelectorImpl.java:122)
   	at java.base@17.0.9/sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:129)
   	at java.base@17.0.9/sun.nio.ch.SelectorImpl.select(SelectorImpl.java:146)
   	at app//org.sparkproject.jetty.io.ManagedSelector.nioSelect(ManagedSelector.java:183)
   	at app//org.sparkproject.jetty.io.ManagedSelector.select(ManagedSelector.java:190)
   	at app//org.sparkproject.jetty.io.ManagedSelector$SelectorProducer.select(ManagedSelector.java:606)
   	at app//org.sparkproject.jetty.io.ManagedSelector$SelectorProducer.produce(ManagedSelector.java:543)
   	at app//org.sparkproject.jetty.util.thread.strategy.EatWhatYouKill.produceTask(EatWhatYouKill.java:362)
   	at app//org.sparkproject.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:186)
   	at app//org.sparkproject.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:173)
   	at app//org.sparkproject.jetty.util.thread.strategy.EatWhatYouKill.produce(EatWhatYouKill.java:137)
   	at app//org.sparkproject.jetty.io.ManagedSelector$$Lambda$775/0x000000c801527460.run(Unknown Source)
   	at app//org.sparkproject.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:883)
   	at app//org.sparkproject.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1034)
   	at java.base@17.0.9/java.lang.Thread.run(Thread.java:840)
   23/12/19 17:38:09 WARN QueuedThreadPool: Couldn't stop Thread[MasterUI-82,5,main]
   	at java.base@17.0.9/sun.nio.ch.KQueue.poll(Native Method)
   	at java.base@17.0.9/sun.nio.ch.KQueueSelectorImpl.doSelect(KQueueSelectorImpl.java:122)
   	at java.base@17.0.9/sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:129)
   	at java.base@17.0.9/sun.nio.ch.SelectorImpl.select(SelectorImpl.java:146)
   	at app//org.sparkproject.jetty.io.ManagedSelector.nioSelect(ManagedSelector.java:183)
   	at app//org.sparkproject.jetty.io.ManagedSelector.select(ManagedSelector.java:190)
   	at app//org.sparkproject.jetty.io.ManagedSelector$SelectorProducer.select(ManagedSelector.java:606)
   	at app//org.sparkproject.jetty.io.ManagedSelector$SelectorProducer.produce(ManagedSelector.java:543)
   	at app//org.sparkproject.jetty.util.thread.strategy.EatWhatYouKill.produceTask(EatWhatYouKill.java:362)
   	at app//org.sparkproject.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:186)
   	at app//org.sparkproject.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:173)
   	at app//org.sparkproject.jetty.util.thread.strategy.EatWhatYouKill.produce(EatWhatYouKill.java:137)
   	at app//org.sparkproject.jetty.io.ManagedSelector$$Lambda$775/0x000000c801527460.run(Unknown Source)
   	at app//org.sparkproject.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:883)
   	at app//org.sparkproject.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1034)
   	at java.base@17.0.9/java.lang.Thread.run(Thread.java:840)
   23/12/19 17:38:09 WARN QueuedThreadPool: Couldn't stop Thread[MasterUI-80,5,main]
   	at java.base@17.0.9/sun.nio.ch.KQueue.poll(Native Method)
   	at java.base@17.0.9/sun.nio.ch.KQueueSelectorImpl.doSelect(KQueueSelectorImpl.java:122)
   	at java.base@17.0.9/sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:129)
   	at java.base@17.0.9/sun.nio.ch.SelectorImpl.select(SelectorImpl.java:146)
   	at app//org.sparkproject.jetty.io.ManagedSelector.nioSelect(ManagedSelector.java:183)
   	at app//org.sparkproject.jetty.io.ManagedSelector.select(ManagedSelector.java:190)
   	at app//org.sparkproject.jetty.io.ManagedSelector$SelectorProducer.select(ManagedSelector.java:606)
   	at app//org.sparkproject.jetty.io.ManagedSelector$SelectorProducer.produce(ManagedSelector.java:543)
   	at app//org.sparkproject.jetty.util.thread.strategy.EatWhatYouKill.produceTask(EatWhatYouKill.java:362)
   	at app//org.sparkproject.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:186)
   	at app//org.sparkproject.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:173)
   	at app//org.sparkproject.jetty.util.thread.strategy.EatWhatYouKill.produce(EatWhatYouKill.java:137)
   	at app//org.sparkproject.jetty.io.ManagedSelector$$Lambda$775/0x000000c801527460.run(Unknown Source)
   	at app//org.sparkproject.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:883)
   	at app//org.sparkproject.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1034)
   	at java.base@17.0.9/java.lang.Thread.run(Thread.java:840)
   23/12/19 17:38:09 WARN QueuedThreadPool: Couldn't stop Thread[MasterUI-81,5,main]
   	at java.base@17.0.9/sun.nio.ch.KQueue.poll(Native Method)
   	at java.base@17.0.9/sun.nio.ch.KQueueSelectorImpl.doSelect(KQueueSelectorImpl.java:122)
   	at java.base@17.0.9/sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:129)
   	at java.base@17.0.9/sun.nio.ch.SelectorImpl.select(SelectorImpl.java:146)
   	at app//org.sparkproject.jetty.io.ManagedSelector.nioSelect(ManagedSelector.java:183)
   	at app//org.sparkproject.jetty.io.ManagedSelector.select(ManagedSelector.java:190)
   	at app//org.sparkproject.jetty.io.ManagedSelector$SelectorProducer.select(ManagedSelector.java:606)
   	at app//org.sparkproject.jetty.io.ManagedSelector$SelectorProducer.produce(ManagedSelector.java:543)
   	at app//org.sparkproject.jetty.util.thread.strategy.EatWhatYouKill.produceTask(EatWhatYouKill.java:362)
   	at app//org.sparkproject.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:186)
   	at app//org.sparkproject.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:173)
   	at app//org.sparkproject.jetty.util.thread.strategy.EatWhatYouKill.produce(EatWhatYouKill.java:137)
   	at app//org.sparkproject.jetty.io.ManagedSelector$$Lambda$775/0x000000c801527460.run(Unknown Source)
   	at app//org.sparkproject.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:883)
   	at app//org.sparkproject.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1034)
   	at java.base@17.0.9/java.lang.Thread.run(Thread.java:840)
   23/12/19 17:38:09 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
   23/12/19 17:38:09 INFO MemoryStore: MemoryStore cleared
   23/12/19 17:38:09 INFO BlockManager: BlockManager stopped
   23/12/19 17:38:09 INFO BlockManagerMaster: BlockManagerMaster stopped
   23/12/19 17:38:09 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
   23/12/19 17:38:09 INFO SparkContext: Successfully stopped SparkContext
   23/12/19 17:38:09 INFO ShutdownHookManager: Shutdown hook called
   23/12/19 17:38:09 INFO ShutdownHookManager: Deleting directory /private/var/folders/84/dgr9ykwn6yndcmq1kjxqvk200000gn/T/spark-8eabc592-87f7-4a3c-8884-594076b25df1
   23/12/19 17:38:09 INFO ShutdownHookManager: Deleting directory /private/var/folders/84/dgr9ykwn6yndcmq1kjxqvk200000gn/T/spark-04ca9e0a-819f-41bb-b67a-80356c4dcdd7
   ```
   
   ### Was this patch authored or co-authored using generative AI tooling?
   <!--
   If generative AI tooling has been used in the process of authoring this patch, please include the
   phrase: 'Generated-by: ' followed by the name of the tool and its version.
   If no, write 'No'.
   Please refer to the [ASF Generative Tooling Guidance](https://www.apache.org/legal/generative-tooling.html) for details.
   -->
   no


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46456][CORE] Set Jetty server stop timeout to 5 seconds to reduce the risk of interrupting shutdown hooks [spark]

Posted by "yaooqinn (via GitHub)" <gi...@apache.org>.
yaooqinn commented on PR #44413:
URL: https://github.com/apache/spark/pull/44413#issuecomment-1862520276

   cc @dongjoon-hyun @HyukjinKwon @sarutak, your reviews are highly appreciated.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46456][CORE] Add `spark.ui.jettyStopTimeout` to set Jetty server stop timeout to unblock SparkContext shutdown [spark]

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun closed pull request #44413: [SPARK-46456][CORE] Add `spark.ui.jettyStopTimeout` to set Jetty server stop timeout  to unblock SparkContext shutdown
URL: https://github.com/apache/spark/pull/44413


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46456][CORE] Set Jetty server stop timeout to 5 seconds to reduce the risk of interrupting shutdown hooks [spark]

Posted by "yaooqinn (via GitHub)" <gi...@apache.org>.
yaooqinn commented on PR #44413:
URL: https://github.com/apache/spark/pull/44413#issuecomment-1863730517

   Adding a configuration sounds good to me. How about calling it `spark.ui.jettyStopTimeout`  since I did not find any cases of jetty server being used outside of the UI module?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46456][CORE] Set Jetty server stop timeout to 5 seconds to reduce the risk of interrupting shutdown hooks [spark]

Posted by "sarutak (via GitHub)" <gi...@apache.org>.
sarutak commented on PR #44413:
URL: https://github.com/apache/spark/pull/44413#issuecomment-1863391995

   @yaooqinn 
   This issue is almost specific to `local-cluster` mode and some `MasterUI` threads don't shutdown within the default timeout (30s) somehow isn't it?
   If so, I don't think it's good to change the `stopTimeout`.
   With `local-cluster` mode, `MasterUI` (not SparkUI) is embedded to the SparkContext so you might require much additional waiting time for the shutdown hook than with other modes.
   So, how about making the timeout of the shutdown hook configurable?
   Or, if this issue is very rare, a workaround is configuring the timeout like:
   
   1. prepare `core-site.xml` and edit like as follows
   ```
   <configuration>
     <property>
       <name>hadoop.service.shutdown.timeout</name>
       <value>60s</value>
     </property>
   </configuration>
   ```
   2. Run spark using the `core-site.xml`
   ```
   HADOOP_CONF_DIR=/dir/of/core-site.xml/ bin/spark-submit ...
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46456][CORE] Set Jetty server stop timeout to 5 seconds to reduce the risk of interrupting shutdown hooks [spark]

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun commented on PR #44413:
URL: https://github.com/apache/spark/pull/44413#issuecomment-1863735495

   Ya, actually, I also looked at `spark.ui` namespace. However, Apache Spark Standalone REST API uses Jetty.
   
   https://github.com/apache/spark/blob/aa1ff3789e492545b07d84ac095fc4c39f7446c6/core/src/main/scala/org/apache/spark/deploy/rest/RestSubmissionServer.scala#L36-L48
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46456][CORE] Set Jetty server stop timeout to 5 seconds to reduce the risk of interrupting shutdown hooks [spark]

Posted by "yaooqinn (via GitHub)" <gi...@apache.org>.
yaooqinn commented on PR #44413:
URL: https://github.com/apache/spark/pull/44413#issuecomment-1863740525

   Oh, I missed that. I only counted the callers of `JettyUtilsstartJettyServer`. Thanks for the reminder @dongjoon-hyun 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46456][CORE] Set Jetty server stop timeout to 5 seconds to reduce the risk of interrupting shutdown hooks [spark]

Posted by "yaooqinn (via GitHub)" <gi...@apache.org>.
yaooqinn commented on code in PR #44413:
URL: https://github.com/apache/spark/pull/44413#discussion_r1431216707


##########
core/src/main/scala/org/apache/spark/ui/JettyUtils.scala:
##########
@@ -276,6 +276,7 @@ private[spark] object JettyUtils extends Logging {
     val serverExecutor = new ScheduledExecutorScheduler(s"$serverName-JettyScheduler", true)
 
     try {
+      server.setStopTimeout(5000)

Review Comment:
   Set before start in order to apply recursively for all managed beans added later.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46456][CORE] Add `spark.ui.jettyStopTimeout` to set Jetty server stop timeout to unblock SparkContext shutdown [spark]

Posted by "yaooqinn (via GitHub)" <gi...@apache.org>.
yaooqinn commented on PR #44413:
URL: https://github.com/apache/spark/pull/44413#issuecomment-1865377472

   Thank you, @dongjoon-hyun  and @sarutak !


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46456][CORE] Add `spark.ui.jettyStopTimeout` to set Jetty server stop timeout to unblock SparkContext shutdown [spark]

Posted by "yaooqinn (via GitHub)" <gi...@apache.org>.
yaooqinn commented on PR #44413:
URL: https://github.com/apache/spark/pull/44413#issuecomment-1863829008

   Thank you again, @sarutak.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46456][CORE] Set Jetty server stop timeout to 5 seconds to reduce the risk of interrupting shutdown hooks [spark]

Posted by "yaooqinn (via GitHub)" <gi...@apache.org>.
yaooqinn commented on code in PR #44413:
URL: https://github.com/apache/spark/pull/44413#discussion_r1431216707


##########
core/src/main/scala/org/apache/spark/ui/JettyUtils.scala:
##########
@@ -276,6 +276,7 @@ private[spark] object JettyUtils extends Logging {
     val serverExecutor = new ScheduledExecutorScheduler(s"$serverName-JettyScheduler", true)
 
     try {
+      server.setStopTimeout(5000)

Review Comment:
   Set before start in order to apply the timeouts to all late added managed beans



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46456][CORE] Add `spark.ui.jettyStopTimeout` to set Jetty server stop timeout to unblock SparkContext shutdown [spark]

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun commented on PR #44413:
URL: https://github.com/apache/spark/pull/44413#issuecomment-1865256788

   Thank you, @yaooqinn and @sarutak !


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46456][CORE] Set Jetty server stop timeout to 5 seconds to reduce the risk of interrupting shutdown hooks [spark]

Posted by "yaooqinn (via GitHub)" <gi...@apache.org>.
yaooqinn commented on PR #44413:
URL: https://github.com/apache/spark/pull/44413#issuecomment-1863723892

   local cluster is a showcase for this issue as it's easy to reproduce.
   
   Besides, 5 seconds is also recommended by jetty.
   
   https://git.eclipse.org/c/jetty/org.eclipse.jetty.project.git/tree/jetty-server/src/main/config/etc/jetty.xml?h=jetty-9.3.x#n124
    
   https://git.eclipse.org/c/jetty/org.eclipse.jetty.project.git/tree/jetty-server/src/main/config/etc/jetty.xml?h=jetty-9.1.x#n126


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46456][CORE] Add `spark.ui.jettyStopTimeout` to set Jetty server stop timeout to unblock SparkContext shutdown [spark]

Posted by "sarutak (via GitHub)" <gi...@apache.org>.
sarutak commented on PR #44413:
URL: https://github.com/apache/spark/pull/44413#issuecomment-1863822822

   Adding a new internal config sounds reasonable to me.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46456][CORE] Set Jetty server stop timeout to 5 seconds to reduce the risk of interrupting shutdown hooks [spark]

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun commented on PR #44413:
URL: https://github.com/apache/spark/pull/44413#issuecomment-1863736675

   Up to now, it has been unclear. However, REST API should be considered non-UI component.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org