You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "robreeves (via GitHub)" <gi...@apache.org> on 2024/03/13 21:56:21 UTC

[PR] [SPARK-47383][CORE] Make the shutdown hook timeout configurable [spark]

robreeves opened a new pull request, #45504:
URL: https://github.com/apache/spark/pull/45504

   <!--
   Thanks for sending a pull request!  Here are some tips for you:
     1. If this is your first time, please read our contributor guidelines: https://spark.apache.org/contributing.html
     2. Ensure you have added or run the appropriate tests for your PR: https://spark.apache.org/developer-tools.html
     3. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP][SPARK-XXXX] Your PR title ...'.
     4. Be sure to keep the PR description updated to reflect all changes.
     5. Please write your PR title to summarize what this PR proposes.
     6. If possible, provide a concise example to reproduce the issue for a faster review.
     7. If you want to add a new configuration, please read the guideline first for naming configurations in
        'core/src/main/scala/org/apache/spark/internal/config/ConfigEntry.scala'.
     8. If you want to add or modify an error type or message, please read the guideline first in
        'common/utils/src/main/resources/error/README.md'.
   -->
   
   ### What changes were proposed in this pull request?
   Make the shutdown hook timeout configurable. If this is not defined it falls back to the existing behavior, which uses a default timeout of 30 seconds, or whatever is defined in core-site.xml for the hadoop.service.shutdown.timeout property.
   
   ### Why are the changes needed?
   Spark sometimes times out during the shutdown process. This results in any data left in the queues to be dropped and causes metadata data loss (e.g. event logs, anything written by custom listeners). 
   
   This is not easily configurable before this change. The underlying `org.apache.hadoop.util.ShutdownHookManager` a the default timeout of 30 seconds.  It can be configured by setting hadoop.service.shutdown.timeout, but this must be done in the core-site.xml/core-default.xml because a new hadoop conf object is created and there is no opportunity to modify it.
   
   ### Does this PR introduce _any_ user-facing change?
   Yes, a new config `spark.shutdown.timeout` is added.
   
   
   ### How was this patch tested?
   Manual testing in spark-shell. This is not easily unit testable.
   
   
   ### Was this patch authored or co-authored using generative AI tooling?
   No
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47383][CORE] Make the shutdown hook timeout configurable [spark]

Posted by "robreeves (via GitHub)" <gi...@apache.org>.
robreeves commented on code in PR #45504:
URL: https://github.com/apache/spark/pull/45504#discussion_r1525546878


##########
core/src/main/scala/org/apache/spark/internal/config/package.scala:
##########
@@ -2666,4 +2666,13 @@ package object config {
       .version("4.0.0")
       .booleanConf
       .createWithDefault(false)
+
+  private[spark] val SPARK_SHUTDOWN_TIMEOUT_MS =
+    ConfigBuilder("spark.shutdown.timeout")
+      .doc("Defines the timeout period to wait for all shutdown hooks to be executed. " +
+        "This must be passed as a system property argument in the Java options, for example " +
+        "spark.driver.extraJavaOptions=\"-Dspark.shutdown.timeout=60s\".")

Review Comment:
   Sorry, but I'm not sure what you mean. I will try to clarify. Before this change there is no easy way to set the timeout easily besides in the `core-site.xml` file using `hadoop.service.shutdown.timeout`. For my use case this is problematic because this is centrally managed and not Spark specific. 
   
   During initialization Spark will register all `spark.*` properties as system properties. Then SparkConf will read them each time an object is created. The problem here is `ShutdownHookManager` is called so early during start up the Spark properties have not been registered as system properties. The workaround is for the configuration to explicitly pass the config as a system property to Java using `-D` like in the example.
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47383][CORE] Make the shutdown hook timeout configurable [spark]

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun commented on code in PR #45504:
URL: https://github.com/apache/spark/pull/45504#discussion_r1527619917


##########
core/src/main/scala/org/apache/spark/internal/config/package.scala:
##########
@@ -2666,4 +2666,13 @@ package object config {
       .version("4.0.0")
       .booleanConf
       .createWithDefault(false)
+
+  private[spark] val SPARK_SHUTDOWN_TIMEOUT_MS =
+    ConfigBuilder("spark.shutdown.timeout")

Review Comment:
   Please add the following to make this internal configuration, @robreeves .
   ```
   .internal()
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47383][CORE] Make the shutdown hook timeout configurable [spark]

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun commented on PR #45504:
URL: https://github.com/apache/spark/pull/45504#issuecomment-2000729614

   cc @mridulm 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47383][CORE] Make the shutdown hook timeout configurable [spark]

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun commented on code in PR #45504:
URL: https://github.com/apache/spark/pull/45504#discussion_r1526488963


##########
core/src/main/scala/org/apache/spark/util/ShutdownHookManager.scala:
##########
@@ -177,8 +181,19 @@ private [util] class SparkShutdownHookManager {
     val hookTask = new Runnable() {
       override def run(): Unit = runAll()
     }
-    org.apache.hadoop.util.ShutdownHookManager.get().addShutdownHook(
-      hookTask, FileSystem.SHUTDOWN_HOOK_PRIORITY + 30)
+    val priority = FileSystem.SHUTDOWN_HOOK_PRIORITY + 30
+    // The timeout property must be passed as a Java system property because this
+    // is initialized before Spark configurations are registered as system
+    // properties later in initialization.
+    val timeout = new SparkConf().get(SPARK_SHUTDOWN_TIMEOUT_MS)
+
+    timeout.fold {
+      org.apache.hadoop.util.ShutdownHookManager.get().addShutdownHook(
+        hookTask, priority)
+    } { t =>
+      org.apache.hadoop.util.ShutdownHookManager.get().addShutdownHook(

Review Comment:
   Ah, you're right. My bad. I mistakenly counted `line 180` which is a deleted line.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47383][CORE] Make the shutdown hook timeout configurable [spark]

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun commented on code in PR #45504:
URL: https://github.com/apache/spark/pull/45504#discussion_r1525485675


##########
core/src/main/scala/org/apache/spark/util/ShutdownHookManager.scala:
##########
@@ -177,8 +181,19 @@ private [util] class SparkShutdownHookManager {
     val hookTask = new Runnable() {
       override def run(): Unit = runAll()
     }
-    org.apache.hadoop.util.ShutdownHookManager.get().addShutdownHook(
-      hookTask, FileSystem.SHUTDOWN_HOOK_PRIORITY + 30)
+    val priority = FileSystem.SHUTDOWN_HOOK_PRIORITY + 30
+    // The timeout property must be passed as a Java system property because this

Review Comment:
   According to this claim, why don't we simply use `System.getProperty` instead of creating `SparkConf`? Then, we don't need to introduce this `SparkConf` dependency to this `ShutdownHookManager` class.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47383][CORE] Support `spark.shutdown.timeout` config [spark]

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun closed pull request #45504: [SPARK-47383][CORE] Support `spark.shutdown.timeout` config
URL: https://github.com/apache/spark/pull/45504


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47383][CORE] Make the shutdown hook timeout configurable [spark]

Posted by "robreeves (via GitHub)" <gi...@apache.org>.
robreeves commented on PR #45504:
URL: https://github.com/apache/spark/pull/45504#issuecomment-2004355962

   > +1, LGTM with one comment to make this configuration as `internal` one.
   > 
   > * https://github.com/apache/spark/pull/45504/files#r1527619917
   
   I made the change. Thanks for the reviews!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47383][CORE] Make the shutdown hook timeout configurable [spark]

Posted by "robreeves (via GitHub)" <gi...@apache.org>.
robreeves commented on code in PR #45504:
URL: https://github.com/apache/spark/pull/45504#discussion_r1528862544


##########
core/src/main/scala/org/apache/spark/internal/config/package.scala:
##########
@@ -2666,4 +2666,13 @@ package object config {
       .version("4.0.0")
       .booleanConf
       .createWithDefault(false)
+
+  private[spark] val SPARK_SHUTDOWN_TIMEOUT_MS =
+    ConfigBuilder("spark.shutdown.timeout")

Review Comment:
   Updated



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47383][CORE] Make the shutdown hook timeout configurable [spark]

Posted by "robreeves (via GitHub)" <gi...@apache.org>.
robreeves commented on code in PR #45504:
URL: https://github.com/apache/spark/pull/45504#discussion_r1525539942


##########
core/src/main/scala/org/apache/spark/util/ShutdownHookManager.scala:
##########
@@ -177,8 +181,19 @@ private [util] class SparkShutdownHookManager {
     val hookTask = new Runnable() {
       override def run(): Unit = runAll()
     }
-    org.apache.hadoop.util.ShutdownHookManager.get().addShutdownHook(
-      hookTask, FileSystem.SHUTDOWN_HOOK_PRIORITY + 30)
+    val priority = FileSystem.SHUTDOWN_HOOK_PRIORITY + 30
+    // The timeout property must be passed as a Java system property because this
+    // is initialized before Spark configurations are registered as system
+    // properties later in initialization.
+    val timeout = new SparkConf().get(SPARK_SHUTDOWN_TIMEOUT_MS)
+
+    timeout.fold {
+      org.apache.hadoop.util.ShutdownHookManager.get().addShutdownHook(
+        hookTask, priority)
+    } { t =>
+      org.apache.hadoop.util.ShutdownHookManager.get().addShutdownHook(

Review Comment:
   I only see two places `org.apache.hadoop.util.ShutdownHookManager.get()` exists. Can you clarify where the third location is? Only one case will be invoked. The case on line 191 is if the option is empty and line 194 is if the option is not emoty.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47383][CORE] Make the shutdown hook timeout configurable [spark]

Posted by "robreeves (via GitHub)" <gi...@apache.org>.
robreeves commented on code in PR #45504:
URL: https://github.com/apache/spark/pull/45504#discussion_r1525538333


##########
core/src/main/scala/org/apache/spark/util/ShutdownHookManager.scala:
##########
@@ -177,8 +181,19 @@ private [util] class SparkShutdownHookManager {
     val hookTask = new Runnable() {
       override def run(): Unit = runAll()
     }
-    org.apache.hadoop.util.ShutdownHookManager.get().addShutdownHook(
-      hookTask, FileSystem.SHUTDOWN_HOOK_PRIORITY + 30)
+    val priority = FileSystem.SHUTDOWN_HOOK_PRIORITY + 30
+    // The timeout property must be passed as a Java system property because this

Review Comment:
   The reason I chose to use SparkConf is because it comes with parsing the value without extra code. That includes support for different time units, converting to milliseconds, and handling the optional case.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47383][CORE] Make the shutdown hook timeout configurable [spark]

Posted by "robreeves (via GitHub)" <gi...@apache.org>.
robreeves commented on PR #45504:
URL: https://github.com/apache/spark/pull/45504#issuecomment-1998427198

   @HyukjinKwon can you take a look please? I'm tagging you since you reviewed the last change in this file.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47383][CORE] Make the shutdown hook timeout configurable [spark]

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun commented on code in PR #45504:
URL: https://github.com/apache/spark/pull/45504#discussion_r1525486768


##########
core/src/main/scala/org/apache/spark/util/ShutdownHookManager.scala:
##########
@@ -177,8 +181,19 @@ private [util] class SparkShutdownHookManager {
     val hookTask = new Runnable() {
       override def run(): Unit = runAll()
     }
-    org.apache.hadoop.util.ShutdownHookManager.get().addShutdownHook(
-      hookTask, FileSystem.SHUTDOWN_HOOK_PRIORITY + 30)
+    val priority = FileSystem.SHUTDOWN_HOOK_PRIORITY + 30
+    // The timeout property must be passed as a Java system property because this
+    // is initialized before Spark configurations are registered as system
+    // properties later in initialization.
+    val timeout = new SparkConf().get(SPARK_SHUTDOWN_TIMEOUT_MS)
+
+    timeout.fold {
+      org.apache.hadoop.util.ShutdownHookManager.get().addShutdownHook(
+        hookTask, priority)
+    } { t =>
+      org.apache.hadoop.util.ShutdownHookManager.get().addShutdownHook(

Review Comment:
   Just a question. It seems that we invoke `org.apache.hadoop.util.ShutdownHookManager.get()` three times repeatedly. Is this inevitable?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47383][CORE] Make the shutdown hook timeout configurable [spark]

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun commented on code in PR #45504:
URL: https://github.com/apache/spark/pull/45504#discussion_r1525483000


##########
core/src/main/scala/org/apache/spark/internal/config/package.scala:
##########
@@ -2666,4 +2666,13 @@ package object config {
       .version("4.0.0")
       .booleanConf
       .createWithDefault(false)
+
+  private[spark] val SPARK_SHUTDOWN_TIMEOUT_MS =
+    ConfigBuilder("spark.shutdown.timeout")
+      .doc("Defines the timeout period to wait for all shutdown hooks to be executed. " +
+        "This must be passed as a system property argument in the Java options, for example " +
+        "spark.driver.extraJavaOptions=\"-Dspark.shutdown.timeout=60s\".")

Review Comment:
   According to the PR description's claim, you cannot achieve this with `spark.driver.extraJavaOptions`, @robreeves ?
   
   > This is not easily configurable before this change. The underlying org.apache.hadoop.util.ShutdownHookManager has a default timeout of 30 seconds. It can be configured by setting hadoop.service.shutdown.timeout, but this must be done in the core-site.xml/core-default.xml because a new hadoop conf object is created and there is no opportunity to modify it.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org