You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@spark.apache.org by wu...@apache.org on 2022/01/05 02:52:27 UTC
[spark] branch branch-3.0 updated: [SPARK-35714][FOLLOW-UP][CORE] WorkerWatcher should run System.exit in a thread out of RpcEnv
This is an automated email from the ASF dual-hosted git repository.
wuyi pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/branch-3.0 by this push:
new be441e8 [SPARK-35714][FOLLOW-UP][CORE] WorkerWatcher should run System.exit in a thread out of RpcEnv
be441e8 is described below
commit be441e84069acc711ea848c69ae5bd55a7c93531
Author: yi.wu <yi...@databricks.com>
AuthorDate: Wed Jan 5 10:48:16 2022 +0800
[SPARK-35714][FOLLOW-UP][CORE] WorkerWatcher should run System.exit in a thread out of RpcEnv
### What changes were proposed in this pull request?
This PR proposes to let `WorkerWatcher` run `System.exit` in a separate thread instead of some thread of `RpcEnv`.
### Why are the changes needed?
`System.exit` will trigger the shutdown hook to run `executor.stop`, which will result in the same deadlock issue with SPARK-14180. But note that since Spark upgrades to Hadoop 3 recently, each hook now will have a [timeout threshold](https://github.com/apache/hadoop/blob/d4794dd3b2ba365a9d95ad6aafcf43a1ea40f777/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/ShutdownHookManager.java#L205-L209) which forcibly interrupt the hook execution once reaches timeout. [...]
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Tested manually.
Closes #35069 from Ngone51/fix-workerwatcher-exit.
Authored-by: yi.wu <yi...@databricks.com>
Signed-off-by: yi.wu <yi...@databricks.com>
(cherry picked from commit 639d6f40e597d79c680084376ece87e40f4d2366)
Signed-off-by: yi.wu <yi...@databricks.com>
---
.../main/scala/org/apache/spark/deploy/worker/WorkerWatcher.scala | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/core/src/main/scala/org/apache/spark/deploy/worker/WorkerWatcher.scala b/core/src/main/scala/org/apache/spark/deploy/worker/WorkerWatcher.scala
index efffc9f..b7a5728 100644
--- a/core/src/main/scala/org/apache/spark/deploy/worker/WorkerWatcher.scala
+++ b/core/src/main/scala/org/apache/spark/deploy/worker/WorkerWatcher.scala
@@ -54,8 +54,12 @@ private[spark] class WorkerWatcher(
if (isTesting) {
isShutDown = true
} else if (isChildProcessStopping.compareAndSet(false, true)) {
- // SPARK-35714: avoid the duplicate call of `System.exit` to avoid the dead lock
- System.exit(-1)
+ // SPARK-35714: avoid the duplicate call of `System.exit` to avoid the dead lock.
+ // Same as SPARK-14180, we should run `System.exit` in a separate thread to avoid
+ // dead lock since `System.exit` will trigger the shutdown hook of `executor.stop`.
+ new Thread("WorkerWatcher-exit-executor") {
+ override def run(): Unit = System.exit(-1)
+ }.start()
}
override def receive: PartialFunction[Any, Unit] = {
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org