You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/08/03 22:26:06 UTC

[GitHub] [spark] kevin85421 opened a new pull request, #37400: [SPARK-39957][CORE] Delay onDisconnected to enable Driver receives ExecutorExitCode

kevin85421 opened a new pull request, #37400:
URL: https://github.com/apache/spark/pull/37400

   ### What changes were proposed in this pull request?
   When onDisconnected is triggered,
   
   (1) Delay `RemoveExecutor` for 5 seconds to enable driver receives ExecutorExitCode from slow path
   (2) Prevent task scheduler from assigning tasks on the lost executor. (By adding the executor to `executorsPendingLossReason`)
   
   ### Why are the changes needed?
   There are two methods to detect executor loss.
   
   (1) (fast path) `onDisconnected` Executor -> Driver:
   When Executor closes its JVM, the socket (Netty's channel) will be closed. The function onDisconnected will be triggered when it knows the channel is closed.
   
   (2) (slow path) ExecutorRunner -> Worker -> Master -> Driver (See #37385  for details)
   When executor exits with ExecutorExitCode, the exit code will be passed from ExecutorRunner to Driver.
   
   Because fast path determines the executor loss without the information of ExecutorExitCode, these two methods may categorize same cases into different conclusions. For example, when Executor exits with ExecutorExitCode HEARTBEAT_FAILURE, onDisconnected will consider the executor loss as a task failure, but slow path will consider it as a network failure. Obviously, HEARTBEAT_FAILURE is a network failure.
   
   [Notice]
   For more details about ExecutorExitCode, check #37385 for more details.
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   
   ### How was this patch tested?
   ```bash
   bazel run //core:org.apache.spark.SparkContextSuite -- -z "ExitCode"
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] kevin85421 commented on pull request #37400: [SPARK-39957][CORE] Delay onDisconnected to enable Driver receives ExecutorExitCode

Posted by GitBox <gi...@apache.org>.
kevin85421 commented on PR #37400:
URL: https://github.com/apache/spark/pull/37400#issuecomment-1223561499

   Gentle ping @Ngone51 @mridulm


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #37400: [SPARK-39957][CORE] Delay onDisconnected to enable Driver receives ExecutorExitCode

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on PR #37400:
URL: https://github.com/apache/spark/pull/37400#issuecomment-1205884388

   Can one of the admins verify this patch?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] Ngone51 closed pull request #37400: [SPARK-39957][CORE] Delay onDisconnected to enable Driver receives ExecutorExitCode

Posted by GitBox <gi...@apache.org>.
Ngone51 closed pull request #37400: [SPARK-39957][CORE] Delay onDisconnected to enable Driver receives ExecutorExitCode
URL: https://github.com/apache/spark/pull/37400


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] Ngone51 commented on pull request #37400: [SPARK-39957][CORE] Delay onDisconnected to enable Driver receives ExecutorExitCode

Posted by GitBox <gi...@apache.org>.
Ngone51 commented on PR #37400:
URL: https://github.com/apache/spark/pull/37400#issuecomment-1225679087

   Thanks @kevin85421 @mridulm , merged to Master!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] kevin85421 commented on a diff in pull request #37400: [SPARK-39957][CORE] Delay onDisconnected to enable Driver receives ExecutorExitCode

Posted by GitBox <gi...@apache.org>.
kevin85421 commented on code in PR #37400:
URL: https://github.com/apache/spark/pull/37400#discussion_r947325944


##########
core/src/main/scala/org/apache/spark/scheduler/cluster/StandaloneSchedulerBackend.scala:
##########
@@ -175,6 +182,11 @@ private[spark] class StandaloneSchedulerBackend(
       exitStatus: Option[Int],
       workerHost: Option[String]): Unit = {
     val reason: ExecutorLossReason = exitStatus match {
+      case Some(ExecutorExitCode.HEARTBEAT_FAILURE) =>
+        ExecutorExited(ExecutorExitCode.HEARTBEAT_FAILURE, exitCausedByApp = false, message)
+      case Some(ExecutorExitCode.DISK_STORE_FAILED_TO_CREATE_DIR) =>
+        ExecutorExited(ExecutorExitCode.DISK_STORE_FAILED_TO_CREATE_DIR,
+          exitCausedByApp = false, message)

Review Comment:
   Good point. Updated https://github.com/apache/spark/pull/37400/commits/55080195dc6326d3fa3886da83dd0cbaec04a26c.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] kevin85421 commented on pull request #37400: [SPARK-39957][CORE] Delay onDisconnected to enable Driver receives ExecutorExitCode

Posted by GitBox <gi...@apache.org>.
kevin85421 commented on PR #37400:
URL: https://github.com/apache/spark/pull/37400#issuecomment-1204546868

   cc. @Ngone51 @jiangxb1987 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] Ngone51 commented on a diff in pull request #37400: [SPARK-39957][CORE] Delay onDisconnected to enable Driver receives ExecutorExitCode

Posted by GitBox <gi...@apache.org>.
Ngone51 commented on code in PR #37400:
URL: https://github.com/apache/spark/pull/37400#discussion_r947494762


##########
core/src/main/scala/org/apache/spark/scheduler/cluster/StandaloneSchedulerBackend.scala:
##########
@@ -175,8 +182,13 @@ private[spark] class StandaloneSchedulerBackend(
       exitStatus: Option[Int],
       workerHost: Option[String]): Unit = {
     val reason: ExecutorLossReason = exitStatus match {
+      case Some(ExecutorExitCode.HEARTBEAT_FAILURE) =>
+        ExecutorExited(ExecutorExitCode.HEARTBEAT_FAILURE, exitCausedByApp = false, message)
+      case Some(ExecutorExitCode.DISK_STORE_FAILED_TO_CREATE_DIR) =>
+        ExecutorExited(ExecutorExitCode.DISK_STORE_FAILED_TO_CREATE_DIR,
+          exitCausedByApp = false, message)
       case Some(code) => ExecutorExited(code, exitCausedByApp = true, message)
-      case None => ExecutorProcessLost(message, workerHost)
+      case None => ExecutorProcessLost(message, workerHost, causedByApp = workerHost == None)

Review Comment:
   ```suggestion
         case None => ExecutorProcessLost(message, workerHost, causedByApp = workerHost.isEmpty)
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] kevin85421 commented on pull request #37400: [SPARK-39957][CORE] Delay onDisconnected to enable Driver receives ExecutorExitCode

Posted by GitBox <gi...@apache.org>.
kevin85421 commented on PR #37400:
URL: https://github.com/apache/spark/pull/37400#issuecomment-1225928545

   Thank @mridulm @Ngone51 for the review! 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] kevin85421 commented on a diff in pull request #37400: [SPARK-39957][CORE] Delay onDisconnected to enable Driver receives ExecutorExitCode

Posted by GitBox <gi...@apache.org>.
kevin85421 commented on code in PR #37400:
URL: https://github.com/apache/spark/pull/37400#discussion_r948240206


##########
core/src/main/scala/org/apache/spark/internal/config/package.scala:
##########
@@ -2398,4 +2398,13 @@ package object config {
       .version("3.3.0")
       .intConf
       .createWithDefault(5)
+
+  private[spark] val EXECUTOR_REMOVE_DELAY =
+    ConfigBuilder("spark.executor.removeDelayOnDisconnection")

Review Comment:
   Updated.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] mridulm commented on a diff in pull request #37400: [SPARK-39957][CORE] Delay onDisconnected to enable Driver receives ExecutorExitCode

Posted by GitBox <gi...@apache.org>.
mridulm commented on code in PR #37400:
URL: https://github.com/apache/spark/pull/37400#discussion_r945364636


##########
core/src/main/scala/org/apache/spark/internal/config/package.scala:
##########
@@ -2398,4 +2398,13 @@ package object config {
       .version("3.3.0")
       .intConf
       .createWithDefault(5)
+
+  private[spark] val EXECUTOR_REMOVE_DELAY =
+    ConfigBuilder("spark.executor.removeDelayOnDisconnection")

Review Comment:
   This is specific to standalone right ? Or is it relevant to other resource managers ? (would impact the naming of the config)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] kevin85421 commented on a diff in pull request #37400: [SPARK-39957][CORE] Delay onDisconnected to enable Driver receives ExecutorExitCode

Posted by GitBox <gi...@apache.org>.
kevin85421 commented on code in PR #37400:
URL: https://github.com/apache/spark/pull/37400#discussion_r947216569


##########
core/src/main/scala/org/apache/spark/internal/config/package.scala:
##########
@@ -2398,4 +2398,13 @@ package object config {
       .version("3.3.0")
       .intConf
       .createWithDefault(5)
+
+  private[spark] val EXECUTOR_REMOVE_DELAY =
+    ConfigBuilder("spark.executor.removeDelayOnDisconnection")

Review Comment:
   Yes, this PR only focuses on standalone. Do you have any suggestion about the config name? Maybe `spark.standalone.removeExecutorDelay`? Thank you!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] Ngone51 commented on a diff in pull request #37400: [SPARK-39957][CORE] Delay onDisconnected to enable Driver receives ExecutorExitCode

Posted by GitBox <gi...@apache.org>.
Ngone51 commented on code in PR #37400:
URL: https://github.com/apache/spark/pull/37400#discussion_r947492731


##########
core/src/main/scala/org/apache/spark/internal/config/package.scala:
##########
@@ -2398,4 +2398,13 @@ package object config {
       .version("3.3.0")
       .intConf
       .createWithDefault(5)
+
+  private[spark] val EXECUTOR_REMOVE_DELAY =
+    ConfigBuilder("spark.executor.removeDelayOnDisconnection")

Review Comment:
   How about `spark.standalone.executorRemoveDelayOnDisconnection`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] kevin85421 commented on a diff in pull request #37400: [SPARK-39957][CORE] Delay onDisconnected to enable Driver receives ExecutorExitCode

Posted by GitBox <gi...@apache.org>.
kevin85421 commented on code in PR #37400:
URL: https://github.com/apache/spark/pull/37400#discussion_r948224828


##########
core/src/test/scala/org/apache/spark/SparkContextSuite.scala:
##########
@@ -1354,6 +1355,50 @@ class SparkContextSuite extends SparkFunSuite with LocalSparkContext with Eventu
     }.getMessage
     assert(msg.contains("Cannot use the keyword 'proxy' or 'history' in reverse proxy URL"))
   }
+
+  test("ExitCode HEARTBEAT_FAILURE should be counted as network failure") {
+    // This test is used to prove that driver will receive executorExitCode before onDisconnected
+    // removes the executor. If the executor is removed by onDisconnected, the executor loss will be
+    // considered as a task failure. Spark will throw a SparkException because TASK_MAX_FAILURES is
+    // 1. On the other hand, driver removes executor with exitCode HEARTBEAT_FAILURE, the loss
+    // should be counted as network failure, and thus the job should not throw SparkException.
+
+    val conf = new SparkConf().set(TASK_MAX_FAILURES, 1)
+    val sc = new SparkContext("local-cluster[1, 1, 1024]", "test-exit-code-heartbeat", conf)
+    val result = sc.parallelize(1 to 10, 1).map { x =>
+      val context = org.apache.spark.TaskContext.get()
+      if (context.taskAttemptId() == 0) {
+        System.exit(ExecutorExitCode.HEARTBEAT_FAILURE)
+      } else {
+        x
+      }
+    }.count()
+    assert(result == 10L)
+    sc.stop()
+  }
+
+  test("ExitCode HEARTBEAT_FAILURE will be counted as task failure when EXECUTOR_REMOVE_DELAY is" +

Review Comment:
   Updated https://github.com/apache/spark/pull/37400/commits/80b86910a59f5b409caa17d46305027ed89c3dd1



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] Ngone51 commented on a diff in pull request #37400: [SPARK-39957][CORE] Delay onDisconnected to enable Driver receives ExecutorExitCode

Posted by GitBox <gi...@apache.org>.
Ngone51 commented on code in PR #37400:
URL: https://github.com/apache/spark/pull/37400#discussion_r947495278


##########
core/src/test/scala/org/apache/spark/SparkContextSuite.scala:
##########
@@ -1354,6 +1355,50 @@ class SparkContextSuite extends SparkFunSuite with LocalSparkContext with Eventu
     }.getMessage
     assert(msg.contains("Cannot use the keyword 'proxy' or 'history' in reverse proxy URL"))
   }
+
+  test("ExitCode HEARTBEAT_FAILURE should be counted as network failure") {

Review Comment:
   ```suggestion
     test("SPARK-39557: ExitCode HEARTBEAT_FAILURE should be counted as network failure") {
   ```



##########
core/src/test/scala/org/apache/spark/SparkContextSuite.scala:
##########
@@ -1354,6 +1355,50 @@ class SparkContextSuite extends SparkFunSuite with LocalSparkContext with Eventu
     }.getMessage
     assert(msg.contains("Cannot use the keyword 'proxy' or 'history' in reverse proxy URL"))
   }
+
+  test("ExitCode HEARTBEAT_FAILURE should be counted as network failure") {
+    // This test is used to prove that driver will receive executorExitCode before onDisconnected
+    // removes the executor. If the executor is removed by onDisconnected, the executor loss will be
+    // considered as a task failure. Spark will throw a SparkException because TASK_MAX_FAILURES is
+    // 1. On the other hand, driver removes executor with exitCode HEARTBEAT_FAILURE, the loss
+    // should be counted as network failure, and thus the job should not throw SparkException.
+
+    val conf = new SparkConf().set(TASK_MAX_FAILURES, 1)
+    val sc = new SparkContext("local-cluster[1, 1, 1024]", "test-exit-code-heartbeat", conf)
+    val result = sc.parallelize(1 to 10, 1).map { x =>
+      val context = org.apache.spark.TaskContext.get()
+      if (context.taskAttemptId() == 0) {
+        System.exit(ExecutorExitCode.HEARTBEAT_FAILURE)
+      } else {
+        x
+      }
+    }.count()
+    assert(result == 10L)
+    sc.stop()
+  }
+
+  test("ExitCode HEARTBEAT_FAILURE will be counted as task failure when EXECUTOR_REMOVE_DELAY is" +

Review Comment:
   ```suggestion
     test("SPARK-39557: ExitCode HEARTBEAT_FAILURE will be counted as task failure when EXECUTOR_REMOVE_DELAY is" +
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] Ngone51 commented on a diff in pull request #37400: [SPARK-39957][CORE] Delay onDisconnected to enable Driver receives ExecutorExitCode

Posted by GitBox <gi...@apache.org>.
Ngone51 commented on code in PR #37400:
URL: https://github.com/apache/spark/pull/37400#discussion_r945361470


##########
core/src/main/scala/org/apache/spark/scheduler/cluster/StandaloneSchedulerBackend.scala:
##########
@@ -175,6 +182,11 @@ private[spark] class StandaloneSchedulerBackend(
       exitStatus: Option[Int],
       workerHost: Option[String]): Unit = {
     val reason: ExecutorLossReason = exitStatus match {
+      case Some(ExecutorExitCode.HEARTBEAT_FAILURE) =>
+        ExecutorExited(ExecutorExitCode.HEARTBEAT_FAILURE, exitCausedByApp = false, message)
+      case Some(ExecutorExitCode.DISK_STORE_FAILED_TO_CREATE_DIR) =>
+        ExecutorExited(ExecutorExitCode.DISK_STORE_FAILED_TO_CREATE_DIR,
+          exitCausedByApp = false, message)

Review Comment:
   Could you also include the case where `workerHost` is non-empty? In that case, executor loss is caused by worker loss, so `exitCausedByApp` should be `false` too.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] mridulm commented on pull request #37400: [SPARK-39957][CORE] Delay onDisconnected to enable Driver receives ExecutorExitCode

Posted by GitBox <gi...@apache.org>.
mridulm commented on PR #37400:
URL: https://github.com/apache/spark/pull/37400#issuecomment-1225259787

   I will let @Ngone51 review/merge - I do not have a lot of context on standalone :-)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on a diff in pull request #37400: [SPARK-39957][CORE] Delay onDisconnected to enable Driver receives ExecutorExitCode

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on code in PR #37400:
URL: https://github.com/apache/spark/pull/37400#discussion_r1148842933


##########
core/src/main/scala/org/apache/spark/scheduler/cluster/StandaloneSchedulerBackend.scala:
##########
@@ -272,4 +285,66 @@ private[spark] class StandaloneSchedulerBackend(
     }
   }
 
+  override def createDriverEndpoint(): DriverEndpoint = {
+    new StandaloneDriverEndpoint()
+  }
+
+  private class StandaloneDriverEndpoint extends DriverEndpoint {
+    // [SC-104659]: There are two paths to detect executor loss.

Review Comment:
   @kevin85421 Sorry to bother you, but can you explain what `SC-104659` and `SC-104335` mean? This seems doesn't Spark jira. Thanks ~
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org