You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@spark.apache.org by an...@apache.org on 2016/02/27 00:11:59 UTC

spark git commit: [SPARK-13519][CORE] Driver should tell Executor to stop itself when cleaning executor's state

Repository: spark
Updated Branches:
  refs/heads/master 1e5fcdf96 -> ad615291f


[SPARK-13519][CORE] Driver should tell Executor to stop itself when cleaning executor's state

## What changes were proposed in this pull request?

When the driver removes an executor's state, the connection between the driver and the executor may be still alive so that the executor cannot exit automatically (E.g., Master will send RemoveExecutor when a work is lost but the executor is still alive), so the driver should try to tell the executor to stop itself. Otherwise, we will leak an executor.

This PR modified the driver to send `StopExecutor` to the executor when it's removed.

## How was this patch tested?

manual test: increase the worker heartbeat interval to force it's always timeout and the leak executors are gone.

Author: Shixiong Zhu <sh...@databricks.com>

Closes #11399 from zsxwing/SPARK-13519.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/ad615291
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/ad615291
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/ad615291

Branch: refs/heads/master
Commit: ad615291fe76580ee59e3f48f4efe4627a01409d
Parents: 1e5fcdf
Author: Shixiong Zhu <sh...@databricks.com>
Authored: Fri Feb 26 15:11:57 2016 -0800
Committer: Andrew Or <an...@databricks.com>
Committed: Fri Feb 26 15:11:57 2016 -0800

----------------------------------------------------------------------
 .../spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala  | 4 ++++
 1 file changed, 4 insertions(+)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/spark/blob/ad615291/core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala
----------------------------------------------------------------------
diff --git a/core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala b/core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala
index 0a5b09d..d151de5 100644
--- a/core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala
+++ b/core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala
@@ -179,6 +179,10 @@ class CoarseGrainedSchedulerBackend(scheduler: TaskSchedulerImpl, val rpcEnv: Rp
         context.reply(true)
 
       case RemoveExecutor(executorId, reason) =>
+        // We will remove the executor's state and cannot restore it. However, the connection
+        // between the driver and the executor may be still alive so that the executor won't exit
+        // automatically, so try to tell the executor to stop itself. See SPARK-13519.
+        executorDataMap.get(executorId).foreach(_.executorEndpoint.send(StopExecutor))
         removeExecutor(executorId, reason)
         context.reply(true)
 


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org