You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Weizhong (JIRA)" <ji...@apache.org> on 2016/10/14 02:57:20 UTC
[jira] [Created] (SPARK-17929) Deadlock when AM restart send
RemoveExecutor
Weizhong created SPARK-17929:
--------------------------------
Summary: Deadlock when AM restart send RemoveExecutor
Key: SPARK-17929
URL: https://issues.apache.org/jira/browse/SPARK-17929
Project: Spark
Issue Type: Bug
Components: Spark Core
Affects Versions: 2.0.0
Reporter: Weizhong
Priority: Minor
We fix SPARK-10582, and add reset in CoarseGrainedSchedulerBackend.scala
{code}
protected def reset(): Unit = synchronized {
numPendingExecutors = 0
executorsPendingToRemove.clear()
// Remove all the lingering executors that should be removed but not yet. The reason might be
// because (1) disconnected event is not yet received; (2) executors die silently.
executorDataMap.toMap.foreach { case (eid, _) =>
driverEndpoint.askWithRetry[Boolean](
RemoveExecutor(eid, SlaveLost("Stale executor after cluster manager re-registered.")))
}
}
{code}
but on removeExecutor also need the lock "CoarseGrainedSchedulerBackend.this.synchronized", this will cause deadlock, and send RPC will failed, and reset failed
{code}
private def removeExecutor(executorId: String, reason: ExecutorLossReason): Unit = {
logDebug(s"Asked to remove executor $executorId with reason $reason")
executorDataMap.get(executorId) match {
case Some(executorInfo) =>
// This must be synchronized because variables mutated
// in this block are read when requesting executors
val killed = CoarseGrainedSchedulerBackend.this.synchronized {
addressToExecutorId -= executorInfo.executorAddress
executorDataMap -= executorId
executorsPendingLossReason -= executorId
executorsPendingToRemove.remove(executorId).getOrElse(false)
}
...
{code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org