You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/12/03 06:48:20 UTC

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #37821: [SPARK-40379][K8S] Propagate decommission executor loss reason in K8s

dongjoon-hyun commented on code in PR #37821:
URL: https://github.com/apache/spark/pull/37821#discussion_r1038740032


##########
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/KubernetesClusterSchedulerBackend.scala:
##########
@@ -324,11 +324,24 @@ private[spark] class KubernetesClusterSchedulerBackend(
           super.receiveAndReply(context)))
 
     override def onDisconnected(rpcAddress: RpcAddress): Unit = {
-      // Don't do anything besides disabling the executor - allow the Kubernetes API events to
-      // drive the rest of the lifecycle decisions
-      // TODO what if we disconnect from a networking issue? Probably want to mark the executor
-      // to be deleted eventually.
-      addressToExecutorId.get(rpcAddress).foreach(disableExecutor)
+      val execId = addressToExecutorId.get(rpcAddress)
+      execId match {
+        case Some(id) =>
+          executorsPendingDecommission.get(id) match {
+            case Some(host) =>
+              // We don't pass through the host because by convention the
+              // host is only populated if the entire host is going away
+              // and we don't know if that's the case or just one container.
+              removeExecutor(id, ExecutorDecommission(None))
+            case _ =>
+            // Don't do anything besides disabling the executor - allow the Kubernetes API events to
+            // drive the rest of the lifecycle decisions.
+            // If it's disconnected due to network issues eventually heartbeat will clear it up.
+            disableExecutor(id)

Review Comment:
   nit. Could you fix the indentation, @holdenk ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org