You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "David Vogelbacher (JIRA)" <ji...@apache.org> on 2018/12/20 21:33:00 UTC

[jira] [Created] (SPARK-26423) [K8s] Make sure that disconnected executors eventually get deleted

David Vogelbacher created SPARK-26423:
-----------------------------------------

             Summary: [K8s] Make sure that disconnected executors eventually get deleted
                 Key: SPARK-26423
                 URL: https://issues.apache.org/jira/browse/SPARK-26423
             Project: Spark
          Issue Type: Bug
          Components: Kubernetes
    Affects Versions: 2.4.0
            Reporter: David Vogelbacher


If an executor disconnects we currently only disable it in the {{KubernetesClusterSchedulerBackend}} but don't take any further action - in the expectation all the other necessary actions (deleting it from spark, requesting a new replacement executor, ...) will be driven by k8s lifecycle events.
However, this only works if the reason that the executor disconnected is that the executor pod is dying/shutting down/...
It doesn't work if there is just some network issue between driver and executor (but the executor pod is still running in k8s and keeps running).
Thus (as indicated in the TODO comment in https://github.com/apache/spark/blob/master/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/KubernetesClusterSchedulerBackend.scala#L158), we should make sure that a disconnected executor eventually does get killed in k8s.





--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org