You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Dmitro Valentiev (Jira)" <ji...@apache.org> on 2021/02/17 13:01:00 UTC
[jira] [Updated] (SPARK-34453) ExecutorPodsLifecycleManager fails
to remove executors in Kubernetes, SPARK 3.0.1
[ https://issues.apache.org/jira/browse/SPARK-34453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Dmitro Valentiev updated SPARK-34453:
-------------------------------------
Attachment: driver.log
executors.png
> ExecutorPodsLifecycleManager fails to remove executors in Kubernetes, SPARK 3.0.1
> ---------------------------------------------------------------------------------
>
> Key: SPARK-34453
> URL: https://issues.apache.org/jira/browse/SPARK-34453
> Project: Spark
> Issue Type: Bug
> Components: Kubernetes
> Affects Versions: 3.0.1
> Environment: SPARK 3.0.1
> EKS 1.15
> Spark cluster runs in Kubernetes cluster though spark submit.
> Reporter: Dmitro Valentiev
> Priority: Minor
> Attachments: driver.log, executors.png
>
>
> Happens when driver fails to register the reason behind deletion, e.g:
> {code:java}
> 2021-02-17 12:07:56,953 DEBUG KubernetesClusterSchedulerBackend$KubernetesDriverEndpoint:61 - Asked to remove executor 1 with reason The executor with id 1 was deleted by a user or the framework.
> {code}
> ExecutorPodsLifecycleManager fails to remove missing executor and gets stuck in this loop:
> {code:java}
> 2021-02-17 12:13:39,023 DEBUG ExecutorPodsLifecycleManager:61 - Removed executors with ids 3 from Spark that were either found to be deleted or non-existent in the cluster.
> 2021-02-17 12:15:09,042 DEBUG ExecutorPodsLifecycleManager:61 - The executor with ID 3 was not found in the cluster but we didn't get a reason why. Marking the executor as failed. The executor may have been deleted but the driver missed the deletion event.
> {code}
>
> Steps to reproduce:
> # Deploy spark cluster into Kubernetes
> # Delete an executor pod though kubectl
>
> Could be linked / duplicate of SPARK-28488
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org