You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Stavros Kontopoulos (JIRA)" <ji...@apache.org> on 2018/10/12 10:37:00 UTC
[jira] [Comment Edited] (SPARK-24751) [k8s] Relaunch failed
executor pods
[ https://issues.apache.org/jira/browse/SPARK-24751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16647757#comment-16647757 ]
Stavros Kontopoulos edited comment on SPARK-24751 at 10/12/18 10:36 AM:
------------------------------------------------------------------------
If an executor dies (eg. killing it) the backend will re-launch another trying to reach the request value of executors.
$ kubectl get pods -n spark
NAME READY STATUS RESTARTS AGE
test-cpus-1539340474549-driver 1/1 Running 0 13s
test-cpus-1539340474549-exec-1 1/1 Running 0 5s
test-cpus-1539340474549-exec-2 1/1 Running 0 4s
test-cpus-1539340474549-exec-3 1/1 Running 0 4s
test-cpus-1539340474549-exec-4 1/1 Running 0 4s
$ kubectl delete pods test-cpus-1539340474549-exec-4 -n spark
pod "test-cpus-1539340474549-exec-4" deleted
$ kubectl get pods -n spark
NAME READY STATUS RESTARTS AGE
test-cpus-1539340474549-driver 1/1 Running 0 32s
test-cpus-1539340474549-exec-1 1/1 Running 0 24s
test-cpus-1539340474549-exec-2 1/1 Running 0 23s
test-cpus-1539340474549-exec-3 1/1 Running 0 23s
test-cpus-1539340474549-exec-5 1/1 Running 0 8s
Do you mean something else?
was (Author: skonto):
If an executor dies (eg. killing it) the backend will re-launch another trying to reach the request value of executors.
$ kubectl get pods -n spark
NAME READY STATUS RESTARTS AGE
test-cpus-1539340474549-driver 1/1 Running 0 13s
test-cpus-1539340474549-exec-1 1/1 Running 0 5s
test-cpus-1539340474549-exec-2 1/1 Running 0 4s
test-cpus-1539340474549-exec-3 1/1 Running 0 4s
test-cpus-1539340474549-exec-4 1/1 Running 0 4s
$ kubectl delete pods test-cpus-1539340474549-exec-4 -n spark
pod "test-cpus-1539340474549-exec-4" deleted
$ kubectl get pods -n spark
NAME READY STATUS RESTARTS AGE
test-cpus-1539340474549-driver 1/1 Running 0 32s
test-cpus-1539340474549-exec-1 1/1 Running 0 24s
test-cpus-1539340474549-exec-2 1/1 Running 0 23s
test-cpus-1539340474549-exec-3 1/1 Running 0 23s
test-cpus-1539340474549-exec-5 1/1 Running 0 8s
Do you mean something else?
> [k8s] Relaunch failed executor pods
> ------------------------------------
>
> Key: SPARK-24751
> URL: https://issues.apache.org/jira/browse/SPARK-24751
> Project: Spark
> Issue Type: Improvement
> Components: Kubernetes
> Affects Versions: 2.3.1
> Reporter: Dharmesh Kakadia
> Priority: Major
> Labels: kubernetes
>
> Currently, we don't create new executor pods to replace the failed once. This is very useful resiliency.
>
>
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org