You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Dongjoon Hyun (Jira)" <ji...@apache.org> on 2022/01/19 19:50:00 UTC

[jira] [Resolved] (SPARK-37910) Spark executor self-exiting due to driver disassociated in Kubernetes with client deploy-mode

     [ https://issues.apache.org/jira/browse/SPARK-37910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dongjoon Hyun resolved SPARK-37910.
-----------------------------------
    Resolution: Invalid

> Spark executor self-exiting due to driver disassociated in Kubernetes with client deploy-mode
> ---------------------------------------------------------------------------------------------
>
>                 Key: SPARK-37910
>                 URL: https://issues.apache.org/jira/browse/SPARK-37910
>             Project: Spark
>          Issue Type: Bug
>          Components: Kubernetes
>    Affects Versions: 3.2.0
>            Reporter: Petri
>            Priority: Major
>
> I have Spark driver running in a Kubernetes pod with client deploy-mode and it tries to start an executor.
> Executor will fail with error:
>     \{"type":"log", "level":"ERROR", "name":"STREAMING_OTHERS", "time":"2022-01-14T12:29:38.318Z", "timezone":"UTC", "class":"dispatcher-Executor", "method":"spark.executor.CoarseGrainedExecutorBackend.logError(73)", "log":"Executor self-exiting due to : Driver 192-168-39-71.mni-system.pod.cluster.local:40752 disassociated! Shutting down.\n"}
> Then driver will attempt to start another executor which fails with same error and this goes on and on.
> In the driver pod, I see only following errors:
>     22/01/14 12:26:32 ERROR TaskSchedulerImpl: Lost executor 1 on 192.168.43.250:
>     22/01/14 12:27:16 ERROR TaskSchedulerImpl: Lost executor 2 on 192.168.43.233:
>     22/01/14 12:27:59 ERROR TaskSchedulerImpl: Lost executor 3 on 192.168.43.221:
>     22/01/14 12:28:43 ERROR TaskSchedulerImpl: Lost executor 4 on 192.168.43.217:
>     22/01/14 12:29:27 ERROR TaskSchedulerImpl: Lost executor 5 on 192.168.43.197:
>     22/01/14 12:30:10 ERROR TaskSchedulerImpl: Lost executor 6 on 192.168.43.237:
>     22/01/14 12:30:53 ERROR TaskSchedulerImpl: Lost executor 7 on 192.168.43.196:
>     22/01/14 12:31:42 ERROR TaskSchedulerImpl: Lost executor 8 on 192.168.43.228:
>     22/01/14 12:32:31 ERROR TaskSchedulerImpl: Lost executor 9 on 192.168.43.254:
>     22/01/14 12:33:14 ERROR TaskSchedulerImpl: Lost executor 10 on 192.168.43.204:
>     22/01/14 12:33:57 ERROR TaskSchedulerImpl: Lost executor 11 on 192.168.43.231:
> What is wrong? And how can I get executors running correctly?



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org