You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Attila Zsolt Piros (Jira)" <ji...@apache.org> on 2021/02/04 16:08:00 UTC
[jira] [Commented] (SPARK-34361) Dynamic allocation on K8s kills
executors with running tasks
[ https://issues.apache.org/jira/browse/SPARK-34361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17278932#comment-17278932 ]
Attila Zsolt Piros commented on SPARK-34361:
--------------------------------------------
I am working on this.
> Dynamic allocation on K8s kills executors with running tasks
> ------------------------------------------------------------
>
> Key: SPARK-34361
> URL: https://issues.apache.org/jira/browse/SPARK-34361
> Project: Spark
> Issue Type: Bug
> Components: Kubernetes
> Affects Versions: 3.0.0, 3.0.1, 3.0.2, 3.1.0, 3.2.0, 3.1.1, 3.1.2
> Reporter: Attila Zsolt Piros
> Priority: Major
>
> There is race between executor POD allocator and cluster scheduler backend.
> During downscaling (in dynamic allocation) we experienced a lot of killed new executors with running task on them.
> The pattern in the log is the following:
> {noformat}
> 21/02/01 15:12:03 INFO ExecutorMonitor: New executor 312 has registered (new total is 138)
> ...
> 21/02/01 15:12:03 INFO TaskSetManager: Starting task 247.0 in stage 4.0 (TID 2079, 100.100.18.138, executor 312, partition 247, PROCESS_LOCAL, 8777 bytes)
> 21/02/01 15:12:03 INFO ExecutorPodsAllocator: Deleting 3 excess pod requests (408,312,307).
> ...
> 21/02/01 15:12:04 ERROR TaskSchedulerImpl: Lost executor 312 on 100.100.18.138: The executor with id 312 was deleted by a user or the framework.
> {noformat}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org