You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Attila Zsolt Piros (Jira)" <ji...@apache.org> on 2021/02/04 16:08:00 UTC

[jira] [Commented] (SPARK-34361) Dynamic allocation on K8s kills executors with running tasks

    [ https://issues.apache.org/jira/browse/SPARK-34361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17278932#comment-17278932 ] 

Attila Zsolt Piros commented on SPARK-34361:
--------------------------------------------

I am working on this.

> Dynamic allocation on K8s kills executors with running tasks
> ------------------------------------------------------------
>
>                 Key: SPARK-34361
>                 URL: https://issues.apache.org/jira/browse/SPARK-34361
>             Project: Spark
>          Issue Type: Bug
>          Components: Kubernetes
>    Affects Versions: 3.0.0, 3.0.1, 3.0.2, 3.1.0, 3.2.0, 3.1.1, 3.1.2
>            Reporter: Attila Zsolt Piros
>            Priority: Major
>
> There is race between executor POD allocator and cluster scheduler backend. 
> During downscaling (in dynamic allocation) we experienced a lot of killed new executors with running task on them.
> The pattern in the log is the following:
> {noformat}
> 21/02/01 15:12:03 INFO ExecutorMonitor: New executor 312 has registered (new total is 138)
> ...
> 21/02/01 15:12:03 INFO TaskSetManager: Starting task 247.0 in stage 4.0 (TID 2079, 100.100.18.138, executor 312, partition 247, PROCESS_LOCAL, 8777 bytes)
> 21/02/01 15:12:03 INFO ExecutorPodsAllocator: Deleting 3 excess pod requests (408,312,307).
> ...
> 21/02/01 15:12:04 ERROR TaskSchedulerImpl: Lost executor 312 on 100.100.18.138: The executor with id 312 was deleted by a user or the framework.
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org