You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Apache Spark (JIRA)" <ji...@apache.org> on 2016/12/07 06:39:58 UTC
[jira] [Commented] (SPARK-18761) Uncancellable / unkillable tasks
may starve jobs of resoures
[ https://issues.apache.org/jira/browse/SPARK-18761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15727896#comment-15727896 ]
Apache Spark commented on SPARK-18761:
--------------------------------------
User 'JoshRosen' has created a pull request for this issue:
https://github.com/apache/spark/pull/16189
> Uncancellable / unkillable tasks may starve jobs of resoures
> ------------------------------------------------------------
>
> Key: SPARK-18761
> URL: https://issues.apache.org/jira/browse/SPARK-18761
> Project: Spark
> Issue Type: Bug
> Components: Spark Core
> Reporter: Josh Rosen
> Assignee: Josh Rosen
>
> Spark's current task cancellation / task killing mechanism is "best effort" in the sense that some tasks may not be interruptible and may not respond to their "killed" flags being set. If a significant fraction of a cluster's task slots are occupied by tasks that have been marked as killed but remain running then this can lead to a situation where new jobs and tasks are starved of resources because zombie tasks are holding resources.
> I propose to address this problem by introducing a "task reaper" mechanism in executors to monitor tasks after they are marked for killing in order to periodically re-attempt the task kill, capture and log stacktraces / warnings if tasks do not exit in a timely manner, and, optionally, kill the entire executor JVM if cancelled tasks cannot be killed within some timeout.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org