You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Lianhui Wang (JIRA)" <ji...@apache.org> on 2015/09/22 14:10:04 UTC

[jira] [Commented] (SPARK-2666) Always try to cancel running tasks when a stage is marked as zombie

    [ https://issues.apache.org/jira/browse/SPARK-2666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14902459#comment-14902459 ] 

Lianhui Wang commented on SPARK-2666:
-------------------------------------

[~imranr] thanks, i have take a look at https://github.com/squito/spark/pull/4. And i think that's logic is right. it is ok except unit test.

> Always try to cancel running tasks when a stage is marked as zombie
> -------------------------------------------------------------------
>
>                 Key: SPARK-2666
>                 URL: https://issues.apache.org/jira/browse/SPARK-2666
>             Project: Spark
>          Issue Type: Bug
>          Components: Scheduler, Spark Core
>            Reporter: Lianhui Wang
>
> There are some situations in which the scheduler can mark a task set as a "zombie" before the task set has completed all of its tasks.  For example:
> (a) When a task fails b/c of a {{FetchFailed}}
> (b) When a stage completes because two different attempts create all the ShuffleMapOutput, though no attempt has completed all its tasks (at least, this *should* result in the task set being marked as zombie, see SPARK-10370)
> (there may be others, I'm not sure if this list is exhaustive.)
> Marking a taskset as zombie prevents any *additional* tasks from getting scheduled, however it does not cancel all currently running tasks.  We should cancel all running to avoid wasting resources (and also to make the behavior a little more clear to the end user).  Rather than canceling tasks in each case piecemeal, we should refactor the scheduler so that these two actions are always taken together -- canceling tasks should go hand-in-hand with marking the taskset as zombie.
> Some implementation notes:
> * We should change {{taskSetManager.isZombie}} to be private and put it behind a method like {{markZombie}} or something.
> * marking a stage as zombie before the all tasks have completed does *not* necessarily mean the stage attempt has failed.  In case (a), the stage attempt has failed, but in stage (b) we are not canceling b/c of a failure, rather just b/c no more tasks are needed.
> * {{taskScheduler.cancelTasks}} always marks the task set as zombie.  However, it also has some side-effects like logging that the stage has failed and creating a {{TaskSetFailed}} event, which we don't want eg. in case (b) when nothing has failed.  So it may need some additional refactoring to go along w/ {{markZombie}}.
> * {{SchedulerBackend}}'s are free to not implement {{killTask}}, so we need to be sure to catch the {{UnsupportedOperationException}} s
> * Testing this *might* benefit from SPARK-10372



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org