You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Imran Rashid (JIRA)" <ji...@apache.org> on 2016/06/30 18:37:10 UTC

[jira] [Resolved] (SPARK-15865) Blacklist should not result in job hanging with less than 4 executors

     [ https://issues.apache.org/jira/browse/SPARK-15865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Imran Rashid resolved SPARK-15865.
----------------------------------
       Resolution: Fixed
    Fix Version/s: 2.1.0

Issue resolved by pull request 13603
[https://github.com/apache/spark/pull/13603]

> Blacklist should not result in job hanging with less than 4 executors
> ---------------------------------------------------------------------
>
>                 Key: SPARK-15865
>                 URL: https://issues.apache.org/jira/browse/SPARK-15865
>             Project: Spark
>          Issue Type: Bug
>          Components: Scheduler
>    Affects Versions: 2.0.0
>            Reporter: Imran Rashid
>            Assignee: Imran Rashid
>             Fix For: 2.1.0
>
>
> Currently when you turn on blacklisting with {{spark.scheduler.executorTaskBlacklistTime}}, but you have fewer than {{spark.task.maxFailures}} executors, you can end with a job "hung" after some task failures.
> If some task fails regularly (say, due to error in user code), then the task will be blacklisted from the given executor.  It will then try another executor, and fail there as well.  However, after it has tried all available executors, the scheduler will simply stop trying to schedule the task anywhere.  The job doesn't fail, nor it does it succeed -- it simply waits.  Eventually, when the blacklist expires, the task will be scheduled again.  But that can be quite far in the future, and in the meantime the user just observes a stuck job.
> Instead we should abort the stage (and fail any dependent jobs) as soon as we detect tasks that cannot be scheduled.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org