You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Matt Cheah (JIRA)" <ji...@apache.org> on 2015/06/08 21:48:00 UTC

[jira] [Commented] (SPARK-8167) Tasks that fail due to YARN preemption can cause job failure

    [ https://issues.apache.org/jira/browse/SPARK-8167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14577727#comment-14577727 ] 

Matt Cheah commented on SPARK-8167:
-----------------------------------

To be clear this is independent of SPARK-7451. SPARK-7451 helps for the case that executors die too many times from preemption, but it doesn't not help if the exact same task gets preempted many times.

> Tasks that fail due to YARN preemption can cause job failure
> ------------------------------------------------------------
>
>                 Key: SPARK-8167
>                 URL: https://issues.apache.org/jira/browse/SPARK-8167
>             Project: Spark
>          Issue Type: Bug
>          Components: Scheduler, YARN
>    Affects Versions: 1.3.1
>            Reporter: Patrick Woody
>
> Tasks that are running on preempted executors will count as FAILED with an ExecutorLostFailure. Unfortunately, this can quickly spiral out of control if a large resource shift is occurring, and the tasks get scheduled to executors that immediately get preempted as well.
> The current workaround is to increase spark.task.maxFailures very high, but that can cause delays in true failures. We should ideally differentiate these task statuses so that they don't count towards the failure limit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org