You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Apache Spark (JIRA)" <ji...@apache.org> on 2015/10/19 07:43:05 UTC

[jira] [Commented] (SPARK-11178) Improve naming around task failures in scheduler code

    [ https://issues.apache.org/jira/browse/SPARK-11178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14962872#comment-14962872 ] 

Apache Spark commented on SPARK-11178:
--------------------------------------

User 'kayousterhout' has created a pull request for this issue:
https://github.com/apache/spark/pull/9164

> Improve naming around task failures in scheduler code
> -----------------------------------------------------
>
>                 Key: SPARK-11178
>                 URL: https://issues.apache.org/jira/browse/SPARK-11178
>             Project: Spark
>          Issue Type: Improvement
>          Components: Scheduler
>    Affects Versions: 1.5.1
>            Reporter: Kay Ousterhout
>            Assignee: Kay Ousterhout
>            Priority: Trivial
>
> Commit af3bc59d1f5d9d952c2d7ad1af599c49f1dbdaf0 introduced new functionality so that if an executor dies for a reason that's not caused by one of the tasks running on the executor (e.g., due to pre-emption), Spark doesn't count the failure towards the maximum number of failures for the task.  That commit introduced some vague naming that I think we should fix; in particular:
>     
> (1) The variable "isNormalExit", which was used to refer to cases where the executor died for a reason unrelated to the tasks running on the machine.  The problem with the existing name is that it's not clear (at least to me!) what it means for an exit to be "normal".
>     
> (2) The variable "shouldEventuallyFailJob" is used to determine whether a task's failure should be counted towards the maximum number of failures allowed for a task before the associated Stage is aborted. The problem with the existing name is that it can be confused with implying that the task's failure should immediately cause the stage to fail because it is somehow fatal (this is the case for a fetch failure, for example: if a task fails because of a fetch failure, there's no point in retrying, and the whole stage should be failed).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org