You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Thomas Graves (JIRA)" <ji...@apache.org> on 2016/07/19 19:50:20 UTC

[jira] [Commented] (SPARK-16630) Blacklist a node if executors won't launch on it.

    [ https://issues.apache.org/jira/browse/SPARK-16630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15384744#comment-15384744 ] 

Thomas Graves commented on SPARK-16630:
---------------------------------------

Note this is more at the resource manager level where in like the ApplicationMaster/YarnAllocator we would look at the failure reason since the Spark scheduler itself wouldn't even know about this attempt to launch the executor.

> Blacklist a node if executors won't launch on it.
> -------------------------------------------------
>
>                 Key: SPARK-16630
>                 URL: https://issues.apache.org/jira/browse/SPARK-16630
>             Project: Spark
>          Issue Type: Improvement
>          Components: YARN
>    Affects Versions: 1.6.2
>            Reporter: Thomas Graves
>
> On YARN, its possible that a node is messed or misconfigured such that a container won't launch on it.  For instance if the Spark external shuffle handler didn't get loaded on it , maybe its just some other transient error. 
> It would be nice we could recognize this happening and stop trying to launch executors on it since that could end up causing us to hit our max number of executor failures and then kill the job.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org