You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Andrew Or (JIRA)" <ji...@apache.org> on 2015/10/20 01:34:27 UTC

[jira] [Resolved] (SPARK-11120) maxNumExecutorFailures defaults to 3 under dynamic allocation

     [ https://issues.apache.org/jira/browse/SPARK-11120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andrew Or resolved SPARK-11120.
-------------------------------
       Resolution: Fixed
         Assignee: Ryan Williams
    Fix Version/s: 1.6.0

> maxNumExecutorFailures defaults to 3 under dynamic allocation
> -------------------------------------------------------------
>
>                 Key: SPARK-11120
>                 URL: https://issues.apache.org/jira/browse/SPARK-11120
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 1.5.1
>            Reporter: Ryan Williams
>            Assignee: Ryan Williams
>            Priority: Minor
>             Fix For: 1.6.0
>
>
> With dynamic allocation, the {{spark.executor.instances}} config is 0, meaning [this line|https://github.com/apache/spark/blob/4ace4f8a9c91beb21a0077e12b75637a4560a542/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala#L66-L68] ends up with {{maxNumExecutorFailures}} equal to {{3}}, which for me has resulted in large dynamicAllocation jobs with hundreds of executors dying due to one bad node serially failing executors that are allocated on it.
> I think that using {{spark.dynamicAllocation.maxExecutors}} would make most sense in this case; I frequently run shells that vary between 1 and 1000 executors, so using {{s.dA.minExecutors}} or {{s.dA.initialExecutors}} would still leave me with a value that is lower than makes sense.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org