You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Apache Spark (Jira)" <ji...@apache.org> on 2023/03/13 06:21:00 UTC

[jira] [Assigned] (SPARK-42766) YarnAllocator should filter excluded nodes when launching allocated containers

     [ https://issues.apache.org/jira/browse/SPARK-42766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Apache Spark reassigned SPARK-42766:
------------------------------------

    Assignee: Apache Spark

> YarnAllocator should filter excluded nodes when launching allocated containers
> ------------------------------------------------------------------------------
>
>                 Key: SPARK-42766
>                 URL: https://issues.apache.org/jira/browse/SPARK-42766
>             Project: Spark
>          Issue Type: Improvement
>          Components: YARN
>    Affects Versions: 3.3.2
>            Reporter: wangshengjie
>            Assignee: Apache Spark
>            Priority: Major
>
> In production environment, we hit an issue like this:
> If we request 10 containers form nodeA and nodeB, first response from Yarn return 5 contianers from nodeA and nodeB, then nodeA blacklisted, and second response from Yarn maybe return some containers from nodeA and launching containers, but when containers(Executor) setup and send register request to Driver, it will be rejected and this failure will be counted to 
> {code:java}
> spark.yarn.max.executor.failures {code}
> , and will casue app failed.
> {code:java}
> Max number of executor failures ($maxNumExecutorFailures) reached{code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org