You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "wangshengjie (Jira)" <ji...@apache.org> on 2023/03/13 03:39:00 UTC

[jira] [Created] (SPARK-42766) YarnAllocator should filter excluded nodes when launching allocated containers

wangshengjie created SPARK-42766:
------------------------------------

             Summary: YarnAllocator should filter excluded nodes when launching allocated containers
                 Key: SPARK-42766
                 URL: https://issues.apache.org/jira/browse/SPARK-42766
             Project: Spark
          Issue Type: Improvement
          Components: YARN
    Affects Versions: 3.3.2
            Reporter: wangshengjie


In production environment, we hit an issue like this:

If we request 10 containers form nodeA and nodeB, first response from Yarn return 5 contianers from nodeA and nodeB, then nodeA blacklisted, and second response from Yarn maybe return some containers from nodeA and launching containers, but when containers(Executor) setup and send register request to Driver, it will be rejected and this failure will be counted to 
{code:java}
spark.yarn.max.executor.failures {code}
, and will casue app failed.
{code:java}
Max number of executor failures ($maxNumExecutorFailures) reached{code}
 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org