You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Thomas Graves (JIRA)" <ji...@apache.org> on 2017/07/12 02:16:00 UTC
[jira] [Created] (SPARK-21383) YARN: can allocate to many
containers
Thomas Graves created SPARK-21383:
-------------------------------------
Summary: YARN: can allocate to many containers
Key: SPARK-21383
URL: https://issues.apache.org/jira/browse/SPARK-21383
Project: Spark
Issue Type: Bug
Components: YARN
Affects Versions: 2.0.0
Reporter: Thomas Graves
The YarnAllocator doesn't properly track containers being launched but not yet running. If it takes time to launch the containers on the NM they don't show up as numExecutorsRunning, but they are already out of the Pending list, so if the allocateResources call happens again it can think it has missing executors even when it doesn't (they just haven't been launched yet).
This was introduced by SPARK-12447
Where it check for missing:
https://github.com/apache/spark/blob/master/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala#L297
Only updates the numRunningExecutors after NM has started it:
https://github.com/apache/spark/blob/master/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala#L524
Thus if the NM is slow or the network is slow, it can miscount and start additional executors.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org