You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Marcelo Vanzin (JIRA)" <ji...@apache.org> on 2019/02/07 18:13:00 UTC

[jira] [Resolved] (SPARK-23974) Do not allocate more containers as expected in dynamic allocation

     [ https://issues.apache.org/jira/browse/SPARK-23974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Marcelo Vanzin resolved SPARK-23974.
------------------------------------
    Resolution: Not A Problem

Closing based on the above comment.

> Do not allocate more containers as expected in dynamic allocation
> -----------------------------------------------------------------
>
>                 Key: SPARK-23974
>                 URL: https://issues.apache.org/jira/browse/SPARK-23974
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 2.1.1
>            Reporter: Darcy Shen
>            Priority: Major
>
> Using Yarn with dynamic allocation enabled, spark does not allocate more containers when current containers(executors) number is less than the max executor num.
> For example, we only have 7 executors working, while our cluster is not busy, and I have set
> {\{ spark.dynamicAllocation.maxExecutors = 600}}
> {{and the current jobs of the context are executed slowly.}}
>  
> A live case with online logs:
> ```
> $ grep "Not adding executors because our current target total" spark-job-server.log.9 | tail
> [2018-04-12 16:07:19,070] DEBUG .ExecutorAllocationManager [] [akka://JobServer/user/jobManager] - Not adding executors because our current target total is already 600 (limit 600)
> [2018-04-12 16:07:20,071] DEBUG .ExecutorAllocationManager [] [akka://JobServer/user/jobManager] - Not adding executors because our current target total is already 600 (limit 600)
> [2018-04-12 16:07:21,072] DEBUG .ExecutorAllocationManager [] [akka://JobServer/user/jobManager] - Not adding executors because our current target total is already 600 (limit 600)
> [2018-04-12 16:07:22,073] DEBUG .ExecutorAllocationManager [] [akka://JobServer/user/jobManager] - Not adding executors because our current target total is already 600 (limit 600)
> [2018-04-12 16:07:23,074] DEBUG .ExecutorAllocationManager [] [akka://JobServer/user/jobManager] - Not adding executors because our current target total is already 600 (limit 600)
> [2018-04-12 16:07:24,075] DEBUG .ExecutorAllocationManager [] [akka://JobServer/user/jobManager] - Not adding executors because our current target total is already 600 (limit 600)
> [2018-04-12 16:07:25,076] DEBUG .ExecutorAllocationManager [] [akka://JobServer/user/jobManager] - Not adding executors because our current target total is already 600 (limit 600)
> [2018-04-12 16:07:26,077] DEBUG .ExecutorAllocationManager [] [akka://JobServer/user/jobManager] - Not adding executors because our current target total is already 600 (limit 600)
> [2018-04-12 16:07:27,078] DEBUG .ExecutorAllocationManager [] [akka://JobServer/user/jobManager] - Not adding executors because our current target total is already 600 (limit 600)
> [2018-04-12 16:07:28,079] DEBUG .ExecutorAllocationManager [] [akka://JobServer/user/jobManager] - Not adding executors because our current target total is already 600 (limit 600)
> $ grep "Not adding executors because our current target total" spark-job-server.log.9 | head
> [2018-04-12 13:52:18,067] DEBUG .ExecutorAllocationManager [] [akka://JobServer/user/jobManager] - Not adding executors because our current target total is already 600 (limit 600)
> [2018-04-12 13:52:19,071] DEBUG .ExecutorAllocationManager [] [akka://JobServer/user/jobManager] - Not adding executors because our current target total is already 600 (limit 600)
> [2018-04-12 13:52:20,072] DEBUG .ExecutorAllocationManager [] [akka://JobServer/user/jobManager] - Not adding executors because our current target total is already 600 (limit 600)
> [2018-04-12 13:52:21,073] DEBUG .ExecutorAllocationManager [] [akka://JobServer/user/jobManager] - Not adding executors because our current target total is already 600 (limit 600)
> [2018-04-12 13:52:22,074] DEBUG .ExecutorAllocationManager [] [akka://JobServer/user/jobManager] - Not adding executors because our current target total is already 600 (limit 600)
> [2018-04-12 13:52:23,075] DEBUG .ExecutorAllocationManager [] [akka://JobServer/user/jobManager] - Not adding executors because our current target total is already 600 (limit 600)
> [2018-04-12 13:52:24,076] DEBUG .ExecutorAllocationManager [] [akka://JobServer/user/jobManager] - Not adding executors because our current target total is already 600 (limit 600)
> [2018-04-12 13:52:25,077] DEBUG .ExecutorAllocationManager [] [akka://JobServer/user/jobManager] - Not adding executors because our current target total is already 600 (limit 600)
> [2018-04-12 13:52:26,078] DEBUG .ExecutorAllocationManager [] [akka://JobServer/user/jobManager] - Not adding executors because our current target total is already 600 (limit 600)
> [2018-04-12 13:52:27,079] DEBUG .ExecutorAllocationManager [] [akka://JobServer/user/jobManager] - Not adding executors because our current target total is already 600 (limit 600)
> $ grep "Not adding executors because our current target total" spark-job-server.log.9 | wc -l
> 8111
> ```
> The logs mean that we are keeping the `numExecutorsTarget == maxNumExecutors == 600` without requesting new executors. And at that time, we only have 7 executors available for our users.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org