You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Marcelo Vanzin (JIRA)" <ji...@apache.org> on 2017/05/01 21:50:04 UTC

[jira] [Resolved] (SPARK-20540) Dynamic allocation constantly requests and kills executors

     [ https://issues.apache.org/jira/browse/SPARK-20540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Marcelo Vanzin resolved SPARK-20540.
------------------------------------
       Resolution: Fixed
         Assignee: Ryan Blue
    Fix Version/s: 2.2.0
                   2.1.2

> Dynamic allocation constantly requests and kills executors
> ----------------------------------------------------------
>
>                 Key: SPARK-20540
>                 URL: https://issues.apache.org/jira/browse/SPARK-20540
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core, YARN
>    Affects Versions: 2.0.2, 2.1.0, 2.2.0
>            Reporter: Ryan Blue
>            Assignee: Ryan Blue
>             Fix For: 2.1.2, 2.2.0
>
>
> We are seeing some strange behavior with dynamic allocation, where in some cases the driver will get into a state where it constantly kills idle executors while requesting new executors. This happens at the end of a stage when all tasks are assigned and never stops even when there are no tasks to run.
> From the YarnAllocator logs, it looks like the allocator is getting lots of requests from the driver, even though the timeout between requests should be 5s:
> {code:title=Yarn allocator logs}
> 17/04/20 19:52:05 INFO dispatcher-event-loop-49 YarnAllocator: Driver requested a total number of 227 executor(s).
> 17/04/20 19:52:05 INFO dispatcher-event-loop-30 YarnAllocator: Driver requested a total number of 213 executor(s).
> 17/04/20 19:52:05 INFO Reporter YarnAllocator: Will request 1 executor containers, each with 2 cores and 7168 MB memory including 2048 MB overhead
> 17/04/20 19:52:05 INFO Reporter YarnAllocator: Canceled 0 container requests (locality no longer needed)
> 17/04/20 19:52:05 INFO Reporter YarnAllocator: Submitted container request (host: Any, capability: &lt;memory:7168, vCores:2&gt;)
> spark://CoarseGrainedScheduler@100.74.39.143:10895,  executorHostname: ip-100-74-34-230.ec2.internal
> spark://CoarseGrainedScheduler@100.74.39.143:10895,  executorHostname: ip-100-74-47-57.ec2.internal
> 17/04/20 19:52:05 INFO Reporter YarnAllocator: Received 2 containers from YARN, launching executors on 2 of them.
> 17/04/20 19:52:05 INFO dispatcher-event-loop-11 YarnAllocator: Driver requested a total number of 195 executor(s).
> 17/04/20 19:52:05 INFO dispatcher-event-loop-55 YarnAllocator: Driver requested a total number of 174 executor(s).
> 17/04/20 19:52:05 INFO Reporter YarnAllocator: Will request 2 executor containers, each with 2 cores and 7168 MB memory including 2048 MB overhead
> 17/04/20 19:52:05 INFO Reporter YarnAllocator: Canceled 0 container requests (locality no longer needed)
> 17/04/20 19:52:05 INFO Reporter YarnAllocator: Submitted container request (host: Any, capability: &lt;memory:7168, vCores:2&gt;)
> 17/04/20 19:52:05 INFO Reporter YarnAllocator: Submitted container request (host: Any, capability: &lt;memory:7168, vCores:2&gt;)
> 17/04/20 19:52:05 INFO Reporter YarnAllocator: Received 4 containers from YARN, launching executors on 4 of them.
> {code}
> I think the allocator cancels what requests it can, but is getting containers that have already been requested and the executors keep growing because of requests from the driver. Here are 5 seconds from the log:
> {code}
> 17/04/20 19:52:30 INFO dispatcher-event-loop-22 YarnAllocator: Driver requested a total number of 185 executor(s).
> 17/04/20 19:52:30 INFO dispatcher-event-loop-48 YarnAllocator: Driver requested a total number of 193 executor(s).
> 17/04/20 19:52:30 INFO dispatcher-event-loop-24 YarnAllocator: Driver requested a total number of 192 executor(s).
> 17/04/20 19:52:30 INFO dispatcher-event-loop-60 YarnAllocator: Driver requested a total number of 195 executor(s).
> 17/04/20 19:52:30 INFO dispatcher-event-loop-53 YarnAllocator: Driver requested a total number of 205 executor(s).
> 17/04/20 19:52:31 INFO dispatcher-event-loop-19 YarnAllocator: Driver requested a total number of 202 executor(s).
> 17/04/20 19:52:31 INFO dispatcher-event-loop-17 YarnAllocator: Driver requested a total number of 232 executor(s).
> 17/04/20 19:52:31 INFO dispatcher-event-loop-45 YarnAllocator: Driver requested a total number of 243 executor(s).
> 17/04/20 19:52:31 INFO dispatcher-event-loop-19 YarnAllocator: Driver requested a total number of 254 executor(s).
> 17/04/20 19:52:31 INFO dispatcher-event-loop-42 YarnAllocator: Driver requested a total number of 263 executor(s).
> 17/04/20 19:52:31 INFO dispatcher-event-loop-20 YarnAllocator: Driver requested a total number of 271 executor(s).
> 17/04/20 19:52:31 INFO dispatcher-event-loop-35 YarnAllocator: Driver requested a total number of 280 executor(s).
> 17/04/20 19:52:31 INFO dispatcher-event-loop-61 YarnAllocator: Driver requested a total number of 289 executor(s).
> 17/04/20 19:52:32 INFO dispatcher-event-loop-22 YarnAllocator: Driver requested a total number of 305 executor(s).
> 17/04/20 19:52:32 INFO dispatcher-event-loop-28 YarnAllocator: Driver requested a total number of 310 executor(s).
> 17/04/20 19:52:32 INFO dispatcher-event-loop-0 YarnAllocator: Driver requested a total number of 313 executor(s).
> 17/04/20 19:52:32 INFO dispatcher-event-loop-28 YarnAllocator: Driver requested a total number of 315 executor(s).
> 17/04/20 19:52:32 INFO dispatcher-event-loop-40 YarnAllocator: Driver requested a total number of 316 executor(s).
> 17/04/20 19:52:32 INFO dispatcher-event-loop-13 YarnAllocator: Driver requested a total number of 317 executor(s).
> 17/04/20 19:52:32 INFO dispatcher-event-loop-35 YarnAllocator: Driver requested a total number of 311 executor(s).
> 17/04/20 19:52:33 INFO dispatcher-event-loop-40 YarnAllocator: Driver requested a total number of 308 executor(s).
> 17/04/20 19:52:33 INFO dispatcher-event-loop-4 YarnAllocator: Driver requested a total number of 301 executor(s).
> 17/04/20 19:52:33 INFO dispatcher-event-loop-23 YarnAllocator: Driver requested a total number of 294 executor(s).
> 17/04/20 19:52:33 INFO dispatcher-event-loop-46 YarnAllocator: Driver requested a total number of 287 executor(s).
> 17/04/20 19:52:33 INFO dispatcher-event-loop-8 YarnAllocator: Driver requested a total number of 285 executor(s).
> 17/04/20 19:52:33 INFO dispatcher-event-loop-63 YarnAllocator: Driver requested a total number of 283 executor(s).
> 17/04/20 19:52:33 INFO dispatcher-event-loop-35 YarnAllocator: Driver requested a total number of 281 executor(s).
> 17/04/20 19:52:33 INFO dispatcher-event-loop-63 YarnAllocator: Driver requested a total number of 278 executor(s).
> 17/04/20 19:52:33 INFO dispatcher-event-loop-3 YarnAllocator: Driver requested a total number of 277 executor(s).
> 17/04/20 19:52:33 INFO dispatcher-event-loop-38 YarnAllocator: Driver requested a total number of 276 executor(s).
> 17/04/20 19:52:34 INFO dispatcher-event-loop-51 YarnAllocator: Driver requested a total number of 273 executor(s).
> 17/04/20 19:52:34 INFO dispatcher-event-loop-31 YarnAllocator: Driver requested a total number of 271 executor(s).
> 17/04/20 19:52:34 INFO dispatcher-event-loop-44 YarnAllocator: Driver requested a total number of 270 executor(s).
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org