You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Marcelo Vanzin (JIRA)" <ji...@apache.org> on 2017/05/01 21:50:04 UTC
[jira] [Resolved] (SPARK-20540) Dynamic allocation constantly
requests and kills executors
[ https://issues.apache.org/jira/browse/SPARK-20540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Marcelo Vanzin resolved SPARK-20540.
------------------------------------
Resolution: Fixed
Assignee: Ryan Blue
Fix Version/s: 2.2.0
2.1.2
> Dynamic allocation constantly requests and kills executors
> ----------------------------------------------------------
>
> Key: SPARK-20540
> URL: https://issues.apache.org/jira/browse/SPARK-20540
> Project: Spark
> Issue Type: Bug
> Components: Spark Core, YARN
> Affects Versions: 2.0.2, 2.1.0, 2.2.0
> Reporter: Ryan Blue
> Assignee: Ryan Blue
> Fix For: 2.1.2, 2.2.0
>
>
> We are seeing some strange behavior with dynamic allocation, where in some cases the driver will get into a state where it constantly kills idle executors while requesting new executors. This happens at the end of a stage when all tasks are assigned and never stops even when there are no tasks to run.
> From the YarnAllocator logs, it looks like the allocator is getting lots of requests from the driver, even though the timeout between requests should be 5s:
> {code:title=Yarn allocator logs}
> 17/04/20 19:52:05 INFO dispatcher-event-loop-49 YarnAllocator: Driver requested a total number of 227 executor(s).
> 17/04/20 19:52:05 INFO dispatcher-event-loop-30 YarnAllocator: Driver requested a total number of 213 executor(s).
> 17/04/20 19:52:05 INFO Reporter YarnAllocator: Will request 1 executor containers, each with 2 cores and 7168 MB memory including 2048 MB overhead
> 17/04/20 19:52:05 INFO Reporter YarnAllocator: Canceled 0 container requests (locality no longer needed)
> 17/04/20 19:52:05 INFO Reporter YarnAllocator: Submitted container request (host: Any, capability: <memory:7168, vCores:2>)
> spark://CoarseGrainedScheduler@100.74.39.143:10895, executorHostname: ip-100-74-34-230.ec2.internal
> spark://CoarseGrainedScheduler@100.74.39.143:10895, executorHostname: ip-100-74-47-57.ec2.internal
> 17/04/20 19:52:05 INFO Reporter YarnAllocator: Received 2 containers from YARN, launching executors on 2 of them.
> 17/04/20 19:52:05 INFO dispatcher-event-loop-11 YarnAllocator: Driver requested a total number of 195 executor(s).
> 17/04/20 19:52:05 INFO dispatcher-event-loop-55 YarnAllocator: Driver requested a total number of 174 executor(s).
> 17/04/20 19:52:05 INFO Reporter YarnAllocator: Will request 2 executor containers, each with 2 cores and 7168 MB memory including 2048 MB overhead
> 17/04/20 19:52:05 INFO Reporter YarnAllocator: Canceled 0 container requests (locality no longer needed)
> 17/04/20 19:52:05 INFO Reporter YarnAllocator: Submitted container request (host: Any, capability: <memory:7168, vCores:2>)
> 17/04/20 19:52:05 INFO Reporter YarnAllocator: Submitted container request (host: Any, capability: <memory:7168, vCores:2>)
> 17/04/20 19:52:05 INFO Reporter YarnAllocator: Received 4 containers from YARN, launching executors on 4 of them.
> {code}
> I think the allocator cancels what requests it can, but is getting containers that have already been requested and the executors keep growing because of requests from the driver. Here are 5 seconds from the log:
> {code}
> 17/04/20 19:52:30 INFO dispatcher-event-loop-22 YarnAllocator: Driver requested a total number of 185 executor(s).
> 17/04/20 19:52:30 INFO dispatcher-event-loop-48 YarnAllocator: Driver requested a total number of 193 executor(s).
> 17/04/20 19:52:30 INFO dispatcher-event-loop-24 YarnAllocator: Driver requested a total number of 192 executor(s).
> 17/04/20 19:52:30 INFO dispatcher-event-loop-60 YarnAllocator: Driver requested a total number of 195 executor(s).
> 17/04/20 19:52:30 INFO dispatcher-event-loop-53 YarnAllocator: Driver requested a total number of 205 executor(s).
> 17/04/20 19:52:31 INFO dispatcher-event-loop-19 YarnAllocator: Driver requested a total number of 202 executor(s).
> 17/04/20 19:52:31 INFO dispatcher-event-loop-17 YarnAllocator: Driver requested a total number of 232 executor(s).
> 17/04/20 19:52:31 INFO dispatcher-event-loop-45 YarnAllocator: Driver requested a total number of 243 executor(s).
> 17/04/20 19:52:31 INFO dispatcher-event-loop-19 YarnAllocator: Driver requested a total number of 254 executor(s).
> 17/04/20 19:52:31 INFO dispatcher-event-loop-42 YarnAllocator: Driver requested a total number of 263 executor(s).
> 17/04/20 19:52:31 INFO dispatcher-event-loop-20 YarnAllocator: Driver requested a total number of 271 executor(s).
> 17/04/20 19:52:31 INFO dispatcher-event-loop-35 YarnAllocator: Driver requested a total number of 280 executor(s).
> 17/04/20 19:52:31 INFO dispatcher-event-loop-61 YarnAllocator: Driver requested a total number of 289 executor(s).
> 17/04/20 19:52:32 INFO dispatcher-event-loop-22 YarnAllocator: Driver requested a total number of 305 executor(s).
> 17/04/20 19:52:32 INFO dispatcher-event-loop-28 YarnAllocator: Driver requested a total number of 310 executor(s).
> 17/04/20 19:52:32 INFO dispatcher-event-loop-0 YarnAllocator: Driver requested a total number of 313 executor(s).
> 17/04/20 19:52:32 INFO dispatcher-event-loop-28 YarnAllocator: Driver requested a total number of 315 executor(s).
> 17/04/20 19:52:32 INFO dispatcher-event-loop-40 YarnAllocator: Driver requested a total number of 316 executor(s).
> 17/04/20 19:52:32 INFO dispatcher-event-loop-13 YarnAllocator: Driver requested a total number of 317 executor(s).
> 17/04/20 19:52:32 INFO dispatcher-event-loop-35 YarnAllocator: Driver requested a total number of 311 executor(s).
> 17/04/20 19:52:33 INFO dispatcher-event-loop-40 YarnAllocator: Driver requested a total number of 308 executor(s).
> 17/04/20 19:52:33 INFO dispatcher-event-loop-4 YarnAllocator: Driver requested a total number of 301 executor(s).
> 17/04/20 19:52:33 INFO dispatcher-event-loop-23 YarnAllocator: Driver requested a total number of 294 executor(s).
> 17/04/20 19:52:33 INFO dispatcher-event-loop-46 YarnAllocator: Driver requested a total number of 287 executor(s).
> 17/04/20 19:52:33 INFO dispatcher-event-loop-8 YarnAllocator: Driver requested a total number of 285 executor(s).
> 17/04/20 19:52:33 INFO dispatcher-event-loop-63 YarnAllocator: Driver requested a total number of 283 executor(s).
> 17/04/20 19:52:33 INFO dispatcher-event-loop-35 YarnAllocator: Driver requested a total number of 281 executor(s).
> 17/04/20 19:52:33 INFO dispatcher-event-loop-63 YarnAllocator: Driver requested a total number of 278 executor(s).
> 17/04/20 19:52:33 INFO dispatcher-event-loop-3 YarnAllocator: Driver requested a total number of 277 executor(s).
> 17/04/20 19:52:33 INFO dispatcher-event-loop-38 YarnAllocator: Driver requested a total number of 276 executor(s).
> 17/04/20 19:52:34 INFO dispatcher-event-loop-51 YarnAllocator: Driver requested a total number of 273 executor(s).
> 17/04/20 19:52:34 INFO dispatcher-event-loop-31 YarnAllocator: Driver requested a total number of 271 executor(s).
> 17/04/20 19:52:34 INFO dispatcher-event-loop-44 YarnAllocator: Driver requested a total number of 270 executor(s).
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org