You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Artem P <yo...@gmail.com> on 2019/02/22 10:40:39 UTC

Occasional broadcast timeout when dynamic allocation is on

Hi!

We have dynamic allocation enabled for our regular jobs and sometimes they fail with java.util.concurrent.TimeoutException: Futures timed out after [300 seconds]. Seems like spark driver starts broadcast just before the job has received any executors from the YARN  and if it takes more than 5 minutes to acquire them, the broadcast fails with TimeoutException. Is there any way to force Spark start broadcast only after all executors are in place (at least minExecutors count)?

Relevant logs (it can be seen, that exception happened 5 minutes after broadcast start message):
```
2019-02-22 06:11:48,047 [broadcast-exchange-0] INFO  org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator  - Code generated in 361.70265 ms
2019-02-22 06:11:48,237 [broadcast-exchange-0] INFO  org.apache.spark.storage.memory.MemoryStore  - Block broadcast_0 stored as values in memory (estimated size 485.3 KB, free 365.8 MB)
2019-02-22 06:11:48,297 [main] INFO  org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator  - Code generated in 611.624073 ms
2019-02-22 06:11:48,522 [broadcast-exchange-0] INFO  org.apache.spark.storage.memory.MemoryStore  - Block broadcast_0_piece0 stored as bytes in memory (estimated size 63.5 KB, free 365.8 MB)
2019-02-22 06:11:48,524 [dispatcher-event-loop-4] INFO  org.apache.spark.storage.BlockManagerInfo  - Added broadcast_0_piece0 in memory on ip-10-4-3-123.eu-central-1.compute.internal:42935 (size: 63.5 KB, free: 366.2 MB)
2019-02-22 06:11:48,531 [broadcast-exchange-0] INFO  org.apache.spark.SparkContext  - Created broadcast 0 from run at ThreadPoolExecutor.java:1149
2019-02-22 06:11:48,545 [broadcast-exchange-0] INFO  org.apache.spark.sql.execution.FileSourceScanExec  - Planning scan with bin packing, max size: 4194304 bytes, open cost is considered as scanning 4194304 bytes.
2019-02-22 06:11:48,859 [broadcast-exchange-0] INFO  org.apache.spark.SparkContext  - Starting job: run at ThreadPoolExecutor.java:1149
2019-02-22 06:11:48,885 [dag-scheduler-event-loop] INFO  org.apache.spark.scheduler.DAGScheduler  - Got job 0 (run at ThreadPoolExecutor.java:1149) with 1 output partitions
2019-02-22 06:11:48,886 [dag-scheduler-event-loop] INFO  org.apache.spark.scheduler.DAGScheduler  - Final stage: ResultStage 0 (run at ThreadPoolExecutor.java:1149)
2019-02-22 06:11:48,893 [dag-scheduler-event-loop] INFO  org.apache.spark.scheduler.DAGScheduler  - Parents of final stage: List()
2019-02-22 06:11:48,895 [dag-scheduler-event-loop] INFO  org.apache.spark.scheduler.DAGScheduler  - Missing parents: List()
2019-02-22 06:11:48,940 [dag-scheduler-event-loop] INFO  org.apache.spark.scheduler.DAGScheduler  - Submitting ResultStage 0 (MapPartitionsRDD[2] at run at ThreadPoolExecutor.java:1149), which has no missing parents
2019-02-22 06:11:49,004 [dag-scheduler-event-loop] INFO  org.apache.spark.storage.memory.MemoryStore  - Block broadcast_1 stored as values in memory (estimated size 11.7 KB, free 365.8 MB)
2019-02-22 06:11:49,024 [main] INFO  org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator  - Code generated in 362.596124 ms
2019-02-22 06:11:49,054 [dag-scheduler-event-loop] INFO  org.apache.spark.storage.memory.MemoryStore  - Block broadcast_1_piece0 stored as bytes in memory (estimated size 5.3 KB, free 365.7 MB)
2019-02-22 06:11:49,055 [dispatcher-event-loop-1] INFO  org.apache.spark.storage.BlockManagerInfo  - Added broadcast_1_piece0 in memory on ip-10-4-3-123.eu-central-1.compute.internal:42935 (size: 5.3 KB, free: 366.2 MB)
2019-02-22 06:11:49,066 [dag-scheduler-event-loop] INFO  org.apache.spark.SparkContext  - Created broadcast 1 from broadcast at DAGScheduler.scala:1079
2019-02-22 06:11:49,101 [dag-scheduler-event-loop] INFO  org.apache.spark.scheduler.DAGScheduler  - Submitting 1 missing tasks from ResultStage 0 (MapPartitionsRDD[2] at run at ThreadPoolExecutor.java:1149) (first 15 tasks are for partitions Vector(0))
2019-02-22 06:11:49,103 [dag-scheduler-event-loop] INFO  org.apache.spark.scheduler.cluster.YarnScheduler  - Adding task set 0.0 with 1 tasks
2019-02-22 06:11:49,116 [main] INFO  org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator  - Code generated in 56.99095 ms
2019-02-22 06:11:49,188 [dag-scheduler-event-loop] INFO  org.apache.spark.scheduler.FairSchedulableBuilder  - Added task set TaskSet_0.0 tasks to pool default
2019-02-22 06:12:04,190 [Timer-0] WARN  org.apache.spark.scheduler.cluster.YarnScheduler  - Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
2019-02-22 06:12:19,189 [Timer-0] WARN  org.apache.spark.scheduler.cluster.YarnScheduler  - Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
2019-02-22 06:12:34,189 [Timer-0] WARN  org.apache.spark.scheduler.cluster.YarnScheduler  - Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
2019-02-22 06:12:49,190 [Timer-0] WARN  org.apache.spark.scheduler.cluster.YarnScheduler  - Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
2019-02-22 06:13:04,189 [Timer-0] WARN  org.apache.spark.scheduler.cluster.YarnScheduler  - Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
2019-02-22 06:13:19,189 [Timer-0] WARN  org.apache.spark.scheduler.cluster.YarnScheduler  - Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
2019-02-22 06:13:34,189 [Timer-0] WARN  org.apache.spark.scheduler.cluster.YarnScheduler  - Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
2019-02-22 06:13:49,189 [Timer-0] WARN  org.apache.spark.scheduler.cluster.YarnScheduler  - Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
2019-02-22 06:14:04,189 [Timer-0] WARN  org.apache.spark.scheduler.cluster.YarnScheduler  - Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
2019-02-22 06:14:19,189 [Timer-0] WARN  org.apache.spark.scheduler.cluster.YarnScheduler  - Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
2019-02-22 06:14:34,190 [Timer-0] WARN  org.apache.spark.scheduler.cluster.YarnScheduler  - Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
2019-02-22 06:14:49,189 [Timer-0] WARN  org.apache.spark.scheduler.cluster.YarnScheduler  - Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
2019-02-22 06:15:04,189 [Timer-0] WARN  org.apache.spark.scheduler.cluster.YarnScheduler  - Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
2019-02-22 06:15:19,189 [Timer-0] WARN  org.apache.spark.scheduler.cluster.YarnScheduler  - Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
2019-02-22 06:15:34,189 [Timer-0] WARN  org.apache.spark.scheduler.cluster.YarnScheduler  - Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
2019-02-22 06:15:49,189 [Timer-0] WARN  org.apache.spark.scheduler.cluster.YarnScheduler  - Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
2019-02-22 06:16:04,189 [Timer-0] WARN  org.apache.spark.scheduler.cluster.YarnScheduler  - Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
2019-02-22 06:16:19,190 [Timer-0] WARN  org.apache.spark.scheduler.cluster.YarnScheduler  - Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
2019-02-22 06:16:34,189 [Timer-0] WARN  org.apache.spark.scheduler.cluster.YarnScheduler  - Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
2019-02-22 06:16:49,169 [main] WARN  org.apache.spark.util.Utils  - Truncated the string representation of a plan since it was too large. This behavior can be adjusted by setting 'spark.debug.maxToStringFields' in SparkEnv.conf.
```
And the exception right after this:
```
Caused by: java.util.concurrent.TimeoutException: Futures timed out after [300 seconds]
	at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
	at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
	at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:201)
	at org.apache.spark.sql.execution.exchange.BroadcastExchangeExec.doExecuteBroadcast(BroadcastExchangeExec.scala:136)
	at org.apache.spark.sql.execution.InputAdapter.doExecuteBroadcast(WholeStageCodegenExec.scala:367)
	at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeBroadcast$1.apply(SparkPlan.scala:144)
	at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeBroadcast$1.apply(SparkPlan.scala:140)
	at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
	at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
	at org.apache.spark.sql.execution.SparkPlan.executeBroadcast(SparkPlan.scala:140)
	at org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.prepareBroadcast(BroadcastHashJoinExec.scala:135)
	at org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.codegenInner(BroadcastHashJoinExec.scala:232)
	at org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.doConsume(BroadcastHashJoinExec.scala:102)
	at org.apache.spark.sql.execution.CodegenSupport$class.consume(WholeStageCodegenExec.scala:181)
	at org.apache.spark.sql.execution.ProjectExec.consume(basicPhysicalOperators.scala:35)
	at org.apache.spark.sql.execution.ProjectExec.doConsume(basicPhysicalOperators.scala:65)
	at org.apache.spark.sql.execution.CodegenSupport$class.consume(WholeStageCodegenExec.scala:181)
	at org.apache.spark.sql.execution.FilterExec.consume(basicPhysicalOperators.scala:85)
	at org.apache.spark.sql.execution.FilterExec.doConsume(basicPhysicalOperators.scala:206)
	at org.apache.spark.sql.execution.CodegenSupport$class.consume(WholeStageCodegenExec.scala:181)
	at org.apache.spark.sql.execution.FileSourceScanExec.consume(DataSourceScanExec.scala:158)
	at org.apache.spark.sql.execution.ColumnarBatchScan$class.produceBatches(ColumnarBatchScan.scala:138)
	at org.apache.spark.sql.execution.ColumnarBatchScan$class.doProduce(ColumnarBatchScan.scala:78)
	at org.apache.spark.sql.execution.FileSourceScanExec.doProduce(DataSourceScanExec.scala:158)
	at org.apache.spark.sql.execution.CodegenSupport$$anonfun$produce$1.apply(WholeStageCodegenExec.scala:88)
	at org.apache.spark.sql.execution.CodegenSupport$$anonfun$produce$1.apply(WholeStageCodegenExec.scala:83)
	at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
	at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
	at org.apache.spark.sql.execution.CodegenSupport$class.produce(WholeStageCodegenExec.scala:83)
	at org.apache.spark.sql.execution.FileSourceScanExec.produce(DataSourceScanExec.scala:158)
	at org.apache.spark.sql.execution.FilterExec.doProduce(basicPhysicalOperators.scala:125)
	at org.apache.spark.sql.execution.CodegenSupport$$anonfun$produce$1.apply(WholeStageCodegenExec.scala:88)
	at org.apache.spark.sql.execution.CodegenSupport$$anonfun$produce$1.apply(WholeStageCodegenExec.scala:83)
	at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
	at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
	at org.apache.spark.sql.execution.CodegenSupport$class.produce(WholeStageCodegenExec.scala:83)
	at org.apache.spark.sql.execution.FilterExec.produce(basicPhysicalOperators.scala:85)
	at org.apache.spark.sql.execution.ProjectExec.doProduce(basicPhysicalOperators.scala:45)
	at org.apache.spark.sql.execution.CodegenSupport$$anonfun$produce$1.apply(WholeStageCodegenExec.scala:88)
	at org.apache.spark.sql.execution.CodegenSupport$$anonfun$produce$1.apply(WholeStageCodegenExec.scala:83)
	at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
	at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
	at org.apache.spark.sql.execution.CodegenSupport$class.produce(WholeStageCodegenExec.scala:83)
	at org.apache.spark.sql.execution.ProjectExec.produce(basicPhysicalOperators.scala:35)
	at org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.doProduce(BroadcastHashJoinExec.scala:97)
	at org.apache.spark.sql.execution.CodegenSupport$$anonfun$produce$1.apply(WholeStageCodegenExec.scala:88)
	at org.apache.spark.sql.execution.CodegenSupport$$anonfun$produce$1.apply(WholeStageCodegenExec.scala:83)
	at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
	at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
	at org.apache.spark.sql.execution.CodegenSupport$class.produce(WholeStageCodegenExec.scala:83)
	at org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.produce(BroadcastHashJoinExec.scala:39)
	at org.apache.spark.sql.execution.ProjectExec.doProduce(basicPhysicalOperators.scala:45)
	at org.apache.spark.sql.execution.CodegenSupport$$anonfun$produce$1.apply(WholeStageCodegenExec.scala:88)
	at org.apache.spark.sql.execution.CodegenSupport$$anonfun$produce$1.apply(WholeStageCodegenExec.scala:83)
	at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
	at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
	at org.apache.spark.sql.execution.CodegenSupport$class.produce(WholeStageCodegenExec.scala:83)
	at org.apache.spark.sql.execution.ProjectExec.produce(basicPhysicalOperators.scala:35)
	at org.apache.spark.sql.execution.WholeStageCodegenExec.doCodeGen(WholeStageCodegenExec.scala:524)
	at org.apache.spark.sql.execution.WholeStageCodegenExec.doExecute(WholeStageCodegenExec.scala:576)
	at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
	at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
	at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
	at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
	at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
	at org.apache.spark.sql.execution.exchange.ShuffleExchangeExec.prepareShuffleDependency(ShuffleExchangeExec.scala:92)
	at org.apache.spark.sql.execution.exchange.ShuffleExchangeExec$$anonfun$doExecute$1.apply(ShuffleExchangeExec.scala:128)
	at org.apache.spark.sql.execution.exchange.ShuffleExchangeExec$$anonfun$doExecute$1.apply(ShuffleExchangeExec.scala:119)
	at org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:52)
```

Re: Occasional broadcast timeout when dynamic allocation is on

Posted by Abdeali Kothari <ab...@gmail.com>.

I've been facing this issue for the past few months too.
I always thought it was an infrastructure issue, but we were never able to
figure out what the infra issue was.

If others are facing this issue too - then maybe it's a valid bug.

Does anyone have any ideas on how we can debug this?

On Fri, Feb 22, 2019, 16:10 Artem P <yonedadev@gmail.com wrote:

> Hi!
>
> We have dynamic allocation enabled for our regular jobs and sometimes they
> fail with java.util.concurrent.TimeoutException: Futures timed out after
> [300 seconds]. Seems like spark driver starts broadcast just before the job
> has received any executors from the YARN  and if it takes more than 5
> minutes to acquire them, the broadcast fails with TimeoutException. Is
> there any way to force Spark start broadcast only after all executors are
> in place (at least minExecutors count)?
>
> Relevant logs (it can be seen, that exception happened 5 minutes after
> broadcast start message):
> ```
>
> 2019-02-22 06:11:48,047 [broadcast-exchange-0] INFO  org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator  - Code generated in 361.70265 ms
> 2019-02-22 06:11:48,237 [broadcast-exchange-0] INFO  org.apache.spark.storage.memory.MemoryStore  - Block broadcast_0 stored as values in memory (estimated size 485.3 KB, free 365.8 MB)
> 2019-02-22 06:11:48,297 [main] INFO  org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator  - Code generated in 611.624073 ms
> 2019-02-22 06:11:48,522 [broadcast-exchange-0] INFO  org.apache.spark.storage.memory.MemoryStore  - Block broadcast_0_piece0 stored as bytes in memory (estimated size 63.5 KB, free 365.8 MB)
> 2019-02-22 06:11:48,524 [dispatcher-event-loop-4] INFO  org.apache.spark.storage.BlockManagerInfo  - Added broadcast_0_piece0 in memory on ip-10-4-3-123.eu-central-1.compute.internal:42935 (size: 63.5 KB, free: 366.2 MB)
> 2019-02-22 06:11:48,531 [broadcast-exchange-0] INFO  org.apache.spark.SparkContext  - Created broadcast 0 from run at ThreadPoolExecutor.java:1149
> 2019-02-22 06:11:48,545 [broadcast-exchange-0] INFO  org.apache.spark.sql.execution.FileSourceScanExec  - Planning scan with bin packing, max size: 4194304 bytes, open cost is considered as scanning 4194304 bytes.
> 2019-02-22 06:11:48,859 [broadcast-exchange-0] INFO  org.apache.spark.SparkContext  - Starting job: run at ThreadPoolExecutor.java:1149
> 2019-02-22 06:11:48,885 [dag-scheduler-event-loop] INFO  org.apache.spark.scheduler.DAGScheduler  - Got job 0 (run at ThreadPoolExecutor.java:1149) with 1 output partitions
> 2019-02-22 06:11:48,886 [dag-scheduler-event-loop] INFO  org.apache.spark.scheduler.DAGScheduler  - Final stage: ResultStage 0 (run at ThreadPoolExecutor.java:1149)
> 2019-02-22 06:11:48,893 [dag-scheduler-event-loop] INFO  org.apache.spark.scheduler.DAGScheduler  - Parents of final stage: List()
> 2019-02-22 06:11:48,895 [dag-scheduler-event-loop] INFO  org.apache.spark.scheduler.DAGScheduler  - Missing parents: List()
> 2019-02-22 06:11:48,940 [dag-scheduler-event-loop] INFO  org.apache.spark.scheduler.DAGScheduler  - Submitting ResultStage 0 (MapPartitionsRDD[2] at run at ThreadPoolExecutor.java:1149), which has no missing parents
> 2019-02-22 06:11:49,004 [dag-scheduler-event-loop] INFO  org.apache.spark.storage.memory.MemoryStore  - Block broadcast_1 stored as values in memory (estimated size 11.7 KB, free 365.8 MB)
> 2019-02-22 06:11:49,024 [main] INFO  org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator  - Code generated in 362.596124 ms
> 2019-02-22 06:11:49,054 [dag-scheduler-event-loop] INFO  org.apache.spark.storage.memory.MemoryStore  - Block broadcast_1_piece0 stored as bytes in memory (estimated size 5.3 KB, free 365.7 MB)
> 2019-02-22 06:11:49,055 [dispatcher-event-loop-1] INFO  org.apache.spark.storage.BlockManagerInfo  - Added broadcast_1_piece0 in memory on ip-10-4-3-123.eu-central-1.compute.internal:42935 (size: 5.3 KB, free: 366.2 MB)
> 2019-02-22 06:11:49,066 [dag-scheduler-event-loop] INFO  org.apache.spark.SparkContext  - Created broadcast 1 from broadcast at DAGScheduler.scala:1079
> 2019-02-22 06:11:49,101 [dag-scheduler-event-loop] INFO  org.apache.spark.scheduler.DAGScheduler  - Submitting 1 missing tasks from ResultStage 0 (MapPartitionsRDD[2] at run at ThreadPoolExecutor.java:1149) (first 15 tasks are for partitions Vector(0))
> 2019-02-22 06:11:49,103 [dag-scheduler-event-loop] INFO  org.apache.spark.scheduler.cluster.YarnScheduler  - Adding task set 0.0 with 1 tasks
> 2019-02-22 06:11:49,116 [main] INFO  org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator  - Code generated in 56.99095 ms
> 2019-02-22 06:11:49,188 [dag-scheduler-event-loop] INFO  org.apache.spark.scheduler.FairSchedulableBuilder  - Added task set TaskSet_0.0 tasks to pool default
> 2019-02-22 06:12:04,190 [Timer-0] WARN  org.apache.spark.scheduler.cluster.YarnScheduler  - Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
> 2019-02-22 06:12:19,189 [Timer-0] WARN  org.apache.spark.scheduler.cluster.YarnScheduler  - Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
> 2019-02-22 06:12:34,189 [Timer-0] WARN  org.apache.spark.scheduler.cluster.YarnScheduler  - Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
> 2019-02-22 06:12:49,190 [Timer-0] WARN  org.apache.spark.scheduler.cluster.YarnScheduler  - Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
> 2019-02-22 06:13:04,189 [Timer-0] WARN  org.apache.spark.scheduler.cluster.YarnScheduler  - Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
> 2019-02-22 06:13:19,189 [Timer-0] WARN  org.apache.spark.scheduler.cluster.YarnScheduler  - Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
> 2019-02-22 06:13:34,189 [Timer-0] WARN  org.apache.spark.scheduler.cluster.YarnScheduler  - Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
> 2019-02-22 06:13:49,189 [Timer-0] WARN  org.apache.spark.scheduler.cluster.YarnScheduler  - Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
> 2019-02-22 06:14:04,189 [Timer-0] WARN  org.apache.spark.scheduler.cluster.YarnScheduler  - Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
> 2019-02-22 06:14:19,189 [Timer-0] WARN  org.apache.spark.scheduler.cluster.YarnScheduler  - Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
> 2019-02-22 06:14:34,190 [Timer-0] WARN  org.apache.spark.scheduler.cluster.YarnScheduler  - Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
> 2019-02-22 06:14:49,189 [Timer-0] WARN  org.apache.spark.scheduler.cluster.YarnScheduler  - Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
> 2019-02-22 06:15:04,189 [Timer-0] WARN  org.apache.spark.scheduler.cluster.YarnScheduler  - Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
> 2019-02-22 06:15:19,189 [Timer-0] WARN  org.apache.spark.scheduler.cluster.YarnScheduler  - Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
> 2019-02-22 06:15:34,189 [Timer-0] WARN  org.apache.spark.scheduler.cluster.YarnScheduler  - Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
> 2019-02-22 06:15:49,189 [Timer-0] WARN  org.apache.spark.scheduler.cluster.YarnScheduler  - Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
> 2019-02-22 06:16:04,189 [Timer-0] WARN  org.apache.spark.scheduler.cluster.YarnScheduler  - Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
> 2019-02-22 06:16:19,190 [Timer-0] WARN  org.apache.spark.scheduler.cluster.YarnScheduler  - Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
> 2019-02-22 06:16:34,189 [Timer-0] WARN  org.apache.spark.scheduler.cluster.YarnScheduler  - Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
> 2019-02-22 06:16:49,169 [main] WARN  org.apache.spark.util.Utils  - Truncated the string representation of a plan since it was too large. This behavior can be adjusted by setting 'spark.debug.maxToStringFields' in SparkEnv.conf.
>
> ```
> And the exception right after this:
> ```
>
> Caused by: java.util.concurrent.TimeoutException: Futures timed out after [300 seconds]
> 	at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
> 	at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
> 	at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:201)
> 	at org.apache.spark.sql.execution.exchange.BroadcastExchangeExec.doExecuteBroadcast(BroadcastExchangeExec.scala:136)
> 	at org.apache.spark.sql.execution.InputAdapter.doExecuteBroadcast(WholeStageCodegenExec.scala:367)
> 	at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeBroadcast$1.apply(SparkPlan.scala:144)
> 	at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeBroadcast$1.apply(SparkPlan.scala:140)
> 	at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
> 	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
> 	at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
> 	at org.apache.spark.sql.execution.SparkPlan.executeBroadcast(SparkPlan.scala:140)
> 	at org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.prepareBroadcast(BroadcastHashJoinExec.scala:135)
> 	at org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.codegenInner(BroadcastHashJoinExec.scala:232)
> 	at org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.doConsume(BroadcastHashJoinExec.scala:102)
> 	at org.apache.spark.sql.execution.CodegenSupport$class.consume(WholeStageCodegenExec.scala:181)
> 	at org.apache.spark.sql.execution.ProjectExec.consume(basicPhysicalOperators.scala:35)
> 	at org.apache.spark.sql.execution.ProjectExec.doConsume(basicPhysicalOperators.scala:65)
> 	at org.apache.spark.sql.execution.CodegenSupport$class.consume(WholeStageCodegenExec.scala:181)
> 	at org.apache.spark.sql.execution.FilterExec.consume(basicPhysicalOperators.scala:85)
> 	at org.apache.spark.sql.execution.FilterExec.doConsume(basicPhysicalOperators.scala:206)
> 	at org.apache.spark.sql.execution.CodegenSupport$class.consume(WholeStageCodegenExec.scala:181)
> 	at org.apache.spark.sql.execution.FileSourceScanExec.consume(DataSourceScanExec.scala:158)
> 	at org.apache.spark.sql.execution.ColumnarBatchScan$class.produceBatches(ColumnarBatchScan.scala:138)
> 	at org.apache.spark.sql.execution.ColumnarBatchScan$class.doProduce(ColumnarBatchScan.scala:78)
> 	at org.apache.spark.sql.execution.FileSourceScanExec.doProduce(DataSourceScanExec.scala:158)
> 	at org.apache.spark.sql.execution.CodegenSupport$$anonfun$produce$1.apply(WholeStageCodegenExec.scala:88)
> 	at org.apache.spark.sql.execution.CodegenSupport$$anonfun$produce$1.apply(WholeStageCodegenExec.scala:83)
> 	at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
> 	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
> 	at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
> 	at org.apache.spark.sql.execution.CodegenSupport$class.produce(WholeStageCodegenExec.scala:83)
> 	at org.apache.spark.sql.execution.FileSourceScanExec.produce(DataSourceScanExec.scala:158)
> 	at org.apache.spark.sql.execution.FilterExec.doProduce(basicPhysicalOperators.scala:125)
> 	at org.apache.spark.sql.execution.CodegenSupport$$anonfun$produce$1.apply(WholeStageCodegenExec.scala:88)
> 	at org.apache.spark.sql.execution.CodegenSupport$$anonfun$produce$1.apply(WholeStageCodegenExec.scala:83)
> 	at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
> 	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
> 	at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
> 	at org.apache.spark.sql.execution.CodegenSupport$class.produce(WholeStageCodegenExec.scala:83)
> 	at org.apache.spark.sql.execution.FilterExec.produce(basicPhysicalOperators.scala:85)
> 	at org.apache.spark.sql.execution.ProjectExec.doProduce(basicPhysicalOperators.scala:45)
> 	at org.apache.spark.sql.execution.CodegenSupport$$anonfun$produce$1.apply(WholeStageCodegenExec.scala:88)
> 	at org.apache.spark.sql.execution.CodegenSupport$$anonfun$produce$1.apply(WholeStageCodegenExec.scala:83)
> 	at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
> 	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
> 	at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
> 	at org.apache.spark.sql.execution.CodegenSupport$class.produce(WholeStageCodegenExec.scala:83)
> 	at org.apache.spark.sql.execution.ProjectExec.produce(basicPhysicalOperators.scala:35)
> 	at org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.doProduce(BroadcastHashJoinExec.scala:97)
> 	at org.apache.spark.sql.execution.CodegenSupport$$anonfun$produce$1.apply(WholeStageCodegenExec.scala:88)
> 	at org.apache.spark.sql.execution.CodegenSupport$$anonfun$produce$1.apply(WholeStageCodegenExec.scala:83)
> 	at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
> 	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
> 	at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
> 	at org.apache.spark.sql.execution.CodegenSupport$class.produce(WholeStageCodegenExec.scala:83)
> 	at org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.produce(BroadcastHashJoinExec.scala:39)
> 	at org.apache.spark.sql.execution.ProjectExec.doProduce(basicPhysicalOperators.scala:45)
> 	at org.apache.spark.sql.execution.CodegenSupport$$anonfun$produce$1.apply(WholeStageCodegenExec.scala:88)
> 	at org.apache.spark.sql.execution.CodegenSupport$$anonfun$produce$1.apply(WholeStageCodegenExec.scala:83)
> 	at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
> 	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
> 	at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
> 	at org.apache.spark.sql.execution.CodegenSupport$class.produce(WholeStageCodegenExec.scala:83)
> 	at org.apache.spark.sql.execution.ProjectExec.produce(basicPhysicalOperators.scala:35)
> 	at org.apache.spark.sql.execution.WholeStageCodegenExec.doCodeGen(WholeStageCodegenExec.scala:524)
> 	at org.apache.spark.sql.execution.WholeStageCodegenExec.doExecute(WholeStageCodegenExec.scala:576)
> 	at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
> 	at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
> 	at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
> 	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
> 	at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
> 	at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
> 	at org.apache.spark.sql.execution.exchange.ShuffleExchangeExec.prepareShuffleDependency(ShuffleExchangeExec.scala:92)
> 	at org.apache.spark.sql.execution.exchange.ShuffleExchangeExec$$anonfun$doExecute$1.apply(ShuffleExchangeExec.scala:128)
> 	at org.apache.spark.sql.execution.exchange.ShuffleExchangeExec$$anonfun$doExecute$1.apply(ShuffleExchangeExec.scala:119)
> 	at org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:52)
>
> ```
>