You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Nico Kruber (JIRA)" <ji...@apache.org> on 2017/08/02 12:24:00 UTC
[jira] [Commented] (FLINK-7351) test instability in
JobClientActorRecoveryITCase#testJobClientRecovery
[ https://issues.apache.org/jira/browse/FLINK-7351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16110777#comment-16110777 ]
Nico Kruber commented on FLINK-7351:
------------------------------------
did another run with the following snipped and the failure is reproducible (even locally):
{code}
private static Logger LOG = LoggerFactory.getLogger(JobClientActorRecoveryITCase.class);
@Test
public void testJobClientRecovery1000() throws Exception {
for (int i = 0; i < 1000; ++i) {
LOG.info("starting test run " + i);
testJobClientRecovery();
}
}
{code}
{code}
12:17:38,304 INFO org.apache.flink.runtime.blob.FileSystemBlobStore - Creating highly available BLOB storage directory at /tmp/junit9004724949110959230/recovery//default/blob
12:17:38,304 INFO org.apache.flink.runtime.util.ZooKeeperUtils - Enforcing default ACL for ZK connections
12:17:38,304 INFO org.apache.flink.runtime.util.ZooKeeperUtils - Using '/flink/default' as Zookeeper namespace.
12:17:38,304 INFO org.apache.curator.framework.imps.CuratorFrameworkImpl - Starting
12:17:38,348 INFO org.apache.flink.runtime.minicluster.FlinkMiniCluster - Disabled queryable state server
12:17:38,348 INFO org.apache.flink.runtime.minicluster.FlinkMiniCluster - Starting FlinkMiniCluster.
12:17:38,348 INFO org.apache.curator.framework.state.ConnectionStateManager - State change: CONNECTED
12:17:38,354 INFO akka.event.slf4j.Slf4jLogger - Slf4jLogger started
12:17:38,355 INFO org.apache.flink.runtime.blob.BlobServer - Created BLOB server storage directory /tmp/blobStore-a2ef16c3-6223-45a6-913b-748781acdb2d
12:17:38,356 INFO org.apache.flink.runtime.blob.BlobServer - Started BLOB server at 0.0.0.0:35687 - max concurrent requests: 50 - max backlog: 1000
12:17:38,356 INFO org.apache.flink.runtime.metrics.MetricRegistry - No metrics reporter configured, no metrics will be exposed/reported.
12:17:38,357 INFO org.apache.flink.runtime.testingUtils.TestingMemoryArchivist - Started memory archivist akka://flink/user/archive_1
12:17:38,359 INFO org.apache.flink.runtime.blob.BlobServer - Created BLOB server storage directory /tmp/blobStore-bc758698-bd94-4802-95d5-da4c6d856883
12:17:38,359 INFO org.apache.flink.runtime.testingUtils.TestingJobManager - Starting JobManager at akka://flink/user/jobmanager_1.
12:17:38,359 INFO org.apache.flink.runtime.leaderelection.ZooKeeperLeaderElectionService - Starting ZooKeeperLeaderElectionService org.apache.flink.runtime.leaderelection.ZooKeeperLeaderElectionService@5cb0f0ab.
12:17:38,359 INFO org.apache.flink.runtime.blob.BlobServer - Started BLOB server at 0.0.0.0:32853 - max concurrent requests: 50 - max backlog: 1000
12:17:38,359 INFO org.apache.flink.runtime.metrics.MetricRegistry - No metrics reporter configured, no metrics will be exposed/reported.
12:17:38,360 INFO org.apache.flink.runtime.testingUtils.TestingMemoryArchivist - Started memory archivist akka://flink/user/archive_2
12:17:38,363 INFO org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService - Starting ZooKeeperLeaderRetrievalService.
12:17:38,363 DEBUG org.apache.flink.runtime.leaderelection.ZooKeeperLeaderElectionService - Grant leadership to contender akka://flink/user/jobmanager_1 with session ID 3f4d9edf-5fa7-48c4-85ae-15bed36d46e4.
12:17:38,363 INFO org.apache.flink.runtime.testingUtils.TestingJobManager - Starting JobManager at akka://flink/user/jobmanager_2.
12:17:38,363 INFO org.apache.flink.runtime.leaderelection.ZooKeeperLeaderElectionService - Starting ZooKeeperLeaderElectionService org.apache.flink.runtime.leaderelection.ZooKeeperLeaderElectionService@527cd5eb.
12:17:38,363 INFO org.apache.flink.runtime.testingUtils.TestingJobManager - JobManager akka://flink/user/jobmanager_1 was granted leadership with leader session ID Some(3f4d9edf-5fa7-48c4-85ae-15bed36d46e4).
12:17:38,363 DEBUG org.apache.flink.runtime.leaderelection.ZooKeeperLeaderElectionService - Confirm leader session ID 3f4d9edf-5fa7-48c4-85ae-15bed36d46e4 for leader akka://flink/user/jobmanager_1.
12:17:38,364 INFO org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService - Starting ZooKeeperLeaderRetrievalService.
12:17:38,365 DEBUG org.apache.flink.runtime.leaderelection.ZooKeeperLeaderElectionService - Leader node changed while akka://flink/user/jobmanager_1 is the leader with session ID null.
12:17:38,363 INFO org.apache.flink.runtime.taskexecutor.TaskManagerConfiguration - Messages have a max timeout of 100000 ms
12:17:38,365 DEBUG org.apache.flink.runtime.leaderelection.ZooKeeperLeaderElectionService - Write leader information: Leader=akka://flink/user/jobmanager_1, session ID=3f4d9edf-5fa7-48c4-85ae-15bed36d46e4.
12:17:38,365 INFO org.apache.flink.runtime.taskexecutor.TaskManagerServices - Temporary file directory '/tmp': total 9 GB, usable 6 GB (66.67% usable)
12:17:38,415 INFO org.apache.flink.runtime.testingUtils.TestingJobManager - Attempting to recover job efa7affb9fafdf7b682886f80a3bdeff.
12:17:38,416 DEBUG org.apache.flink.runtime.leaderelection.ZooKeeperLeaderElectionService - Successfully wrote leader information: Leader=akka://flink/user/jobmanager_1, session ID=3f4d9edf-5fa7-48c4-85ae-15bed36d46e4.
12:17:38,416 INFO org.apache.flink.runtime.testingUtils.TestingJobManager - Delaying recovery of all jobs by 10000 milliseconds.
12:17:38,418 DEBUG org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService - Leader node has changed.
12:17:38,418 DEBUG org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService - Leader node has changed.
12:17:38,418 DEBUG org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService - Leader node has changed.
12:17:38,418 DEBUG org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService - New leader information: Leader=akka://flink/user/jobmanager_1, session ID=3f4d9edf-5fa7-48c4-85ae-15bed36d46e4.
12:17:38,418 DEBUG org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService - New leader information: Leader=akka://flink/user/jobmanager_1, session ID=3f4d9edf-5fa7-48c4-85ae-15bed36d46e4.
12:17:38,418 DEBUG org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService - Leader node has changed.
12:17:38,418 DEBUG org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService - Leader node has changed.
12:17:38,418 DEBUG org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService - Leader node has changed.
12:17:38,418 DEBUG org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService - Leader node has changed.
12:17:38,418 DEBUG org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService - New leader information: Leader=akka://flink/user/jobmanager_1, session ID=3f4d9edf-5fa7-48c4-85ae-15bed36d46e4.
12:17:38,418 DEBUG org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService - New leader information: Leader=akka://flink/user/jobmanager_1, session ID=3f4d9edf-5fa7-48c4-85ae-15bed36d46e4.
12:17:38,418 DEBUG org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService - New leader information: Leader=akka://flink/user/jobmanager_1, session ID=3f4d9edf-5fa7-48c4-85ae-15bed36d46e4.
12:17:38,418 DEBUG org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService - New leader information: Leader=akka://flink/user/jobmanager_1, session ID=3f4d9edf-5fa7-48c4-85ae-15bed36d46e4.
12:17:38,418 DEBUG org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService - New leader information: Leader=akka://flink/user/jobmanager_1, session ID=3f4d9edf-5fa7-48c4-85ae-15bed36d46e4.
12:17:38,418 DEBUG org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService - Leader node has changed.
12:17:38,418 DEBUG org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService - Leader node has changed.
12:17:38,418 DEBUG org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService - Leader node has changed.
12:17:38,418 DEBUG org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService - New leader information: Leader=akka://flink/user/jobmanager_1, session ID=3f4d9edf-5fa7-48c4-85ae-15bed36d46e4.
12:17:38,418 DEBUG org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService - New leader information: Leader=akka://flink/user/jobmanager_1, session ID=3f4d9edf-5fa7-48c4-85ae-15bed36d46e4.
12:17:38,418 DEBUG org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService - New leader information: Leader=akka://flink/user/jobmanager_1, session ID=3f4d9edf-5fa7-48c4-85ae-15bed36d46e4.
12:17:38,418 DEBUG org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService - Leader node has changed.
12:17:38,419 DEBUG org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService - New leader information: Leader=akka://flink/user/jobmanager_1, session ID=3f4d9edf-5fa7-48c4-85ae-15bed36d46e4.
12:17:38,419 DEBUG org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService - Leader node has changed.
12:17:38,419 INFO org.apache.flink.runtime.jobmanager.ZooKeeperSubmittedJobGraphStore - Recovered SubmittedJobGraph(efa7affb9fafdf7b682886f80a3bdeff, JobInfo(clients: Set((Actor[akka://flink/user/$a#235524161],EXECUTION_RESULT_AND_STATE_CHANGES)), start: 1501676245446)).
12:17:38,419 DEBUG org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService - New leader information: Leader=akka://flink/user/jobmanager_1, session ID=3f4d9edf-5fa7-48c4-85ae-15bed36d46e4.
12:17:38,419 INFO org.apache.flink.runtime.testingUtils.TestingJobManager - Submitting recovered job efa7affb9fafdf7b682886f80a3bdeff.
12:17:38,419 INFO org.apache.flink.runtime.testingUtils.TestingJobManager - Submitting job efa7affb9fafdf7b682886f80a3bdeff (Blocking Test Job) (Recovery).
12:17:38,419 INFO org.apache.flink.runtime.testutils.TestingResourceManager - Trying to associate with JobManager leader akka://flink/user/jobmanager_1
12:17:38,419 INFO org.apache.flink.runtime.testingUtils.TestingJobManager - Using restart strategy NoRestartStrategy for efa7affb9fafdf7b682886f80a3bdeff.
12:17:38,419 DEBUG org.apache.flink.runtime.leaderelection.ZooKeeperLeaderElectionService - Leader node changed while akka://flink/user/jobmanager_1 is the leader with session ID 3f4d9edf-5fa7-48c4-85ae-15bed36d46e4.
12:17:38,420 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Job recovers via failover strategy: full graph restart
12:17:38,420 INFO org.apache.flink.runtime.testingUtils.TestingJobManager - Running initialization on master for job Blocking Test Job (efa7affb9fafdf7b682886f80a3bdeff).
12:17:38,420 INFO org.apache.flink.runtime.testingUtils.TestingJobManager - Successfully ran initialization on master in 0 ms.
12:17:38,420 INFO org.apache.flink.runtime.testutils.TestingResourceManager - Resource Manager associating with leading JobManager Actor[akka://flink/user/jobmanager_1#-1382860260] - leader session 3f4d9edf-5fa7-48c4-85ae-15bed36d46e4
12:17:38,420 DEBUG org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService - Leader node has changed.
12:17:38,420 INFO org.apache.flink.runtime.testingUtils.TestingJobManager - Scheduling job efa7affb9fafdf7b682886f80a3bdeff (Blocking Test Job).
12:17:38,420 DEBUG org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService - New leader information: Leader=akka://flink/user/jobmanager_1, session ID=3f4d9edf-5fa7-48c4-85ae-15bed36d46e4.
12:17:38,420 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Job Blocking Test Job (efa7affb9fafdf7b682886f80a3bdeff) switched from state CREATED to RUNNING.
12:17:38,420 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Blocking Vertex (1/1) (424ef083d9189043385f9f2b855aeb21) switched from CREATED to SCHEDULED.
12:17:38,420 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Blocking Vertex (1/1) (424ef083d9189043385f9f2b855aeb21) switched from SCHEDULED to FAILED.
org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailableException: Not enough free slots available to run the job. You can decrease the operator parallelism or increase the number of slots per TaskManager in the configuration. Resources available to scheduler: Number of instances=0, total number of slots=0, available slots=0
at org.apache.flink.runtime.jobmanager.scheduler.Scheduler.scheduleTask(Scheduler.java:334)
at org.apache.flink.runtime.jobmanager.scheduler.Scheduler.allocateSlot(Scheduler.java:139)
at org.apache.flink.runtime.executiongraph.Execution.allocateSlotForExecution(Execution.java:368)
at org.apache.flink.runtime.executiongraph.Execution.scheduleForExecution(Execution.java:309)
at org.apache.flink.runtime.executiongraph.ExecutionVertex.scheduleForExecution(ExecutionVertex.java:596)
at org.apache.flink.runtime.executiongraph.ExecutionJobVertex.scheduleAll(ExecutionJobVertex.java:450)
at org.apache.flink.runtime.executiongraph.ExecutionGraph.scheduleLazy(ExecutionGraph.java:834)
at org.apache.flink.runtime.executiongraph.ExecutionGraph.scheduleForExecution(ExecutionGraph.java:814)
at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply$mcV$sp(JobManager.scala:1425)
at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:1372)
at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:1372)
at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40)
at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
12:17:38,421 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Job Blocking Test Job (efa7affb9fafdf7b682886f80a3bdeff) switched from state RUNNING to FAILING.
org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailableException: Not enough free slots available to run the job. You can decrease the operator parallelism or increase the number of slots per TaskManager in the configuration. Resources available to scheduler: Number of instances=0, total number of slots=0, available slots=0
at org.apache.flink.runtime.jobmanager.scheduler.Scheduler.scheduleTask(Scheduler.java:334)
at org.apache.flink.runtime.jobmanager.scheduler.Scheduler.allocateSlot(Scheduler.java:139)
at org.apache.flink.runtime.executiongraph.Execution.allocateSlotForExecution(Execution.java:368)
at org.apache.flink.runtime.executiongraph.Execution.scheduleForExecution(Execution.java:309)
at org.apache.flink.runtime.executiongraph.ExecutionVertex.scheduleForExecution(ExecutionVertex.java:596)
at org.apache.flink.runtime.executiongraph.ExecutionJobVertex.scheduleAll(ExecutionJobVertex.java:450)
at org.apache.flink.runtime.executiongraph.ExecutionGraph.scheduleLazy(ExecutionGraph.java:834)
at org.apache.flink.runtime.executiongraph.ExecutionGraph.scheduleForExecution(ExecutionGraph.java:814)
at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply$mcV$sp(JobManager.scala:1425)
at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:1372)
at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:1372)
at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40)
at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
12:17:38,421 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Try to restart or fail the job Blocking Test Job (efa7affb9fafdf7b682886f80a3bdeff) if no longer possible.
12:17:38,421 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Job Blocking Test Job (efa7affb9fafdf7b682886f80a3bdeff) switched from state FAILING to FAILED.
org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailableException: Not enough free slots available to run the job. You can decrease the operator parallelism or increase the number of slots per TaskManager in the configuration. Resources available to scheduler: Number of instances=0, total number of slots=0, available slots=0
at org.apache.flink.runtime.jobmanager.scheduler.Scheduler.scheduleTask(Scheduler.java:334)
at org.apache.flink.runtime.jobmanager.scheduler.Scheduler.allocateSlot(Scheduler.java:139)
at org.apache.flink.runtime.executiongraph.Execution.allocateSlotForExecution(Execution.java:368)
at org.apache.flink.runtime.executiongraph.Execution.scheduleForExecution(Execution.java:309)
at org.apache.flink.runtime.executiongraph.ExecutionVertex.scheduleForExecution(ExecutionVertex.java:596)
at org.apache.flink.runtime.executiongraph.ExecutionJobVertex.scheduleAll(ExecutionJobVertex.java:450)
at org.apache.flink.runtime.executiongraph.ExecutionGraph.scheduleLazy(ExecutionGraph.java:834)
at org.apache.flink.runtime.executiongraph.ExecutionGraph.scheduleForExecution(ExecutionGraph.java:814)
at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply$mcV$sp(JobManager.scala:1425)
at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:1372)
at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:1372)
at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40)
at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
12:17:38,422 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Could not restart the job Blocking Test Job (efa7affb9fafdf7b682886f80a3bdeff) because the restart strategy prevented it.
org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailableException: Not enough free slots available to run the job. You can decrease the operator parallelism or increase the number of slots per TaskManager in the configuration. Resources available to scheduler: Number of instances=0, total number of slots=0, available slots=0
at org.apache.flink.runtime.jobmanager.scheduler.Scheduler.scheduleTask(Scheduler.java:334)
at org.apache.flink.runtime.jobmanager.scheduler.Scheduler.allocateSlot(Scheduler.java:139)
at org.apache.flink.runtime.executiongraph.Execution.allocateSlotForExecution(Execution.java:368)
at org.apache.flink.runtime.executiongraph.Execution.scheduleForExecution(Execution.java:309)
at org.apache.flink.runtime.executiongraph.ExecutionVertex.scheduleForExecution(ExecutionVertex.java:596)
at org.apache.flink.runtime.executiongraph.ExecutionJobVertex.scheduleAll(ExecutionJobVertex.java:450)
at org.apache.flink.runtime.executiongraph.ExecutionGraph.scheduleLazy(ExecutionGraph.java:834)
at org.apache.flink.runtime.executiongraph.ExecutionGraph.scheduleForExecution(ExecutionGraph.java:814)
at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply$mcV$sp(JobManager.scala:1425)
at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:1372)
at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:1372)
at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40)
at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
12:17:38,446 INFO org.apache.flink.runtime.jobmanager.ZooKeeperSubmittedJobGraphStore - Removed job graph efa7affb9fafdf7b682886f80a3bdeff from ZooKeeper.
12:17:38,488 INFO org.apache.flink.runtime.io.network.buffer.NetworkBufferPool - Allocated 197 MB for network buffer pool (number of memory segments: 6307, bytes per segment: 32768).
12:17:38,488 INFO org.apache.flink.runtime.io.network.NetworkEnvironment - Starting the network environment and its components.
12:17:38,488 INFO org.apache.flink.runtime.taskexecutor.TaskManagerServices - Limiting managed memory to 621 MB, memory will be allocated lazily.
12:17:38,489 INFO org.apache.flink.runtime.io.disk.iomanager.IOManager - I/O manager uses directory /tmp/flink-io-54a003b9-6f13-49c8-a585-238ec48019c5 for spill files.
12:17:38,489 INFO org.apache.flink.runtime.metrics.MetricRegistry - No metrics reporter configured, no metrics will be exposed/reported.
12:17:38,489 INFO org.apache.flink.runtime.filecache.FileCache - User file cache uses directory /tmp/flink-dist-cache-b7357f57-18a2-40ad-8448-b251ad3af109
12:17:38,490 INFO org.apache.flink.runtime.filecache.FileCache - User file cache uses directory /tmp/flink-dist-cache-d3abe96c-6ec6-416e-9418-b1127cf407de
12:17:38,490 INFO org.apache.flink.runtime.testingUtils.TestingTaskManager - Starting TaskManager actor at akka://flink/user/taskmanager_1#2009077452.
12:17:38,490 INFO org.apache.flink.runtime.testingUtils.TestingTaskManager - TaskManager data connection information: c9a2fe7403e005322f998f352bbe5be5 @ localhost (dataPort=-1)
12:17:38,490 INFO org.apache.flink.runtime.testingUtils.TestingTaskManager - TaskManager has 1 task slot(s).
12:17:38,490 INFO org.apache.flink.runtime.testingUtils.TestingTaskManager - Memory usage stats: [HEAP: 207/247/1979 MB, NON HEAP: 43/44/-1 MB (used/committed/max)]
12:17:38,490 INFO org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService - Starting ZooKeeperLeaderRetrievalService.
12:17:38,492 DEBUG org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService - Leader node has changed.
12:17:38,492 DEBUG org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService - New leader information: Leader=akka://flink/user/jobmanager_1, session ID=3f4d9edf-5fa7-48c4-85ae-15bed36d46e4.
12:17:38,493 INFO org.apache.flink.runtime.testingUtils.TestingTaskManager - Trying to register at JobManager akka://flink/user/jobmanager_1 (attempt 1, timeout: 500 milliseconds)
12:17:38,493 INFO org.apache.flink.runtime.testutils.TestingResourceManager - TaskManager c9a2fe7403e005322f998f352bbe5be5 has started.
12:17:38,493 INFO org.apache.flink.runtime.instance.InstanceManager - Registered TaskManager at localhost (akka://flink/user/taskmanager_1) as cd09541e56c8913613dc9a58f61d304a. Current number of registered hosts is 1. Current number of alive task slots is 1.
12:17:38,493 INFO org.apache.flink.runtime.testingUtils.TestingTaskManager - Successful registration at JobManager (akka://flink/user/jobmanager_1), starting network stack and library cache.
12:17:38,493 INFO org.apache.flink.runtime.testingUtils.TestingTaskManager - Determined BLOB server address to be localhost/127.0.0.1:35687. Starting BLOB cache.
12:17:38,493 INFO org.apache.flink.runtime.blob.BlobCache - Created BLOB cache storage directory /tmp/blobStore-9364d0ae-7fe4-45d9-93c9-d978fae7caa8
12:17:38,495 INFO org.apache.flink.runtime.leaderelection.ZooKeeperLeaderElectionService - Stopping ZooKeeperLeaderElectionService org.apache.flink.runtime.leaderelection.ZooKeeperLeaderElectionService@5cb0f0ab.
12:17:38,495 INFO org.apache.flink.runtime.testingUtils.TestingTaskManager - TaskManager akka://flink/user/taskmanager_1 disconnects from JobManager akka://flink/user/jobmanager_1: JobManager is no longer reachable
12:17:38,495 INFO org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService - Starting ZooKeeperLeaderRetrievalService.
12:17:38,495 INFO org.apache.flink.runtime.testingUtils.TestingTaskManager - Disassociating from JobManager
12:17:38,496 INFO org.apache.flink.runtime.blob.BlobCache - Shutting down BlobCache
12:17:38,496 INFO org.apache.flink.runtime.client.JobSubmissionClientActor - Received SubmitJobAndWait(JobGraph(jobId: 41b8348843eb617e608df4f200590f37)) but there is no connection to a JobManager yet.
12:17:38,496 INFO org.apache.flink.runtime.client.JobSubmissionClientActor - Received job Blocking Test Job (41b8348843eb617e608df4f200590f37).
12:17:38,497 INFO org.apache.flink.runtime.testingUtils.TestingTaskManager - Trying to register at JobManager akka://flink/user/jobmanager_1 (attempt 1, timeout: 500 milliseconds)
12:17:38,498 DEBUG org.apache.flink.runtime.leaderelection.ZooKeeperLeaderElectionService - Grant leadership to contender akka://flink/user/jobmanager_2 with session ID a1124fe4-7739-452a-8b74-ee2b3fb7dad0.
12:17:38,498 INFO org.apache.flink.runtime.testingUtils.TestingJobManager - JobManager akka://flink/user/jobmanager_2 was granted leadership with leader session ID Some(a1124fe4-7739-452a-8b74-ee2b3fb7dad0).
12:17:38,498 DEBUG org.apache.flink.runtime.leaderelection.ZooKeeperLeaderElectionService - Confirm leader session ID a1124fe4-7739-452a-8b74-ee2b3fb7dad0 for leader akka://flink/user/jobmanager_2.
12:17:38,498 DEBUG org.apache.flink.runtime.leaderelection.ZooKeeperLeaderElectionService - Write leader information: Leader=akka://flink/user/jobmanager_2, session ID=a1124fe4-7739-452a-8b74-ee2b3fb7dad0.
12:17:38,503 DEBUG org.apache.flink.runtime.leaderelection.ZooKeeperLeaderElectionService - Successfully wrote leader information: Leader=akka://flink/user/jobmanager_2, session ID=a1124fe4-7739-452a-8b74-ee2b3fb7dad0.
12:17:38,503 DEBUG org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService - Leader node has changed.
12:17:38,503 INFO org.apache.flink.runtime.testingUtils.TestingJobManager - Delaying recovery of all jobs by 10000 milliseconds.
12:17:38,503 DEBUG org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService - New leader information: Leader=akka://flink/user/jobmanager_2, session ID=a1124fe4-7739-452a-8b74-ee2b3fb7dad0.
12:17:38,503 INFO org.apache.flink.runtime.client.JobSubmissionClientActor - Disconnect from JobManager null.
12:17:38,504 INFO org.apache.flink.runtime.client.JobSubmissionClientActor - Connect to JobManager Actor[akka://flink/user/jobmanager_2#2090247331].
12:17:38,504 INFO org.apache.flink.runtime.client.JobSubmissionClientActor - Connected to JobManager at Actor[akka://flink/user/jobmanager_2#2090247331] with leader session id a1124fe4-7739-452a-8b74-ee2b3fb7dad0.
12:17:38,504 INFO org.apache.flink.runtime.client.JobSubmissionClientActor - Sending message to JobManager akka://flink/user/jobmanager_2 to submit job Blocking Test Job (41b8348843eb617e608df4f200590f37) and wait for progress
12:17:38,505 INFO org.apache.flink.runtime.client.JobSubmissionClientActor - Upload jar files to job manager akka://flink/user/jobmanager_2.
12:17:38,505 DEBUG org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService - Leader node has changed.
12:17:38,505 INFO org.apache.flink.runtime.client.JobSubmissionClientActor - Submit job to the job manager akka://flink/user/jobmanager_2.
12:17:38,505 DEBUG org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService - New leader information: Leader=akka://flink/user/jobmanager_2, session ID=a1124fe4-7739-452a-8b74-ee2b3fb7dad0.
12:17:38,505 INFO org.apache.flink.runtime.testingUtils.TestingJobManager - Submitting job 41b8348843eb617e608df4f200590f37 (Blocking Test Job).
12:17:38,505 DEBUG org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService - Leader node has changed.
12:17:38,505 DEBUG org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService - Leader node has changed.
12:17:38,505 DEBUG org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService - Leader node has changed.
12:17:38,505 DEBUG org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService - New leader information: Leader=akka://flink/user/jobmanager_2, session ID=a1124fe4-7739-452a-8b74-ee2b3fb7dad0.
12:17:38,505 DEBUG org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService - New leader information: Leader=akka://flink/user/jobmanager_2, session ID=a1124fe4-7739-452a-8b74-ee2b3fb7dad0.
12:17:38,505 DEBUG org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService - Leader node has changed.
12:17:38,505 DEBUG org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService - New leader information: Leader=akka://flink/user/jobmanager_2, session ID=a1124fe4-7739-452a-8b74-ee2b3fb7dad0.
12:17:38,505 DEBUG org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService - Leader node has changed.
12:17:38,505 DEBUG org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService - Leader node has changed.
12:17:38,505 DEBUG org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService - Leader node has changed.
12:17:38,505 DEBUG org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService - New leader information: Leader=akka://flink/user/jobmanager_2, session ID=a1124fe4-7739-452a-8b74-ee2b3fb7dad0.
12:17:38,505 DEBUG org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService - Leader node has changed.
12:17:38,505 DEBUG org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService - New leader information: Leader=akka://flink/user/jobmanager_2, session ID=a1124fe4-7739-452a-8b74-ee2b3fb7dad0.
12:17:38,505 DEBUG org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService - New leader information: Leader=akka://flink/user/jobmanager_2, session ID=a1124fe4-7739-452a-8b74-ee2b3fb7dad0.
12:17:38,505 DEBUG org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService - New leader information: Leader=akka://flink/user/jobmanager_2, session ID=a1124fe4-7739-452a-8b74-ee2b3fb7dad0.
12:17:38,505 INFO org.apache.flink.runtime.testingUtils.TestingJobManager - Using restart strategy NoRestartStrategy for 41b8348843eb617e608df4f200590f37.
12:17:38,505 DEBUG org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService - New leader information: Leader=akka://flink/user/jobmanager_2, session ID=a1124fe4-7739-452a-8b74-ee2b3fb7dad0.
12:17:38,505 DEBUG org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService - Leader node has changed.
12:17:38,506 DEBUG org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService - New leader information: Leader=akka://flink/user/jobmanager_2, session ID=a1124fe4-7739-452a-8b74-ee2b3fb7dad0.
12:17:38,506 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Job recovers via failover strategy: full graph restart
12:17:38,506 DEBUG org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService - Leader node has changed.
12:17:38,506 DEBUG org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService - Leader node has changed.
12:17:38,506 DEBUG org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService - New leader information: Leader=akka://flink/user/jobmanager_2, session ID=a1124fe4-7739-452a-8b74-ee2b3fb7dad0.
12:17:38,506 DEBUG org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService - New leader information: Leader=akka://flink/user/jobmanager_2, session ID=a1124fe4-7739-452a-8b74-ee2b3fb7dad0.
12:17:38,506 INFO org.apache.flink.runtime.testingUtils.TestingJobManager - Running initialization on master for job Blocking Test Job (41b8348843eb617e608df4f200590f37).
12:17:38,506 INFO org.apache.flink.runtime.testutils.TestingResourceManager - Associated JobManager Actor[akka://flink/user/jobmanager_1#-1382860260] lost leader status
12:17:38,506 INFO org.apache.flink.runtime.testingUtils.TestingJobManager - Successfully ran initialization on master in 0 ms.
12:17:38,506 INFO org.apache.flink.runtime.testutils.TestingResourceManager - Trying to associate with JobManager leader akka://flink/user/jobmanager_2
12:17:38,507 DEBUG org.apache.flink.runtime.leaderelection.ZooKeeperLeaderElectionService - Leader node changed while akka://flink/user/jobmanager_2 is the leader with session ID a1124fe4-7739-452a-8b74-ee2b3fb7dad0.
12:17:38,507 INFO org.apache.flink.runtime.testutils.TestingResourceManager - Resource Manager associating with leading JobManager Actor[akka://flink/user/jobmanager_2#2090247331] - leader session a1124fe4-7739-452a-8b74-ee2b3fb7dad0
12:17:38,507 DEBUG org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService - Leader node has changed.
12:17:38,508 DEBUG org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService - New leader information: Leader=akka://flink/user/jobmanager_2, session ID=a1124fe4-7739-452a-8b74-ee2b3fb7dad0.
12:17:38,509 INFO org.apache.flink.runtime.jobmanager.ZooKeeperSubmittedJobGraphStore - Added SubmittedJobGraph(41b8348843eb617e608df4f200590f37, JobInfo(clients: Set((Actor[akka://flink/user/$a#282433225],EXECUTION_RESULT_AND_STATE_CHANGES)), start: 1501676258505)) to ZooKeeper.
12:17:38,510 INFO org.apache.flink.runtime.testingUtils.TestingJobManager - Scheduling job 41b8348843eb617e608df4f200590f37 (Blocking Test Job).
12:17:38,510 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Job Blocking Test Job (41b8348843eb617e608df4f200590f37) switched from state CREATED to RUNNING.
12:17:38,510 INFO org.apache.flink.runtime.client.JobSubmissionClientActor - Job 41b8348843eb617e608df4f200590f37 was successfully submitted to the JobManager akka://flink/deadLetters.
12:17:38,510 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Blocking Vertex (1/1) (19c331a32d8716bc6cd6d5bf7d1f02dd) switched from CREATED to SCHEDULED.
12:17:38,510 INFO org.apache.flink.runtime.client.JobSubmissionClientActor - 08/02/2017 12:17:38 Job execution switched to status RUNNING.
12:17:38,510 INFO org.apache.flink.runtime.client.JobSubmissionClientActor - 08/02/2017 12:17:38 Blocking Vertex(1/1) switched to SCHEDULED
12:17:38,510 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Blocking Vertex (1/1) (19c331a32d8716bc6cd6d5bf7d1f02dd) switched from SCHEDULED to FAILED.
org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailableException: Not enough free slots available to run the job. You can decrease the operator parallelism or increase the number of slots per TaskManager in the configuration. Resources available to scheduler: Number of instances=0, total number of slots=0, available slots=0
at org.apache.flink.runtime.jobmanager.scheduler.Scheduler.scheduleTask(Scheduler.java:334)
at org.apache.flink.runtime.jobmanager.scheduler.Scheduler.allocateSlot(Scheduler.java:139)
at org.apache.flink.runtime.executiongraph.Execution.allocateSlotForExecution(Execution.java:368)
at org.apache.flink.runtime.executiongraph.Execution.scheduleForExecution(Execution.java:309)
at org.apache.flink.runtime.executiongraph.ExecutionVertex.scheduleForExecution(ExecutionVertex.java:596)
at org.apache.flink.runtime.executiongraph.ExecutionJobVertex.scheduleAll(ExecutionJobVertex.java:450)
at org.apache.flink.runtime.executiongraph.ExecutionGraph.scheduleLazy(ExecutionGraph.java:834)
at org.apache.flink.runtime.executiongraph.ExecutionGraph.scheduleForExecution(ExecutionGraph.java:814)
at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply$mcV$sp(JobManager.scala:1425)
at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:1372)
at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:1372)
at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40)
at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
12:17:38,589 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Job Blocking Test Job (41b8348843eb617e608df4f200590f37) switched from state RUNNING to FAILING.
org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailableException: Not enough free slots available to run the job. You can decrease the operator parallelism or increase the number of slots per TaskManager in the configuration. Resources available to scheduler: Number of instances=0, total number of slots=0, available slots=0
at org.apache.flink.runtime.jobmanager.scheduler.Scheduler.scheduleTask(Scheduler.java:334)
at org.apache.flink.runtime.jobmanager.scheduler.Scheduler.allocateSlot(Scheduler.java:139)
at org.apache.flink.runtime.executiongraph.Execution.allocateSlotForExecution(Execution.java:368)
at org.apache.flink.runtime.executiongraph.Execution.scheduleForExecution(Execution.java:309)
at org.apache.flink.runtime.executiongraph.ExecutionVertex.scheduleForExecution(ExecutionVertex.java:596)
at org.apache.flink.runtime.executiongraph.ExecutionJobVertex.scheduleAll(ExecutionJobVertex.java:450)
at org.apache.flink.runtime.executiongraph.ExecutionGraph.scheduleLazy(ExecutionGraph.java:834)
at org.apache.flink.runtime.executiongraph.ExecutionGraph.scheduleForExecution(ExecutionGraph.java:814)
at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply$mcV$sp(JobManager.scala:1425)
at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:1372)
at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:1372)
at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40)
at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
12:17:38,589 INFO org.apache.flink.runtime.client.JobSubmissionClientActor - 08/02/2017 12:17:38 Blocking Vertex(1/1) switched to FAILED
org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailableException: Not enough free slots available to run the job. You can decrease the operator parallelism or increase the number of slots per TaskManager in the configuration. Resources available to scheduler: Number of instances=0, total number of slots=0, available slots=0
at org.apache.flink.runtime.jobmanager.scheduler.Scheduler.scheduleTask(Scheduler.java:334)
at org.apache.flink.runtime.jobmanager.scheduler.Scheduler.allocateSlot(Scheduler.java:139)
at org.apache.flink.runtime.executiongraph.Execution.allocateSlotForExecution(Execution.java:368)
at org.apache.flink.runtime.executiongraph.Execution.scheduleForExecution(Execution.java:309)
at org.apache.flink.runtime.executiongraph.ExecutionVertex.scheduleForExecution(ExecutionVertex.java:596)
at org.apache.flink.runtime.executiongraph.ExecutionJobVertex.scheduleAll(ExecutionJobVertex.java:450)
at org.apache.flink.runtime.executiongraph.ExecutionGraph.scheduleLazy(ExecutionGraph.java:834)
at org.apache.flink.runtime.executiongraph.ExecutionGraph.scheduleForExecution(ExecutionGraph.java:814)
at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply$mcV$sp(JobManager.scala:1425)
at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:1372)
at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:1372)
at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40)
at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
12:17:38,589 DEBUG org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService - Leader node has changed.
12:17:38,589 INFO org.apache.flink.runtime.client.JobSubmissionClientActor - 08/02/2017 12:17:38 Job execution switched to status FAILING.
org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailableException: Not enough free slots available to run the job. You can decrease the operator parallelism or increase the number of slots per TaskManager in the configuration. Resources available to scheduler: Number of instances=0, total number of slots=0, available slots=0
at org.apache.flink.runtime.jobmanager.scheduler.Scheduler.scheduleTask(Scheduler.java:334)
at org.apache.flink.runtime.jobmanager.scheduler.Scheduler.allocateSlot(Scheduler.java:139)
at org.apache.flink.runtime.executiongraph.Execution.allocateSlotForExecution(Execution.java:368)
at org.apache.flink.runtime.executiongraph.Execution.scheduleForExecution(Execution.java:309)
at org.apache.flink.runtime.executiongraph.ExecutionVertex.scheduleForExecution(ExecutionVertex.java:596)
at org.apache.flink.runtime.executiongraph.ExecutionJobVertex.scheduleAll(ExecutionJobVertex.java:450)
at org.apache.flink.runtime.executiongraph.ExecutionGraph.scheduleLazy(ExecutionGraph.java:834)
at org.apache.flink.runtime.executiongraph.ExecutionGraph.scheduleForExecution(ExecutionGraph.java:814)
at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply$mcV$sp(JobManager.scala:1425)
at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:1372)
at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:1372)
at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40)
at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
12:17:38,589 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Try to restart or fail the job Blocking Test Job (41b8348843eb617e608df4f200590f37) if no longer possible.
12:17:38,590 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Job Blocking Test Job (41b8348843eb617e608df4f200590f37) switched from state FAILING to FAILED.
org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailableException: Not enough free slots available to run the job. You can decrease the operator parallelism or increase the number of slots per TaskManager in the configuration. Resources available to scheduler: Number of instances=0, total number of slots=0, available slots=0
at org.apache.flink.runtime.jobmanager.scheduler.Scheduler.scheduleTask(Scheduler.java:334)
at org.apache.flink.runtime.jobmanager.scheduler.Scheduler.allocateSlot(Scheduler.java:139)
at org.apache.flink.runtime.executiongraph.Execution.allocateSlotForExecution(Execution.java:368)
at org.apache.flink.runtime.executiongraph.Execution.scheduleForExecution(Execution.java:309)
at org.apache.flink.runtime.executiongraph.ExecutionVertex.scheduleForExecution(ExecutionVertex.java:596)
at org.apache.flink.runtime.executiongraph.ExecutionJobVertex.scheduleAll(ExecutionJobVertex.java:450)
at org.apache.flink.runtime.executiongraph.ExecutionGraph.scheduleLazy(ExecutionGraph.java:834)
at org.apache.flink.runtime.executiongraph.ExecutionGraph.scheduleForExecution(ExecutionGraph.java:814)
at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply$mcV$sp(JobManager.scala:1425)
at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:1372)
at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:1372)
at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40)
at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
12:17:38,590 DEBUG org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService - New leader information: Leader=akka://flink/user/jobmanager_2, session ID=a1124fe4-7739-452a-8b74-ee2b3fb7dad0.
12:17:38,590 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Could not restart the job Blocking Test Job (41b8348843eb617e608df4f200590f37) because the restart strategy prevented it.
org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailableException: Not enough free slots available to run the job. You can decrease the operator parallelism or increase the number of slots per TaskManager in the configuration. Resources available to scheduler: Number of instances=0, total number of slots=0, available slots=0
at org.apache.flink.runtime.jobmanager.scheduler.Scheduler.scheduleTask(Scheduler.java:334)
at org.apache.flink.runtime.jobmanager.scheduler.Scheduler.allocateSlot(Scheduler.java:139)
at org.apache.flink.runtime.executiongraph.Execution.allocateSlotForExecution(Execution.java:368)
at org.apache.flink.runtime.executiongraph.Execution.scheduleForExecution(Execution.java:309)
at org.apache.flink.runtime.executiongraph.ExecutionVertex.scheduleForExecution(ExecutionVertex.java:596)
at org.apache.flink.runtime.executiongraph.ExecutionJobVertex.scheduleAll(ExecutionJobVertex.java:450)
at org.apache.flink.runtime.executiongraph.ExecutionGraph.scheduleLazy(ExecutionGraph.java:834)
at org.apache.flink.runtime.executiongraph.ExecutionGraph.scheduleForExecution(ExecutionGraph.java:814)
at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply$mcV$sp(JobManager.scala:1425)
at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:1372)
at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:1372)
at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40)
at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
12:17:38,591 INFO org.apache.flink.runtime.client.JobSubmissionClientActor - 08/02/2017 12:17:38 Job execution switched to status FAILED.
12:17:38,591 INFO org.apache.flink.runtime.testingUtils.TestingTaskManager - Trying to register at JobManager akka://flink/user/jobmanager_2 (attempt 1, timeout: 500 milliseconds)
12:17:38,591 INFO org.apache.flink.runtime.client.JobSubmissionClientActor - Terminate JobClientActor.
12:17:38,591 INFO org.apache.flink.runtime.client.JobSubmissionClientActor - Disconnect from JobManager Actor[akka://flink/user/jobmanager_2#2090247331].
12:17:38,591 INFO org.apache.flink.runtime.client.JobClient - Job execution failed
12:17:38,591 INFO org.apache.flink.runtime.instance.InstanceManager - Registered TaskManager at localhost (akka://flink/user/taskmanager_1) as 7b417742cd33c7f2e146a52a7e5597b9. Current number of registered hosts is 1. Current number of alive task slots is 1.
12:17:38,592 INFO org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService - Stopping ZooKeeperLeaderRetrievalService.
12:17:38,593 INFO org.apache.flink.runtime.testingUtils.TestingTaskManager - Stopping TaskManager akka://flink/user/taskmanager_1#2009077452.
12:17:38,593 INFO org.apache.flink.runtime.testingUtils.TestingJobManager - Stopping JobManager akka://flink/user/jobmanager_2.
12:17:38,593 INFO org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService - Stopping ZooKeeperLeaderRetrievalService.
12:17:38,593 INFO org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService - Stopping ZooKeeperLeaderRetrievalService.
12:17:38,593 INFO org.apache.flink.runtime.io.disk.iomanager.IOManager - I/O manager removed spill file directory /tmp/flink-io-54a003b9-6f13-49c8-a585-238ec48019c5
12:17:38,593 INFO org.apache.flink.runtime.io.network.NetworkEnvironment - Shutting down the network environment and its components.
12:17:38,594 INFO org.apache.flink.runtime.jobmanager.ZooKeeperSubmittedJobGraphStore - Removed job graph 41b8348843eb617e608df4f200590f37 from ZooKeeper.
12:17:38,594 INFO org.apache.flink.runtime.testingUtils.TestingTaskManager - Task manager akka://flink/user/taskmanager_1 is completely shut down.
12:17:38,594 INFO org.apache.flink.runtime.leaderelection.ZooKeeperLeaderElectionService - Stopping ZooKeeperLeaderElectionService org.apache.flink.runtime.leaderelection.ZooKeeperLeaderElectionService@527cd5eb.
12:17:38,595 INFO org.apache.flink.runtime.blob.BlobServer - Stopped BLOB server at 0.0.0.0:32853
12:17:38,596 ERROR org.apache.flink.runtime.client.JobClientActorRecoveryITCase -
--------------------------------------------------------------------------------
Test testJobClientRecovery1000(org.apache.flink.runtime.client.JobClientActorRecoveryITCase) failed with:
org.apache.flink.runtime.client.JobExecutionException: Job execution failed.
at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$7.apply$mcV$sp(JobManager.scala:933)
at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$7.apply(JobManager.scala:876)
at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$7.apply(JobManager.scala:876)
at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40)
at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
Caused by: org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailableException: Not enough free slots available to run the job. You can decrease the operator parallelism or increase the number of slots per TaskManager in the configuration. Resources available to scheduler: Number of instances=0, total number of slots=0, available slots=0
at org.apache.flink.runtime.jobmanager.scheduler.Scheduler.scheduleTask(Scheduler.java:334)
at org.apache.flink.runtime.jobmanager.scheduler.Scheduler.allocateSlot(Scheduler.java:139)
at org.apache.flink.runtime.executiongraph.Execution.allocateSlotForExecution(Execution.java:368)
at org.apache.flink.runtime.executiongraph.Execution.scheduleForExecution(Execution.java:309)
at org.apache.flink.runtime.executiongraph.ExecutionVertex.scheduleForExecution(ExecutionVertex.java:596)
at org.apache.flink.runtime.executiongraph.ExecutionJobVertex.scheduleAll(ExecutionJobVertex.java:450)
at org.apache.flink.runtime.executiongraph.ExecutionGraph.scheduleLazy(ExecutionGraph.java:834)
at org.apache.flink.runtime.executiongraph.ExecutionGraph.scheduleForExecution(ExecutionGraph.java:814)
at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply$mcV$sp(JobManager.scala:1425)
at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:1372)
at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:1372)
... 8 more
{code}
> test instability in JobClientActorRecoveryITCase#testJobClientRecovery
> ----------------------------------------------------------------------
>
> Key: FLINK-7351
> URL: https://issues.apache.org/jira/browse/FLINK-7351
> Project: Flink
> Issue Type: Bug
> Components: Job-Submission, Tests
> Affects Versions: 1.3.2
> Reporter: Nico Kruber
> Priority: Minor
> Labels: test-stability
>
> On a 16-core VM, the following test failed during {{mvn clean verify}}
> {code}
> Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 22.814 sec <<< FAILURE! - in org.apache.flink.runtime.client.JobClientActorRecoveryITCase
> testJobClientRecovery(org.apache.flink.runtime.client.JobClientActorRecoveryITCase) Time elapsed: 21.299 sec <<< ERROR!
> org.apache.flink.runtime.client.JobExecutionException: Job execution failed.
> at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$7.apply$mcV$sp(JobManager.scala:933)
> at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$7.apply(JobManager.scala:876)
> at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$7.apply(JobManager.scala:876)
> at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
> at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
> at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40)
> at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397)
> at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
> at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
> at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
> at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> Caused by: org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailableException: Not enough free slots available to run the job. You can decrease the operator parallelism or increase the number of slots per TaskManager in the configuration. Resources available to scheduler: Number of instances=0, total number of slots=0, available slots=0
> at org.apache.flink.runtime.jobmanager.scheduler.Scheduler.scheduleTask(Scheduler.java:334)
> at org.apache.flink.runtime.jobmanager.scheduler.Scheduler.allocateSlot(Scheduler.java:139)
> at org.apache.flink.runtime.executiongraph.Execution.allocateSlotForExecution(Execution.java:368)
> at org.apache.flink.runtime.executiongraph.Execution.scheduleForExecution(Execution.java:309)
> at org.apache.flink.runtime.executiongraph.ExecutionVertex.scheduleForExecution(ExecutionVertex.java:596)
> at org.apache.flink.runtime.executiongraph.ExecutionJobVertex.scheduleAll(ExecutionJobVertex.java:450)
> at org.apache.flink.runtime.executiongraph.ExecutionGraph.scheduleLazy(ExecutionGraph.java:834)
> at org.apache.flink.runtime.executiongraph.ExecutionGraph.scheduleForExecution(ExecutionGraph.java:814)
> at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply$mcV$sp(JobManager.scala:1425)
> at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:1372)
> at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:1372)
> at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
> at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
> at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40)
> at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397)
> at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
> at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
> at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
> at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> {code}
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)