You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Maximilian Michels (JIRA)" <ji...@apache.org> on 2016/08/29 16:21:21 UTC
[jira] [Resolved] (FLINK-4486) JobManager not fully running when yarn-session.sh finishes

     [ https://issues.apache.org/jira/browse/FLINK-4486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Maximilian Michels resolved FLINK-4486.
---------------------------------------
    Resolution: Fixed

master: ab1df63c3fd419c23631f3b55b506e6fdf3cb72f
release-1.1: 4cdeb11854956ac6cf1189d7cfa43628fb3be328

> JobManager not fully running when yarn-session.sh finishes
> ----------------------------------------------------------
>
>                 Key: FLINK-4486
>                 URL: https://issues.apache.org/jira/browse/FLINK-4486
>             Project: Flink
>          Issue Type: Bug
>          Components: YARN Client
>    Affects Versions: 1.1.0
>            Reporter: Niels Basjes
>            Assignee: Maximilian Michels
>             Fix For: 1.2.0, 1.1.2
>
>
> I start a detached yarn-session.sh.
> If the Yarn cluster is very busy then the yarn-session.sh script completes BEFORE all the task slots have been allocated. As a consequence I sometimes have a jobmanager without any task slots. Over time these task slots are assigned by the Yarn cluster but these are not available for the first job that is submitted.
> As a consequence I have found that the first few tasks in my job fail with this error "Not enough free slots available to run the job.".
> I think the desirable behavior is that yarn-session waits until the jobmanager is fully functional and capable of actually running the jobs.
> {code}
> org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailableException: Not enough free slots available to run the job. You can decrease the operator parallelism or increase the number of slots per TaskManager in the configuration. Task to schedule: < Attempt #0 (CHAIN DataSource (Read prefix '4') -> Map (Map prefix '4') (8/10)) @ (unassigned) - [SCHEDULED] > with groupID < cd6c37df290564e603da908a8783a9bf > in sharing group < SlotSharingGroup [c0b6eff6ce93967182cdb6dfeae9359b, 8b2c3b39f3a55adf9f123243ab03c9c1, 55fb94dd8a3e5f59a10dbbf5c4925db4, 433b2e4a05a5e685b48c517249755a89, 8c74690c35454064e4815ac3756cdca2, 4b4fbd24f3483030fd852b38ff2249c1, 5e36a56ea4dece18fe5ba04352d90dc8, cd6c37df290564e603da908a8783a9bf, 64eafa845087bee70735f7250df9994f, 706a5d6fe48ae57724a00a9fce5dae8a, 7bee4297e0e839e53a153dfcbcca8624, 21b58f7d408d237540ae7b4734f81a1d, b429b1ff338d9d73677f42717cfc0dbc, cc7491db641f557c6aa8c749ebc2de62, f61cbf0ae00331f67aaf60ace78b05aa, 606f02ea9e0f4ad57f0cc0232dd70842] >. Resources available to scheduler: Number of instances=1, total number of slots=7, available slots=0
> 	at org.apache.flink.runtime.jobmanager.scheduler.Scheduler.scheduleTask(Scheduler.java:256)
> 	at org.apache.flink.runtime.jobmanager.scheduler.Scheduler.scheduleImmediately(Scheduler.java:131)
> 	at org.apache.flink.runtime.executiongraph.Execution.scheduleForExecution(Execution.java:306)
> 	at org.apache.flink.runtime.executiongraph.ExecutionVertex.scheduleForExecution(ExecutionVertex.java:454)
> 	at org.apache.flink.runtime.executiongraph.ExecutionJobVertex.scheduleAll(ExecutionJobVertex.java:326)
> 	at org.apache.flink.runtime.executiongraph.ExecutionGraph.scheduleForExecution(ExecutionGraph.java:734)
> 	at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply$mcV$sp(JobManager.scala:1332)
> 	at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:1291)
> 	at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:1291)
> 	at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
> 	at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
> 	at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:41)
> 	at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:401)
> 	at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
> 	at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.pollAndExecAll(ForkJoinPool.java:1253)
> 	at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1346)
> 	at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
> 	at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)