You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Till Rohrmann (JIRA)" <ji...@apache.org> on 2018/11/08 09:35:00 UTC

[jira] [Commented] (FLINK-10818) RestartStrategies.fixedDelayRestart Occur NoResourceAvailableException: Not enough free slots available to run the job.

    [ https://issues.apache.org/jira/browse/FLINK-10818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16679502#comment-16679502 ] 

Till Rohrmann commented on FLINK-10818:
---------------------------------------

Could you check whether your Yarn cluster had actually the required resources? If you have other jobs running in your cluster, then it could happen that they take the required resources. Moreover, you could check whether the problem also occurs with Flink {{1.6.2}} and the new mode (not legacy).

> RestartStrategies.fixedDelayRestart Occur  NoResourceAvailableException: Not enough free slots available to run the job.
> ------------------------------------------------------------------------------------------------------------------------
>
>                 Key: FLINK-10818
>                 URL: https://issues.apache.org/jira/browse/FLINK-10818
>             Project: Flink
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 1.6.2
>         Environment: JDK 1.8
> Flink 1.6.0 
> Hadoop 2.7.3
>            Reporter: ambition
>            Priority: Major
>
>  Our Online Flink on Yarn environment operation  job,code set restart tactic like 
> {code:java}
> exeEnv.setRestartStrategy(RestartStrategies.fixedDelayRestart(5,1000l));
> {code}
> But job running some days, Occur Exception is :
> {code:java}
> org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailableException: Not enough free slots available to run the job. You can decrease the operator parallelism or increase the number of slots per TaskManager in the configuration. Task to schedule: < Attempt #5 (Source: KafkaJsonTableSource -> Map -> where: (AND(OR(=(app_key, _UTF-16LE'C4FAF9CE1569F541'), =(app_key, _UTF-16LE'F5C7F68C7117630B'), =(app_key, _UTF-16LE'57C6FF4B5A064D29')), OR(=(LOWER(TRIM(FLAG(BOTH), _UTF-16LE' ', os_type)), _UTF-16LE'ios'), =(LOWER(TRIM(FLAG(BOTH), _UTF-16LE' ', os_type)), _UTF-16LE'android')), IS NOT NULL(server_id))), select: (MT_Date_Format_Mode(receive_time, _UTF-16LE'yyyyMMddHHmm', 10) AS date_p, LOWER(TRIM(FLAG(BOTH), _UTF-16LE' ', os_type)) AS os_type, MT_Date_Format_Mode(receive_time, _UTF-16LE'HHmm', 10) AS date_mm, server_id) (1/6)) @ (unassigned) - [SCHEDULED] > with groupID < cbc357ccb763df2852fee8c4fc7d55f2 > in sharing group < 690dbad267a8ff37c8cb5e9dbedd0a6d >. Resources available to scheduler: Number of instances=6, total number of slots=6, available slots=0
>    at org.apache.flink.runtime.jobmanager.scheduler.Scheduler.scheduleTask(Scheduler.java:281)
>    at org.apache.flink.runtime.jobmanager.scheduler.Scheduler.allocateSlot(Scheduler.java:155)
>    at org.apache.flink.runtime.executiongraph.Execution.lambda$allocateAndAssignSlotForExecution$2(Execution.java:491)
>    at org.apache.flink.runtime.executiongraph.Execution$$Lambda$44/1664178385.apply(Unknown Source)
>    at java.util.concurrent.CompletableFuture.uniComposeStage(CompletableFuture.java:981)
>    at java.util.concurrent.CompletableFuture.thenCompose(CompletableFuture.java:2116)
>    at org.apache.flink.runtime.executiongraph.Execution.allocateAndAssignSlotForExecution(Execution.java:489)
>    at org.apache.flink.runtime.executiongraph.ExecutionJobVertex.allocateResourcesForAll(ExecutionJobVertex.java:521)
>    at org.apache.flink.runtime.executiongraph.ExecutionGraph.scheduleEager(ExecutionGraph.java:945)
>    at org.apache.flink.runtime.executiongraph.ExecutionGraph.scheduleForExecution(ExecutionGraph.java:875)
>    at org.apache.flink.runtime.executiongraph.ExecutionGraph.restart(ExecutionGraph.java:1262)
>    at org.apache.flink.runtime.executiongraph.restart.ExecutionGraphRestartCallback.triggerFullRecovery(ExecutionGraphRestartCallback.java:59)
>    at org.apache.flink.runtime.executiongraph.restart.FixedDelayRestartStrategy$1.run(FixedDelayRestartStrategy.java:68)
>    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
>    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
>    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>    at java.lang.Thread.run(Thread.java:745)
> {code}
>  
> this Exception happened when the job started. issue links to 
> https://issues.apache.org/jira/browse/FLINK-4486
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)