You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@flink.apache.org by 杨力 <bi...@gmail.com> on 2018/05/22 08:33:01 UTC

Jobs running on a yarn per-job cluster fail to restart when a task manager is lost

Hi,

I am running a streaming job without checkpointing enabled. A failute rate
restart strategy have been set with
StreamExecutionEvironment.setRestartStrategy.

When a task manager is lost because of memory problems, the job manager try
to restart the job without launching a new task manager, and failed with
NoResourceAvailableException: Not enough slots available to run the job.

The job is running on flink 1.4.2 and Hadoop 2.7.4.