You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by HarshithBolar <hk...@arity.com> on 2018/08/30 12:19:01 UTC

What are the general reasons for a Flink Task Manager to crash? How to troubleshoot?

We're running Flink on a 5 node Flink cluster with two Job Managers and three
Task Managers.

Of late, we're facing this issue where once every day or so, all three task
managers get killed, making the number of available task slots 0 causing all
the jobs running on that cluster to fail. The only resolution is to manually
restart the Task Managers.

So I wanted to know some of the typical reason that can bring down a Task
Manager. And if there is a way to automatically bring them back up without
manual intervention.

Additional info: The jobs running on the cluster read data from Kafka and
write data to Kafka/Cassandra.



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/