You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flink.apache.org by "Stephan Ewen (JIRA)" <ji...@apache.org> on 2016/02/05 16:03:40 UTC
[jira] [Created] (FLINK-3347) TaskManager ActorSystems need to
restart themselves in case they notice quarantine
Stephan Ewen created FLINK-3347:
-----------------------------------
Summary: TaskManager ActorSystems need to restart themselves in case they notice quarantine
Key: FLINK-3347
URL: https://issues.apache.org/jira/browse/FLINK-3347
Project: Flink
Issue Type: Improvement
Components: TaskManager
Affects Versions: 0.10.1
Reporter: Stephan Ewen
Fix For: 1.0.0
There are cases where Akka quarantines remote actor systems. In that case, no further communication is possible with that actor system unless one of the two actor systems is restarted.
The result is that a TaskManager is up and available, but cannot register at the JobManager (Akka refuses connection because of the quarantined state), making the TaskManager a useless process.
I suggest to let the TaskManager restart itself once it notices that either it quarantined the JobManager, or the JobManager quarantined it.
It is possible to recognize that by listening to certain events in the actor system event stream: http://stackoverflow.com/questions/32471088/akka-cluster-detecting-quarantined-state
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)