You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Till Rohrmann (Jira)" <ji...@apache.org> on 2019/09/27 09:55:00 UTC
[jira] [Closed] (FLINK-6147) flink client can't detect cluster is
down
[ https://issues.apache.org/jira/browse/FLINK-6147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Till Rohrmann closed FLINK-6147.
--------------------------------
Resolution: Invalid
No longer valid with the Flip-6 changes.
> flink client can't detect cluster is down
> -----------------------------------------
>
> Key: FLINK-6147
> URL: https://issues.apache.org/jira/browse/FLINK-6147
> Project: Flink
> Issue Type: Bug
> Components: Command Line Client
> Affects Versions: 1.2.0, 1.3.0
> Reporter: Yelei Feng
> Priority: Major
> Labels: client
>
> I tested in yarn mode, reproduce step:
> 1. flink run xx.jar
> 2. kill yarn application
> CLI hangs there only showing "New JobManager elected. Connecting to null " instead of cleanup and close itself.
> After some digging, I found the main logic is in {{JobClientActor}}. It would terminate itself once receiving message {{ConnectionTimeout}}. It receive jobmanager status changes from two sources: zookeeper and akka deathwatch. Client sets current {{leaderSessionId}} and unwatch previous jobmanager from zk, receives {{Teminated}} of previous jobmanager from akka deathwatch and send {{ConnectionTimeout}} to itself after 60s. In a great chance, they would interfere with each other.
>
> Situation1:
> 1. client get notified from zk, set {{leaderSessionId}} to null
> 2. client unwatch previous jobmanager
> 3. msg {{Teminated}} of previous jobmanager never got received
> Situation 2:
> 1. msg {{Teminated}} of current jobmanager is received
> 2. schedule msg {{ConnectionTimeout}} after 60s
> 3. client get notified from zk, set {{leaderSessionId}} to null in less than 60s
> 4. {{ConnectionTimeout}} will be filtered out due to different {{leaderSessionId}}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)