You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Till Rohrmann (Jira)" <ji...@apache.org> on 2020/01/31 11:19:00 UTC

[jira] [Closed] (FLINK-8829) Flink in EMR(YARN) is down due to Akka communication issue

     [ https://issues.apache.org/jira/browse/FLINK-8829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Till Rohrmann closed FLINK-8829.
--------------------------------
    Resolution: Abandoned

Closing for inactivity.

> Flink in EMR(YARN) is down due to Akka communication issue
> ----------------------------------------------------------
>
>                 Key: FLINK-8829
>                 URL: https://issues.apache.org/jira/browse/FLINK-8829
>             Project: Flink
>          Issue Type: Bug
>          Components: Deployment / YARN
>    Affects Versions: 1.3.2
>            Reporter: Aleksandr Filichkin
>            Priority: Major
>
> Hi,
> We have running Flink 1.3.2 app in Amazon EMR with YARN. Every week our Flink job is down due to:
> _2018-02-16 19:00:04,595 WARN akka.remote.ReliableDeliverySupervisor - Association with remote system [akka.tcp://[flink@ip-10-97-34-209.tr-fr-nonprod.aws-int.com:42177]|mailto:flink@ip-10-97-34-209.tr-fr-nonprod.aws-int.com:42177]] has failed, address is now gated for [5000] ms. Reason: [Association failed with [akka.tcp://[flink@ip-10-97-34-209.tr-fr-nonprod.aws-int.com:42177]]|mailto:flink@ip-10-97-34-209.tr-fr-nonprod.aws-int.com:42177]]] Caused by: [Connection refused: ip-10-97-34-209.tr-fr-nonprod.aws-int.com/10.97.34.209:42177] 2018-02-16 19:00:05,593 WARN akka.remote.RemoteWatcher - Detected unreachable: [akka.tcp://[flink@ip-10-97-34-209.tr-fr-nonprod.aws-int.com:42177]|mailto:flink@ip-10-97-34-209.tr-fr-nonprod.aws-int.com:42177]] 2018-02-16 19:00:05,596 INFO org.apache.flink.runtime.client.JobSubmissionClientActor - Lost connection to JobManager akka.tcp://[flink@ip-10-97-34-209.tr-fr-nonprod.aws-int.com:42177/user/jobmanager|mailto:flink@ip-10-97-34-209.tr-fr-nonprod.aws-int.thomsonreuters.com:42177/user/jobmanager]. Triggering connection timeout._
> Do you have any ideas how to troubleshoot it?
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)