You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flink.apache.org by "Aleksandr Filichkin (JIRA)" <ji...@apache.org> on 2018/03/02 07:44:00 UTC

[jira] [Created] (FLINK-8829) Flink in EMR(YARN) is down due to Akka communication issue

Aleksandr Filichkin created FLINK-8829:
------------------------------------------

             Summary: Flink in EMR(YARN) is down due to Akka communication issue
                 Key: FLINK-8829
                 URL: https://issues.apache.org/jira/browse/FLINK-8829
             Project: Flink
          Issue Type: Bug
          Components: YARN
    Affects Versions: 1.3.2
            Reporter: Aleksandr Filichkin


Hi,

We have running Flink 1.3.2 app in Amazon EMR. Every week our Flink job is down due to:

_2018-02-16 19:00:04,595 WARN akka.remote.ReliableDeliverySupervisor - Association with remote system [akka.tcp://[flink@ip-10-97-34-209.tr-fr-nonprod.aws-int.thomsonreuters.com:42177]|mailto:flink@ip-10-97-34-209.tr-fr-nonprod.aws-int.thomsonreuters.com:42177]] has failed, address is now gated for [5000] ms. Reason: [Association failed with [akka.tcp://[flink@ip-10-97-34-209.tr-fr-nonprod.aws-int.thomsonreuters.com:42177]]|mailto:flink@ip-10-97-34-209.tr-fr-nonprod.aws-int.thomsonreuters.com:42177]]] Caused by: [Connection refused: ip-10-97-34-209.tr-fr-nonprod.aws-int.thomsonreuters.com/10.97.34.209:42177] 2018-02-16 19:00:05,593 WARN akka.remote.RemoteWatcher - Detected unreachable: [akka.tcp://[flink@ip-10-97-34-209.tr-fr-nonprod.aws-int.thomsonreuters.com:42177]|mailto:flink@ip-10-97-34-209.tr-fr-nonprod.aws-int.thomsonreuters.com:42177]] 2018-02-16 19:00:05,596 INFO org.apache.flink.runtime.client.JobSubmissionClientActor - Lost connection to JobManager akka.tcp://[flink@ip-10-97-34-209.tr-fr-nonprod.aws-int.thomsonreuters.com:42177/user/jobmanager|mailto:flink@ip-10-97-34-209.tr-fr-nonprod.aws-int.thomsonreuters.com:42177/user/jobmanager]. Triggering connection timeout._

Do you have any ideas how to troubleshoot it?

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)