You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Rafi Aroch <ra...@gmail.com> on 2019/09/22 09:34:22 UTC

JM fails connecting to TM Metrics service on AWS ECS

Hi,

I have a Flink 1.9.0 cluster deployed on AWS ECS. Cluster is running, but
metrics are not showing in the UI.

For other services (RPC / Data) it works because the connection is
initiated from the TM to the JM through a load-balancer. But it does not
work for metrics where JM tries to initiate a connection with the TMs.

Currently, Flink uses *taskmanager.host* configuration as both 'bind
address' and 'advertised address'. When TM starts, it binds to the internal
Docker IP which is not accessible from the JM.

Also, the TM *metrics.internal.query-service.port* is set to a specific
port which is dynamically bind to a random ECS host port.

It seems that I need a separate setting for bind-address/port vs
advertised-address/port.

I saw there were several discussions on this issue also for Kubernetes:
https://issues.apache.org/jira/browse/FLINK-11127
There was also an attempt to solve this by using Akka configurations here:
https://hub.docker.com/r/lzaugg/flink-taskmanager/

Can someone suggest a solution for this issue on AWS ECS?

Would appreciate your help.

Thanks,
Rafi