You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Sergei Poganshev <s....@slice.com> on 2018/12/12 13:07:43 UTC

Flink 1.7 jobmanager tries to lookup taskmanager by its hostname in k8s environment

When I to deploy Flink 1.7 job to Kubernetes, the job itself runs, but upon
visiting Flink UI I can see no metrics and there are WARN messages in
jobmanager's log:

[flink-metrics-14] WARN akka.remote.ReliableDeliverySupervisor
flink-metrics-akka.remote.default-remote-dispatcher-3 - Association with
remote system
[akka.tcp://flink-metrics@adhoc-historical-taskmanager-d4b65dfd4-h5nrx:44491]
has failed, address is now gated for [50] ms. Reason: [Association failed
with [akka.tcp://flink-metrics@adhoc-historical-taskmanager-d4b65dfd4-h5nrx:44491]]
Caused by: [adhoc-historical-taskmanager-d4b65dfd4-h5nrx: Name or service
not known]

Note: adhoc-historical-taskmanager-d4b65dfd4-h5nrx is a hostname of a pod
on which taskmanager is running.

So, jobmanager tries to resolve taskmanager's hostname (which probably got
to it from taskmanager itself) on a random port. How can this be mitigated?

Re: Flink 1.7 jobmanager tries to lookup taskmanager by its hostname in k8s environment

Posted by Chesnay Schepler <ch...@apache.org>.
This is a known issue, see 
https://issues.apache.org/jira/browse/FLINK-11127.

I'm not aware of a workaround.

On 12.12.2018 14:07, Sergei Poganshev wrote:
> When I to deploy Flink 1.7 job to Kubernetes, the job itself runs, but 
> upon visiting Flink UI I can see no metrics and there are WARN 
> messages in jobmanager's log:
>
> [flink-metrics-14] WARN akka.remote.ReliableDeliverySupervisor 
> flink-metrics-akka.remote.default-remote-dispatcher-3 - Association 
> with remote system 
> [akka.tcp://flink-metrics@adhoc-historical-taskmanager-d4b65dfd4-h5nrx:44491] 
> has failed, address is now gated for [50] ms. Reason: [Association 
> failed with 
> [akka.tcp://flink-metrics@adhoc-historical-taskmanager-d4b65dfd4-h5nrx:44491]] 
> Caused by: [adhoc-historical-taskmanager-d4b65dfd4-h5nrx: Name or 
> service not known]
>
> Note: adhoc-historical-taskmanager-d4b65dfd4-h5nrx is a hostname of a 
> pod on which taskmanager is running.
>
> So, jobmanager tries to resolve taskmanager's hostname (which probably 
> got to it from taskmanager itself) on a random port. How can this be 
> mitigated?
>
>