You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Sergei Poganshev <s....@slice.com> on 2018/12/12 13:07:43 UTC
Flink 1.7 jobmanager tries to lookup taskmanager by its hostname in
k8s environment
When I to deploy Flink 1.7 job to Kubernetes, the job itself runs, but upon
visiting Flink UI I can see no metrics and there are WARN messages in
jobmanager's log:
[flink-metrics-14] WARN akka.remote.ReliableDeliverySupervisor
flink-metrics-akka.remote.default-remote-dispatcher-3 - Association with
remote system
[akka.tcp://flink-metrics@adhoc-historical-taskmanager-d4b65dfd4-h5nrx:44491]
has failed, address is now gated for [50] ms. Reason: [Association failed
with [akka.tcp://flink-metrics@adhoc-historical-taskmanager-d4b65dfd4-h5nrx:44491]]
Caused by: [adhoc-historical-taskmanager-d4b65dfd4-h5nrx: Name or service
not known]
Note: adhoc-historical-taskmanager-d4b65dfd4-h5nrx is a hostname of a pod
on which taskmanager is running.
So, jobmanager tries to resolve taskmanager's hostname (which probably got
to it from taskmanager itself) on a random port. How can this be mitigated?
Re: Flink 1.7 jobmanager tries to lookup taskmanager by its hostname
in k8s environment
Posted by Chesnay Schepler <ch...@apache.org>.
This is a known issue, see
https://issues.apache.org/jira/browse/FLINK-11127.
I'm not aware of a workaround.
On 12.12.2018 14:07, Sergei Poganshev wrote:
> When I to deploy Flink 1.7 job to Kubernetes, the job itself runs, but
> upon visiting Flink UI I can see no metrics and there are WARN
> messages in jobmanager's log:
>
> [flink-metrics-14] WARN akka.remote.ReliableDeliverySupervisor
> flink-metrics-akka.remote.default-remote-dispatcher-3 - Association
> with remote system
> [akka.tcp://flink-metrics@adhoc-historical-taskmanager-d4b65dfd4-h5nrx:44491]
> has failed, address is now gated for [50] ms. Reason: [Association
> failed with
> [akka.tcp://flink-metrics@adhoc-historical-taskmanager-d4b65dfd4-h5nrx:44491]]
> Caused by: [adhoc-historical-taskmanager-d4b65dfd4-h5nrx: Name or
> service not known]
>
> Note: adhoc-historical-taskmanager-d4b65dfd4-h5nrx is a hostname of a
> pod on which taskmanager is running.
>
> So, jobmanager tries to resolve taskmanager's hostname (which probably
> got to it from taskmanager itself) on a random port. How can this be
> mitigated?
>
>