You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Le Xu <sh...@gmail.com> on 2023/04/05 04:43:09 UTC
Flink Kubernetes Session sample from Documentation
Hello!
I'm trying out the Kubernetes sample
<https://nightlies.apache.org/flink/flink-docs-release-1.17/docs/deployment/resource-providers/native_kubernetes/#starting-a-flink-session-on-kubernetes>
described in the official doc but I am not able to submit job with the
following error:
-----------------------------------------------------------------------------------------------------
org.apache.flink.client.program.ProgramInvocationException: The main method
caused an error:
org.apache.flink.client.deployment.ClusterRetrieveException: Could not
create the RestClusterClient.
at
org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:372)
at
org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:222)
at
org.apache.flink.client.ClientUtils.executeProgram(ClientUtils.java:105)
at
org.apache.flink.client.cli.CliFrontend.executeProgram(CliFrontend.java:851)
at org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:245)
at
org.apache.flink.client.cli.CliFrontend.parseAndRun(CliFrontend.java:1095)
at
org.apache.flink.client.cli.CliFrontend.lambda$mainInternal$9(CliFrontend.java:1189)
at
org.apache.flink.runtime.security.contexts.NoOpSecurityContext.runSecured(NoOpSecurityContext.java:28)
at
org.apache.flink.client.cli.CliFrontend.mainInternal(CliFrontend.java:1189)
at
org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1157)
Caused by: java.lang.RuntimeException:
org.apache.flink.client.deployment.ClusterRetrieveException: Could not
create the RestClusterClient.
at
org.apache.flink.kubernetes.KubernetesClusterDescriptor.lambda$createClusterClientProvider$1(KubernetesClusterDescriptor.java:121)
at
org.apache.flink.kubernetes.KubernetesClusterDescriptor.retrieve(KubernetesClusterDescriptor.java:148)
at
org.apache.flink.kubernetes.KubernetesClusterDescriptor.retrieve(KubernetesClusterDescriptor.java:69)
at
org.apache.flink.client.deployment.executors.AbstractSessionClusterExecutor.execute(AbstractSessionClusterExecutor.java:80)
at
org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.executeAsync(StreamExecutionEnvironment.java:2197)
at
org.apache.flink.client.program.StreamContextEnvironment.executeAsync(StreamContextEnvironment.java:189)
at
org.apache.flink.client.program.StreamContextEnvironment.execute(StreamContextEnvironment.java:118)
at
org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.java:2058)
at
org.apache.flink.streaming.examples.windowing.TopSpeedWindowing.main(TopSpeedWindowing.java:154)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at
org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:355)
... 9 more
Caused by: org.apache.flink.client.deployment.ClusterRetrieveException:
Could not create the RestClusterClient.
... 23 more
Caused by: java.net.UnknownHostException:
my-first-flink-cluster-rest.default: Name or service not known
at java.net.Inet6AddressImpl.lookupAllHostAddr(Native Method)
at java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:929)
at
java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1324)
at java.net.InetAddress.getAllByName0(InetAddress.java:1277)
at java.net.InetAddress.getAllByName(InetAddress.java:1193)
at java.net.InetAddress.getAllByName(InetAddress.java:1127)
at java.net.InetAddress.getByName(InetAddress.java:1077)
at
org.apache.flink.runtime.highavailability.HighAvailabilityServicesUtils.getWebMonitorAddress(HighAvailabilityServicesUtils.java:229)
at
org.apache.flink.kubernetes.KubernetesClusterDescriptor.getWebMonitorAddress(KubernetesClusterDescriptor.java:140)
at
org.apache.flink.kubernetes.KubernetesClusterDescriptor.lambda$null$0(KubernetesClusterDescriptor.java:119)
at
org.apache.flink.client.program.rest.RestClusterClient.<init>(RestClusterClient.java:237)
at
org.apache.flink.client.program.rest.RestClusterClient.<init>(RestClusterClient.java:197)
at
org.apache.flink.kubernetes.KubernetesClusterDescriptor.lambda$createClusterClientProvider$1(KubernetesClusterDescriptor.java:114)
... 22 more
"
-----------------------------------------------------------------------------------------------------
My kubernetes service does have DNS running (see the following):
-----------------------------------------------------------------------------------------------------
root@node0:/mydata/flink-1.17.0# kubectl get pods -n kube-system
NAME
READY STATUS RESTARTS AGE
calico-kube-controllers-6d674b5f78-6xjv8 0/1
CrashLoopBackOff 45 (9s ago) 3h33m
calico-node-49qlx 0/1
Running 0 3h33m
calico-node-gds4w 0/1
Running 0 3h33m
calico-node-rc999 0/1
Running 0 3h33m
coredns-787d4945fb-76qw6 1/1
Running 0 2d4h
coredns-787d4945fb-wwclv 1/1
Running 0 2d4h
etcd-node0.dirigo-newfs.wisr-pg0.utah.cloudlab.us 1/1
Running 1 (9h ago) 2d4h
kube-apiserver-node0.dirigo-newfs.wisr-pg0.utah.cloudlab.us 1/1
Running 35 (9h ago) 2d4h
kube-controller-manager-node0.dirigo-newfs.wisr-pg0.utah.cloudlab.us 1/1
Running 4 (9h ago) 2d4h
kube-proxy-8g6zk 1/1
Running 1 (9h ago) 2d4h
kube-proxy-p2ph9 1/1
Running 0 7h47m
kube-proxy-w2whd 1/1
Running 0 7h41m
kube-scheduler-node0.dirigo-newfs.wisr-pg0.utah.cloudlab.us 1/1
Running 4 (9h ago) 2d4h
-----------------------------------------------------------------------------------------------------
And my service appears to be running normally (I'm using my own cluster,
changing the exposure type to NodePort produces the similar error):
-----------------------------------------------------------------------------------------------------
root@node0:/mydata/flink-1.17.0# kubectl get services
NAME TYPE CLUSTER-IP EXTERNAL-IP
PORT(S) AGE
kubernetes ClusterIP 10.96.0.1 <none>
443/TCP 2d4h
my-first-flink-cluster ClusterIP None <none>
6123/TCP,6124/TCP 60s
my-first-flink-cluster-rest ClusterIP 10.98.42.188 <none>
8081/TCP 60s
-----------------------------------------------------------------------------------------------------
Any suggestions on what might be going on with my setup?
Thanks!
Le
Re: Flink Kubernetes Session sample from Documentation
Posted by Le Xu <sh...@gmail.com>.
Thanks -- I fixed the DNS setup and it solved the problem.
Le
On Thu, Apr 6, 2023 at 12:19 AM Weihua Hu <hu...@gmail.com> wrote:
> Hi, Le
>
> It looks like a DNS issue. Maybe you can try to ping or nslookup the
> 'my-first-flink-cluster-rest.default'
> on flink operator pods to check whether dns service is normal.
>
> Best,
> Weihua
>
>
> On Wed, Apr 5, 2023 at 12:43 PM Le Xu <sh...@gmail.com> wrote:
>
>> Hello!
>>
>> I'm trying out the Kubernetes sample
>> <https://nightlies.apache.org/flink/flink-docs-release-1.17/docs/deployment/resource-providers/native_kubernetes/#starting-a-flink-session-on-kubernetes>
>> described in the official doc but I am not able to submit job with the
>> following error:
>>
>>
>> -----------------------------------------------------------------------------------------------------
>> org.apache.flink.client.program.ProgramInvocationException: The main
>> method caused an error:
>> org.apache.flink.client.deployment.ClusterRetrieveException: Could not
>> create the RestClusterClient.
>> at
>> org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:372)
>> at
>> org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:222)
>> at
>> org.apache.flink.client.ClientUtils.executeProgram(ClientUtils.java:105)
>> at
>> org.apache.flink.client.cli.CliFrontend.executeProgram(CliFrontend.java:851)
>> at
>> org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:245)
>> at
>> org.apache.flink.client.cli.CliFrontend.parseAndRun(CliFrontend.java:1095)
>> at
>> org.apache.flink.client.cli.CliFrontend.lambda$mainInternal$9(CliFrontend.java:1189)
>> at
>> org.apache.flink.runtime.security.contexts.NoOpSecurityContext.runSecured(NoOpSecurityContext.java:28)
>> at
>> org.apache.flink.client.cli.CliFrontend.mainInternal(CliFrontend.java:1189)
>> at
>> org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1157)
>> Caused by: java.lang.RuntimeException:
>> org.apache.flink.client.deployment.ClusterRetrieveException: Could not
>> create the RestClusterClient.
>> at
>> org.apache.flink.kubernetes.KubernetesClusterDescriptor.lambda$createClusterClientProvider$1(KubernetesClusterDescriptor.java:121)
>> at
>> org.apache.flink.kubernetes.KubernetesClusterDescriptor.retrieve(KubernetesClusterDescriptor.java:148)
>> at
>> org.apache.flink.kubernetes.KubernetesClusterDescriptor.retrieve(KubernetesClusterDescriptor.java:69)
>> at
>> org.apache.flink.client.deployment.executors.AbstractSessionClusterExecutor.execute(AbstractSessionClusterExecutor.java:80)
>> at
>> org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.executeAsync(StreamExecutionEnvironment.java:2197)
>> at
>> org.apache.flink.client.program.StreamContextEnvironment.executeAsync(StreamContextEnvironment.java:189)
>> at
>> org.apache.flink.client.program.StreamContextEnvironment.execute(StreamContextEnvironment.java:118)
>> at
>> org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.java:2058)
>> at
>> org.apache.flink.streaming.examples.windowing.TopSpeedWindowing.main(TopSpeedWindowing.java:154)
>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>> at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>> at java.lang.reflect.Method.invoke(Method.java:498)
>> at
>> org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:355)
>> ... 9 more
>> Caused by: org.apache.flink.client.deployment.ClusterRetrieveException:
>> Could not create the RestClusterClient.
>> ... 23 more
>> Caused by: java.net.UnknownHostException:
>> my-first-flink-cluster-rest.default: Name or service not known
>> at java.net.Inet6AddressImpl.lookupAllHostAddr(Native Method)
>> at java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:929)
>> at
>> java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1324)
>> at java.net.InetAddress.getAllByName0(InetAddress.java:1277)
>> at java.net.InetAddress.getAllByName(InetAddress.java:1193)
>> at java.net.InetAddress.getAllByName(InetAddress.java:1127)
>> at java.net.InetAddress.getByName(InetAddress.java:1077)
>> at
>> org.apache.flink.runtime.highavailability.HighAvailabilityServicesUtils.getWebMonitorAddress(HighAvailabilityServicesUtils.java:229)
>> at
>> org.apache.flink.kubernetes.KubernetesClusterDescriptor.getWebMonitorAddress(KubernetesClusterDescriptor.java:140)
>> at
>> org.apache.flink.kubernetes.KubernetesClusterDescriptor.lambda$null$0(KubernetesClusterDescriptor.java:119)
>> at
>> org.apache.flink.client.program.rest.RestClusterClient.<init>(RestClusterClient.java:237)
>> at
>> org.apache.flink.client.program.rest.RestClusterClient.<init>(RestClusterClient.java:197)
>> at
>> org.apache.flink.kubernetes.KubernetesClusterDescriptor.lambda$createClusterClientProvider$1(KubernetesClusterDescriptor.java:114)
>> ... 22 more
>> "
>>
>>
>> -----------------------------------------------------------------------------------------------------
>>
>>
>> My kubernetes service does have DNS running (see the following):
>>
>>
>>
>> -----------------------------------------------------------------------------------------------------
>> root@node0:/mydata/flink-1.17.0# kubectl get pods -n kube-system
>> NAME
>> READY STATUS RESTARTS AGE
>> calico-kube-controllers-6d674b5f78-6xjv8
>> 0/1 CrashLoopBackOff 45 (9s ago) 3h33m
>> calico-node-49qlx
>> 0/1 Running 0 3h33m
>> calico-node-gds4w
>> 0/1 Running 0 3h33m
>> calico-node-rc999
>> 0/1 Running 0 3h33m
>> coredns-787d4945fb-76qw6
>> 1/1 Running 0 2d4h
>> coredns-787d4945fb-wwclv
>> 1/1 Running 0 2d4h
>> etcd-node0.dirigo-newfs.wisr-pg0.utah.cloudlab.us
>> 1/1 Running 1 (9h ago) 2d4h
>> kube-apiserver-node0.dirigo-newfs.wisr-pg0.utah.cloudlab.us
>> 1/1 Running 35 (9h ago) 2d4h
>> kube-controller-manager-node0.dirigo-newfs.wisr-pg0.utah.cloudlab.us
>> 1/1 Running 4 (9h ago) 2d4h
>> kube-proxy-8g6zk
>> 1/1 Running 1 (9h ago) 2d4h
>> kube-proxy-p2ph9
>> 1/1 Running 0 7h47m
>> kube-proxy-w2whd
>> 1/1 Running 0 7h41m
>> kube-scheduler-node0.dirigo-newfs.wisr-pg0.utah.cloudlab.us
>> 1/1 Running 4 (9h ago) 2d4h
>>
>> -----------------------------------------------------------------------------------------------------
>>
>> And my service appears to be running normally (I'm using my own cluster,
>> changing the exposure type to NodePort produces the similar error):
>>
>>
>> -----------------------------------------------------------------------------------------------------
>> root@node0:/mydata/flink-1.17.0# kubectl get services
>> NAME TYPE CLUSTER-IP EXTERNAL-IP
>> PORT(S) AGE
>> kubernetes ClusterIP 10.96.0.1 <none>
>> 443/TCP 2d4h
>> my-first-flink-cluster ClusterIP None <none>
>> 6123/TCP,6124/TCP 60s
>> my-first-flink-cluster-rest ClusterIP 10.98.42.188 <none>
>> 8081/TCP 60s
>>
>> -----------------------------------------------------------------------------------------------------
>>
>> Any suggestions on what might be going on with my setup?
>>
>> Thanks!
>>
>> Le
>>
>>
Re: Flink Kubernetes Session sample from Documentation
Posted by Weihua Hu <hu...@gmail.com>.
Hi, Le
It looks like a DNS issue. Maybe you can try to ping or nslookup the
'my-first-flink-cluster-rest.default'
on flink operator pods to check whether dns service is normal.
Best,
Weihua
On Wed, Apr 5, 2023 at 12:43 PM Le Xu <sh...@gmail.com> wrote:
> Hello!
>
> I'm trying out the Kubernetes sample
> <https://nightlies.apache.org/flink/flink-docs-release-1.17/docs/deployment/resource-providers/native_kubernetes/#starting-a-flink-session-on-kubernetes>
> described in the official doc but I am not able to submit job with the
> following error:
>
>
> -----------------------------------------------------------------------------------------------------
> org.apache.flink.client.program.ProgramInvocationException: The main
> method caused an error:
> org.apache.flink.client.deployment.ClusterRetrieveException: Could not
> create the RestClusterClient.
> at
> org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:372)
> at
> org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:222)
> at
> org.apache.flink.client.ClientUtils.executeProgram(ClientUtils.java:105)
> at
> org.apache.flink.client.cli.CliFrontend.executeProgram(CliFrontend.java:851)
> at
> org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:245)
> at
> org.apache.flink.client.cli.CliFrontend.parseAndRun(CliFrontend.java:1095)
> at
> org.apache.flink.client.cli.CliFrontend.lambda$mainInternal$9(CliFrontend.java:1189)
> at
> org.apache.flink.runtime.security.contexts.NoOpSecurityContext.runSecured(NoOpSecurityContext.java:28)
> at
> org.apache.flink.client.cli.CliFrontend.mainInternal(CliFrontend.java:1189)
> at
> org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1157)
> Caused by: java.lang.RuntimeException:
> org.apache.flink.client.deployment.ClusterRetrieveException: Could not
> create the RestClusterClient.
> at
> org.apache.flink.kubernetes.KubernetesClusterDescriptor.lambda$createClusterClientProvider$1(KubernetesClusterDescriptor.java:121)
> at
> org.apache.flink.kubernetes.KubernetesClusterDescriptor.retrieve(KubernetesClusterDescriptor.java:148)
> at
> org.apache.flink.kubernetes.KubernetesClusterDescriptor.retrieve(KubernetesClusterDescriptor.java:69)
> at
> org.apache.flink.client.deployment.executors.AbstractSessionClusterExecutor.execute(AbstractSessionClusterExecutor.java:80)
> at
> org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.executeAsync(StreamExecutionEnvironment.java:2197)
> at
> org.apache.flink.client.program.StreamContextEnvironment.executeAsync(StreamContextEnvironment.java:189)
> at
> org.apache.flink.client.program.StreamContextEnvironment.execute(StreamContextEnvironment.java:118)
> at
> org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.java:2058)
> at
> org.apache.flink.streaming.examples.windowing.TopSpeedWindowing.main(TopSpeedWindowing.java:154)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at
> org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:355)
> ... 9 more
> Caused by: org.apache.flink.client.deployment.ClusterRetrieveException:
> Could not create the RestClusterClient.
> ... 23 more
> Caused by: java.net.UnknownHostException:
> my-first-flink-cluster-rest.default: Name or service not known
> at java.net.Inet6AddressImpl.lookupAllHostAddr(Native Method)
> at java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:929)
> at
> java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1324)
> at java.net.InetAddress.getAllByName0(InetAddress.java:1277)
> at java.net.InetAddress.getAllByName(InetAddress.java:1193)
> at java.net.InetAddress.getAllByName(InetAddress.java:1127)
> at java.net.InetAddress.getByName(InetAddress.java:1077)
> at
> org.apache.flink.runtime.highavailability.HighAvailabilityServicesUtils.getWebMonitorAddress(HighAvailabilityServicesUtils.java:229)
> at
> org.apache.flink.kubernetes.KubernetesClusterDescriptor.getWebMonitorAddress(KubernetesClusterDescriptor.java:140)
> at
> org.apache.flink.kubernetes.KubernetesClusterDescriptor.lambda$null$0(KubernetesClusterDescriptor.java:119)
> at
> org.apache.flink.client.program.rest.RestClusterClient.<init>(RestClusterClient.java:237)
> at
> org.apache.flink.client.program.rest.RestClusterClient.<init>(RestClusterClient.java:197)
> at
> org.apache.flink.kubernetes.KubernetesClusterDescriptor.lambda$createClusterClientProvider$1(KubernetesClusterDescriptor.java:114)
> ... 22 more
> "
>
>
> -----------------------------------------------------------------------------------------------------
>
>
> My kubernetes service does have DNS running (see the following):
>
>
>
> -----------------------------------------------------------------------------------------------------
> root@node0:/mydata/flink-1.17.0# kubectl get pods -n kube-system
> NAME
> READY STATUS RESTARTS AGE
> calico-kube-controllers-6d674b5f78-6xjv8 0/1
> CrashLoopBackOff 45 (9s ago) 3h33m
> calico-node-49qlx 0/1
> Running 0 3h33m
> calico-node-gds4w 0/1
> Running 0 3h33m
> calico-node-rc999 0/1
> Running 0 3h33m
> coredns-787d4945fb-76qw6 1/1
> Running 0 2d4h
> coredns-787d4945fb-wwclv 1/1
> Running 0 2d4h
> etcd-node0.dirigo-newfs.wisr-pg0.utah.cloudlab.us
> 1/1 Running 1 (9h ago) 2d4h
> kube-apiserver-node0.dirigo-newfs.wisr-pg0.utah.cloudlab.us
> 1/1 Running 35 (9h ago) 2d4h
> kube-controller-manager-node0.dirigo-newfs.wisr-pg0.utah.cloudlab.us
> 1/1 Running 4 (9h ago) 2d4h
> kube-proxy-8g6zk 1/1
> Running 1 (9h ago) 2d4h
> kube-proxy-p2ph9 1/1
> Running 0 7h47m
> kube-proxy-w2whd 1/1
> Running 0 7h41m
> kube-scheduler-node0.dirigo-newfs.wisr-pg0.utah.cloudlab.us
> 1/1 Running 4 (9h ago) 2d4h
>
> -----------------------------------------------------------------------------------------------------
>
> And my service appears to be running normally (I'm using my own cluster,
> changing the exposure type to NodePort produces the similar error):
>
>
> -----------------------------------------------------------------------------------------------------
> root@node0:/mydata/flink-1.17.0# kubectl get services
> NAME TYPE CLUSTER-IP EXTERNAL-IP
> PORT(S) AGE
> kubernetes ClusterIP 10.96.0.1 <none>
> 443/TCP 2d4h
> my-first-flink-cluster ClusterIP None <none>
> 6123/TCP,6124/TCP 60s
> my-first-flink-cluster-rest ClusterIP 10.98.42.188 <none>
> 8081/TCP 60s
>
> -----------------------------------------------------------------------------------------------------
>
> Any suggestions on what might be going on with my setup?
>
> Thanks!
>
> Le
>
>