You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user-zh@flink.apache.org by Mori_Tang <49...@qq.com> on 2020/09/10 08:09:44 UTC
关于Fink Native K8S Session模式的两个问题
首先,我使用的flink版本是1.11,K8S版本是v1.17。
启动的集群的脚本命令是:
./bin/kubernetes-session.sh -Dkubernetes.cluster-id=flink-cluster
-Dkubernetes.jobmanager.service-account=flink
-Dtaskmanager.memory.process.size=4096m -Dkubernetes.taskmanager.cpu=2
-Dtaskmanager.numberOfTaskSlots=4 -Dkubernetes.namespace=flink
-Dkubernetes.rest-service.exposed.type=NodePort
-Dkubernetes.container-start-command-template="%java% %classpath% %jvmmem%
%jvmopts% %logging% %class% %args%" -Dakka.framesize=104857600b
-Dkubernetes.container.image=flink:1.11.1
集群应该是启动成功,启动的部分日志如下所示:
2020-09-10 14:08:56,417 INFO
org.apache.flink.kubernetes.KubernetesClusterDescriptor [] - Create
flink session cluster flink-cluster successfully, JobManager Web Interface:
http://10.10.102.20:30398
可以通过上述网址成功访问到UI界面。
接下来是我的两个问题:
1、关于提交的命令:
仿照文档中给出的命令是:./bin/flink run -d -e kubernetes-session
-Dkubernetes.cluster-id=flink-cluster
examples/streaming/WindowJoin.jar。但据此命令提交后,会有如下报错:
org.apache.flink.client.program.ProgramInvocationException: The main method
caused an error:
org.apache.flink.client.deployment.ClusterRetrieveException: Could not get
the rest endpoint of flink-cluster
而后我仿照非native的方法,以如下命令提交:
./bin/flink run -m 10.10.101.26:30398
./examples/streaming/WordCount.jar,似乎提交成功了。
Q:文档中的命令要做什么额外的配置才能让它解析到endpoint of flink-cluster呢?以后面这个命令提交是可以的吗?
2、以第二种命令提交后报错。
以第二种命令提交后,有如下报错:
Caused by:
org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailableException:
Could not allocate the required slot within slot request timeout. Please
make sure that the cluster has enough resources.
at
org.apache.flink.runtime.scheduler.DefaultScheduler.maybeWrapWithNoResourceAvailableException(DefaultScheduler.java:441)
... 47 more
Caused by: java.util.concurrent.CompletionException:
java.util.concurrent.TimeoutException
at
java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292)
at
java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308)
at
java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:607)
at
java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:591)
... 27 more
Caused by: java.util.concurrent.TimeoutException
Q:看起来似乎是集群资源不够,可是集群的资源应当是充足的,应该如何debug呢?
--
Sent from: http://apache-flink.147419.n8.nabble.com/
Re: 关于Fink Native K8S Session模式的两个问题
Posted by Yang Wang <da...@gmail.com>.
1. 没有找到rest endpoint的原因应该是你flink run提交任务的时候没有指定namespace
你用下面的命令应该就可以了
./bin/flink run -d -e kubernetes-session
-Dkubernetes.cluster-id=flink-cluster -Dkubernetes.namespace=flink
examples/streaming/WindowJoin.jar
2. 没有资源有可能是Flink的ResourceManager没有足够的权限向K8s申请Pod,确认你已经配置了正确的service
account[1]。
注意不同的namespace需要对应的service account是不一样的。
如果不是这个原因的话,可以打开JobManager的webui查看log或者使用kubectl -n blink logs <JM_POD>
[1].
https://ci.apache.org/projects/flink/flink-docs-master/ops/deployment/native_kubernetes.html#rbac
Best,
Yang
Mori_Tang <49...@qq.com> 于2020年9月10日周四 下午4:10写道:
> 首先,我使用的flink版本是1.11,K8S版本是v1.17。
> 启动的集群的脚本命令是:
> ./bin/kubernetes-session.sh -Dkubernetes.cluster-id=flink-cluster
> -Dkubernetes.jobmanager.service-account=flink
> -Dtaskmanager.memory.process.size=4096m -Dkubernetes.taskmanager.cpu=2
> -Dtaskmanager.numberOfTaskSlots=4 -Dkubernetes.namespace=flink
> -Dkubernetes.rest-service.exposed.type=NodePort
> -Dkubernetes.container-start-command-template="%java% %classpath% %jvmmem%
> %jvmopts% %logging% %class% %args%" -Dakka.framesize=104857600b
> -Dkubernetes.container.image=flink:1.11.1
>
> 集群应该是启动成功,启动的部分日志如下所示:
> 2020-09-10 14:08:56,417 INFO
> org.apache.flink.kubernetes.KubernetesClusterDescriptor [] - Create
> flink session cluster flink-cluster successfully, JobManager Web Interface:
> http://10.10.102.20:30398
>
> 可以通过上述网址成功访问到UI界面。
>
> 接下来是我的两个问题:
> 1、关于提交的命令:
> 仿照文档中给出的命令是:./bin/flink run -d -e kubernetes-session
> -Dkubernetes.cluster-id=flink-cluster
> examples/streaming/WindowJoin.jar。但据此命令提交后,会有如下报错:
> org.apache.flink.client.program.ProgramInvocationException: The main method
> caused an error:
> org.apache.flink.client.deployment.ClusterRetrieveException: Could not get
> the rest endpoint of flink-cluster
>
> 而后我仿照非native的方法,以如下命令提交:
> ./bin/flink run -m 10.10.101.26:30398
> ./examples/streaming/WordCount.jar,似乎提交成功了。
> Q:文档中的命令要做什么额外的配置才能让它解析到endpoint of flink-cluster呢?以后面这个命令提交是可以的吗?
> 2、以第二种命令提交后报错。
> 以第二种命令提交后,有如下报错:
> Caused by:
> org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailableException:
> Could not allocate the required slot within slot request timeout. Please
> make sure that the cluster has enough resources.
> at
>
> org.apache.flink.runtime.scheduler.DefaultScheduler.maybeWrapWithNoResourceAvailableException(DefaultScheduler.java:441)
> ... 47 more
> Caused by: java.util.concurrent.CompletionException:
> java.util.concurrent.TimeoutException
> at
>
> java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292)
> at
>
> java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308)
> at
> java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:607)
> at
>
> java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:591)
> ... 27 more
> Caused by: java.util.concurrent.TimeoutException
>
> Q:看起来似乎是集群资源不够,可是集群的资源应当是充足的,应该如何debug呢?
>
>
>
>
> --
> Sent from: http://apache-flink.147419.n8.nabble.com/
>