You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Tamir Sagi <Ta...@niceactimize.com> on 2022/01/05 16:38:12 UTC

Flink Kubernetes library 1.14.2 - Kubernetes client does not read service account permissions properly if NodePort is configured

Hey Flinkers,

To date we deploy Flink clusters(Flink 1.12.2) in native k8s mode on AWS EKS.

We have a cluster role as follow

[cid:cb7b6a87-c286-47fa-9d1f-ad849ca65994]


The cluster role is bonded (using RoleBinding) to 2 service accounts in flink-jobs namespace

  1.  Service account created for Flink pods(JM, TM)
  2.  Another service account in a different namespace, to be able to perform several actions within tamirs-flink-jobs namesapce.

[cid:9d35fc13-9e50-4c1b-95b5-af40714319bd]

in addition, its also bonded to that service account in its own namespace (tamirs) to be able to perform some action in namespace tamirs-flink-job from 'tamirs' namespace
[cid:80dbb8bc-9484-43e0-95fe-2938098c59ce]

Everything works well.

I've started testing Flink 1.14.2 and encountered a weird exception while deploying Flink cluster.

Caused by: io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: GET at: https://172.20.0.1/api/v1/nodes. Message: Forbidden!Configured service account doesn't have access. Service account may have been revoked. nodes is forbidden: User "system:serviceaccount:tamirs:tamirs-data-aggregation-flink-mgmt-batch-sa" cannot list resource "nodes" in API group "" at the cluster scope.
at io.fabric8.kubernetes.client.dsl.base.OperationSupport.requestFailure(OperationSupport.java:673) ~[flink-kubernetes_2.12-1.14.2.jar!/:1.14.2]
at io.fabric8.kubernetes.client.dsl.base.OperationSupport.assertResponseCode(OperationSupport.java:610) ~[flink-kubernetes_2.12-1.14.2.jar!/:1.14.2]
at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:560) ~[flink-kubernetes_2.12-1.14.2.jar!/:1.14.2]
at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:521) ~[flink-kubernetes_2.12-1.14.2.jar!/:1.14.2]
at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:504) ~[flink-kubernetes_2.12-1.14.2.jar!/:1.14.2]
at io.fabric8.kubernetes.client.dsl.base.BaseOperation.listRequestHelper(BaseOperation.java:143) ~[flink-kubernetes_2.12-1.14.2.jar!/:1.14.2]
at io.fabric8.kubernetes.client.dsl.base.BaseOperation.list(BaseOperation.java:555) ~[flink-kubernetes_2.12-1.14.2.jar!/:1.14.2]
at io.fabric8.kubernetes.client.dsl.base.BaseOperation.list(BaseOperation.java:90) ~[flink-kubernetes_2.12-1.14.2.jar!/:1.14.2]
at org.apache.flink.kubernetes.kubeclient.Fabric8FlinkKubeClient.getLoadBalancerRestEndpoint(Fabric8FlinkKubeClient.java:449) ~[flink-kubernetes_2.12-1.14.2.jar!/:1.14.2]
at org.apache.flink.kubernetes.kubeclient.Fabric8FlinkKubeClient.getRestEndPointFromService(Fabric8FlinkKubeClient.java:424) ~[flink-kubernetes_2.12-1.14.2.jar!/:1.14.2]
at org.apache.flink.kubernetes.kubeclient.Fabric8FlinkKubeClient.getRestEndpoint(Fabric8FlinkKubeClient.java:191) ~[flink-kubernetes_2.12-1.14.2.jar!/:1.14.2]
at org.apache.flink.kubernetes.KubernetesClusterDescriptor.lambda$createClusterClientProvider$1(KubernetesClusterDescriptor.java:98) ~[flink-kubernetes_2.12-1.14.2.jar!/:1.14.2]
at org.apache.flink.kubernetes.KubernetesClusterDescriptor.deployApplicationCluster(KubernetesClusterDescriptor.java:214) ~[flink-kubernetes_2.12-1.14.2.jar!/:1.14.2]
at org.apache.flink.client.deployment.application.cli.ApplicationClusterDeployer.run(ApplicationClusterDeployer.java:67) ~[flink-clients_2.12-1.14.2.jar!/:1.14.2]


Where I definitely can access such resource,

running
 kubectl auth can-i list nodes -n tamirs --as system:serviceaccount:tamirs:tamirs-data-aggregation-flink-mgmt-batch-sa

returns #yes
[cid:81d0a541-2194-4a65-b09e-aca647f30cc8]

kubectl auth can-i list nodes -n tamirs-flink-jobs --as system:serviceaccount:tamirs:tamirs-data-aggregation-flink-mgmt-batch-sa

[cid:a89f778b-d62e-4d15-a38d-fd12f782699b]

I changed the property : 'kubernetes.rest-service.exposed.type' from NodePort to ClusterIP and it worked well.

Looking into the source code it handles NodePort differently
https://github.com/apache/flink/blob/release-1.14.2/flink-kubernetes/src/main/java/org/apache/flink/kubernetes/KubernetesClusterDescriptor.java#L98

if 'kubernetes.rest-service.exposed.type' is set to ClusterIP :
https://github.com/apache/flink/blob/release-1.14.2/flink-kubernetes/src/main/java/org/apache/flink/kubernetes/kubeclient/Fabric8FlinkKubeClient.java#L183-L189

Otherwise, it calls #getRestEndPointFromService
https://github.com/apache/flink/blob/release-1.14.2/flink-kubernetes/src/main/java/org/apache/flink/kubernetes/kubeclient/Fabric8FlinkKubeClient.java#L191

which fails here
https://github.com/apache/flink/blob/6fd4b1c0ef2ddd12751889218445ce0e60ff6c80/flink-kubernetes/src/main/java/org/apache/flink/kubernetes/kubeclient/Fabric8FlinkKubeClient.java#L449

I also noticed in 1.14.0 there was a bug fix regarding NodePort (might be related?)
https://issues.apache.org/jira/browse/FLINK-23507


Appreciate your help.

Best,
Tamir.




Confidentiality: This communication and any attachments are intended for the above-named persons only and may be confidential and/or legally privileged. Any opinions expressed in this communication are not necessarily those of NICE Actimize. If this communication has come to you in error you must take no action based on it, nor must you copy or show it to anyone; please delete/destroy and inform the sender by e-mail immediately.
Monitoring: NICE Actimize may monitor incoming and outgoing e-mails.
Viruses: Although we have taken steps toward ensuring that this e-mail and attachments are free from any virus, we advise that in keeping with good computing practice the recipient should ensure they are actually virus free.

Re: Flink Kubernetes library 1.14.2 - Kubernetes client does not read service account permissions properly if NodePort is configured

Posted by Yang Wang <da...@gmail.com>.
Hi Tamir,

Sorry for the inconvenience. I think you are right. Currently, we need the
node list permission when the service is exposed with NodePort.
Moreover, Flink client does not read it from the service account, but from
the kube config file instead. It could be configured via
"kubernetes.config.file".
Please note that the kube config file will not be shipped to JobManager and
TaskManager pod. It is only used on the client side.


Best,
Yang

Tamir Sagi <Ta...@niceactimize.com> 于2022年1月6日周四 00:38写道:

> Hey Flinkers,
>
> To date we deploy Flink clusters(Flink 1.12.2) in native k8s mode on AWS
> EKS.
>
> We have a cluster role as follow
>
>
>
>
> The cluster role is bonded (using RoleBinding) to 2 service accounts in
> flink-jobs namespace
>
>    1. Service account created for Flink pods(JM, TM)
>    2. Another service account in a different namespace, to be able to
>    perform several actions within tamirs-flink-jobs namesapce.
>
>
>
> in addition, its also bonded to that service account in its own namespace
> (tamirs) to be able to perform some action in namespace tamirs-flink-job
> from 'tamirs' namespace
>
>
> Everything works well.
>
> I've started testing Flink 1.14.2 and encountered a weird exception while
> deploying Flink cluster.
>
> Caused by: io.fabric8.kubernetes.client.KubernetesClientException: Failure
> executing: GET at: https://172.20.0.1/api/v1/nodes. Message:
> Forbidden!Configured service account doesn't have access. Service account
> may have been revoked. nodes is forbidden: User
> "system:serviceaccount:tamirs:tamirs-data-aggregation-flink-mgmt-batch-sa"
> cannot list resource "nodes" in API group "" at the cluster scope.
> at
> io.fabric8.kubernetes.client.dsl.base.OperationSupport.requestFailure(OperationSupport.java:673)
> ~[flink-kubernetes_2.12-1.14.2.jar!/:1.14.2]
> at
> io.fabric8.kubernetes.client.dsl.base.OperationSupport.assertResponseCode(OperationSupport.java:610)
> ~[flink-kubernetes_2.12-1.14.2.jar!/:1.14.2]
> at
> io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:560)
> ~[flink-kubernetes_2.12-1.14.2.jar!/:1.14.2]
> at
> io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:521)
> ~[flink-kubernetes_2.12-1.14.2.jar!/:1.14.2]
> at
> io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:504)
> ~[flink-kubernetes_2.12-1.14.2.jar!/:1.14.2]
> at
> io.fabric8.kubernetes.client.dsl.base.BaseOperation.listRequestHelper(BaseOperation.java:143)
> ~[flink-kubernetes_2.12-1.14.2.jar!/:1.14.2]
> at
> io.fabric8.kubernetes.client.dsl.base.BaseOperation.list(BaseOperation.java:555)
> ~[flink-kubernetes_2.12-1.14.2.jar!/:1.14.2]
> at
> io.fabric8.kubernetes.client.dsl.base.BaseOperation.list(BaseOperation.java:90)
> ~[flink-kubernetes_2.12-1.14.2.jar!/:1.14.2]
> at
> org.apache.flink.kubernetes.kubeclient.Fabric8FlinkKubeClient.getLoadBalancerRestEndpoint(Fabric8FlinkKubeClient.java:449)
> ~[flink-kubernetes_2.12-1.14.2.jar!/:1.14.2]
> at
> org.apache.flink.kubernetes.kubeclient.Fabric8FlinkKubeClient.getRestEndPointFromService(Fabric8FlinkKubeClient.java:424)
> ~[flink-kubernetes_2.12-1.14.2.jar!/:1.14.2]
> at
> org.apache.flink.kubernetes.kubeclient.Fabric8FlinkKubeClient.getRestEndpoint(Fabric8FlinkKubeClient.java:191)
> ~[flink-kubernetes_2.12-1.14.2.jar!/:1.14.2]
> at
> org.apache.flink.kubernetes.KubernetesClusterDescriptor.lambda$createClusterClientProvider$1(KubernetesClusterDescriptor.java:98)
> ~[flink-kubernetes_2.12-1.14.2.jar!/:1.14.2]
> at
> org.apache.flink.kubernetes.KubernetesClusterDescriptor.deployApplicationCluster(KubernetesClusterDescriptor.java:214)
> ~[flink-kubernetes_2.12-1.14.2.jar!/:1.14.2]
> at
> org.apache.flink.client.deployment.application.cli.ApplicationClusterDeployer.run(ApplicationClusterDeployer.java:67)
> ~[flink-clients_2.12-1.14.2.jar!/:1.14.2]
>
>
> Where I definitely can access such resource,
>
> running
>  kubectl auth can-i list nodes -n tamirs --as
> system:serviceaccount:tamirs:tamirs-data-aggregation-flink-mgmt-batch-sa
>
> returns #yes
>
>
> kubectl auth can-i list nodes -n tamirs-flink-jobs --as
> system:serviceaccount:tamirs:tamirs-data-aggregation-flink-mgmt-batch-sa
>
>
>
> I changed the property : 'kubernetes.rest-service.exposed.type' from
> NodePort to ClusterIP and it worked well.
>
> Looking into the source code it handles NodePort differently
>
> https://github.com/apache/flink/blob/release-1.14.2/flink-kubernetes/src/main/java/org/apache/flink/kubernetes/KubernetesClusterDescriptor.java#L98
>
> if 'kubernetes.rest-service.exposed.type' is set to ClusterIP :
>
> https://github.com/apache/flink/blob/release-1.14.2/flink-kubernetes/src/main/java/org/apache/flink/kubernetes/kubeclient/Fabric8FlinkKubeClient.java#L183-L189
>
> Otherwise, it calls #getRestEndPointFromService
>
> https://github.com/apache/flink/blob/release-1.14.2/flink-kubernetes/src/main/java/org/apache/flink/kubernetes/kubeclient/Fabric8FlinkKubeClient.java#L191
>
> which fails here
>
> https://github.com/apache/flink/blob/6fd4b1c0ef2ddd12751889218445ce0e60ff6c80/flink-kubernetes/src/main/java/org/apache/flink/kubernetes/kubeclient/Fabric8FlinkKubeClient.java#L449
>
> I also noticed in 1.14.0 there was a bug fix regarding NodePort (might be
> related?)
> https://issues.apache.org/jira/browse/FLINK-23507
>
>
> Appreciate your help.
>
> Best,
> Tamir.
>
>
>
> Confidentiality: This communication and any attachments are intended for
> the above-named persons only and may be confidential and/or legally
> privileged. Any opinions expressed in this communication are not
> necessarily those of NICE Actimize. If this communication has come to you
> in error you must take no action based on it, nor must you copy or show it
> to anyone; please delete/destroy and inform the sender by e-mail
> immediately.
> Monitoring: NICE Actimize may monitor incoming and outgoing e-mails.
> Viruses: Although we have taken steps toward ensuring that this e-mail and
> attachments are free from any virus, we advise that in keeping with good
> computing practice the recipient should ensure they are actually virus free.
>