You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Aitozi (Jira)" <ji...@apache.org> on 2021/10/24 15:16:00 UTC

[jira] [Commented] (FLINK-24624) Add clean up phase when kubernetes session start failed

    [ https://issues.apache.org/jira/browse/FLINK-24624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17433448#comment-17433448 ] 

Aitozi commented on FLINK-24624:
--------------------------------

After looking into the failure, It's caused by the lack of permission

{{2021-10-24 23:10:30,385 ERROR org.apache.flink.kubernetes.cli.KubernetesSessionCli         [] - Error while running the Flink session.2021-10-24 23:10:30,385 ERROR org.apache.flink.kubernetes.cli.KubernetesSessionCli         [] - Error while running the Flink session.io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: GET at: [https://xxxx/api/v1/nodes]. Message: Forbidden! User xxx doesn't have permission. nodes is forbidden: User "xxx" cannot list resource "nodes" in API group "" at the cluster scope: noopinion by orca and marlin and k8s rbac. at io.fabric8.kubernetes.client.dsl.base.OperationSupport.requestFailure(OperationSupport.java:673) ~[flink-dist_2.12-1.15-SNAPSHOT.jar:1.15-SNAPSHOT] at io.fabric8.kubernetes.client.dsl.base.OperationSupport.assertResponseCode(OperationSupport.java:610) ~[flink-dist_2.12-1.15-SNAPSHOT.jar:1.15-SNAPSHOT] at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:560) ~[flink-dist_2.12-1.15-SNAPSHOT.jar:1.15-SNAPSHOT] at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:521) ~[flink-dist_2.12-1.15-SNAPSHOT.jar:1.15-SNAPSHOT] at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:504) ~[flink-dist_2.12-1.15-SNAPSHOT.jar:1.15-SNAPSHOT] at io.fabric8.kubernetes.client.dsl.base.BaseOperation.listRequestHelper(BaseOperation.java:143) ~[flink-dist_2.12-1.15-SNAPSHOT.jar:1.15-SNAPSHOT] at io.fabric8.kubernetes.client.dsl.base.BaseOperation.list(BaseOperation.java:555) ~[flink-dist_2.12-1.15-SNAPSHOT.jar:1.15-SNAPSHOT] at io.fabric8.kubernetes.client.dsl.base.BaseOperation.list(BaseOperation.java:90) ~[flink-dist_2.12-1.15-SNAPSHOT.jar:1.15-SNAPSHOT] at org.apache.flink.kubernetes.kubeclient.Fabric8FlinkKubeClient.getLoadBalancerRestEndpoint(Fabric8FlinkKubeClient.java:463) ~[flink-dist_2.12-1.15-SNAPSHOT.jar:1.15-SNAPSHOT] at org.apache.flink.kubernetes.kubeclient.Fabric8FlinkKubeClient.getRestEndPointFromService(Fabric8FlinkKubeClient.java:438) ~[flink-dist_2.12-1.15-SNAPSHOT.jar:1.15-SNAPSHOT] at org.apache.flink.kubernetes.kubeclient.Fabric8FlinkKubeClient.getRestEndpoint(Fabric8FlinkKubeClient.java:191) ~[flink-dist_2.12-1.15-SNAPSHOT.jar:1.15-SNAPSHOT] at org.apache.flink.kubernetes.KubernetesClusterDescriptor.lambda$createClusterClientProvider$1(KubernetesClusterDescriptor.java:98) ~[flink-dist_2.12-1.15-SNAPSHOT.jar:1.15-SNAPSHOT] at org.apache.flink.kubernetes.KubernetesClusterDescriptor.deploySessionCluster(KubernetesClusterDescriptor.java:164) ~[flink-dist_2.12-1.15-SNAPSHOT.jar:1.15-SNAPSHOT] at org.apache.flink.kubernetes.cli.KubernetesSessionCli.run(KubernetesSessionCli.java:114) ~[flink-dist_2.12-1.15-SNAPSHOT.jar:1.15-SNAPSHOT] at org.apache.flink.kubernetes.cli.KubernetesSessionCli.lambda$main$0(KubernetesSessionCli.java:198) ~[flink-dist_2.12-1.15-SNAPSHOT.jar:1.15-SNAPSHOT] at org.apache.flink.runtime.security.contexts.NoOpSecurityContext.runSecured(NoOpSecurityContext.java:28) ~[flink-dist_2.12-1.15-SNAPSHOT.jar:1.15-SNAPSHOT] at org.apache.flink.kubernetes.cli.KubernetesSessionCli.main(KubernetesSessionCli.java:198) [flink-dist_2.12-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]}}

> Add clean up phase when kubernetes session start failed
> -------------------------------------------------------
>
>                 Key: FLINK-24624
>                 URL: https://issues.apache.org/jira/browse/FLINK-24624
>             Project: Flink
>          Issue Type: Improvement
>          Components: Deployment / Kubernetes
>    Affects Versions: 1.14.0
>            Reporter: Aitozi
>            Priority: Major
>
> Serval k8s resources are created when deploy the kubernetes session. But the resource are left there when deploy failed. This will lead to the next failure or resource leak. So I think we should add the clean up phase when start failed



--
This message was sent by Atlassian Jira
(v8.3.4#803005)