You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Alexey Trenikhun <ye...@msn.com> on 2021/02/20 22:10:00 UTC

stop job with Savepoint

Hello,
I'm running per job Flink cluster, JM is deployed as Kubernetes Job with restartPolicy: Never, highavailability is KubernetesHaServicesFactory. Job runs fine for some time, configmaps are created etc.  Now in order to upgrade Flink job, I'm trying to stop job with savepoint (flink stop $JOB_ID), JM exits with code 2, from log:

{"ts":"2021-02-20T21:34:18.195Z","message":"Terminating cluster entrypoint process StandaloneApplicationClusterEntryPoint with exit code 2.","logger_name":"org.apache.flink.runtime.entrypoint.ClusterEntrypoint","thread_name":"flink-akka.actor.default-dispatcher-2","level":"INFO","level_value":20000,"stack_trace":"java.util.concurrent.ExecutionException: io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: GET at: https://10.96.0.1/api/v1/namespaces/n/configmaps?labelSelector=app%3Dfsp%2Cconfigmap-type%3Dhigh-availability%2Ctype%3Dflink-native-kubernetes. Message: Forbidden!Configured service account doesn't have access. Service account may have been revoked. configmaps is forbidden: User \"system:serviceaccount:n:fsp\" cannot list resource \"configmaps\" in API group \"\" in the namespace \"n\".\n\tat java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)\n\tat java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908)\n\tat org.apache.flink.kubernetes.highavailability.KubernetesHaServices.internalCleanup(KubernetesHaServices.java:142)\n\tat org.apache.flink.runtime.highavailability.AbstractHaServices.closeAndCleanupAllData(AbstractHaServices.java:180)\n\tat org.apache.flink.runtime.entrypoint.ClusterEntrypoint.stopClusterServices(ClusterEntrypoint.java:378)\n\tat org.apache.flink.runtime.entrypoint.ClusterEntrypoint.lambda$shutDownAsync$3(ClusterEntrypoint.java:467)\n\tat org.apache.flink.runtime.concurrent.FutureUtils.lambda$composeAfterwards$19(FutureUtils.java:704)\n\tat java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774)\n\tat java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750)\n\tat java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488)\n\tat java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1975)\n\tat org.apache.flink.runtime.concurrent.FutureUtils.lambda$null$18(FutureUtils.java:715)\n\tat java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774)\n\tat java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750)\n\tat java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488)\n\tat java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1975)\n\tat org.apache.flink.runtime.entrypoint.component.DispatcherResourceManagerComponent.lambda$closeAsyncInternal$3(DispatcherResourceManagerComponent.java:182)\n\tat java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774)\n\tat java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750)\n\tat java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488)\n\tat java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1975)\n\tat org.apache.flink.runtime.concurrent.FutureUtils$CompletionConjunctFuture.completeFuture(FutureUtils.java:956)\n\tat java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774)\n\tat java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750)\n\tat java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488)\n\tat java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1975)\n\tat org.apache.flink.runtime.concurrent.FutureUtils.lambda$forwardTo$22(FutureUtils.java:1323)\n\tat java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774)\n\tat java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750)\n\tat java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:456)\n\tat java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)\n\tat java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)\n\tat java.lang.Thread.run(Thread.java:748)\nCaused by: io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: GET at: https://10.96.0.1/api/v1/namespaces/n/configmaps?labelSelector=app%3Dfsp%2Cconfigmap-type%3Dhigh-availability%2Ctype%3Dflink-native-kubernetes. Message: Forbidden!Configured service account doesn't have access. Service account may have been revoked. configmaps is forbidden: User \"system:serviceaccount:n:fsp\" cannot list resource \"configmaps\" in API group \"\" in the namespace \"n\".\n\tat io.fabric8.kubernetes.client.dsl.base.OperationSupport.requestFailure(OperationSupport.java:568)\n\tat io.fabric8.kubernetes.client.dsl.base.OperationSupport.assertResponseCode(OperationSupport.java:505)\n\tat io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:471)\n\tat io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:430)\n\tat io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:412)\n\tat io.fabric8.kubernetes.client.dsl.base.BaseOperation.listRequestHelper(BaseOperation.java:151)\n\tat io.fabric8.kubernetes.client.dsl.base.BaseOperation.list(BaseOperation.java:621)\n\tat io.fabric8.kubernetes.client.dsl.base.BaseOperation.deleteList(BaseOperation.java:730)\n\tat io.fabric8.kubernetes.client.dsl.base.BaseOperation.delete(BaseOperation.java:655)\n\tat io.fabric8.kubernetes.client.dsl.base.BaseOperation.delete(BaseOperation.java:70)\n\tat org.apache.flink.kubernetes.kubeclient.Fabric8FlinkKubeClient.lambda$deleteConfigMapsByLabels$10(Fabric8FlinkKubeClient.java:361)\n\tat java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1640)\n\t... 3 common frames omitted\n"}

Service account (fsp) role has following rules:
rules:
- apiGroups:
  - ""
  resources:
  - configmaps
  verbs:
  - update
  - get
  - create
  - watch
  - patch
  - delete

So service account seems allowed to GET configmaps. Also seems service account was ok to create configmaps during run (no complains in log).

Thanks,
Alexey

Re: stop job with Savepoint

Posted by Arvid Heise <ar...@apache.org>.
Hi Alexey,

The list looks complete to me. Please report back if this is not correct.

On Sat, Feb 20, 2021 at 11:30 PM Alexey Trenikhun <ye...@msn.com> wrote:

> Adding "list" to verbs helps, do I need to add anything else ?
>
> ------------------------------
> *From:* Alexey Trenikhun <ye...@msn.com>
> *Sent:* Saturday, February 20, 2021 2:10 PM
> *To:* Flink User Mail List <us...@flink.apache.org>
> *Subject:* stop job with Savepoint
>
> Hello,
> I'm running per job Flink cluster, JM is deployed as Kubernetes Job
> with restartPolicy: Never, highavailability is KubernetesHaServicesFactory.
> Job runs fine for some time, configmaps are created etc.  Now in order to
> upgrade Flink job, I'm trying to stop job with savepoint (flink
> stop $JOB_ID), JM exits with code 2, from log:
>
> *{"ts":"2021-02-20T21:34:18.195Z","message":"Terminating cluster
> entrypoint process StandaloneApplicationClusterEntryPoint with exit code
> 2.","logger_name":"org.apache.flink.runtime.entrypoint.ClusterEntrypoint","thread_name":"flink-akka.actor.default-dispatcher-2","level":"INFO","level_value":20000,"stack_trace":"java.util.concurrent.ExecutionException:
> io.fabric8.kubernetes.client.KubernetesClientException: Failure executing:
> GET at:
> https://10.96.0.1/api/v1/namespaces/n/configmaps?labelSelector=app%3Dfsp%2Cconfigmap-type%3Dhigh-availability%2Ctype%3Dflink-native-kubernetes
> <https://10.96.0.1/api/v1/namespaces/n/configmaps?labelSelector=app%3Dfsp%2Cconfigmap-type%3Dhigh-availability%2Ctype%3Dflink-native-kubernetes>.
> Message: Forbidden!Configured service account doesn't have access. Service
> account may have been revoked. configmaps is forbidden: User
> \"system:serviceaccount:n:fsp\" cannot list resource \"configmaps\" in API
> group \"\" in the namespace \"n\".\n\tat
> java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)\n\tat
> java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908)\n\tat
> org.apache.flink.kubernetes.highavailability.KubernetesHaServices.internalCleanup(KubernetesHaServices.java:142)\n\tat
> org.apache.flink.runtime.highavailability.AbstractHaServices.closeAndCleanupAllData(AbstractHaServices.java:180)\n\tat
> org.apache.flink.runtime.entrypoint.ClusterEntrypoint.stopClusterServices(ClusterEntrypoint.java:378)\n\tat
> org.apache.flink.runtime.entrypoint.ClusterEntrypoint.lambda$shutDownAsync$3(ClusterEntrypoint.java:467)\n\tat
> org.apache.flink.runtime.concurrent.FutureUtils.lambda$composeAfterwards$19(FutureUtils.java:704)\n\tat
> java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774)\n\tat
> java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750)\n\tat
> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488)\n\tat
> java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1975)\n\tat
> org.apache.flink.runtime.concurrent.FutureUtils.lambda$null$18(FutureUtils.java:715)\n\tat
> java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774)\n\tat
> java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750)\n\tat
> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488)\n\tat
> java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1975)\n\tat
> org.apache.flink.runtime.entrypoint.component.DispatcherResourceManagerComponent.lambda$closeAsyncInternal$3(DispatcherResourceManagerComponent.java:182)\n\tat
> java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774)\n\tat
> java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750)\n\tat
> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488)\n\tat
> java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1975)\n\tat
> org.apache.flink.runtime.concurrent.FutureUtils$CompletionConjunctFuture.completeFuture(FutureUtils.java:956)\n\tat
> java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774)\n\tat
> java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750)\n\tat
> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488)\n\tat
> java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1975)\n\tat
> org.apache.flink.runtime.concurrent.FutureUtils.lambda$forwardTo$22(FutureUtils.java:1323)\n\tat
> java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774)\n\tat
> java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750)\n\tat
> java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:456)\n\tat
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)\n\tat
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)\n\tat
> java.lang.Thread.run(Thread.java:748)\nCaused by:
> io.fabric8.kubernetes.client.KubernetesClientException: Failure executing:
> GET at:
> https://10.96.0.1/api/v1/namespaces/n/configmaps?labelSelector=app%3Dfsp%2Cconfigmap-type%3Dhigh-availability%2Ctype%3Dflink-native-kubernetes
> <https://10.96.0.1/api/v1/namespaces/n/configmaps?labelSelector=app%3Dfsp%2Cconfigmap-type%3Dhigh-availability%2Ctype%3Dflink-native-kubernetes>.
> Message: Forbidden!Configured service account doesn't have access. Service
> account may have been revoked. configmaps is forbidden: User
> \"system:serviceaccount:n:fsp\" cannot list resource \"configmaps\" in API
> group \"\" in the namespace \"n\".\n\tat
> io.fabric8.kubernetes.client.dsl.base.OperationSupport.requestFailure(OperationSupport.java:568)\n\tat
> io.fabric8.kubernetes.client.dsl.base.OperationSupport.assertResponseCode(OperationSupport.java:505)\n\tat
> io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:471)\n\tat
> io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:430)\n\tat
> io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:412)\n\tat
> io.fabric8.kubernetes.client.dsl.base.BaseOperation.listRequestHelper(BaseOperation.java:151)\n\tat
> io.fabric8.kubernetes.client.dsl.base.BaseOperation.list(BaseOperation.java:621)\n\tat
> io.fabric8.kubernetes.client.dsl.base.BaseOperation.deleteList(BaseOperation.java:730)\n\tat
> io.fabric8.kubernetes.client.dsl.base.BaseOperation.delete(BaseOperation.java:655)\n\tat
> io.fabric8.kubernetes.client.dsl.base.BaseOperation.delete(BaseOperation.java:70)\n\tat
> org.apache.flink.kubernetes.kubeclient.Fabric8FlinkKubeClient.lambda$deleteConfigMapsByLabels$10(Fabric8FlinkKubeClient.java:361)\n\tat
> java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1640)\n\t...
> 3 common frames omitted\n"}*
>
> Service account (fsp) role has following rules:
> rules:
> - apiGroups:
>   - ""
>   resources:
>   - configmaps
>   verbs:
>   - update
>   - get
>   - create
>   - watch
>   - patch
>   - delete
>
> So service account seems allowed to GET configmaps. Also seems service
> account was ok to create configmaps during run (no complains in log).
>
> Thanks,
> Alexey
>

Re: stop job with Savepoint

Posted by Alexey Trenikhun <ye...@msn.com>.
Adding "list" to verbs helps, do I need to add anything else ?

________________________________
From: Alexey Trenikhun <ye...@msn.com>
Sent: Saturday, February 20, 2021 2:10 PM
To: Flink User Mail List <us...@flink.apache.org>
Subject: stop job with Savepoint

Hello,
I'm running per job Flink cluster, JM is deployed as Kubernetes Job with restartPolicy: Never, highavailability is KubernetesHaServicesFactory. Job runs fine for some time, configmaps are created etc.  Now in order to upgrade Flink job, I'm trying to stop job with savepoint (flink stop $JOB_ID), JM exits with code 2, from log:

{"ts":"2021-02-20T21:34:18.195Z","message":"Terminating cluster entrypoint process StandaloneApplicationClusterEntryPoint with exit code 2.","logger_name":"org.apache.flink.runtime.entrypoint.ClusterEntrypoint","thread_name":"flink-akka.actor.default-dispatcher-2","level":"INFO","level_value":20000,"stack_trace":"java.util.concurrent.ExecutionException: io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: GET at: https://10.96.0.1/api/v1/namespaces/n/configmaps?labelSelector=app%3Dfsp%2Cconfigmap-type%3Dhigh-availability%2Ctype%3Dflink-native-kubernetes. Message: Forbidden!Configured service account doesn't have access. Service account may have been revoked. configmaps is forbidden: User \"system:serviceaccount:n:fsp\" cannot list resource \"configmaps\" in API group \"\" in the namespace \"n\".\n\tat java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)\n\tat java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908)\n\tat org.apache.flink.kubernetes.highavailability.KubernetesHaServices.internalCleanup(KubernetesHaServices.java:142)\n\tat org.apache.flink.runtime.highavailability.AbstractHaServices.closeAndCleanupAllData(AbstractHaServices.java:180)\n\tat org.apache.flink.runtime.entrypoint.ClusterEntrypoint.stopClusterServices(ClusterEntrypoint.java:378)\n\tat org.apache.flink.runtime.entrypoint.ClusterEntrypoint.lambda$shutDownAsync$3(ClusterEntrypoint.java:467)\n\tat org.apache.flink.runtime.concurrent.FutureUtils.lambda$composeAfterwards$19(FutureUtils.java:704)\n\tat java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774)\n\tat java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750)\n\tat java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488)\n\tat java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1975)\n\tat org.apache.flink.runtime.concurrent.FutureUtils.lambda$null$18(FutureUtils.java:715)\n\tat java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774)\n\tat java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750)\n\tat java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488)\n\tat java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1975)\n\tat org.apache.flink.runtime.entrypoint.component.DispatcherResourceManagerComponent.lambda$closeAsyncInternal$3(DispatcherResourceManagerComponent.java:182)\n\tat java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774)\n\tat java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750)\n\tat java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488)\n\tat java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1975)\n\tat org.apache.flink.runtime.concurrent.FutureUtils$CompletionConjunctFuture.completeFuture(FutureUtils.java:956)\n\tat java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774)\n\tat java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750)\n\tat java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488)\n\tat java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1975)\n\tat org.apache.flink.runtime.concurrent.FutureUtils.lambda$forwardTo$22(FutureUtils.java:1323)\n\tat java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774)\n\tat java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750)\n\tat java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:456)\n\tat java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)\n\tat java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)\n\tat java.lang.Thread.run(Thread.java:748)\nCaused by: io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: GET at: https://10.96.0.1/api/v1/namespaces/n/configmaps?labelSelector=app%3Dfsp%2Cconfigmap-type%3Dhigh-availability%2Ctype%3Dflink-native-kubernetes. Message: Forbidden!Configured service account doesn't have access. Service account may have been revoked. configmaps is forbidden: User \"system:serviceaccount:n:fsp\" cannot list resource \"configmaps\" in API group \"\" in the namespace \"n\".\n\tat io.fabric8.kubernetes.client.dsl.base.OperationSupport.requestFailure(OperationSupport.java:568)\n\tat io.fabric8.kubernetes.client.dsl.base.OperationSupport.assertResponseCode(OperationSupport.java:505)\n\tat io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:471)\n\tat io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:430)\n\tat io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:412)\n\tat io.fabric8.kubernetes.client.dsl.base.BaseOperation.listRequestHelper(BaseOperation.java:151)\n\tat io.fabric8.kubernetes.client.dsl.base.BaseOperation.list(BaseOperation.java:621)\n\tat io.fabric8.kubernetes.client.dsl.base.BaseOperation.deleteList(BaseOperation.java:730)\n\tat io.fabric8.kubernetes.client.dsl.base.BaseOperation.delete(BaseOperation.java:655)\n\tat io.fabric8.kubernetes.client.dsl.base.BaseOperation.delete(BaseOperation.java:70)\n\tat org.apache.flink.kubernetes.kubeclient.Fabric8FlinkKubeClient.lambda$deleteConfigMapsByLabels$10(Fabric8FlinkKubeClient.java:361)\n\tat java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1640)\n\t... 3 common frames omitted\n"}

Service account (fsp) role has following rules:
rules:
- apiGroups:
  - ""
  resources:
  - configmaps
  verbs:
  - update
  - get
  - create
  - watch
  - patch
  - delete

So service account seems allowed to GET configmaps. Also seems service account was ok to create configmaps during run (no complains in log).

Thanks,
Alexey