You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by hjw <hj...@163.com> on 2023/02/19 16:55:41 UTC
Application upgrade rollbacks failed in Flink Kubernetes Operator
I make a test on the Application upgrade rollback feature, but this function fails.The Flink application mode job cannot roll back to last stable spec.
As shown in the follow example, I declare a error pod-template without a container named flink-main-container to test rollback feature.
However, only the error of deploying the flink application job failed without rollback.
Error:
org.apache.flink.client.deployment.ClusterDeploymentException: Could not create Kubernetes cluster "basic-example".
at org.apache.flink.kubernetes.KubernetesClusterDescriptor.deployClusterInternal(KubernetesClusterDescriptor.java:292)
Caused by: io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: POST at: https://*/k8s/clusters/c-fwkxh/apis/apps/v1/namespaces/test-flink/deployments. Message: Deployment.apps "basic-example" is invalid: [spec.template.spec.containers[0].name: Required value, spec.template.spec.containers[0].image: Required value]. Received status: Status(apiVersion=v1, code=422, details=StatusDetails(causes=[StatusCause(field=spec.template.spec.containers[0].name, message=Required value, reason=FieldValueRequired, additionalProperties={}), StatusCause(field=spec.template.spec.containers[0].image, message=Required value, reason=FieldValueRequired, additionalProperties={})], group=apps, kind=Deployment, name=flink-bdra-sql-application-job-s3p, retryAfterSeconds=null, uid=null, additionalProperties={}), kind=Status, message=Deployment.apps "flink-bdra-sql-application-job-s3p" is invalid: [spec.template.spec.containers[0].name: Required value, spec.template.spec.containers[0].image: Required value], metadata=ListMeta(_continue=null, remainingItemCount=null, resourceVersion=null, selfLink=null, additionalProperties={}), reason=Invalid, status=Failure, additionalProperties={}).
at io.fabric8.kubernetes.client.dsl.base.OperationSupport.requestFailure(OperationSupport.java:673)
at io.fabric8.kubernetes.client.dsl.base.OperationSupport.assertResponseCode(OperationSupport.java:612)
at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:560)
Env:
Flink version:Flink 1.16
Flink Kubernetes Operator:1.3.1
Last stable spec:
apiVersion: flink.apache.org/v1beta1
kind: FlinkDeployment
metadata:
name: basic-example
spec:
image: flink:1.16
flinkVersion: v1_16
flinkConfiguration:
taskmanager.numberOfTaskSlots: "2"
kubernetes.operator.deployment.rollback.enabled: true
state.savepoints.dir: s3:///flink-data/savepoints
state.checkpoints.dir: s3:///flink-data/checkpoints
high-availability: org.apache.flink.kubernetes.highavailability.KubernetesHaServicesFactory
high-availability.storageDir: s3:///flink-data/ha
serviceAccount: flink
podTemplate:
spec:
containers:
- name: flink-main-container
env:
- name: TZ
value: Asia/Shanghai
jobManager:
resource:
memory: "2048m"
cpu: 1
taskManager:
resource:
memory: "2048m"
cpu: 1
job:
jarURI: local:///opt/flink/examples/streaming/StateMachineExample.jar
parallelism: 2
upgradeMode: stateless
new Spec:
apiVersion: flink.apache.org/v1beta1
kind: FlinkDeployment
metadata:
name: basic-example
spec:
image: flink:1.16
flinkVersion: v1_16
flinkConfiguration:
taskmanager.numberOfTaskSlots: "2"
kubernetes.operator.deployment.rollback.enabled: true
state.savepoints.dir: s3:///flink-data/savepoints
state.checkpoints.dir: s3:///flink-data/checkpoints
high-availability: org.apache.flink.kubernetes.highavailability.KubernetesHaServicesFactory
high-availability.storageDir: s3:///flink-data/ha
serviceAccount: flink
podTemplate:
spec:
containers:
- env:
- name: TZ
value: Asia/Shanghai
jobManager:
resource:
memory: "2048m"
cpu: 1
taskManager:
resource:
memory: "2048m"
cpu: 1
job:
jarURI: local:///opt/flink/examples/streaming/StateMachineExample.jar
parallelism: 2
upgradeMode: stateless
--
Best,
Hjw
Re:Re: Application upgrade rollbacks failed in Flink Kubernetes Operator
Posted by hjw <hj...@163.com>.
Hi
I declare a error pod-template without a container named flink-main-container to test rollback feature.
Please pay attention to the Pod-template in the old and new specs.
Last stable spec:
apiVersion: flink.apache.org/v1beta1
kind: FlinkDeployment
metadata:
name: basic-example
spec:
image: flink:1.16
flinkVersion: v1_16
flinkConfiguration:
taskmanager.numberOfTaskSlots: "2"
kubernetes.operator.deployment.rollback.enabled: true
state.savepoints.dir: s3://flink-data/savepoints
state.checkpoints.dir: s3://flink-data/checkpoints
high-availability: org.apache.flink.kubernetes.highavailability.KubernetesHaServicesFactory
high-availability.storageDir: s3://flink-data/ha
serviceAccount: flink
podTemplate:
spec:
containers:
- name: flink-main-container
env:
- name: TZ
value: Asia/Shanghai
jobManager:
resource:
memory: "2048m"
cpu: 1
taskManager:
resource:
memory: "2048m"
cpu: 1
job:
jarURI: local:///opt/flink/examples/streaming/StateMachineExample.jar
parallelism: 2
upgradeMode: stateless
new Spec:
apiVersion: flink.apache.org/v1beta1
kind: FlinkDeployment
metadata:
name: basic-example
spec:
image: flink:1.16
flinkVersion: v1_16
flinkConfiguration:
taskmanager.numberOfTaskSlots: "2"
kubernetes.operator.deployment.rollback.enabled: true
state.savepoints.dir: s3://flink-data/savepoints
state.checkpoints.dir: s3://flink-data/checkpoints
high-availability: org.apache.flink.kubernetes.highavailability.KubernetesHaServicesFactory
high-availability.storageDir: s3://flink-data/ha
serviceAccount: flink
podTemplate:
spec:
containers:
- env:
- name: TZ
value: Asia/Shanghai
jobManager:
resource:
memory: "2048m"
cpu: 1
taskManager:
resource:
memory: "2048m"
cpu: 1
job:
jarURI: local:///opt/flink/examples/streaming/StateMachineExample.jar
parallelism: 2
upgradeMode: stateless
--
Best,
Hjw
At 2023-02-20 08:48:46, "Shammon FY" <zj...@gmail.com> wrote:
Hi
I cannot see the difference between the two configurations, but the error info `Failure executing: POST at: https://*/k8s/clusters/c-fwkxh/apis/apps/v1/namespaces/test-flink/deployments. Message: Deployment.apps "basic-example" is invalid` is strange. Maybe you can check whether the configuration of k8s has changed?
Best,
Shammon
On Mon, Feb 20, 2023 at 12:56 AM hjw <hj...@163.com> wrote:
I make a test on the Application upgrade rollback feature, but this function fails.The Flink application mode job cannot roll back to last stable spec.
As shown in the follow example, I declare a error pod-template without a container named flink-main-container to test rollback feature.
However, only the error of deploying the flink application job failed without rollback.
Error:
org.apache.flink.client.deployment.ClusterDeploymentException: Could not create Kubernetes cluster "basic-example".
at org.apache.flink.kubernetes.KubernetesClusterDescriptor.deployClusterInternal(KubernetesClusterDescriptor.java:292)
Caused by: io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: POST at: https://*/k8s/clusters/c-fwkxh/apis/apps/v1/namespaces/test-flink/deployments. Message: Deployment.apps "basic-example" is invalid: [spec.template.spec.containers[0].name: Required value, spec.template.spec.containers[0].image: Required value]. Received status: Status(apiVersion=v1, code=422, details=StatusDetails(causes=[StatusCause(field=spec.template.spec.containers[0].name, message=Required value, reason=FieldValueRequired, additionalProperties={}), StatusCause(field=spec.template.spec.containers[0].image, message=Required value, reason=FieldValueRequired, additionalProperties={})], group=apps, kind=Deployment, name=flink-bdra-sql-application-job-s3p, retryAfterSeconds=null, uid=null, additionalProperties={}), kind=Status, message=Deployment.apps "flink-bdra-sql-application-job-s3p" is invalid: [spec.template.spec.containers[0].name: Required value, spec.template.spec.containers[0].image: Required value], metadata=ListMeta(_continue=null, remainingItemCount=null, resourceVersion=null, selfLink=null, additionalProperties={}), reason=Invalid, status=Failure, additionalProperties={}).
at io.fabric8.kubernetes.client.dsl.base.OperationSupport.requestFailure(OperationSupport.java:673)
at io.fabric8.kubernetes.client.dsl.base.OperationSupport.assertResponseCode(OperationSupport.java:612)
at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:560)
Env:
Flink version:Flink 1.16
Flink Kubernetes Operator:1.3.1
Last stable spec:
apiVersion: flink.apache.org/v1beta1
kind: FlinkDeployment
metadata:
name: basic-example
spec:
image: flink:1.16
flinkVersion: v1_16
flinkConfiguration:
taskmanager.numberOfTaskSlots: "2"
kubernetes.operator.deployment.rollback.enabled: true
state.savepoints.dir: s3:///flink-data/savepoints
state.checkpoints.dir: s3:///flink-data/checkpoints
high-availability: org.apache.flink.kubernetes.highavailability.KubernetesHaServicesFactory
high-availability.storageDir: s3:///flink-data/ha
serviceAccount: flink
podTemplate:
spec:
containers:
- name: flink-main-container
env:
- name: TZ
value: Asia/Shanghai
jobManager:
resource:
memory: "2048m"
cpu: 1
taskManager:
resource:
memory: "2048m"
cpu: 1
job:
jarURI: local:///opt/flink/examples/streaming/StateMachineExample.jar
parallelism: 2
upgradeMode: stateless
new Spec:
apiVersion: flink.apache.org/v1beta1
kind: FlinkDeployment
metadata:
name: basic-example
spec:
image: flink:1.16
flinkVersion: v1_16
flinkConfiguration:
taskmanager.numberOfTaskSlots: "2"
kubernetes.operator.deployment.rollback.enabled: true
state.savepoints.dir: s3:///flink-data/savepoints
state.checkpoints.dir: s3:///flink-data/checkpoints
high-availability: org.apache.flink.kubernetes.highavailability.KubernetesHaServicesFactory
high-availability.storageDir: s3:///flink-data/ha
serviceAccount: flink
podTemplate:
spec:
containers:
- env:
- name: TZ
value: Asia/Shanghai
jobManager:
resource:
memory: "2048m"
cpu: 1
taskManager:
resource:
memory: "2048m"
cpu: 1
job:
jarURI: local:///opt/flink/examples/streaming/StateMachineExample.jar
parallelism: 2
upgradeMode: stateless
--
Best,
Hjw
Re: Application upgrade rollbacks failed in Flink Kubernetes Operator
Posted by Shammon FY <zj...@gmail.com>.
Hi
I cannot see the difference between the two configurations, but the error
info `Failure executing: POST at: https://*/k8s/clusters/c-
fwkxh/apis/apps/v1/namespaces/test-flink/deployments. Message:
Deployment.apps "basic-example" is invalid` is strange. Maybe you can check
whether the configuration of k8s has changed?
Best,
Shammon
On Mon, Feb 20, 2023 at 12:56 AM hjw <hj...@163.com> wrote:
> I make a test on the Application upgrade rollback feature, but this
> function fails.The Flink application mode job cannot roll back to last
> stable spec.
> As shown in the follow example, I declare a error pod-template without a
> container named flink-main-container to test rollback feature.
> However, only the error of deploying the flink application job failed
> without rollback.
>
> Error:
> org.apache.flink.client.deployment.ClusterDeploymentException: Could not
> create Kubernetes cluster "basic-example".
> at
> org.apache.flink.kubernetes.KubernetesClusterDescriptor.deployClusterInternal(KubernetesClusterDescriptor.java:292)
> Caused by: io.fabric8.kubernetes.client.KubernetesClientException: Failure
> executing: POST at: https://*/k8s/clusters/c-fwkxh/apis/apps/v1/namespaces/test-flink/deployments.
> Message: Deployment.apps "basic-example" is invalid:
> [spec.template.spec.containers[0].name: Required value,
> spec.template.spec.containers[0].image: Required value]. Received status:
> Status(apiVersion=v1, code=422,
> details=StatusDetails(causes=[StatusCause(field=spec.template.spec.containers[0].name,
> message=Required value, reason=FieldValueRequired,
> additionalProperties={}),
> StatusCause(field=spec.template.spec.containers[0].image, message=Required
> value, reason=FieldValueRequired, additionalProperties={})], group=apps,
> kind=Deployment, name=flink-bdra-sql-application-job-s3p,
> retryAfterSeconds=null, uid=null, additionalProperties={}), kind=Status,
> message=Deployment.apps "flink-bdra-sql-application-job-s3p" is invalid:
> [spec.template.spec.containers[0].name: Required value,
> spec.template.spec.containers[0].image: Required value],
> metadata=ListMeta(_continue=null, remainingItemCount=null,
> resourceVersion=null, selfLink=null, additionalProperties={}),
> reason=Invalid, status=Failure, additionalProperties={}).
> at
> io.fabric8.kubernetes.client.dsl.base.OperationSupport.requestFailure(OperationSupport.java:673)
> at
> io.fabric8.kubernetes.client.dsl.base.OperationSupport.assertResponseCode(OperationSupport.java:612)
> at
> io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:560)
>
> Env:
> Flink version:Flink 1.16
> Flink Kubernetes Operator:1.3.1
>
> *Last* *stable spec:*
> apiVersion: flink.apache.org/v1beta1
> kind: FlinkDeployment
> metadata:
> name: basic-example
> spec:
> image: flink:1.16
> flinkVersion: v1_16
> flinkConfiguration:
> taskmanager.numberOfTaskSlots: "2"
> kubernetes.operator.deployment.rollback.enabled: true
> state.savepoints.dir: s3:///flink-data/savepoints
> state.checkpoints.dir: s3:///flink-data/checkpoints
> high-availability:
> org.apache.flink.kubernetes.highavailability.KubernetesHaServicesFactory
> high-availability.storageDir: s3:///flink-data/ha
> serviceAccount: flink
> podTemplate:
> spec:
> containers:
> - name: flink-main-container
> env:
> - name: TZ
> value: Asia/Shanghai
> jobManager:
> resource:
> memory: "2048m"
> cpu: 1
> taskManager:
> resource:
> memory: "2048m"
> cpu: 1
> job:
> jarURI: local:///opt/flink/examples/streaming/StateMachineExample.jar
> parallelism: 2
> upgradeMode: stateless
>
> *new Spec:*
> apiVersion: flink.apache.org/v1beta1
> kind: FlinkDeployment
> metadata:
> name: basic-example
> spec:
> image: flink:1.16
> flinkVersion: v1_16
> flinkConfiguration:
> taskmanager.numberOfTaskSlots: "2"
> kubernetes.operator.deployment.rollback.enabled: true
> state.savepoints.dir: s3:///flink-data/savepoints
> state.checkpoints.dir: s3:///flink-data/checkpoints
> high-availability:
> org.apache.flink.kubernetes.highavailability.KubernetesHaServicesFactory
> high-availability.storageDir: s3:///flink-data/ha
> serviceAccount: flink
> podTemplate:
> spec:
> containers:
> - env:
> - name: TZ
> value: Asia/Shanghai
> jobManager:
> resource:
> memory: "2048m"
> cpu: 1
> taskManager:
> resource:
> memory: "2048m"
> cpu: 1
> job:
> jarURI: local:///opt/flink/examples/streaming/StateMachineExample.jar
> parallelism: 2
> upgradeMode: stateless
>
> --
> Best,
> Hjw
>