You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by hjw <hj...@163.com> on 2023/02/19 16:55:41 UTC

Application upgrade rollbacks failed in Flink Kubernetes Operator

I make a test on the Application upgrade rollback feature, but this function fails.The Flink application mode job cannot roll back to  last stable spec.
As shown in the follow example, I declare a error pod-template without a container named flink-main-container to test rollback feature.
However, only the error of deploying the flink application job failed without rollback.


Error:
org.apache.flink.client.deployment.ClusterDeploymentException: Could not create Kubernetes cluster "basic-example".
 at org.apache.flink.kubernetes.KubernetesClusterDescriptor.deployClusterInternal(KubernetesClusterDescriptor.java:292)
Caused by: io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: POST at: https://*/k8s/clusters/c-fwkxh/apis/apps/v1/namespaces/test-flink/deployments. Message: Deployment.apps "basic-example" is invalid: [spec.template.spec.containers[0].name: Required value, spec.template.spec.containers[0].image: Required value]. Received status: Status(apiVersion=v1, code=422, details=StatusDetails(causes=[StatusCause(field=spec.template.spec.containers[0].name, message=Required value, reason=FieldValueRequired, additionalProperties={}), StatusCause(field=spec.template.spec.containers[0].image, message=Required value, reason=FieldValueRequired, additionalProperties={})], group=apps, kind=Deployment, name=flink-bdra-sql-application-job-s3p, retryAfterSeconds=null, uid=null, additionalProperties={}), kind=Status, message=Deployment.apps "flink-bdra-sql-application-job-s3p" is invalid: [spec.template.spec.containers[0].name: Required value, spec.template.spec.containers[0].image: Required value], metadata=ListMeta(_continue=null, remainingItemCount=null, resourceVersion=null, selfLink=null, additionalProperties={}), reason=Invalid, status=Failure, additionalProperties={}).
 at io.fabric8.kubernetes.client.dsl.base.OperationSupport.requestFailure(OperationSupport.java:673)
 at io.fabric8.kubernetes.client.dsl.base.OperationSupport.assertResponseCode(OperationSupport.java:612)
 at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:560)


Env:
Flink version:Flink 1.16
Flink Kubernetes Operator:1.3.1


Last stable  spec:
apiVersion: flink.apache.org/v1beta1
kind: FlinkDeployment
metadata:
  name: basic-example
spec:
  image: flink:1.16
  flinkVersion: v1_16
  flinkConfiguration:
    taskmanager.numberOfTaskSlots: "2"
    kubernetes.operator.deployment.rollback.enabled: true
    state.savepoints.dir: s3:///flink-data/savepoints
    state.checkpoints.dir: s3:///flink-data/checkpoints
    high-availability: org.apache.flink.kubernetes.highavailability.KubernetesHaServicesFactory
    high-availability.storageDir: s3:///flink-data/ha
  serviceAccount: flink
  podTemplate:
    spec:
      containers:
        - name: flink-main-container      
          env:
          - name: TZ
            value: Asia/Shanghai
  jobManager:
    resource:
      memory: "2048m"
      cpu: 1
  taskManager:
    resource:
      memory: "2048m"
      cpu: 1
  job:
    jarURI: local:///opt/flink/examples/streaming/StateMachineExample.jar
    parallelism: 2
    upgradeMode: stateless


new Spec:
apiVersion: flink.apache.org/v1beta1
kind: FlinkDeployment
metadata:
  name: basic-example
spec:
  image: flink:1.16
  flinkVersion: v1_16
  flinkConfiguration:
    taskmanager.numberOfTaskSlots: "2"
    kubernetes.operator.deployment.rollback.enabled: true
    state.savepoints.dir: s3:///flink-data/savepoints
    state.checkpoints.dir: s3:///flink-data/checkpoints
    high-availability: org.apache.flink.kubernetes.highavailability.KubernetesHaServicesFactory
    high-availability.storageDir: s3:///flink-data/ha
  serviceAccount: flink
  podTemplate:
    spec:
      containers:
        -   env:
          - name: TZ
            value: Asia/Shanghai
  jobManager:
    resource:
      memory: "2048m"
      cpu: 1
  taskManager:
    resource:
      memory: "2048m"
      cpu: 1
  job:
    jarURI: local:///opt/flink/examples/streaming/StateMachineExample.jar
    parallelism: 2
    upgradeMode: stateless

--

Best,
Hjw

Re:Re: Application upgrade rollbacks failed in Flink Kubernetes Operator

Posted by hjw <hj...@163.com>.
Hi
I declare a error pod-template without a container named flink-main-container to test rollback feature.
Please pay attention to the Pod-template in the old and new specs.


Last stable  spec:
apiVersion: flink.apache.org/v1beta1
kind: FlinkDeployment
metadata:
  name: basic-example
spec:
  image: flink:1.16
  flinkVersion: v1_16
  flinkConfiguration:
    taskmanager.numberOfTaskSlots: "2"
    kubernetes.operator.deployment.rollback.enabled: true
    state.savepoints.dir: s3://flink-data/savepoints
    state.checkpoints.dir: s3://flink-data/checkpoints
    high-availability: org.apache.flink.kubernetes.highavailability.KubernetesHaServicesFactory
    high-availability.storageDir: s3://flink-data/ha
  serviceAccount: flink
  podTemplate:
    spec:
      containers:
        - name: flink-main-container      
          env:
          - name: TZ
            value: Asia/Shanghai
  jobManager:
    resource:
      memory: "2048m"
      cpu: 1
  taskManager:
    resource:
      memory: "2048m"
      cpu: 1
  job:
    jarURI: local:///opt/flink/examples/streaming/StateMachineExample.jar
    parallelism: 2
    upgradeMode: stateless


new Spec:
apiVersion: flink.apache.org/v1beta1
kind: FlinkDeployment
metadata:
  name: basic-example
spec:
  image: flink:1.16
  flinkVersion: v1_16
  flinkConfiguration:
    taskmanager.numberOfTaskSlots: "2"
    kubernetes.operator.deployment.rollback.enabled: true
    state.savepoints.dir: s3://flink-data/savepoints
    state.checkpoints.dir: s3://flink-data/checkpoints
    high-availability: org.apache.flink.kubernetes.highavailability.KubernetesHaServicesFactory
    high-availability.storageDir: s3://flink-data/ha
  serviceAccount: flink
  podTemplate:
    spec:
      containers:
        -   env:
          - name: TZ
            value: Asia/Shanghai
  jobManager:
    resource:
      memory: "2048m"
      cpu: 1
  taskManager:
    resource:
      memory: "2048m"
      cpu: 1
  job:
    jarURI: local:///opt/flink/examples/streaming/StateMachineExample.jar
    parallelism: 2
    upgradeMode: stateless















--

Best,
Hjw




At 2023-02-20 08:48:46, "Shammon FY" <zj...@gmail.com> wrote:

Hi


I cannot see the difference between the two configurations, but the error info `Failure executing: POST at: https://*/k8s/clusters/c-fwkxh/apis/apps/v1/namespaces/test-flink/deployments. Message: Deployment.apps "basic-example" is invalid` is strange. Maybe you can check whether the configuration of k8s has changed?



Best,
Shammon




On Mon, Feb 20, 2023 at 12:56 AM hjw <hj...@163.com> wrote:

I make a test on the Application upgrade rollback feature, but this function fails.The Flink application mode job cannot roll back to  last stable spec.
As shown in the follow example, I declare a error pod-template without a container named flink-main-container to test rollback feature.
However, only the error of deploying the flink application job failed without rollback.


Error:
org.apache.flink.client.deployment.ClusterDeploymentException: Could not create Kubernetes cluster "basic-example".
 at org.apache.flink.kubernetes.KubernetesClusterDescriptor.deployClusterInternal(KubernetesClusterDescriptor.java:292)
Caused by: io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: POST at: https://*/k8s/clusters/c-fwkxh/apis/apps/v1/namespaces/test-flink/deployments. Message: Deployment.apps "basic-example" is invalid: [spec.template.spec.containers[0].name: Required value, spec.template.spec.containers[0].image: Required value]. Received status: Status(apiVersion=v1, code=422, details=StatusDetails(causes=[StatusCause(field=spec.template.spec.containers[0].name, message=Required value, reason=FieldValueRequired, additionalProperties={}), StatusCause(field=spec.template.spec.containers[0].image, message=Required value, reason=FieldValueRequired, additionalProperties={})], group=apps, kind=Deployment, name=flink-bdra-sql-application-job-s3p, retryAfterSeconds=null, uid=null, additionalProperties={}), kind=Status, message=Deployment.apps "flink-bdra-sql-application-job-s3p" is invalid: [spec.template.spec.containers[0].name: Required value, spec.template.spec.containers[0].image: Required value], metadata=ListMeta(_continue=null, remainingItemCount=null, resourceVersion=null, selfLink=null, additionalProperties={}), reason=Invalid, status=Failure, additionalProperties={}).
 at io.fabric8.kubernetes.client.dsl.base.OperationSupport.requestFailure(OperationSupport.java:673)
 at io.fabric8.kubernetes.client.dsl.base.OperationSupport.assertResponseCode(OperationSupport.java:612)
 at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:560)


Env:
Flink version:Flink 1.16
Flink Kubernetes Operator:1.3.1


Last stable  spec:
apiVersion: flink.apache.org/v1beta1
kind: FlinkDeployment
metadata:
  name: basic-example
spec:
  image: flink:1.16
  flinkVersion: v1_16
  flinkConfiguration:
    taskmanager.numberOfTaskSlots: "2"
    kubernetes.operator.deployment.rollback.enabled: true
    state.savepoints.dir: s3:///flink-data/savepoints
    state.checkpoints.dir: s3:///flink-data/checkpoints
    high-availability: org.apache.flink.kubernetes.highavailability.KubernetesHaServicesFactory
    high-availability.storageDir: s3:///flink-data/ha
  serviceAccount: flink
  podTemplate:
    spec:
      containers:
        - name: flink-main-container      
          env:
          - name: TZ
            value: Asia/Shanghai
  jobManager:
    resource:
      memory: "2048m"
      cpu: 1
  taskManager:
    resource:
      memory: "2048m"
      cpu: 1
  job:
    jarURI: local:///opt/flink/examples/streaming/StateMachineExample.jar
    parallelism: 2
    upgradeMode: stateless


new Spec:
apiVersion: flink.apache.org/v1beta1
kind: FlinkDeployment
metadata:
  name: basic-example
spec:
  image: flink:1.16
  flinkVersion: v1_16
  flinkConfiguration:
    taskmanager.numberOfTaskSlots: "2"
    kubernetes.operator.deployment.rollback.enabled: true
    state.savepoints.dir: s3:///flink-data/savepoints
    state.checkpoints.dir: s3:///flink-data/checkpoints
    high-availability: org.apache.flink.kubernetes.highavailability.KubernetesHaServicesFactory
    high-availability.storageDir: s3:///flink-data/ha
  serviceAccount: flink
  podTemplate:
    spec:
      containers:
        -   env:
          - name: TZ
            value: Asia/Shanghai
  jobManager:
    resource:
      memory: "2048m"
      cpu: 1
  taskManager:
    resource:
      memory: "2048m"
      cpu: 1
  job:
    jarURI: local:///opt/flink/examples/streaming/StateMachineExample.jar
    parallelism: 2
    upgradeMode: stateless

--

Best,
Hjw

Re: Application upgrade rollbacks failed in Flink Kubernetes Operator

Posted by Shammon FY <zj...@gmail.com>.
Hi

I cannot see the difference between the two configurations, but the error
info `Failure executing: POST at: https://*/k8s/clusters/c-
fwkxh/apis/apps/v1/namespaces/test-flink/deployments. Message:
Deployment.apps "basic-example" is invalid` is strange. Maybe you can check
whether the configuration of k8s has changed?

Best,
Shammon


On Mon, Feb 20, 2023 at 12:56 AM hjw <hj...@163.com> wrote:

> I make a test on the Application upgrade rollback feature, but this
> function fails.The Flink application mode job cannot roll back to  last
> stable spec.
> As shown in the follow example, I declare a error pod-template without a
> container named flink-main-container to test rollback feature.
> However, only the error of deploying the flink application job failed
> without rollback.
>
> Error:
> org.apache.flink.client.deployment.ClusterDeploymentException: Could not
> create Kubernetes cluster "basic-example".
>  at
> org.apache.flink.kubernetes.KubernetesClusterDescriptor.deployClusterInternal(KubernetesClusterDescriptor.java:292)
> Caused by: io.fabric8.kubernetes.client.KubernetesClientException: Failure
> executing: POST at: https://*/k8s/clusters/c-fwkxh/apis/apps/v1/namespaces/test-flink/deployments.
> Message: Deployment.apps "basic-example" is invalid:
> [spec.template.spec.containers[0].name: Required value,
> spec.template.spec.containers[0].image: Required value]. Received status:
> Status(apiVersion=v1, code=422,
> details=StatusDetails(causes=[StatusCause(field=spec.template.spec.containers[0].name,
> message=Required value, reason=FieldValueRequired,
> additionalProperties={}),
> StatusCause(field=spec.template.spec.containers[0].image, message=Required
> value, reason=FieldValueRequired, additionalProperties={})], group=apps,
> kind=Deployment, name=flink-bdra-sql-application-job-s3p,
> retryAfterSeconds=null, uid=null, additionalProperties={}), kind=Status,
> message=Deployment.apps "flink-bdra-sql-application-job-s3p" is invalid:
> [spec.template.spec.containers[0].name: Required value,
> spec.template.spec.containers[0].image: Required value],
> metadata=ListMeta(_continue=null, remainingItemCount=null,
> resourceVersion=null, selfLink=null, additionalProperties={}),
> reason=Invalid, status=Failure, additionalProperties={}).
>  at
> io.fabric8.kubernetes.client.dsl.base.OperationSupport.requestFailure(OperationSupport.java:673)
>  at
> io.fabric8.kubernetes.client.dsl.base.OperationSupport.assertResponseCode(OperationSupport.java:612)
>  at
> io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:560)
>
> Env:
> Flink version:Flink 1.16
> Flink Kubernetes Operator:1.3.1
>
> *Last* *stable  spec:*
> apiVersion: flink.apache.org/v1beta1
> kind: FlinkDeployment
> metadata:
>   name: basic-example
> spec:
>   image: flink:1.16
>   flinkVersion: v1_16
>   flinkConfiguration:
>     taskmanager.numberOfTaskSlots: "2"
>     kubernetes.operator.deployment.rollback.enabled: true
>     state.savepoints.dir: s3:///flink-data/savepoints
>     state.checkpoints.dir: s3:///flink-data/checkpoints
>     high-availability:
> org.apache.flink.kubernetes.highavailability.KubernetesHaServicesFactory
>     high-availability.storageDir: s3:///flink-data/ha
>   serviceAccount: flink
>   podTemplate:
>     spec:
>       containers:
>         - name: flink-main-container
>           env:
>           - name: TZ
>             value: Asia/Shanghai
>   jobManager:
>     resource:
>       memory: "2048m"
>       cpu: 1
>   taskManager:
>     resource:
>       memory: "2048m"
>       cpu: 1
>   job:
>     jarURI: local:///opt/flink/examples/streaming/StateMachineExample.jar
>     parallelism: 2
>     upgradeMode: stateless
>
> *new Spec:*
> apiVersion: flink.apache.org/v1beta1
> kind: FlinkDeployment
> metadata:
>   name: basic-example
> spec:
>   image: flink:1.16
>   flinkVersion: v1_16
>   flinkConfiguration:
>     taskmanager.numberOfTaskSlots: "2"
>     kubernetes.operator.deployment.rollback.enabled: true
>     state.savepoints.dir: s3:///flink-data/savepoints
>     state.checkpoints.dir: s3:///flink-data/checkpoints
>     high-availability:
> org.apache.flink.kubernetes.highavailability.KubernetesHaServicesFactory
>     high-availability.storageDir: s3:///flink-data/ha
>   serviceAccount: flink
>   podTemplate:
>     spec:
>       containers:
>         -   env:
>           - name: TZ
>             value: Asia/Shanghai
>   jobManager:
>     resource:
>       memory: "2048m"
>       cpu: 1
>   taskManager:
>     resource:
>       memory: "2048m"
>       cpu: 1
>   job:
>     jarURI: local:///opt/flink/examples/streaming/StateMachineExample.jar
>     parallelism: 2
>     upgradeMode: stateless
>
> --
> Best,
> Hjw
>