You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flink.apache.org by "hjw (Jira)" <ji...@apache.org> on 2023/02/23 16:34:00 UTC

[jira] [Created] (FLINK-31203) Application upgrade rollbacks failed in Flink Kubernetes Operator

hjw created FLINK-31203:
---------------------------

             Summary: Application upgrade rollbacks failed in Flink Kubernetes Operator
                 Key: FLINK-31203
                 URL: https://issues.apache.org/jira/browse/FLINK-31203
             Project: Flink
          Issue Type: Bug
          Components: Kubernetes Operator
    Affects Versions: kubernetes-operator-1.3.1
            Reporter: hjw


I make a test on the Application upgrade rollback feature, but this function fails.The Flink application mode job cannot roll back to  last stable spec.
As shown in the follow example, I declare a error pod-template without a container named flink-main-container to test rollback feature.
However, only the error of deploying the flink application job failed without rollback.
 
Error:
org.apache.flink.client.deployment.ClusterDeploymentException: Could not create Kubernetes cluster "basic-example".
 at org.apache.flink.kubernetes.KubernetesClusterDescriptor.deployClusterInternal(KubernetesClusterDescriptor.java:292)
Caused by: io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: POST at: https://*/k8s/clusters/c-fwkxh/apis/apps/v1/namespaces/test-flink/deployments. Message: Deployment.apps "basic-example" is invalid: [spec.template.spec.containers[0].name: Required value, spec.template.spec.containers[0].image: Required value]. Received status: Status(apiVersion=v1, code=422, details=StatusDetails(causes=[StatusCause(field=spec.template.spec.containers[0].name, message=Required value, reason=FieldValueRequired, additionalProperties={}), StatusCause(field=spec.template.spec.containers[0].image, message=Required value, reason=FieldValueRequired, additionalProperties={})], group=apps, kind=Deployment, name=basic-example, retryAfterSeconds=null, uid=null, additionalProperties={}), kind=Status, message=Deployment.apps "basic-example" is invalid: [spec.template.spec.containers[0].name: Required value, spec.template.spec.containers[0].image: Required value], metadata=ListMeta(_continue=null, remainingItemCount=null, resourceVersion=null, selfLink=null, additionalProperties={}), reason=Invalid, status=Failure, additionalProperties={}).
 at io.fabric8.kubernetes.client.dsl.base.OperationSupport.requestFailure(OperationSupport.java:673)
 at io.fabric8.kubernetes.client.dsl.base.OperationSupport.assertResponseCode(OperationSupport.java:612)
 at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:560)
 
Env:
Flink version:Flink 1.16
Flink Kubernetes Operator:1.3.1
 
*Last* ** *stable  spec:*
apiVersion: [flink.apache.org/v1beta1|http://flink.apache.org/v1beta1]
kind: FlinkDeployment
metadata:
  name: basic-example
spec:
  image: flink:1.16
  flinkVersion: v1_16
  flinkConfiguration:
    taskmanager.numberOfTaskSlots: "2"
    kubernetes.operator.deployment.rollback.enabled: true
    state.savepoints.dir: s3://flink-data/savepoints
    state.checkpoints.dir: s3://flink-data/checkpoints
    high-availability: org.apache.flink.kubernetes.highavailability.KubernetesHaServicesFactory
    high-availability.storageDir: s3://flink-data/ha
  serviceAccount: flink
  *podTemplate:*
    *spec:*
      *containers:*
        *- name: flink-main-container*      
          *env:*
          *- name: TZ*
            *value: Asia/Shanghai*
  jobManager:
    resource:
      memory: "2048m"
      cpu: 1
  taskManager:
    resource:
      memory: "2048m"
      cpu: 1
  job:
    jarURI: local:///opt/flink/examples/streaming/StateMachineExample.jar
    parallelism: 2
    upgradeMode: stateless
 
*new Spec:*
apiVersion: [flink.apache.org/v1beta1|http://flink.apache.org/v1beta1]
kind: FlinkDeployment
metadata:
  name: basic-example
spec:
  image: flink:1.16
  flinkVersion: v1_16
  flinkConfiguration:
    taskmanager.numberOfTaskSlots: "2"
    kubernetes.operator.deployment.rollback.enabled: true
    state.savepoints.dir: s3://flink-data/savepoints
    state.checkpoints.dir: s3://flink-data/checkpoints
    high-availability: org.apache.flink.kubernetes.highavailability.KubernetesHaServicesFactory
    high-availability.storageDir: s3://flink-data/ha
  serviceAccount: flink
  *podTemplate:*
    *spec:*
      *containers:*
        *-   env:*
          *- name: TZ*
            *value: Asia/Shanghai*
  jobManager:
    resource:
      memory: "2048m"
      cpu: 1
  taskManager:
    resource:
      memory: "2048m"
      cpu: 1
  job:
    jarURI: local:///opt/flink/examples/streaming/StateMachineExample.jar
    parallelism: 2
    upgradeMode: stateless



--
This message was sent by Atlassian Jira
(v8.20.10#820010)