You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flink.apache.org by "Thomas Weise (Jira)" <ji...@apache.org> on 2022/08/25 01:49:00 UTC

[jira] [Created] (FLINK-29100) Deployment with last-state upgrade mode stuck after initial error

Thomas Weise created FLINK-29100:
------------------------------------

             Summary: Deployment with last-state upgrade mode stuck after initial error
                 Key: FLINK-29100
                 URL: https://issues.apache.org/jira/browse/FLINK-29100
             Project: Flink
          Issue Type: Bug
          Components: Kubernetes Operator
    Affects Versions: kubernetes-operator-1.1.0
            Reporter: Thomas Weise
            Assignee: Thomas Weise


A deployment with last_state upgrade mode that never succeeds will be stuck in deploying state because no HA data exists. This can be reproduced by creating a deployment with invalid image or exception in entry point. Update to the CR that corrects the issue won't be reconciled due to "o.a.f.k.o.r.d.ApplicationReconciler [INFO ] [default.basic-checkpoint-ha-example] Job is not running yet and HA metadata is not available, waiting for upgradeable state". This forces manual intervention to delete the CR.

Instead,  operator should check if this is the initial deployment and if so skip the HA metadata check.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)