You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flink.apache.org by "Gyula Fora (Jira)" <ji...@apache.org> on 2022/02/24 05:08:00 UTC

[jira] [Created] (FLINK-26345) Observer should detect flink job even if deployment status is empty

Gyula Fora created FLINK-26345:
----------------------------------

             Summary: Observer should detect flink job even if deployment status is empty
                 Key: FLINK-26345
                 URL: https://issues.apache.org/jira/browse/FLINK-26345
             Project: Flink
          Issue Type: Bug
            Reporter: Gyula Fora


Currently it is possible to get into a cornercase where the job is submitted by the reconciler but the deployment status is not updated to reflect the submission.

In these cases the observer does not attempt to "recover" the cluster, it simply skips the observation step, thinking that the job is not running (status == null).

However this means that the reconciler will try to submit it again leading to the error:
{code:java}
org.apache.flink.client.deployment.ClusterDeploymentException: The Flink cluster job-name already exists.                             
     at org.apache.flink.kubernetes.KubernetesClusterDescriptor.deployApplicationCluster(KubernetesClusterDescriptor.java:179)                     
     at org.apache.flink.client.deployment.application.cli.ApplicationClusterDeployer.run(ApplicationClusterDeployer.java:67)                      
     at org.apache.flink.kubernetes.operator.service.FlinkService.submitApplicationCluster(FlinkService.java:73)                                  
     at org.apache.flink.kubernetes.operator.reconciler.JobReconciler.deployFlinkJob(JobReconciler.java:123)                                      
     at org.apache.flink.kubernetes.operator.reconciler.JobReconciler.reconcile(JobReconciler.java:65)                                                     
     at org.apache.flink.kubernetes.operator.controller.FlinkDeploymentController.reconcileFlinkDeployment(FlinkDeploymentController.java:126)          
     at org.apache.flink.kubernetes.operator.controller.FlinkDeploymentController.reconcile(FlinkDeploymentController.java:102)                           
     at org.apache.flink.kubernetes.operator.controller.FlinkDeploymentController.reconcile(FlinkDeploymentController.java:51)                              
 {code}
This is somewhat related to FLINK-26261, cc [~thw] 

 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)