You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Gyula Fora (Jira)" <ji...@apache.org> on 2022/04/03 19:19:00 UTC

[jira] [Assigned] (FLINK-26345) Observer should detect flink job even if deployment status is empty

     [ https://issues.apache.org/jira/browse/FLINK-26345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Gyula Fora reassigned FLINK-26345:
----------------------------------

    Assignee: Gyula Fora

> Observer should detect flink job even if deployment status is empty
> -------------------------------------------------------------------
>
>                 Key: FLINK-26345
>                 URL: https://issues.apache.org/jira/browse/FLINK-26345
>             Project: Flink
>          Issue Type: Improvement
>          Components: Kubernetes Operator
>            Reporter: Gyula Fora
>            Assignee: Gyula Fora
>            Priority: Major
>             Fix For: kubernetes-operator-1.0.0
>
>
> Currently it is possible to get into a cornercase where the job is submitted by the reconciler but the deployment status is not updated to reflect the submission.
> In these cases the observer does not attempt to "recover" the cluster, it simply skips the observation step, thinking that the job is not running (status == null).
> However this means that the reconciler will try to submit it again leading to the error:
> {code:java}
> org.apache.flink.client.deployment.ClusterDeploymentException: The Flink cluster job-name already exists.                             
>      at org.apache.flink.kubernetes.KubernetesClusterDescriptor.deployApplicationCluster(KubernetesClusterDescriptor.java:179)                     
>      at org.apache.flink.client.deployment.application.cli.ApplicationClusterDeployer.run(ApplicationClusterDeployer.java:67)                      
>      at org.apache.flink.kubernetes.operator.service.FlinkService.submitApplicationCluster(FlinkService.java:73)                                  
>      at org.apache.flink.kubernetes.operator.reconciler.JobReconciler.deployFlinkJob(JobReconciler.java:123)                                      
>      at org.apache.flink.kubernetes.operator.reconciler.JobReconciler.reconcile(JobReconciler.java:65)                                                     
>      at org.apache.flink.kubernetes.operator.controller.FlinkDeploymentController.reconcileFlinkDeployment(FlinkDeploymentController.java:126)          
>      at org.apache.flink.kubernetes.operator.controller.FlinkDeploymentController.reconcile(FlinkDeploymentController.java:102)                           
>      at org.apache.flink.kubernetes.operator.controller.FlinkDeploymentController.reconcile(FlinkDeploymentController.java:51)                              
>  {code}
> This is somewhat related to FLINK-26261, cc [~thw] 
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)