You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@flink.apache.org by "Aitozi (Jira)" <ji...@apache.org> on 2022/07/10 04:03:00 UTC

[jira] [Updated] (FLINK-28478) Session Cluster will lost if it failed between status recorded and deploy

     [ https://issues.apache.org/jira/browse/FLINK-28478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Aitozi updated FLINK-28478:
---------------------------
    Description: 
When I test case with https://issues.apache.org/jira/browse/FLINK-28187 
I found that the session cluster deploy can not recover if it fails between status recorded and deploy. Because, in the next reconcile loop, the spec is not detected changed by {{checkNewSpecAlreadyDeployed}}, so it will not try to start the session cluster again. 

The application mode have no problem, because the deployed spec SUSPEND state of the job is not equal to the desired state, so it will try to reconcile the spec change.

  was:
I found that the session cluster deploy can not recover if it fails between status recorded and deploy. Because, in the next reconcile loop, the spec is not detected changed by {{checkNewSpecAlreadyDeployed}}, so it will not try to start the session cluster again. 

The application mode have no problem, because the deployed spec SUSPEND state of the job is not equal to the desired state, so it will try to reconcile the spec change.


> Session Cluster will lost if it failed between status recorded and deploy
> -------------------------------------------------------------------------
>
>                 Key: FLINK-28478
>                 URL: https://issues.apache.org/jira/browse/FLINK-28478
>             Project: Flink
>          Issue Type: Bug
>          Components: Kubernetes Operator
>            Reporter: Aitozi
>            Priority: Major
>
> When I test case with https://issues.apache.org/jira/browse/FLINK-28187 
> I found that the session cluster deploy can not recover if it fails between status recorded and deploy. Because, in the next reconcile loop, the spec is not detected changed by {{checkNewSpecAlreadyDeployed}}, so it will not try to start the session cluster again. 
> The application mode have no problem, because the deployed spec SUSPEND state of the job is not equal to the desired state, so it will try to reconcile the spec change.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)