You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Gyula Fora (Jira)" <ji...@apache.org> on 2024/04/16 08:55:00 UTC

[jira] [Commented] (FLINK-35123) Flink Kubernetes Operator should not do deleteHAData

    [ https://issues.apache.org/jira/browse/FLINK-35123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17837610#comment-17837610 ] 

Gyula Fora commented on FLINK-35123:
------------------------------------

I agree that if the rest api is accessible we could call shutdown and not touch the HA metadata. But there are some cases when you Need to delete HA metadata explicitly:
 - Cluster is not in a healthy state (rest api not available)
 - Job is previously suspended with last-state upgrade mode where HA metadata is left

Also in Kubernetes HA configuration which is much more common than ZK the HA metadata cleanup is much faster than anything else. It's a simple ConfigMap deletion.

> Flink Kubernetes Operator should not do deleteHAData 
> -----------------------------------------------------
>
>                 Key: FLINK-35123
>                 URL: https://issues.apache.org/jira/browse/FLINK-35123
>             Project: Flink
>          Issue Type: Technical Debt
>          Components: Kubernetes Operator
>    Affects Versions: kubernetes-operator-1.7.0, kubernetes-operator-1.8.0
>            Reporter: Fei Feng
>            Priority: Major
>         Attachments: image-2024-04-16-15-56-33-426.png
>
>
> we use flink HA based on zookeeper. when a lots of FlinkDeployment was deleting, operator will be spend to many time in cleanHaData. the jstack show that reconcile thread was hang on disconnect with zookeeper. this made deleting flinkdeployment was slowly. 
> !image-2024-04-16-15-56-33-426.png|width=502,height=263!
>  
> I don't understand why flink kubernetes operator need cleanHAdata , as [~aitozi] comment in PR  [FLINK-26336 Call cancel on deletion & clean up configmaps as well #28|https://github.com/apache/flink-kubernetes-operator/pull/28#discussion_r815968841]
> {quote}it's a bit of out of scope of the operator responsibility or ability
> {quote}
> and I'm totally agree with his point. 
> and I want to know why we call don't call RestClusterClient#shutDownCluster interface, which is
> 1. more graceful and reasonable (operator need not care whether flink app enable ha or not) 2. compatible across flink versions .   
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)