You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@flink.apache.org by "Matthias Pohl (Jira)" <ji...@apache.org> on 2022/01/16 17:13:00 UTC

[jira] [Commented] (FLINK-25432) Implement cleanup strategy

    [ https://issues.apache.org/jira/browse/FLINK-25432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17476836#comment-17476836 ] 

Matthias Pohl commented on FLINK-25432:
---------------------------------------

We're experiencing {{KubernetesHighAvailabilityRecoverFromSavepointITCase}} failing because of a dependency between shutting down the {{{}JobMaster{}}}'s leader election and cleaning up the HA resources for the finished job.

[KubernetesLeaderElectionDriver:229|https://github.com/apache/flink/blob/9c7e3007eea80d7f4ad602fc33d9f58b676a7722/flink-kubernetes/src/main/java/org/apache/flink/kubernetes/highavailability/KubernetesLeaderElectionDriver.java#L229] fails fatally if a {{ConfigMap}} is deleted from some actor not having the right leadership ID (i.e. the Dispatcher).

This means that we have to wait for the leadership resources being freed before cleaning the HA data for the job, still. This is not necessary when considering the work done by FLINK-24038 which introduces a single leader election per JobManager.

> Implement cleanup strategy
> --------------------------
>
>                 Key: FLINK-25432
>                 URL: https://issues.apache.org/jira/browse/FLINK-25432
>             Project: Flink
>          Issue Type: Sub-task
>    Affects Versions: 1.15.0
>            Reporter: Matthias Pohl
>            Assignee: Matthias Pohl
>            Priority: Major
>
> We want to combine the job-specific cleanup of the different resources and provide a common {{ResourceCleaner}} taking care of the actual cleanup of all resources.
> This needs to be integrated into the {{Dispatcher}}.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)