You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Matthias Pohl (Jira)" <ji...@apache.org> on 2022/03/11 17:07:00 UTC

[jira] [Resolved] (FLINK-26391) Release Testing: Application Mode recovery does not re-trigger a job which failed during cleanup (FLINK-11813)

     [ https://issues.apache.org/jira/browse/FLINK-26391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Matthias Pohl resolved FLINK-26391.
-----------------------------------
    Resolution: Fixed

I'm closing this release testing task. It looks like everything works as expected. Thanks for your efforts, [~wangyang0918]. Feel free to open the issue again if you thing that something is still missing

> Release Testing: Application Mode recovery does not re-trigger a job which failed during cleanup (FLINK-11813)
> --------------------------------------------------------------------------------------------------------------
>
>                 Key: FLINK-26391
>                 URL: https://issues.apache.org/jira/browse/FLINK-26391
>             Project: Flink
>          Issue Type: Improvement
>          Components: Runtime / Coordination
>    Affects Versions: 1.15.0
>            Reporter: Matthias Pohl
>            Assignee: Yang Wang
>            Priority: Blocker
>              Labels: release-testing
>             Fix For: 1.15.0
>
>
> FLINK-11813 is about not being able to determine whether a job has been terminated globally before a failover happened. Testing this behavior can be achieved by running a job in HA mode to enable the file-based {{JobResultStore}} (JRS).
> You can specify [job-result-store.storage-path|https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/#job-result-store-storage-path] to point to a directory which you can access. [job-result-store.delete-on-commit|https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/#job-result-store-delete-on-commit] can be used to make the JRS artifacts not being deleted after a job finished.
> You can make a job finish to generate a the JRS artifact for this job in the specified directory. Renaming the generated file from {{<job-id>.json}} to {{<job-id>_DIRTY.json}} will simulate the job not being cleaned up properly. Starting the job in application mode once more (through specifying the corresponding Job ID) should lead to the job not being started again (you might want to enable {{debug}} logging to verify the logs), i.e.:
> * Cleanup should be performed. 
> * No JobMaster-related logs should appear in the Flink logs.
> * cleanup-related logs should appear in the Flink logs.
> * At the end, the {{_DIRTY.json}} file extension should have been removed from the JRS artifact again



--
This message was sent by Atlassian Jira
(v8.20.1#820001)