You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Robert Metzger (JIRA)" <ji...@apache.org> on 2019/02/28 12:55:01 UTC

[jira] [Updated] (FLINK-10287) Flink HA Persist Cancelled Job in Zookeeper

     [ https://issues.apache.org/jira/browse/FLINK-10287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Metzger updated FLINK-10287:
-----------------------------------
    Component/s:     (was: Core)
                 Runtime / Coordination

> Flink HA Persist Cancelled Job in Zookeeper
> -------------------------------------------
>
>                 Key: FLINK-10287
>                 URL: https://issues.apache.org/jira/browse/FLINK-10287
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / Coordination
>    Affects Versions: 1.6.0
>            Reporter: Sayat Satybaldiyev
>            Priority: Major
>         Attachments: Screenshot from 2018-09-05 16-48-34.png
>
>
> Flink HA persisted canceled job in Zookeeper, which makes HA mode quite fragile. In case JM get restarted, it tries to recover canceled job and after some time fails completely being not able to recover it. 
>  
> How to reproduce:
>  # Have Flink HA 1.6 cluster
>  # Cancel a running flink job
>  # Observe that flink didn't remove ZK metadata.
> !Screenshot from 2018-09-05 16-48-34.png!
> {code:java}
> ls /flink/flink_ns/jobgraphs/46d8d3555936c0d8e6b6ec21cc02bb11
> [7f392fd9-cedc-4978-9186-1f54b98eeeb7]{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)