You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Gyula Fora (Jira)" <ji...@apache.org> on 2022/07/11 07:31:00 UTC

[jira] [Closed] (FLINK-28187) Duplicate job submission for FlinkSessionJob

     [ https://issues.apache.org/jira/browse/FLINK-28187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Gyula Fora closed FLINK-28187.
------------------------------
    Resolution: Fixed

merged to main 16c9f45061d0b6c2ca31f4f0ed98378e70a9f33b

> Duplicate job submission for FlinkSessionJob
> --------------------------------------------
>
>                 Key: FLINK-28187
>                 URL: https://issues.apache.org/jira/browse/FLINK-28187
>             Project: Flink
>          Issue Type: Bug
>          Components: Kubernetes Operator
>    Affects Versions: kubernetes-operator-1.0.0, kubernetes-operator-1.1.0
>            Reporter: Jeesmon Jacob
>            Assignee: Aitozi
>            Priority: Critical
>              Labels: pull-request-available
>             Fix For: kubernetes-operator-1.1.0
>
>         Attachments: flink-operator-log.txt
>
>
> During a session job submission if a deployment error (ex: concurrent.TimeoutException) is hit, operator will submit the job again. But first submission could have succeeded in jobManager side and second submission could result in duplicate job. Operator log attached.
> Per [~gyfora]:
> The problem is that in case a deployment error was hit, the SessionJobObserver will not be able to tell whether it has submitted the job or not. So it will simply try to submit it again. We have to find a mechanism to correlate Jobs on the cluster with the SessionJob CR itself. Maybe we could override the job name itself for this purpose or something like that.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)