You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flink.apache.org by Gyula Fóra <gy...@apache.org> on 2022/05/10 19:41:08 UTC

Flink job restarted from empty state when execution.shutdown-on-application-finish is enabled

Hi Devs!

I ran into a concerning situation and would like to hear your thoughts on
this.

I am running Flink 1.15 on Kubernetes native mode (using the operator but
that is besides the point here) with Flink Kubernetes HA enabled.

We have enabled
*execution.shutdown-on-application-finish = true*

I noticed that if after the job failed/finished, if I kill the jobmanager
pod (triggering a jobmanager failover), the job would be resubmitted from a
completely empty state (as if starting for the first time).

Has anyone encountered this issue? This makes using this config option
pretty risky.

Thank you!
Gyula

Re: Flink job restarted from empty state when execution.shutdown-on-application-finish is enabled

Posted by Yang Wang <da...@gmail.com>.
I assume this is the responsibility of job result store[1]. However, it
seems that it does not work as expected.

[1].
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=195726435

Best,
Yang

Gyula Fóra <gy...@gmail.com> 于2022年5月11日周三 12:55写道:

> Sorry I messed up the email, I meant false .
>
> So when we set it to not shut down … :)
>
> Gyula
>
> On Wed, 11 May 2022 at 05:06, Yun Tang <my...@live.com> wrote:
>
> > Hi Gyula,
> >
> > Why are you sure that the configuration of
> > execution.shutdown-on-application-finish leading to this error? I noticed
> > that the default value of this configuration is just "true".
> >
> > From my understanding, the completed checkpoint store should only clear
> > its persisted checkpoint information on shutdown when the job status is
> > globally terminated.
> > Did you ever check the configmap, which used to store the completed
> > checkpoint store, that its content has been empty after you just trigger
> a
> > job manager failure?
> >
> > Best
> > Yun Tang
> >
> > ________________________________
> > From: Gyula F?ra <gy...@apache.org>
> > Sent: Wednesday, May 11, 2022 3:41
> > To: dev <de...@flink.apache.org>
> > Subject: Flink job restarted from empty state when
> > execution.shutdown-on-application-finish is enabled
> >
> > Hi Devs!
> >
> > I ran into a concerning situation and would like to hear your thoughts on
> > this.
> >
> > I am running Flink 1.15 on Kubernetes native mode (using the operator but
> > that is besides the point here) with Flink Kubernetes HA enabled.
> >
> > We have enabled
> > *execution.shutdown-on-application-finish = true*
> >
> > I noticed that if after the job failed/finished, if I kill the jobmanager
> > pod (triggering a jobmanager failover), the job would be resubmitted
> from a
> > completely empty state (as if starting for the first time).
> >
> > Has anyone encountered this issue? This makes using this config option
> > pretty risky.
> >
> > Thank you!
> > Gyula
> >
>

Re: Flink job restarted from empty state when execution.shutdown-on-application-finish is enabled

Posted by Gyula Fóra <gy...@gmail.com>.
Sorry I messed up the email, I meant false .

So when we set it to not shut down … :)

Gyula

On Wed, 11 May 2022 at 05:06, Yun Tang <my...@live.com> wrote:

> Hi Gyula,
>
> Why are you sure that the configuration of
> execution.shutdown-on-application-finish leading to this error? I noticed
> that the default value of this configuration is just "true".
>
> From my understanding, the completed checkpoint store should only clear
> its persisted checkpoint information on shutdown when the job status is
> globally terminated.
> Did you ever check the configmap, which used to store the completed
> checkpoint store, that its content has been empty after you just trigger a
> job manager failure?
>
> Best
> Yun Tang
>
> ________________________________
> From: Gyula F?ra <gy...@apache.org>
> Sent: Wednesday, May 11, 2022 3:41
> To: dev <de...@flink.apache.org>
> Subject: Flink job restarted from empty state when
> execution.shutdown-on-application-finish is enabled
>
> Hi Devs!
>
> I ran into a concerning situation and would like to hear your thoughts on
> this.
>
> I am running Flink 1.15 on Kubernetes native mode (using the operator but
> that is besides the point here) with Flink Kubernetes HA enabled.
>
> We have enabled
> *execution.shutdown-on-application-finish = true*
>
> I noticed that if after the job failed/finished, if I kill the jobmanager
> pod (triggering a jobmanager failover), the job would be resubmitted from a
> completely empty state (as if starting for the first time).
>
> Has anyone encountered this issue? This makes using this config option
> pretty risky.
>
> Thank you!
> Gyula
>

Re: Flink job restarted from empty state when execution.shutdown-on-application-finish is enabled

Posted by Yun Tang <my...@live.com>.
Hi Gyula,

Why are you sure that the configuration of execution.shutdown-on-application-finish leading to this error? I noticed that the default value of this configuration is just "true".

From my understanding, the completed checkpoint store should only clear its persisted checkpoint information on shutdown when the job status is globally terminated.
Did you ever check the configmap, which used to store the completed checkpoint store, that its content has been empty after you just trigger a job manager failure?

Best
Yun Tang

________________________________
From: Gyula F?ra <gy...@apache.org>
Sent: Wednesday, May 11, 2022 3:41
To: dev <de...@flink.apache.org>
Subject: Flink job restarted from empty state when execution.shutdown-on-application-finish is enabled

Hi Devs!

I ran into a concerning situation and would like to hear your thoughts on
this.

I am running Flink 1.15 on Kubernetes native mode (using the operator but
that is besides the point here) with Flink Kubernetes HA enabled.

We have enabled
*execution.shutdown-on-application-finish = true*

I noticed that if after the job failed/finished, if I kill the jobmanager
pod (triggering a jobmanager failover), the job would be resubmitted from a
completely empty state (as if starting for the first time).

Has anyone encountered this issue? This makes using this config option
pretty risky.

Thank you!
Gyula