You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@flink.apache.org by Őrhidi Mátyás <ma...@gmail.com> on 2022/05/10 08:55:38 UTC

JobManager High Availability

Hi Folks!

I've been goofing around with the JobManager HA configs using multiple JM
replicas (in the Flink Kubernetes Operator). It's working seemingly fine,
however the job itself is being restarted when you kill the leader JM pod.
Is this expected?

Thanks,
Matyas

Re: JobManager High Availability

Posted by Thomas Weise <th...@apache.org>.

+1 to what Konstantin said. There is no real benefit running multiple
JMs on k8s, unless you need to optimize the JM startup time. Often the
time to get a replacement pod is negligible compared to the job
restart time.

Thomas

On Tue, May 10, 2022 at 2:27 AM Őrhidi Mátyás <ma...@gmail.com> wrote:
>
> Ah, ok. Thanks, Konstantin for the clarification, I appreciate the quick
> response.
>
> Best,
> Matyas
>
> On Tue, May 10, 2022 at 10:59 AM Konstantin Knauf <kn...@apache.org> wrote:
>
> > Hi Matyas,
> >
> > yes, that's expected. The feature should have never been called "high
> > availability", but something like "Flink Jobmanager failover", because
> > that's what it is.
> >
> > With standby Jobmanagers what you gain is a faster failover, because a new
> > Jobmanager does not need to be started before restarting the Job. That's
> > all.
> >
> > Cheers,
> >
> > Konstantin
> >
> > Am Di., 10. Mai 2022 um 10:56 Uhr schrieb Őrhidi Mátyás <
> > matyas.orhidi@gmail.com>:
> >
> > > Hi Folks!
> > >
> > > I've been goofing around with the JobManager HA configs using multiple JM
> > > replicas (in the Flink Kubernetes Operator). It's working seemingly fine,
> > > however the job itself is being restarted when you kill the leader JM
> > pod.
> > > Is this expected?
> > >
> > > Thanks,
> > > Matyas
> > >
> >
> >
> > --
> > https://twitter.com/snntrable
> > https://github.com/knaufk
> >

Re: JobManager High Availability

Posted by Őrhidi Mátyás <ma...@gmail.com>.

Ah, ok. Thanks, Konstantin for the clarification, I appreciate the quick
response.

Best,
Matyas

On Tue, May 10, 2022 at 10:59 AM Konstantin Knauf <kn...@apache.org> wrote:

> Hi Matyas,
>
> yes, that's expected. The feature should have never been called "high
> availability", but something like "Flink Jobmanager failover", because
> that's what it is.
>
> With standby Jobmanagers what you gain is a faster failover, because a new
> Jobmanager does not need to be started before restarting the Job. That's
> all.
>
> Cheers,
>
> Konstantin
>
> Am Di., 10. Mai 2022 um 10:56 Uhr schrieb Őrhidi Mátyás <
> matyas.orhidi@gmail.com>:
>
> > Hi Folks!
> >
> > I've been goofing around with the JobManager HA configs using multiple JM
> > replicas (in the Flink Kubernetes Operator). It's working seemingly fine,
> > however the job itself is being restarted when you kill the leader JM
> pod.
> > Is this expected?
> >
> > Thanks,
> > Matyas
> >
>
>
> --
> https://twitter.com/snntrable
> https://github.com/knaufk
>

Re: JobManager High Availability

Posted by Konstantin Knauf <kn...@apache.org>.

Hi Matyas,

yes, that's expected. The feature should have never been called "high
availability", but something like "Flink Jobmanager failover", because
that's what it is.

With standby Jobmanagers what you gain is a faster failover, because a new
Jobmanager does not need to be started before restarting the Job. That's
all.

Cheers,

Konstantin

Am Di., 10. Mai 2022 um 10:56 Uhr schrieb Őrhidi Mátyás <
matyas.orhidi@gmail.com>:

> Hi Folks!
>
> I've been goofing around with the JobManager HA configs using multiple JM
> replicas (in the Flink Kubernetes Operator). It's working seemingly fine,
> however the job itself is being restarted when you kill the leader JM pod.
> Is this expected?
>
> Thanks,
> Matyas
>


-- 
https://twitter.com/snntrable
https://github.com/knaufk