You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flink.apache.org by Till Rohrmann <tr...@apache.org> on 2021/09/03 06:48:45 UTC

Re: Question about ZooKeeper HA new structure [FLINK-22636]

Hi Juha,

Flink does not give backwards compatibility wrt to its internal data
structures. The recommended way is to stop the jobs with a savepoint and
then resume these jobs on the new Flink cluster. Failing over the processes
with a new version is not guaranteed to work atm. I hope this answers your
question.

Cheers,
Till

On Fri, Sep 3, 2021 at 7:35 AM Juha Mynttinen
<ju...@aiven.io.invalid> wrote:

> Hello,
>
> I noticed there's a change [1] coming up in Flink 1.14.0 in the ZooKeeper
> tree structure ZooKeeper HA services maintains.
>
> I didn't spot any migration logic from the old (< 1.14.0) structure to the
> new. Did I miss something?
>
> If you have a Flink cluster running with 1.13.X and let's say add a
> JobManage with 1.14.0 and terminate the original so that the new one
> becomes the leader, how's the new one going to understand the data in
> ZooKeeper? There are naturally more cases where the same compatibility
> issue should be handled, this example should illustrate the issue.
>
> Regards,
> Juha
>
> [1] https://issues.apache.org/jira/browse/FLINK-22636
>

Re: Question about ZooKeeper HA new structure [FLINK-22636]

Posted by Juha Mynttinen <ju...@aiven.io.INVALID>.
Thanks,

This clarifies the situation.

Regards,
Juha

On Fri, Sep 3, 2021 at 11:53 AM Till Rohrmann <tr...@apache.org> wrote:

> For patch versions the Flink community is very careful not to introduce
> breaking changes. Hence, for patch releases it should be possible to
> upgrade via failover. However, I don't think that this is properly guarded
> by tests at the moment and also no official guarantee we are giving.
>
> Cheers,
> Till
>
> On Fri, Sep 3, 2021 at 9:40 AM Juha Mynttinen
> <ju...@aiven.io.invalid> wrote:
>
> > OK, thanks,
> >
> > I see, this savepointing and creating a new cluster is the documented [1]
> > way of upgrading Flink version. However, I think at least for some
> version
> > upgrades it has been fine to just switch the code to the new version. I
> > might be wrong.
> >
> > What about patch versions like 1.13.X? The doc says "the general way of
> > upgrading Flink across version". Can there be breaking changes in patch
> > versions too?
> >
> > Regards,
> > Juha
> >
> > [1]
> >
> >
> https://ci.apache.org/projects/flink/flink-docs-master/docs/ops/upgrading/#upgrading-the-flink-framework-version
> >
> >
> >
> > On Fri, Sep 3, 2021 at 9:49 AM Till Rohrmann <tr...@apache.org>
> wrote:
> >
> > > Hi Juha,
> > >
> > > Flink does not give backwards compatibility wrt to its internal data
> > > structures. The recommended way is to stop the jobs with a savepoint
> and
> > > then resume these jobs on the new Flink cluster. Failing over the
> > processes
> > > with a new version is not guaranteed to work atm. I hope this answers
> > your
> > > question.
> > >
> > > Cheers,
> > > Till
> > >
> > > On Fri, Sep 3, 2021 at 7:35 AM Juha Mynttinen
> > > <ju...@aiven.io.invalid> wrote:
> > >
> > > > Hello,
> > > >
> > > > I noticed there's a change [1] coming up in Flink 1.14.0 in the
> > ZooKeeper
> > > > tree structure ZooKeeper HA services maintains.
> > > >
> > > > I didn't spot any migration logic from the old (< 1.14.0) structure
> to
> > > the
> > > > new. Did I miss something?
> > > >
> > > > If you have a Flink cluster running with 1.13.X and let's say add a
> > > > JobManage with 1.14.0 and terminate the original so that the new one
> > > > becomes the leader, how's the new one going to understand the data in
> > > > ZooKeeper? There are naturally more cases where the same
> compatibility
> > > > issue should be handled, this example should illustrate the issue.
> > > >
> > > > Regards,
> > > > Juha
> > > >
> > > > [1] https://issues.apache.org/jira/browse/FLINK-22636
> > > >
> > >
> >
> >
> > --
> > Regards,
> > Juha
> >

Re: Question about ZooKeeper HA new structure [FLINK-22636]

Posted by Till Rohrmann <tr...@apache.org>.
For patch versions the Flink community is very careful not to introduce
breaking changes. Hence, for patch releases it should be possible to
upgrade via failover. However, I don't think that this is properly guarded
by tests at the moment and also no official guarantee we are giving.

Cheers,
Till

On Fri, Sep 3, 2021 at 9:40 AM Juha Mynttinen
<ju...@aiven.io.invalid> wrote:

> OK, thanks,
>
> I see, this savepointing and creating a new cluster is the documented [1]
> way of upgrading Flink version. However, I think at least for some version
> upgrades it has been fine to just switch the code to the new version. I
> might be wrong.
>
> What about patch versions like 1.13.X? The doc says "the general way of
> upgrading Flink across version". Can there be breaking changes in patch
> versions too?
>
> Regards,
> Juha
>
> [1]
>
> https://ci.apache.org/projects/flink/flink-docs-master/docs/ops/upgrading/#upgrading-the-flink-framework-version
>
>
>
> On Fri, Sep 3, 2021 at 9:49 AM Till Rohrmann <tr...@apache.org> wrote:
>
> > Hi Juha,
> >
> > Flink does not give backwards compatibility wrt to its internal data
> > structures. The recommended way is to stop the jobs with a savepoint and
> > then resume these jobs on the new Flink cluster. Failing over the
> processes
> > with a new version is not guaranteed to work atm. I hope this answers
> your
> > question.
> >
> > Cheers,
> > Till
> >
> > On Fri, Sep 3, 2021 at 7:35 AM Juha Mynttinen
> > <ju...@aiven.io.invalid> wrote:
> >
> > > Hello,
> > >
> > > I noticed there's a change [1] coming up in Flink 1.14.0 in the
> ZooKeeper
> > > tree structure ZooKeeper HA services maintains.
> > >
> > > I didn't spot any migration logic from the old (< 1.14.0) structure to
> > the
> > > new. Did I miss something?
> > >
> > > If you have a Flink cluster running with 1.13.X and let's say add a
> > > JobManage with 1.14.0 and terminate the original so that the new one
> > > becomes the leader, how's the new one going to understand the data in
> > > ZooKeeper? There are naturally more cases where the same compatibility
> > > issue should be handled, this example should illustrate the issue.
> > >
> > > Regards,
> > > Juha
> > >
> > > [1] https://issues.apache.org/jira/browse/FLINK-22636
> > >
> >
>
>
> --
> Regards,
> Juha
>

Re: Question about ZooKeeper HA new structure [FLINK-22636]

Posted by Juha Mynttinen <ju...@aiven.io.INVALID>.
OK, thanks,

I see, this savepointing and creating a new cluster is the documented [1]
way of upgrading Flink version. However, I think at least for some version
upgrades it has been fine to just switch the code to the new version. I
might be wrong.

What about patch versions like 1.13.X? The doc says "the general way of
upgrading Flink across version". Can there be breaking changes in patch
versions too?

Regards,
Juha

[1]
https://ci.apache.org/projects/flink/flink-docs-master/docs/ops/upgrading/#upgrading-the-flink-framework-version



On Fri, Sep 3, 2021 at 9:49 AM Till Rohrmann <tr...@apache.org> wrote:

> Hi Juha,
>
> Flink does not give backwards compatibility wrt to its internal data
> structures. The recommended way is to stop the jobs with a savepoint and
> then resume these jobs on the new Flink cluster. Failing over the processes
> with a new version is not guaranteed to work atm. I hope this answers your
> question.
>
> Cheers,
> Till
>
> On Fri, Sep 3, 2021 at 7:35 AM Juha Mynttinen
> <ju...@aiven.io.invalid> wrote:
>
> > Hello,
> >
> > I noticed there's a change [1] coming up in Flink 1.14.0 in the ZooKeeper
> > tree structure ZooKeeper HA services maintains.
> >
> > I didn't spot any migration logic from the old (< 1.14.0) structure to
> the
> > new. Did I miss something?
> >
> > If you have a Flink cluster running with 1.13.X and let's say add a
> > JobManage with 1.14.0 and terminate the original so that the new one
> > becomes the leader, how's the new one going to understand the data in
> > ZooKeeper? There are naturally more cases where the same compatibility
> > issue should be handled, this example should illustrate the issue.
> >
> > Regards,
> > Juha
> >
> > [1] https://issues.apache.org/jira/browse/FLINK-22636
> >
>


-- 
Regards,
Juha