You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Amit Bhatia <bh...@gmail.com> on 2021/08/26 07:09:49 UTC

Queries regarding Flink upgrade strategies

Hi,

We are using Flink 1.13.2 with Kubernetes HA solution provided by flink. We
have created a deployment for JobManager and TaskManager with option to
deploy multiple replicas and the same is bundled in a single helm chart.
So we have below queries regarding Flink upgrade strategies, kindly help us
to answer below queries:

1) What upgrade strategies are supported by Flink (RollingUpdate/Recreate)
and which one is recommended for production use?

2) During Flink upgrade from version A to version B, if we are using
rollingUpdate then at some point of time multiple versions of Flink JMs &
TMs might be running so does that can cause any corruption/failure for
running Jobs ?

3) During Flink upgrade from version A to version B, If we use recreate
then at some point of time if all JMs gets updated to a new version and TMs
are still updating which means TMs are running with different versions then
will this cause any corruption/failure for running Jobs?

Regards,
Amit Bhatia

Re: Queries regarding Flink upgrade strategies

Posted by Amit Bhatia <bh...@gmail.com>.
Hi Matthias,

Thanks for the confirmation.

@Yang Wang <da...@gmail.com> : Any comments from your side ?

Regards,
Amit

On Fri, Aug 27, 2021 at 7:27 PM Matthias Pohl <ma...@ververica.com>
wrote:

> Thanks for clarifying that, Amit. Rolling updates with JobManagers and
> TaskManagers coming from different Flink versions in the same Flink cluster
> is not supported.
>
> @Yang Wang <da...@gmail.com> Do you have any recommendations you
> could share in this regard?
>
> Best,
> Matthias
>
> On Fri, Aug 27, 2021 at 2:44 PM Amit Bhatia <bh...@gmail.com>
> wrote:
>
>> Hi Matthias,
>>
>> What you mention is a little tricky. When we create a new cluster it will
>> have its own volume (PVC)  so sending savepoint/checkpoint data from volume
>> (PVC) of the older cluster to the newer cluster is a manual task. Also not
>> sure if savepoint/checkpoint data needs to be copied to the newer flink
>> cluster before flink starts. This approach is more like a blue/green
>> upgrade strategy.
>>
>> I wanted to understand if Flink supports rollingUpdate where we update
>> Taskmanager and Jobmanager pods one by one and its impact when during
>> upgrade Jobmanagers & Taskmanger pods are on different  versions. Also the
>> impact of recreate strategy in the same context.
>>
>> Regards,
>> Amit
>>
>> On Fri, Aug 27, 2021 at 5:32 PM Matthias Pohl <ma...@ververica.com>
>> wrote:
>>
>>> The upgrade approach mentioned in my previous answer should also work in
>>> the context of k8s and pods: Creating a Flink cluster having the newer
>>> version should be done before migrating the job using a savepoint. But
>>> maybe, I misunderstand your question. Do you have something in mind where
>>> you upgrade each pod individually, i.e. operating TaskManagers and
>>> JobManagers with different Flink versions in the same Flink cluster?
>>>
>>> Best,
>>> Matthias
>>>
>>> On Fri, Aug 27, 2021 at 11:05 AM Amit Bhatia <bh...@gmail.com>
>>> wrote:
>>>
>>>> Hi Matthias,
>>>>
>>>> Thanks for the information but this upgrade is looking like on native
>>>> (physical/virtual) deployment.
>>>> I want to understand the upgrade strategies on kubernetes deployments
>>>> where Flink is running in pods. If you could help in that area it would be
>>>> great.
>>>>
>>>> Regards,
>>>> Amit Bhatia
>>>>
>>>> On Thu, Aug 26, 2021 at 5:25 PM Matthias Pohl <ma...@ververica.com>
>>>> wrote:
>>>>
>>>>> Hi Amit,
>>>>> upgrading Flink versions means that you should stop your jobs with a
>>>>> savepoint first. A new cluster with the new Flink version can be deployed
>>>>> next. Then, this cluster can be used to start the jobs from the previously
>>>>> created savepoints. Each job should pick up the work from where it stopped.
>>>>> See [1] for further details on how to upgrade Flink.
>>>>> I'm not sure about any Helm-specifics here. But I'm gonna pull Austin
>>>>> into the thread. He might have more insights to share.
>>>>>
>>>>> Best,
>>>>> Matthias
>>>>>
>>>>> [1]
>>>>> https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/ops/upgrading/#upgrading-the-flink-framework-version
>>>>>
>>>>> On Thu, Aug 26, 2021 at 9:10 AM Amit Bhatia <bh...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> We are using Flink 1.13.2 with Kubernetes HA solution provided by
>>>>>> flink. We have created a deployment for JobManager and TaskManager with
>>>>>> option to deploy multiple replicas and the same is bundled in a single helm
>>>>>> chart.
>>>>>> So we have below queries regarding Flink upgrade strategies, kindly
>>>>>> help us to answer below queries:
>>>>>>
>>>>>> 1) What upgrade strategies are supported by Flink
>>>>>> (RollingUpdate/Recreate) and which one is recommended for production use?
>>>>>>
>>>>>> 2) During Flink upgrade from version A to version B, if we are using
>>>>>> rollingUpdate then at some point of time multiple versions of Flink JMs &
>>>>>> TMs might be running so does that can cause any corruption/failure for
>>>>>> running Jobs ?
>>>>>>
>>>>>> 3) During Flink upgrade from version A to version B, If we use
>>>>>> recreate then at some point of time if all JMs gets updated to a new
>>>>>> version and TMs are still updating which means TMs are running with
>>>>>> different versions then will this cause any corruption/failure for running
>>>>>> Jobs?
>>>>>>
>>>>>> Regards,
>>>>>> Amit Bhatia
>>>>>>
>>>>>

Re: Queries regarding Flink upgrade strategies

Posted by Matthias Pohl <ma...@ververica.com>.
Thanks for clarifying that, Amit. Rolling updates with JobManagers and
TaskManagers coming from different Flink versions in the same Flink cluster
is not supported.

@Yang Wang <da...@gmail.com> Do you have any recommendations you
could share in this regard?

Best,
Matthias

On Fri, Aug 27, 2021 at 2:44 PM Amit Bhatia <bh...@gmail.com>
wrote:

> Hi Matthias,
>
> What you mention is a little tricky. When we create a new cluster it will
> have its own volume (PVC)  so sending savepoint/checkpoint data from volume
> (PVC) of the older cluster to the newer cluster is a manual task. Also not
> sure if savepoint/checkpoint data needs to be copied to the newer flink
> cluster before flink starts. This approach is more like a blue/green
> upgrade strategy.
>
> I wanted to understand if Flink supports rollingUpdate where we update
> Taskmanager and Jobmanager pods one by one and its impact when during
> upgrade Jobmanagers & Taskmanger pods are on different  versions. Also the
> impact of recreate strategy in the same context.
>
> Regards,
> Amit
>
> On Fri, Aug 27, 2021 at 5:32 PM Matthias Pohl <ma...@ververica.com>
> wrote:
>
>> The upgrade approach mentioned in my previous answer should also work in
>> the context of k8s and pods: Creating a Flink cluster having the newer
>> version should be done before migrating the job using a savepoint. But
>> maybe, I misunderstand your question. Do you have something in mind where
>> you upgrade each pod individually, i.e. operating TaskManagers and
>> JobManagers with different Flink versions in the same Flink cluster?
>>
>> Best,
>> Matthias
>>
>> On Fri, Aug 27, 2021 at 11:05 AM Amit Bhatia <bh...@gmail.com>
>> wrote:
>>
>>> Hi Matthias,
>>>
>>> Thanks for the information but this upgrade is looking like on native
>>> (physical/virtual) deployment.
>>> I want to understand the upgrade strategies on kubernetes deployments
>>> where Flink is running in pods. If you could help in that area it would be
>>> great.
>>>
>>> Regards,
>>> Amit Bhatia
>>>
>>> On Thu, Aug 26, 2021 at 5:25 PM Matthias Pohl <ma...@ververica.com>
>>> wrote:
>>>
>>>> Hi Amit,
>>>> upgrading Flink versions means that you should stop your jobs with a
>>>> savepoint first. A new cluster with the new Flink version can be deployed
>>>> next. Then, this cluster can be used to start the jobs from the previously
>>>> created savepoints. Each job should pick up the work from where it stopped.
>>>> See [1] for further details on how to upgrade Flink.
>>>> I'm not sure about any Helm-specifics here. But I'm gonna pull Austin
>>>> into the thread. He might have more insights to share.
>>>>
>>>> Best,
>>>> Matthias
>>>>
>>>> [1]
>>>> https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/ops/upgrading/#upgrading-the-flink-framework-version
>>>>
>>>> On Thu, Aug 26, 2021 at 9:10 AM Amit Bhatia <bh...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> We are using Flink 1.13.2 with Kubernetes HA solution provided by
>>>>> flink. We have created a deployment for JobManager and TaskManager with
>>>>> option to deploy multiple replicas and the same is bundled in a single helm
>>>>> chart.
>>>>> So we have below queries regarding Flink upgrade strategies, kindly
>>>>> help us to answer below queries:
>>>>>
>>>>> 1) What upgrade strategies are supported by Flink
>>>>> (RollingUpdate/Recreate) and which one is recommended for production use?
>>>>>
>>>>> 2) During Flink upgrade from version A to version B, if we are using
>>>>> rollingUpdate then at some point of time multiple versions of Flink JMs &
>>>>> TMs might be running so does that can cause any corruption/failure for
>>>>> running Jobs ?
>>>>>
>>>>> 3) During Flink upgrade from version A to version B, If we use
>>>>> recreate then at some point of time if all JMs gets updated to a new
>>>>> version and TMs are still updating which means TMs are running with
>>>>> different versions then will this cause any corruption/failure for running
>>>>> Jobs?
>>>>>
>>>>> Regards,
>>>>> Amit Bhatia
>>>>>
>>>>

Re: Queries regarding Flink upgrade strategies

Posted by Amit Bhatia <bh...@gmail.com>.
Hi Matthias,

What you mention is a little tricky. When we create a new cluster it will
have its own volume (PVC)  so sending savepoint/checkpoint data from volume
(PVC) of the older cluster to the newer cluster is a manual task. Also not
sure if savepoint/checkpoint data needs to be copied to the newer flink
cluster before flink starts. This approach is more like a blue/green
upgrade strategy.

I wanted to understand if Flink supports rollingUpdate where we update
Taskmanager and Jobmanager pods one by one and its impact when during
upgrade Jobmanagers & Taskmanger pods are on different  versions. Also the
impact of recreate strategy in the same context.

Regards,
Amit

On Fri, Aug 27, 2021 at 5:32 PM Matthias Pohl <ma...@ververica.com>
wrote:

> The upgrade approach mentioned in my previous answer should also work in
> the context of k8s and pods: Creating a Flink cluster having the newer
> version should be done before migrating the job using a savepoint. But
> maybe, I misunderstand your question. Do you have something in mind where
> you upgrade each pod individually, i.e. operating TaskManagers and
> JobManagers with different Flink versions in the same Flink cluster?
>
> Best,
> Matthias
>
> On Fri, Aug 27, 2021 at 11:05 AM Amit Bhatia <bh...@gmail.com>
> wrote:
>
>> Hi Matthias,
>>
>> Thanks for the information but this upgrade is looking like on native
>> (physical/virtual) deployment.
>> I want to understand the upgrade strategies on kubernetes deployments
>> where Flink is running in pods. If you could help in that area it would be
>> great.
>>
>> Regards,
>> Amit Bhatia
>>
>> On Thu, Aug 26, 2021 at 5:25 PM Matthias Pohl <ma...@ververica.com>
>> wrote:
>>
>>> Hi Amit,
>>> upgrading Flink versions means that you should stop your jobs with a
>>> savepoint first. A new cluster with the new Flink version can be deployed
>>> next. Then, this cluster can be used to start the jobs from the previously
>>> created savepoints. Each job should pick up the work from where it stopped.
>>> See [1] for further details on how to upgrade Flink.
>>> I'm not sure about any Helm-specifics here. But I'm gonna pull Austin
>>> into the thread. He might have more insights to share.
>>>
>>> Best,
>>> Matthias
>>>
>>> [1]
>>> https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/ops/upgrading/#upgrading-the-flink-framework-version
>>>
>>> On Thu, Aug 26, 2021 at 9:10 AM Amit Bhatia <bh...@gmail.com>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> We are using Flink 1.13.2 with Kubernetes HA solution provided by
>>>> flink. We have created a deployment for JobManager and TaskManager with
>>>> option to deploy multiple replicas and the same is bundled in a single helm
>>>> chart.
>>>> So we have below queries regarding Flink upgrade strategies, kindly
>>>> help us to answer below queries:
>>>>
>>>> 1) What upgrade strategies are supported by Flink
>>>> (RollingUpdate/Recreate) and which one is recommended for production use?
>>>>
>>>> 2) During Flink upgrade from version A to version B, if we are using
>>>> rollingUpdate then at some point of time multiple versions of Flink JMs &
>>>> TMs might be running so does that can cause any corruption/failure for
>>>> running Jobs ?
>>>>
>>>> 3) During Flink upgrade from version A to version B, If we use recreate
>>>> then at some point of time if all JMs gets updated to a new version and TMs
>>>> are still updating which means TMs are running with different versions then
>>>> will this cause any corruption/failure for running Jobs?
>>>>
>>>> Regards,
>>>> Amit Bhatia
>>>>
>>>

Re: Queries regarding Flink upgrade strategies

Posted by Matthias Pohl <ma...@ververica.com>.
The upgrade approach mentioned in my previous answer should also work in
the context of k8s and pods: Creating a Flink cluster having the newer
version should be done before migrating the job using a savepoint. But
maybe, I misunderstand your question. Do you have something in mind where
you upgrade each pod individually, i.e. operating TaskManagers and
JobManagers with different Flink versions in the same Flink cluster?

Best,
Matthias

On Fri, Aug 27, 2021 at 11:05 AM Amit Bhatia <bh...@gmail.com>
wrote:

> Hi Matthias,
>
> Thanks for the information but this upgrade is looking like on native
> (physical/virtual) deployment.
> I want to understand the upgrade strategies on kubernetes deployments
> where Flink is running in pods. If you could help in that area it would be
> great.
>
> Regards,
> Amit Bhatia
>
> On Thu, Aug 26, 2021 at 5:25 PM Matthias Pohl <ma...@ververica.com>
> wrote:
>
>> Hi Amit,
>> upgrading Flink versions means that you should stop your jobs with a
>> savepoint first. A new cluster with the new Flink version can be deployed
>> next. Then, this cluster can be used to start the jobs from the previously
>> created savepoints. Each job should pick up the work from where it stopped.
>> See [1] for further details on how to upgrade Flink.
>> I'm not sure about any Helm-specifics here. But I'm gonna pull Austin
>> into the thread. He might have more insights to share.
>>
>> Best,
>> Matthias
>>
>> [1]
>> https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/ops/upgrading/#upgrading-the-flink-framework-version
>>
>> On Thu, Aug 26, 2021 at 9:10 AM Amit Bhatia <bh...@gmail.com>
>> wrote:
>>
>>> Hi,
>>>
>>> We are using Flink 1.13.2 with Kubernetes HA solution provided by flink.
>>> We have created a deployment for JobManager and TaskManager with option to
>>> deploy multiple replicas and the same is bundled in a single helm chart.
>>> So we have below queries regarding Flink upgrade strategies, kindly help
>>> us to answer below queries:
>>>
>>> 1) What upgrade strategies are supported by Flink
>>> (RollingUpdate/Recreate) and which one is recommended for production use?
>>>
>>> 2) During Flink upgrade from version A to version B, if we are using
>>> rollingUpdate then at some point of time multiple versions of Flink JMs &
>>> TMs might be running so does that can cause any corruption/failure for
>>> running Jobs ?
>>>
>>> 3) During Flink upgrade from version A to version B, If we use recreate
>>> then at some point of time if all JMs gets updated to a new version and TMs
>>> are still updating which means TMs are running with different versions then
>>> will this cause any corruption/failure for running Jobs?
>>>
>>> Regards,
>>> Amit Bhatia
>>>
>>

Re: Queries regarding Flink upgrade strategies

Posted by Amit Bhatia <bh...@gmail.com>.
Hi Matthias,

Thanks for the information but this upgrade is looking like on native
(physical/virtual) deployment.
I want to understand the upgrade strategies on kubernetes deployments where
Flink is running in pods. If you could help in that area it would be great.

Regards,
Amit Bhatia

On Thu, Aug 26, 2021 at 5:25 PM Matthias Pohl <ma...@ververica.com>
wrote:

> Hi Amit,
> upgrading Flink versions means that you should stop your jobs with a
> savepoint first. A new cluster with the new Flink version can be deployed
> next. Then, this cluster can be used to start the jobs from the previously
> created savepoints. Each job should pick up the work from where it stopped.
> See [1] for further details on how to upgrade Flink.
> I'm not sure about any Helm-specifics here. But I'm gonna pull Austin into
> the thread. He might have more insights to share.
>
> Best,
> Matthias
>
> [1]
> https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/ops/upgrading/#upgrading-the-flink-framework-version
>
> On Thu, Aug 26, 2021 at 9:10 AM Amit Bhatia <bh...@gmail.com>
> wrote:
>
>> Hi,
>>
>> We are using Flink 1.13.2 with Kubernetes HA solution provided by flink.
>> We have created a deployment for JobManager and TaskManager with option to
>> deploy multiple replicas and the same is bundled in a single helm chart.
>> So we have below queries regarding Flink upgrade strategies, kindly help
>> us to answer below queries:
>>
>> 1) What upgrade strategies are supported by Flink
>> (RollingUpdate/Recreate) and which one is recommended for production use?
>>
>> 2) During Flink upgrade from version A to version B, if we are using
>> rollingUpdate then at some point of time multiple versions of Flink JMs &
>> TMs might be running so does that can cause any corruption/failure for
>> running Jobs ?
>>
>> 3) During Flink upgrade from version A to version B, If we use recreate
>> then at some point of time if all JMs gets updated to a new version and TMs
>> are still updating which means TMs are running with different versions then
>> will this cause any corruption/failure for running Jobs?
>>
>> Regards,
>> Amit Bhatia
>>
>

Re: Queries regarding Flink upgrade strategies

Posted by Matthias Pohl <ma...@ververica.com>.
Hi Amit,
upgrading Flink versions means that you should stop your jobs with a
savepoint first. A new cluster with the new Flink version can be deployed
next. Then, this cluster can be used to start the jobs from the previously
created savepoints. Each job should pick up the work from where it stopped.
See [1] for further details on how to upgrade Flink.
I'm not sure about any Helm-specifics here. But I'm gonna pull Austin into
the thread. He might have more insights to share.

Best,
Matthias

[1]
https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/ops/upgrading/#upgrading-the-flink-framework-version

On Thu, Aug 26, 2021 at 9:10 AM Amit Bhatia <bh...@gmail.com>
wrote:

> Hi,
>
> We are using Flink 1.13.2 with Kubernetes HA solution provided by flink.
> We have created a deployment for JobManager and TaskManager with option to
> deploy multiple replicas and the same is bundled in a single helm chart.
> So we have below queries regarding Flink upgrade strategies, kindly help
> us to answer below queries:
>
> 1) What upgrade strategies are supported by Flink (RollingUpdate/Recreate)
> and which one is recommended for production use?
>
> 2) During Flink upgrade from version A to version B, if we are using
> rollingUpdate then at some point of time multiple versions of Flink JMs &
> TMs might be running so does that can cause any corruption/failure for
> running Jobs ?
>
> 3) During Flink upgrade from version A to version B, If we use recreate
> then at some point of time if all JMs gets updated to a new version and TMs
> are still updating which means TMs are running with different versions then
> will this cause any corruption/failure for running Jobs?
>
> Regards,
> Amit Bhatia
>