You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Amit Bhatia <bh...@gmail.com> on 2021/09/01 04:07:25 UTC

Re: Queries regarding Flink upgrade strategies

Hi Matthias,

Thanks for the confirmation.

@Yang Wang <da...@gmail.com> : Any comments from your side ?

Regards,
Amit

On Fri, Aug 27, 2021 at 7:27 PM Matthias Pohl <ma...@ververica.com>
wrote:

> Thanks for clarifying that, Amit. Rolling updates with JobManagers and
> TaskManagers coming from different Flink versions in the same Flink cluster
> is not supported.
>
> @Yang Wang <da...@gmail.com> Do you have any recommendations you
> could share in this regard?
>
> Best,
> Matthias
>
> On Fri, Aug 27, 2021 at 2:44 PM Amit Bhatia <bh...@gmail.com>
> wrote:
>
>> Hi Matthias,
>>
>> What you mention is a little tricky. When we create a new cluster it will
>> have its own volume (PVC)  so sending savepoint/checkpoint data from volume
>> (PVC) of the older cluster to the newer cluster is a manual task. Also not
>> sure if savepoint/checkpoint data needs to be copied to the newer flink
>> cluster before flink starts. This approach is more like a blue/green
>> upgrade strategy.
>>
>> I wanted to understand if Flink supports rollingUpdate where we update
>> Taskmanager and Jobmanager pods one by one and its impact when during
>> upgrade Jobmanagers & Taskmanger pods are on different  versions. Also the
>> impact of recreate strategy in the same context.
>>
>> Regards,
>> Amit
>>
>> On Fri, Aug 27, 2021 at 5:32 PM Matthias Pohl <ma...@ververica.com>
>> wrote:
>>
>>> The upgrade approach mentioned in my previous answer should also work in
>>> the context of k8s and pods: Creating a Flink cluster having the newer
>>> version should be done before migrating the job using a savepoint. But
>>> maybe, I misunderstand your question. Do you have something in mind where
>>> you upgrade each pod individually, i.e. operating TaskManagers and
>>> JobManagers with different Flink versions in the same Flink cluster?
>>>
>>> Best,
>>> Matthias
>>>
>>> On Fri, Aug 27, 2021 at 11:05 AM Amit Bhatia <bh...@gmail.com>
>>> wrote:
>>>
>>>> Hi Matthias,
>>>>
>>>> Thanks for the information but this upgrade is looking like on native
>>>> (physical/virtual) deployment.
>>>> I want to understand the upgrade strategies on kubernetes deployments
>>>> where Flink is running in pods. If you could help in that area it would be
>>>> great.
>>>>
>>>> Regards,
>>>> Amit Bhatia
>>>>
>>>> On Thu, Aug 26, 2021 at 5:25 PM Matthias Pohl <ma...@ververica.com>
>>>> wrote:
>>>>
>>>>> Hi Amit,
>>>>> upgrading Flink versions means that you should stop your jobs with a
>>>>> savepoint first. A new cluster with the new Flink version can be deployed
>>>>> next. Then, this cluster can be used to start the jobs from the previously
>>>>> created savepoints. Each job should pick up the work from where it stopped.
>>>>> See [1] for further details on how to upgrade Flink.
>>>>> I'm not sure about any Helm-specifics here. But I'm gonna pull Austin
>>>>> into the thread. He might have more insights to share.
>>>>>
>>>>> Best,
>>>>> Matthias
>>>>>
>>>>> [1]
>>>>> https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/ops/upgrading/#upgrading-the-flink-framework-version
>>>>>
>>>>> On Thu, Aug 26, 2021 at 9:10 AM Amit Bhatia <bh...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> We are using Flink 1.13.2 with Kubernetes HA solution provided by
>>>>>> flink. We have created a deployment for JobManager and TaskManager with
>>>>>> option to deploy multiple replicas and the same is bundled in a single helm
>>>>>> chart.
>>>>>> So we have below queries regarding Flink upgrade strategies, kindly
>>>>>> help us to answer below queries:
>>>>>>
>>>>>> 1) What upgrade strategies are supported by Flink
>>>>>> (RollingUpdate/Recreate) and which one is recommended for production use?
>>>>>>
>>>>>> 2) During Flink upgrade from version A to version B, if we are using
>>>>>> rollingUpdate then at some point of time multiple versions of Flink JMs &
>>>>>> TMs might be running so does that can cause any corruption/failure for
>>>>>> running Jobs ?
>>>>>>
>>>>>> 3) During Flink upgrade from version A to version B, If we use
>>>>>> recreate then at some point of time if all JMs gets updated to a new
>>>>>> version and TMs are still updating which means TMs are running with
>>>>>> different versions then will this cause any corruption/failure for running
>>>>>> Jobs?
>>>>>>
>>>>>> Regards,
>>>>>> Amit Bhatia
>>>>>>
>>>>>