You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Timo Walther <tw...@apache.org> on 2021/02/01 08:30:09 UTC
Re: Flink SQL and checkpoints and savepoints
I agree with Max.
Within the same Flink release you can perform savepoints and sometimes
also change parts of the query. But the latter depends on a case-by-case
basis and needs to be tested.
Regards,
Timo
On 30.01.21 11:43, Maximilian Michels wrote:
> It is true that there are no strict upgrade guarantees.
>
> However, looking at the code, it appears RowSerializer supports adding
> new fields to Row - as long as no fields are modified or deleted.
> Haven't tried this out but it looks like the code would only restore
> existing fields and incorporate the new ones as null values.
>
> Please correct me if I'm wrong.
>
> -Max
>
> On 29.01.21 08:54, Dan Hill wrote:
>> I went through a few of the recent Flink Forward videos and didn't see
>> solutions to this problem. It sounds like some companies have
>> solutions but they didn't talk about them in enough detail to do
>> something similar.
>>
>> On Thu, Jan 28, 2021 at 11:45 PM Dan Hill <quietgolfer@gmail.com
>> <ma...@gmail.com>> wrote:
>>
>> Is this savepoint recovery issue also true with the Flink Table
>> API? I'd assume so. Just doublechecking.
>>
>> On Mon, Jan 18, 2021 at 1:58 AM Timo Walther <twalthr@apache.org
>> <ma...@apache.org>> wrote:
>>
>> I would check the past Flink Forward conference talks and blog
>> posts. A
>> couple of companies have developed connectors or modified
>> existing
>> connectors to make this work. Usually, based on event timestamps
>> or some
>> external control stream (DataStream API around the actual SQL
>> pipeline
>> for handling this).
>>
>> Also there is FLIP-150 which goes into this direction.
>>
>>
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-150%3A+Introduce+Hybrid+Source
>>
>>
>> <https://cwiki.apache.org/confluence/display/FLINK/FLIP-150%3A+Introduce+Hybrid+Source>
>>
>>
>> Regards,
>> Timo
>>
>>
>> On 18.01.21 10:40, Dan Hill wrote:
>> > Thanks Timo!
>> >
>> > The reason makes sense.
>> >
>> > Do any of the techniques make it easy to support exactly once?
>> >
>> > I'm inferring what is meant by dry out. Are there any
>> documented
>> > patterns for it? E.g. sending data to new kafka topics
>> between releases?
>> >
>> >
>> >
>> >
>> > On Mon, Jan 18, 2021, 01:04 Timo Walther <twalthr@apache.org
>> <ma...@apache.org>
>> > <mailto:twalthr@apache.org <ma...@apache.org>>>
>> wrote:
>> >
>> > Hi Dan,
>> >
>> > currently, we cannot provide any savepoint guarantees
>> between releases.
>> > Because of the nature of SQL that abstracts away runtime
>> operators, it
>> > might be that a future execution plan will look
>> completely different
>> > and
>> > thus we cannot map state anymore. This is not avoidable
>> because the
>> > optimizer might get smarter when adding new optimizer
>> rules.
>> >
>> > For such cases, we recommend to dry out the old pipeline
>> and/or warm up
>> > a new pipeline with historic data when upgrading Flink. A
>> change in
>> > columns sometimes works but even this depends on the used
>> operators.
>> >
>> > Regards,
>> > Timo
>> >
>> >
>> > On 18.01.21 04:46, Dan Hill wrote:
>> > > How well does Flink SQL work with checkpoints and
>> savepoints? I
>> > tried
>> > > to find documentation for it in v1.11 but couldn't
>> find it.
>> > >
>> > > E.g. what happens if the Flink SQL is modified between
>> releases?
>> > New
>> > > columns? Change columns? Adding joins?
>> > >
>> > >
>> >
>>
>