You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Timo Walther <tw...@apache.org> on 2021/02/01 08:30:09 UTC
Re: Flink SQL and checkpoints and savepoints

I agree with Max.

Within the same Flink release you can perform savepoints and sometimes 
also change parts of the query. But the latter depends on a case-by-case 
basis and needs to be tested.

Regards,
Timo

On 30.01.21 11:43, Maximilian Michels wrote:
> It is true that there are no strict upgrade guarantees.
> 
> However, looking at the code, it appears RowSerializer supports adding 
> new fields to Row - as long as no fields are modified or deleted. 
> Haven't tried this out but it looks like the code would only restore 
> existing fields and incorporate the new ones as null values.
> 
> Please correct me if I'm wrong.
> 
> -Max
> 
> On 29.01.21 08:54, Dan Hill wrote:
>> I went through a few of the recent Flink Forward videos and didn't see 
>> solutions to this problem.  It sounds like some companies have 
>> solutions but they didn't talk about them in enough detail to do 
>> something similar.
>>
>> On Thu, Jan 28, 2021 at 11:45 PM Dan Hill <quietgolfer@gmail.com 
>> <ma...@gmail.com>> wrote:
>>
>>     Is this savepoint recovery issue also true with the Flink Table
>>     API?  I'd assume so.  Just doublechecking.
>>
>>     On Mon, Jan 18, 2021 at 1:58 AM Timo Walther <twalthr@apache.org
>>     <ma...@apache.org>> wrote:
>>
>>         I would check the past Flink Forward conference talks and blog
>>         posts. A
>>         couple of companies have developed connectors or modified 
>> existing
>>         connectors to make this work. Usually, based on event timestamps
>>         or some
>>         external control stream (DataStream API around the actual SQL
>>         pipeline
>>         for handling this).
>>
>>         Also there is FLIP-150 which goes into this direction.
>>
>>         
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-150%3A+Introduce+Hybrid+Source 
>>
>>         
>> <https://cwiki.apache.org/confluence/display/FLINK/FLIP-150%3A+Introduce+Hybrid+Source> 
>>
>>
>>         Regards,
>>         Timo
>>
>>
>>         On 18.01.21 10:40, Dan Hill wrote:
>>          > Thanks Timo!
>>          >
>>          > The reason makes sense.
>>          >
>>          > Do any of the techniques make it easy to support exactly once?
>>          >
>>          > I'm inferring what is meant by dry out.  Are there any
>>         documented
>>          > patterns for it?  E.g. sending data to new kafka topics
>>         between releases?
>>          >
>>          >
>>          >
>>          >
>>          > On Mon, Jan 18, 2021, 01:04 Timo Walther <twalthr@apache.org
>>         <ma...@apache.org>
>>          > <mailto:twalthr@apache.org <ma...@apache.org>>> 
>> wrote:
>>          >
>>          >     Hi Dan,
>>          >
>>          >     currently, we cannot provide any savepoint guarantees
>>         between releases.
>>          >     Because of the nature of SQL that abstracts away runtime
>>         operators, it
>>          >     might be that a future execution plan will look
>>         completely different
>>          >     and
>>          >     thus we cannot map state anymore. This is not avoidable
>>         because the
>>          >     optimizer might get smarter when adding new optimizer 
>> rules.
>>          >
>>          >     For such cases, we recommend to dry out the old pipeline
>>         and/or warm up
>>          >     a new pipeline with historic data when upgrading Flink. A
>>         change in
>>          >     columns sometimes works but even this depends on the used
>>         operators.
>>          >
>>          >     Regards,
>>          >     Timo
>>          >
>>          >
>>          >     On 18.01.21 04:46, Dan Hill wrote:
>>          >      > How well does Flink SQL work with checkpoints and
>>         savepoints?  I
>>          >     tried
>>          >      > to find documentation for it in v1.11 but couldn't
>>         find it.
>>          >      >
>>          >      > E.g. what happens if the Flink SQL is modified between
>>         releases?
>>          >     New
>>          >      > columns?  Change columns?  Adding joins?
>>          >      >
>>          >      >
>>          >
>>
>