You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@flink.apache.org by Tony Wei <to...@gmail.com> on 2019/03/08 09:45:02 UTC

What does "Continuous incremental cleanup" mean in Flink 1.8 release notes

Hi everyone,

I read the Flink 1.8 release notes about state [1], and it said

*Continuous incremental cleanup of old Keyed State with TTL*
> We introduced TTL (time-to-live) for Keyed state in Flink 1.6 (FLINK-9510
> <https://issues.apache.org/jira/browse/FLINK-9510>). This feature allowed
> to clean up and make inaccessible keyed state entries when accessing them.
> In addition state would now also being cleaned up when writing a
> savepoint/checkpoint.
> Flink 1.8 introduces continous cleanup of old entries for both the RocksDB
> state backend (FLINK-10471
> <https://issues.apache.org/jira/browse/FLINK-10471>) and the heap state
> backend (FLINK-10473 <https://issues.apache.org/jira/browse/FLINK-10473>).
> This means that old entries (according to the ttl setting) are continously
> being cleanup up.


I'm not familiar with TTL's implementation in Flink 1.6 and what new
features introduced in Flink
1.8. I don't understand what difference between these two release version
after reading the
release notes. Did they change the outcome of TTL feature, or provide new
TTL features, or just
change the behavior of executing TTL mechanism.

Could you give me more references to learn about it? A simple example
to illustrate it is more
appreciated. Thank you.

Best,
Tony Wei

[1]
https://ci.apache.org/projects/flink/flink-docs-master/release-notes/flink-1.8.html#state

Re: What does "Continuous incremental cleanup" mean in Flink 1.8 release notes

Posted by Konstantin Knauf <ko...@ververica.com>.

Hi Tony,

yes, when taking a savepoint  the same strategy as the during a
non-incremental checkpoint is used.

Best,

Konstantin

On Mon, Mar 11, 2019 at 2:29 AM Tony Wei <to...@gmail.com> wrote:

> Hi Konstantin,
>
> That is really helpful. Thanks.
>
> Another follow-up question: The document said "Cleanup in full
> snapshot" is not applicable for
> the incremental checkpointing in the RocksDB state backend. However, when
> user manually
> trigger a savepoint and restart job from it, the expired states should be
> clean up as well based
> on Flink 1.6's implementation. Am I right?
>
> Best,
> Tony Wei
>
> Konstantin Knauf <ko...@ververica.com> 於 2019年3月9日 週六 上午7:00寫道：
>
>> Hi Tony,
>>
>> before Flink 1.8 expired state is only cleaned up, when you try to access
>> it after expiration, i.e. when user code tries to access the expired state,
>> the state value is cleaned and "null" is returned. There was also already
>> the option to clean up expired state during full snapshots (
>> https://github.com/apache/flink/pull/6460). With Flink 1.8 expired state
>> is cleaned up continuously in the background regardless of checkpointing or
>> any attempt to access it after expiration.
>>
>> As a reference the linked JIRA tickets should be a good starting point.
>>
>> Hope this helps.
>>
>> Konstantin
>>
>>
>>
>>
>> On Fri, Mar 8, 2019 at 10:45 AM Tony Wei <to...@gmail.com> wrote:
>>
>>> Hi everyone,
>>>
>>> I read the Flink 1.8 release notes about state [1], and it said
>>>
>>> *Continuous incremental cleanup of old Keyed State with TTL*
>>>> We introduced TTL (time-to-live) for Keyed state in Flink 1.6 (
>>>> FLINK-9510 <https://issues.apache.org/jira/browse/FLINK-9510>). This
>>>> feature allowed to clean up and make inaccessible keyed state entries when
>>>> accessing them. In addition state would now also being cleaned up when
>>>> writing a savepoint/checkpoint.
>>>> Flink 1.8 introduces continous cleanup of old entries for both the
>>>> RocksDB state backend (FLINK-10471
>>>> <https://issues.apache.org/jira/browse/FLINK-10471>) and the heap
>>>> state backend (FLINK-10473
>>>> <https://issues.apache.org/jira/browse/FLINK-10473>). This means that
>>>> old entries (according to the ttl setting) are continously being cleanup up.
>>>
>>>
>>> I'm not familiar with TTL's implementation in Flink 1.6 and what new
>>> features introduced in Flink
>>> 1.8. I don't understand what difference between these two release
>>> version after reading the
>>> release notes. Did they change the outcome of TTL feature, or provide
>>> new TTL features, or just
>>> change the behavior of executing TTL mechanism.
>>>
>>> Could you give me more references to learn about it? A simple example
>>> to illustrate it is more
>>> appreciated. Thank you.
>>>
>>> Best,
>>> Tony Wei
>>>
>>> [1]
>>> https://ci.apache.org/projects/flink/flink-docs-master/release-notes/flink-1.8.html#state
>>>
>>
>>
>> --
>>
>> Konstantin Knauf | Solutions Architect
>>
>> +49 160 91394525
>>
>> <https://www.ververica.com/>
>>
>> Follow us @VervericaData
>>
>> --
>>
>> Join Flink Forward <https://flink-forward.org/> - The Apache Flink
>> Conference
>>
>> Stream Processing | Event Driven | Real Time
>>
>> --
>>
>> Data Artisans GmbH | Invalidenstrasse 115, 10115 Berlin, Germany
>>
>> --
>> Data Artisans GmbH
>> Registered at Amtsgericht Charlottenburg: HRB 158244 B
>> Managing Directors: Dr. Kostas Tzoumas, Dr. Stephan Ewen
>>
>

-- 

Konstantin Knauf | Solutions Architect

+49 160 91394525

<https://www.ververica.com/>

Follow us @VervericaData

--

Join Flink Forward <https://flink-forward.org/> - The Apache Flink
Conference

Stream Processing | Event Driven | Real Time

--

Data Artisans GmbH | Invalidenstrasse 115, 10115 Berlin, Germany

--
Data Artisans GmbH
Registered at Amtsgericht Charlottenburg: HRB 158244 B
Managing Directors: Dr. Kostas Tzoumas, Dr. Stephan Ewen

Re: What does "Continuous incremental cleanup" mean in Flink 1.8 release notes

Posted by Tony Wei <to...@gmail.com>.

Hi Konstantin,

That is really helpful. Thanks.

Another follow-up question: The document said "Cleanup in full snapshot" is
not applicable for
the incremental checkpointing in the RocksDB state backend. However, when
user manually
trigger a savepoint and restart job from it, the expired states should be
clean up as well based
on Flink 1.6's implementation. Am I right?

Best,
Tony Wei

Konstantin Knauf <ko...@ververica.com> 於 2019年3月9日 週六 上午7:00寫道：

> Hi Tony,
>
> before Flink 1.8 expired state is only cleaned up, when you try to access
> it after expiration, i.e. when user code tries to access the expired state,
> the state value is cleaned and "null" is returned. There was also already
> the option to clean up expired state during full snapshots (
> https://github.com/apache/flink/pull/6460). With Flink 1.8 expired state
> is cleaned up continuously in the background regardless of checkpointing or
> any attempt to access it after expiration.
>
> As a reference the linked JIRA tickets should be a good starting point.
>
> Hope this helps.
>
> Konstantin
>
>
>
>
> On Fri, Mar 8, 2019 at 10:45 AM Tony Wei <to...@gmail.com> wrote:
>
>> Hi everyone,
>>
>> I read the Flink 1.8 release notes about state [1], and it said
>>
>> *Continuous incremental cleanup of old Keyed State with TTL*
>>> We introduced TTL (time-to-live) for Keyed state in Flink 1.6 (
>>> FLINK-9510 <https://issues.apache.org/jira/browse/FLINK-9510>). This
>>> feature allowed to clean up and make inaccessible keyed state entries when
>>> accessing them. In addition state would now also being cleaned up when
>>> writing a savepoint/checkpoint.
>>> Flink 1.8 introduces continous cleanup of old entries for both the
>>> RocksDB state backend (FLINK-10471
>>> <https://issues.apache.org/jira/browse/FLINK-10471>) and the heap state
>>> backend (FLINK-10473 <https://issues.apache.org/jira/browse/FLINK-10473>).
>>> This means that old entries (according to the ttl setting) are continously
>>> being cleanup up.
>>
>>
>> I'm not familiar with TTL's implementation in Flink 1.6 and what new
>> features introduced in Flink
>> 1.8. I don't understand what difference between these two release version
>> after reading the
>> release notes. Did they change the outcome of TTL feature, or provide new
>> TTL features, or just
>> change the behavior of executing TTL mechanism.
>>
>> Could you give me more references to learn about it? A simple example
>> to illustrate it is more
>> appreciated. Thank you.
>>
>> Best,
>> Tony Wei
>>
>> [1]
>> https://ci.apache.org/projects/flink/flink-docs-master/release-notes/flink-1.8.html#state
>>
>
>
> --
>
> Konstantin Knauf | Solutions Architect
>
> +49 160 91394525
>
> <https://www.ververica.com/>
>
> Follow us @VervericaData
>
> --
>
> Join Flink Forward <https://flink-forward.org/> - The Apache Flink
> Conference
>
> Stream Processing | Event Driven | Real Time
>
> --
>
> Data Artisans GmbH | Invalidenstrasse 115, 10115 Berlin, Germany
>
> --
> Data Artisans GmbH
> Registered at Amtsgericht Charlottenburg: HRB 158244 B
> Managing Directors: Dr. Kostas Tzoumas, Dr. Stephan Ewen
>

Re: What does "Continuous incremental cleanup" mean in Flink 1.8 release notes

Posted by Konstantin Knauf <ko...@ververica.com>.

Hi Tony,

before Flink 1.8 expired state is only cleaned up, when you try to access
it after expiration, i.e. when user code tries to access the expired state,
the state value is cleaned and "null" is returned. There was also already
the option to clean up expired state during full snapshots (
https://github.com/apache/flink/pull/6460). With Flink 1.8 expired state is
cleaned up continuously in the background regardless of checkpointing or
any attempt to access it after expiration.

As a reference the linked JIRA tickets should be a good starting point.

Hope this helps.

Konstantin




On Fri, Mar 8, 2019 at 10:45 AM Tony Wei <to...@gmail.com> wrote:

> Hi everyone,
>
> I read the Flink 1.8 release notes about state [1], and it said
>
> *Continuous incremental cleanup of old Keyed State with TTL*
>> We introduced TTL (time-to-live) for Keyed state in Flink 1.6 (FLINK-9510
>> <https://issues.apache.org/jira/browse/FLINK-9510>). This feature
>> allowed to clean up and make inaccessible keyed state entries when
>> accessing them. In addition state would now also being cleaned up when
>> writing a savepoint/checkpoint.
>> Flink 1.8 introduces continous cleanup of old entries for both the
>> RocksDB state backend (FLINK-10471
>> <https://issues.apache.org/jira/browse/FLINK-10471>) and the heap state
>> backend (FLINK-10473 <https://issues.apache.org/jira/browse/FLINK-10473>).
>> This means that old entries (according to the ttl setting) are continously
>> being cleanup up.
>
>
> I'm not familiar with TTL's implementation in Flink 1.6 and what new
> features introduced in Flink
> 1.8. I don't understand what difference between these two release version
> after reading the
> release notes. Did they change the outcome of TTL feature, or provide new
> TTL features, or just
> change the behavior of executing TTL mechanism.
>
> Could you give me more references to learn about it? A simple example
> to illustrate it is more
> appreciated. Thank you.
>
> Best,
> Tony Wei
>
> [1]
> https://ci.apache.org/projects/flink/flink-docs-master/release-notes/flink-1.8.html#state
>


-- 

Konstantin Knauf | Solutions Architect

+49 160 91394525

<https://www.ververica.com/>

Follow us @VervericaData

--

Join Flink Forward <https://flink-forward.org/> - The Apache Flink
Conference

Stream Processing | Event Driven | Real Time

--

Data Artisans GmbH | Invalidenstrasse 115, 10115 Berlin, Germany

--
Data Artisans GmbH
Registered at Amtsgericht Charlottenburg: HRB 158244 B
Managing Directors: Dr. Kostas Tzoumas, Dr. Stephan Ewen