You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@flink.apache.org by Ning Shi <ni...@gmail.com> on 2019/03/11 17:15:55 UTC

Migrating Existing TTL State to 1.8

It's exciting to see TTL state cleanup feature in 1.8. I have a question
regarding the migration of existing TTL state to the newer version.

Looking at the Pull Request [1] that introduced this feature, it seems like
that Flink is leveraging RocksDB's compaction filter to remove stale state.
I assume this means that state will only be cleaned on compaction. If I
have a significant amount of stale TTL state, some of which may have
already been compacted to higher levels already, upgrading to 1.8 may not
clean them. Is this assumption correct? If so, is the best approach to take
a full snapshot/checkpoint and restore it to 1.8 to have them been cleaned
on initialization?

Thanks,

[1] https://github.com/dataArtisans/frocksdb/pull/1

--
Ning

Re: Migrating Existing TTL State to 1.8

Posted by Ning Shi <ni...@gmail.com>.

Hi Andrey,

Thank you for the reply.

We are using incremental checkpointing.

Good to know that the incremental cleanup only applies to the heap state
backend. Looks like taking some downtime to take a full savepoint and restore
everything is inevitable.

Thanks,

--
Ning

On Wed, 15 May 2019 10:53:25 -0400,
Andrey Zagrebin wrote:
> 
> Hi Ning,
> 
> If you have not activated non-incremental checkpointing then taking a
> savepoint is the only way to trigger the full snapshot. In any case, it
> will take time.
> 
> The incremental cleanup strategy is applicable only for heap state backend
> and does nothing for RocksDB backend. At the moment, you can combine only
> compaction filter with full snapshotting cleanup with RocksDB backend.
> 
> Best,
> Andrey

Re: Migrating Existing TTL State to 1.8

Posted by Andrey Zagrebin <an...@ververica.com>.

Hi Ning,

If you have not activated non-incremental checkpointing then taking a
savepoint is the only way to trigger the full snapshot. In any case, it
will take time.

The incremental cleanup strategy is applicable only for heap state backend
and does nothing for RocksDB backend. At the moment, you can combine only
compaction filter with full snapshotting cleanup with RocksDB backend.

Best,
Andrey

On Fri, Mar 15, 2019 at 11:56 PM Ning Shi <ni...@gmail.com> wrote:

> Hi Stefan,
>
> Thank you for the confirmation.
>
> Doing a one time cleanup with full snapshot and upgrading to Flink 1.8
> could work. However, in our case, the state is quite large (TBs).
> Taking a savepoint takes over an hour, during which we have to pause
> the job or it may process more events.
>
> The JavaDoc of `cleanupFullSnapshot` [1] says "Cleanup expired state
> in full snapshot on checkpoint.". My understanding is that the only
> way to take a full snapshot with RocksDB backend is to take a
> savepoint. Is there another way to take a full checkpoint?
>
> I noticed that Flink 1.8 also added an incremental cleanup strategy
> [2] by iterating through several keys at a time for each state access.
> If I combine this with the new compaction filter cleanup strategy,
> will it eventually remove all expired state without taking a full
> snapshot for upgrade?
>
> [1]
> https://ci.apache.org/projects/flink/flink-docs-master/api/java/org/apache/flink/api/common/state/StateTtlConfig.Builder.html#cleanupFullSnapshot--
> [2]
> https://ci.apache.org/projects/flink/flink-docs-master/api/java/org/apache/flink/api/common/state/StateTtlConfig.Builder.html#cleanupIncrementally-int-boolean-
>
> Thanks,
>
> Ning
>
>
> On Wed, Mar 13, 2019 at 11:22 AM Stefan Richter <s....@ververica.com>
> wrote:
> >
> > Hi,
> >
> > If you are worried about old state, you can combine the compaction
> filter based TTL with other cleanup strategies (see docs). For example,
> setting `cleanupFullSnapshot` when you take a savepoint it will be cleared
> of any expired state and you can then use it to bring it into Flink 1.8.
> >
> > Best,
> > Stefan
>

Re: Migrating Existing TTL State to 1.8

Posted by Ning Shi <ni...@gmail.com>.

Hi Stefan,

Thank you for the confirmation.

Doing a one time cleanup with full snapshot and upgrading to Flink 1.8
could work. However, in our case, the state is quite large (TBs).
Taking a savepoint takes over an hour, during which we have to pause
the job or it may process more events.

The JavaDoc of `cleanupFullSnapshot` [1] says "Cleanup expired state
in full snapshot on checkpoint.". My understanding is that the only
way to take a full snapshot with RocksDB backend is to take a
savepoint. Is there another way to take a full checkpoint?

I noticed that Flink 1.8 also added an incremental cleanup strategy
[2] by iterating through several keys at a time for each state access.
If I combine this with the new compaction filter cleanup strategy,
will it eventually remove all expired state without taking a full
snapshot for upgrade?

[1] https://ci.apache.org/projects/flink/flink-docs-master/api/java/org/apache/flink/api/common/state/StateTtlConfig.Builder.html#cleanupFullSnapshot--
[2] https://ci.apache.org/projects/flink/flink-docs-master/api/java/org/apache/flink/api/common/state/StateTtlConfig.Builder.html#cleanupIncrementally-int-boolean-

Thanks,

Ning

On Wed, Mar 13, 2019 at 11:22 AM Stefan Richter <s....@ververica.com> wrote:
>
> Hi,
>
> If you are worried about old state, you can combine the compaction filter based TTL with other cleanup strategies (see docs). For example, setting `cleanupFullSnapshot` when you take a savepoint it will be cleared of any expired state and you can then use it to bring it into Flink 1.8.
>
> Best,
> Stefan

Re: Migrating Existing TTL State to 1.8

Posted by Stefan Richter <s....@ververica.com>.

Hi,

If you are worried about old state, you can combine the compaction filter based TTL with other cleanup strategies (see docs). For example, setting `cleanupFullSnapshot` when you take a savepoint it will be cleared of any expired state and you can then use it to bring it into Flink 1.8.

Best,
Stefan

> On 13. Mar 2019, at 13:41, Ning Shi <ni...@gmail.com> wrote:
> 
> Just wondering if anyone has any insights into the new TTL state cleanup feature mentioned below.
> 
> Thanks,
> 
> —
> Ning
> 
> On Mar 11, 2019, at 1:15 PM, Ning Shi <ningshi2@gmail.com <ma...@gmail.com>> wrote:
> 
>> It's exciting to see TTL state cleanup feature in 1.8. I have a question regarding the migration of existing TTL state to the newer version.
>> 
>> Looking at the Pull Request [1] that introduced this feature, it seems like that Flink is leveraging RocksDB's compaction filter to remove stale state. I assume this means that state will only be cleaned on compaction. If I have a significant amount of stale TTL state, some of which may have already been compacted to higher levels already, upgrading to 1.8 may not clean them. Is this assumption correct? If so, is the best approach to take a full snapshot/checkpoint and restore it to 1.8 to have them been cleaned on initialization?
>> 
>> Thanks,
>> 
>> [1] https://github.com/dataArtisans/frocksdb/pull/1 <https://github.com/dataArtisans/frocksdb/pull/1>
>> 
>> --
>> Ning

Re: Migrating Existing TTL State to 1.8

Posted by Ning Shi <ni...@gmail.com>.

Just wondering if anyone has any insights into the new TTL state cleanup feature mentioned below.

Thanks,

—
Ning

> On Mar 11, 2019, at 1:15 PM, Ning Shi <ni...@gmail.com> wrote:
> 
> It's exciting to see TTL state cleanup feature in 1.8. I have a question regarding the migration of existing TTL state to the newer version.
> 
> Looking at the Pull Request [1] that introduced this feature, it seems like that Flink is leveraging RocksDB's compaction filter to remove stale state. I assume this means that state will only be cleaned on compaction. If I have a significant amount of stale TTL state, some of which may have already been compacted to higher levels already, upgrading to 1.8 may not clean them. Is this assumption correct? If so, is the best approach to take a full snapshot/checkpoint and restore it to 1.8 to have them been cleaned on initialization?
> 
> Thanks,
> 
> [1] https://github.com/dataArtisans/frocksdb/pull/1
> 
> --
> Ning