You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Bowen Li (JIRA)" <ji...@apache.org> on 2018/01/09 08:28:00 UTC

[jira] [Comment Edited] (FLINK-3089) State API Should Support Data Expiration (State TTL)

    [ https://issues.apache.org/jira/browse/FLINK-3089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16318013#comment-16318013 ] 

Bowen Li edited comment on FLINK-3089 at 1/9/18 8:27 AM:
---------------------------------------------------------

[~sihuazhou] If we don't enforce deletion, TtlDB won't promise to expire the data right after TTL, which may cause uncertainty somewhere. Frankly, I think it might only cause uncertainty in unit tests, and will not impact production. I want to have this limitation fully discussed ahead.

Enforcing *strict TTL*, as you said, is costly in both heap and RocksDB. So take a step back, I think Flink probably should adopt *a relaxed TTL policy like TtlDB's* - ["...when key-values inserted are meant to be removed from the db in a non-strict 'ttl' amount of time therefore, this guarantees that key-values inserted will remain in the db for at least ttl amount of time and the db will make efforts to remove the key-values as soon as possible after ttl seconds of their insertion."|https://github.com/facebook/rocksdb/wiki/Time-to-Live]   This way, it makes everything much easier and performant. What do you think?

And how do you distinguish processing time with event time with TtlDB? Do you proximate event time to processing time?


was (Author: phoenixjiangnan):
[~sihuazhou] If we don't enforce deletion, TtlDB won't promise to expire the data right after TTL, which may cause uncertainty somewhere. Frankly, I think it might only cause uncertainty in unit tests, and will not impact production. I want to have this limitation fully discussed ahead.

Enforcing *strict TTL*, as you said, is costly in both heap and RocksDB. So take a step back, I think Flink probably should adopt *a relaxed TTL policy like TtlDB's* - ["...when key-values inserted are meant to be removed from the db in a non-strict 'ttl' amount of time therefore, this guarantees that key-values inserted will remain in the db for at least ttl amount of time and the db will make efforts to remove the key-values as soon as possible after ttl seconds of their insertion."|https://github.com/facebook/rocksdb/wiki/Time-to-Live]   This way, it makes everything much easier. What do you think?

And how do you distinguish processing time with event time with TtlDB? Do you proximate event time to processing time?

> State API Should Support Data Expiration (State TTL)
> ----------------------------------------------------
>
>                 Key: FLINK-3089
>                 URL: https://issues.apache.org/jira/browse/FLINK-3089
>             Project: Flink
>          Issue Type: New Feature
>          Components: DataStream API, State Backends, Checkpointing
>            Reporter: Niels Basjes
>            Assignee: Bowen Li
>
> In some usecases (webanalytics) there is a need to have a state per visitor on a website (i.e. keyBy(sessionid) ).
> At some point the visitor simply leaves and no longer creates new events (so a special 'end of session' event will not occur).
> The only way to determine that a visitor has left is by choosing a timeout, like "After 30 minutes no events we consider the visitor 'gone'".
> Only after this (chosen) timeout has expired should we discard this state.
> In the Trigger part of Windows we can set a timer and close/discard this kind of information. But that introduces the buffering effect of the window (which in some scenarios is unwanted).
> What I would like is to be able to set a timeout on a specific state which I can update afterwards.
> This makes it possible to create a map function that assigns the right value and that discards the state automatically.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)