You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2018/05/02 15:20:00 UTC

[jira] [Commented] (FLINK-8297) RocksDBListState stores whole list in single byte[]

    [ https://issues.apache.org/jira/browse/FLINK-8297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16461160#comment-16461160 ] 

ASF GitHub Bot commented on FLINK-8297:
---------------------------------------

Github user StefanRRichter commented on the issue:

    https://github.com/apache/flink/pull/5185
  
    Just a thought, how about having all this implemented on top of a map state, and also include the current size as a special entry in the map (with the size field having a key that makes it lexicographically the first entry, so that the iteration can easily skip it). Then we could have a util that wraps a map state into a list state. So the user can register a map state and enhance it to operate as a list state. From Flink's perspective it is still a map state in savepoints and only the user code reinterprets it as list state. Obviously this does not solve the problem of migrating between different list types, but it also does need to introduce a second list type and keeps the window operator as is.



> RocksDBListState stores whole list in single byte[]
> ---------------------------------------------------
>
>                 Key: FLINK-8297
>                 URL: https://issues.apache.org/jira/browse/FLINK-8297
>             Project: Flink
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 1.4.0, 1.3.2
>            Reporter: Jan Lukavský
>            Priority: Major
>
> RocksDBListState currently keeps whole list of data in single RocksDB key-value pair, which implies that the list actually must fit into memory. Larger lists are not supported and end up with OOME or other error. The RocksDBListState could be modified so that individual items in list are stored in separate keys in RocksDB and can then be iterated over. A simple implementation could reuse existing RocksDBMapState, with key as index to the list and a single RocksDBValueState keeping track of how many items has already been added to the list. Because this implementation might be less efficient in come cases, it would be good to make it opt-in by a construct like
> {{new RocksDBStateBackend().enableLargeListsPerKey()}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)