You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Anirudh Mallem <an...@247-inc.com> on 2016/12/01 08:35:20 UTC

Query regarding state backend for Custom Map Function

Hi Everyone,
I am trying to understand the Working With State feature page of the Flink documentation.
 My question is in case I am using a ValueState in my CustomMap class to store my states with the RocksDb as my state backend then it is clear that every state value is stored in RocksDb.
Now instead of a ValueState if I just use a normal Java Hashmap to store my states and implement the Checkpointed interface then will the entire HashMap reside on the RocksDb backend or will the HashMap be in memory and just the snapshots sent to RocksDb? I am trying to see what will I lose/gain if I have my own data structure to do state maintenance. Thanks.

Regards,
Anirudh

Re: Query regarding state backend for Custom Map Function

Posted by Stefan Richter <s....@data-artisans.com>.
Hi,

unfortunately, I think it is a little unlikely that it will still make it into 1.2.

Best,
Stefan

> Am 01.12.2016 um 20:29 schrieb Anirudh Mallem <an...@247-inc.com>:
> 
> Thanks a lot Stefan. I got what I was looking for. Is the MapState functionality coming as a part of the 1.2 release? 
> 
> From: Stefan Richter
> Reply-To: "user@flink.apache.org <ma...@flink.apache.org>"
> Date: Thursday, December 1, 2016 at 2:53 AM
> To: "user@flink.apache.org <ma...@flink.apache.org>"
> Subject: Re: Query regarding state backend for Custom Map Function
> 
> Hi,
> 
> using the ValueState and RocksDB to store a map inside the value state means that you will have a different map for each key, which is automatically swapped on a per record basis, depending on the record’s key. If you are using a map and Checkpointed, there is only one map and your code is responsible for dispatching state between different keys.
> 
> If you use a map and Checkpointed, the map will be on the heap and the checkpoint will go directly against the filesystem; this is independent of the chosen backend, so no RocksDB is involved.
> 
> On a further note, we are working on an alternative to ValueState that is like a MapState. In contrast to ValueState, MapState does not deserialize the whole map on each access, but can access individual key/value pairs. This might be what you are looking for.
> 
> Best,
> Stefan
> 
> 
>> Am 01.12.2016 um 09:35 schrieb Anirudh Mallem <anirudh.mallem@247-inc.com <ma...@247-inc.com>>:
>> 
>> Hi Everyone,
>> I am trying to understand the Working With State feature page of the Flink documentation.
>>  My question is in case I am using a ValueState in my CustomMap class to store my states with the RocksDb as my state backend then it is clear that every state value is stored in RocksDb. 
>> Now instead of a ValueState if I just use a normal Java Hashmap to store my states and implement the Checkpointed interface then will the entire HashMap reside on the RocksDb backend or will the HashMap be in memory and just the snapshots sent to RocksDb? I am trying to see what will I lose/gain if I have my own data structure to do state maintenance. Thanks. 
>> 
>> Regards,
>> Anirudh 
> 


Re: Query regarding state backend for Custom Map Function

Posted by Anirudh Mallem <an...@247-inc.com>.
Thanks a lot Stefan. I got what I was looking for. Is the MapState functionality coming as a part of the 1.2 release?

From: Stefan Richter
Reply-To: "user@flink.apache.org<ma...@flink.apache.org>"
Date: Thursday, December 1, 2016 at 2:53 AM
To: "user@flink.apache.org<ma...@flink.apache.org>"
Subject: Re: Query regarding state backend for Custom Map Function

Hi,

using the ValueState and RocksDB to store a map inside the value state means that you will have a different map for each key, which is automatically swapped on a per record basis, depending on the record’s key. If you are using a map and Checkpointed, there is only one map and your code is responsible for dispatching state between different keys.

If you use a map and Checkpointed, the map will be on the heap and the checkpoint will go directly against the filesystem; this is independent of the chosen backend, so no RocksDB is involved.

On a further note, we are working on an alternative to ValueState that is like a MapState. In contrast to ValueState, MapState does not deserialize the whole map on each access, but can access individual key/value pairs. This might be what you are looking for.

Best,
Stefan


Am 01.12.2016 um 09:35 schrieb Anirudh Mallem <an...@247-inc.com>>:

Hi Everyone,
I am trying to understand the Working With State feature page of the Flink documentation.
 My question is in case I am using a ValueState in my CustomMap class to store my states with the RocksDb as my state backend then it is clear that every state value is stored in RocksDb.
Now instead of a ValueState if I just use a normal Java Hashmap to store my states and implement the Checkpointed interface then will the entire HashMap reside on the RocksDb backend or will the HashMap be in memory and just the snapshots sent to RocksDb? I am trying to see what will I lose/gain if I have my own data structure to do state maintenance. Thanks.

Regards,
Anirudh


Re: Query regarding state backend for Custom Map Function

Posted by Stefan Richter <s....@data-artisans.com>.
Hi,

using the ValueState and RocksDB to store a map inside the value state means that you will have a different map for each key, which is automatically swapped on a per record basis, depending on the record’s key. If you are using a map and Checkpointed, there is only one map and your code is responsible for dispatching state between different keys.

If you use a map and Checkpointed, the map will be on the heap and the checkpoint will go directly against the filesystem; this is independent of the chosen backend, so no RocksDB is involved.

On a further note, we are working on an alternative to ValueState that is like a MapState. In contrast to ValueState, MapState does not deserialize the whole map on each access, but can access individual key/value pairs. This might be what you are looking for.

Best,
Stefan


> Am 01.12.2016 um 09:35 schrieb Anirudh Mallem <an...@247-inc.com>:
> 
> Hi Everyone,
> I am trying to understand the Working With State feature page of the Flink documentation.
>  My question is in case I am using a ValueState in my CustomMap class to store my states with the RocksDb as my state backend then it is clear that every state value is stored in RocksDb. 
> Now instead of a ValueState if I just use a normal Java Hashmap to store my states and implement the Checkpointed interface then will the entire HashMap reside on the RocksDb backend or will the HashMap be in memory and just the snapshots sent to RocksDb? I am trying to see what will I lose/gain if I have my own data structure to do state maintenance. Thanks. 
> 
> Regards,
> Anirudh