You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Sameer W <sa...@axiomine.com> on 2018/10/23 12:03:53 UTC

Size of Checkpoints increasing with time

Hi,

We are using ValueState to maintain state. It is a pretty simple job with a
keyBy operator on a stream and the subsequent map operator maintains state
in a ValueState instance. The transaction load is in billion transactions
per day. However the amount of state per key is a list of 18x6 long values
which are constantly updated. We have about 20 million keys and
transactions are uniformly distributed across those keys.

When the job starts the size of the checkpoints (Using RocksDB backed by
S3) is low (order of 500 MB). However, after 12 hours of operation the
checkpoint sizes have increased to about 4-5 GB. Time taken to complete the
checkpoint starts around 15-20 seconds and after 12 hours reaches about a
minute.

What is the reason behind the increasing size of checkpoints?

Thanks,
Sameer

Re: Size of Checkpoints increasing with time

Posted by Kien Truong <du...@gmail.com>.
Hi,

Do you use incremental checkpoint ?

RocksDB is an append-only DB, so you will experience the steady increase 
in state size until a compaction occurs and old values of keys are 
garbage-collected.

However, the average state size should stabilize after a while, if the 
load doesn't change.

Regards,

Kien


On 10/23/2018 7:03 PM, Sameer W wrote:
> Hi,
>
> We are using ValueState to maintain state. It is a pretty simple job 
> with a keyBy operator on a stream and the subsequent map operator 
> maintains state in a ValueState instance. The transaction load is in 
> billion transactions per day. However the amount of state per key is a 
> list of 18x6 long values which are constantly updated. We have about 
> 20 million keys and transactions are uniformly distributed across 
> those keys.
>
> When the job starts the size of the checkpoints (Using RocksDB backed 
> by S3) is low (order of 500 MB). However, after 12 hours of operation 
> the checkpoint sizes have increased to about 4-5 GB. Time taken to 
> complete the checkpoint starts around 15-20 seconds and after 12 hours 
> reaches about a minute.
>
> What is the reason behind the increasing size of checkpoints?
>
> Thanks,
> Sameer