You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Guozhang Wang <wa...@gmail.com> on 2016/07/07 17:51:57 UTC

Re: Streams RocksDB State Store Disk Usage

I find this tuning guide in RocksDB quite useful, regarding your write /
space amplifications.

https://github.com/facebook/rocksdb/wiki/RocksDB-Tuning-Guide

Guozhang

On Thu, Jun 30, 2016 at 8:36 AM, Avi Flax <av...@parkassist.com> wrote:

> On Jun 29, 2016, at 22:44, Guozhang Wang <wa...@gmail.com> wrote:
> >
> > One way to mentally quantify your state store usage is to consider the
> > total key space in your reduceByKey() operator, and multiply by the
> average
> > key-value pair size. Then you need to consider the RocksDB write / space
> > amplification factor as well.
>
> That makes sense, thank you!
>
> > Currently Kafka Streams hard-write some RocksDB config values such as
> block
> > size to achieve good write performance with the cost of write
> > amplification, but we are now working on exposing those configs to the
> > users so that they can override themselves:
> >
> > https://issues.apache.org/jira/browse/KAFKA-3740
>
> That looks excellent for the next release ;)
>
> In the meantime, do you know anything specific about the RocksDB behavior
> with the LOG and LOG.old.{timestamp} files? (They don’t seem to me to be
> directly related to the storage space required by the actual state itself,
> unless I’m misunderstanding the word “log” — it is a bit overloaded in this
> community.) Is there something I can do in code to affect this? Or some way
> to understand/predict the growth patterns of these files, whether or not
> RocksDB has some kind of built-in cleanup feature or whether I need to set
> up a cron job on my own?
>
> Thanks!
> Avi




-- 
-- Guozhang