You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Giselle van Dongen <Gi...@UGent.be> on 2020/10/27 20:48:54 UTC

Kafka Streams RocksDB CPU usage

Hi all,


We have a Kafka Streams job which has high CPU utilization. When profiling the job, we saw that this was for a large part due to RocksDB methods: flush, seek, put, get, iteratorCF. We use the default settings for our RocksDB state store. Which configuration parameters are most important to tune to lower CPU usage? Most documentation focuses on memory as the bottleneck.


Our job does a join and window step. The commit interval is 1 second. We enabled caching and the cache is 512MB large. We have 6 instances of 6 CPU and 30 GB RAM.



Thank you for any help!


Re: Kafka Streams RocksDB CPU usage

Posted by Sophie Blee-Goldman <so...@confluent.io>.
You might want to start with a lower commit interval, if you can handle some
additional latency. I would bet that the frequent flushing is a major part
of your
problem: not just the act of flushing itself, but the consequences for the
structure
of the data in each rocksdb. If you end up flushing unfilled memtables then
you'll
end up with a large number of small L0 files that then have to be
compacted, and
until they are this can make the iterators/seeks less effective. Also it
means the
memtable is less effective as a write cache so you miss out on some
immediate
deduplication of updates to the same key.

There's been some recent work to decouple flushing from committing, so
starting
in 2.7 you shouldn't have to choose between low latency and cache/rocksdb
performance. This release is currently in progress but I'd recommend
checking
it out when you can.

I'm not sure what version you're using but in 2.5 we added some RocksDB
metrics
that could be useful for further insight. I think they're all recorded at
the DEBUG
level. Might be worth investigating.

We also recently added some additional metrics to expose properties of
RocksDB,
 which will also be available in the upcoming 2.7 release.

Cheers,
Sophie

On Tue, Oct 27, 2020 at 1:49 PM Giselle van Dongen <
Giselle.vanDongen@ugent.be> wrote:

> Hi all,
>
>
> We have a Kafka Streams job which has high CPU utilization. When profiling
> the job, we saw that this was for a large part due to RocksDB methods:
> flush, seek, put, get, iteratorCF. We use the default settings for our
> RocksDB state store. Which configuration parameters are most important to
> tune to lower CPU usage? Most documentation focuses on memory as the
> bottleneck.
>
>
> Our job does a join and window step. The commit interval is 1 second. We
> enabled caching and the cache is 512MB large. We have 6 instances of 6 CPU
> and 30 GB RAM.
>
>
>
> Thank you for any help!
>
>