You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@kafka.apache.org by Tianji Li <sk...@gmail.com> on 2017/03/15 13:02:50 UTC

Kafka Stream: RocksDBKeyValueStoreSupplier performance

Hi there,

It seems that the RocksDB state store is quite slow in my case and I wonder
if I did anything wrong.

I have a topic, that I groupBy() and then aggregate() 50 times. That is, I
will create 50 results topics and a lot more changelog and repartition
topics.

There are a few things that are weird and here I report one, which is the
State store speed.

If I use:

      StateStoreSupplier stateStoreSupplier = Stores.create(storeName)
        .withKeys(stringSerde)
        .withValues(avroSerde)
        .inMemory()
        .build();

Then processing 1 millions records takes around 5 minutes on my coding
computer.

If I use:

      StateStoreSupplier stateStoreSupplier = Stores.create(storeName)
        .withKeys(stringSerde)
        .withValues(avroSerde)
        .persistent()
        .disableLogging()
        .enableCaching()
        .build();

Processing the same 1 million records takes around 10 minutes.

I believe in the first case, changelog is backed up to Kafka and in the
second case, only RocketsDB is used.

But why the RocketsDB is so slow?

Eventually, I am hoping to do windowed aggregation and it seems I have to
use RocketsDB, but given the performance, I am hesitating.

Thanks
Tianji

Re: Kafka Stream: RocksDBKeyValueStoreSupplier performance

Posted by Tianji Li <sk...@gmail.com>.

Hi Eno,

Thanks for your help. Very appreciated.

Thanks
Tianji


On Wed, Mar 15, 2017 at 4:29 PM, Eno Thereska <en...@gmail.com>
wrote:

> Tianji,
>
> A couple of things:
>
> - for now could you use RocksDb without the cache? I've opened a JIRA to
> verify why it's slower with the cache: https://issues.apache.org/
> jira/browse/KAFKA-4904 <https://issues.apache.org/jira/browse/KAFKA-4904>
>
> - you can tune the RocksDb performance further by increasing "its" cache
> (yes, RocksDb has a separate cache and its size is set to quite small by
> default). Look at this question on how to do that with the
> RocksDbConfigSetter: https://groups.google.com/forum/#!topic/confluent-
> platform/RgkaUy1TUno <https://groups.google.com/forum/#!topic/confluent-
> platform/RgkaUy1TUno>. This might be a bit too much to start with, but
> it's possible. You'd have to set the blockCacheSize option, for example as
> done in the openDb call in RocksDbStore.java <https://github.com/apache/
> kafka/blob/trunk/streams/src/main/java/org/apache/kafka/
> streams/state/internals/RocksDBStore.java#L115>
>
> - in summary, I'd recommend you use RocksDb as is since 7 vs 5 is a
> reasonable difference.
>
> However, the real performance will be when you actually enable logging,
> right? You might want RocksDb to be backed to Kafka for fault tolerance.
>
> Finally, make sure to use 0.10.2, the latest release.
>
> Thanks
> Eno
>
>
> > On 15 Mar 2017, at 18:14, Tianji Li <sk...@gmail.com> wrote:
> >
> > Hi Eno,
> >
> > Rocksdb without caching took around 7 minutes.
> >
> > Tianji
> >
> >
> > On Wed, Mar 15, 2017 at 9:40 AM, Eno Thereska <en...@gmail.com>
> > wrote:
> >
> >> Tianji,
> >>
> >> Could you provide a third data point, running with RocksDb, but without
> >> caching, i.e:
> >>
> >>> StateStoreSupplier stateStoreSupplier = Stores.create(storeName)
> >>>       .withKeys(stringSerde)
> >>>       .withValues(avroSerde)
> >>>       .persistent()
> >>>       .disableLogging()
> >>>       .build();
> >>
> >>
> >> Thanks
> >> Eno
> >>
> >>
> >>> On 15 Mar 2017, at 13:02, Tianji Li <sk...@gmail.com> wrote:
> >>>
> >>> Hi there,
> >>>
> >>> It seems that the RocksDB state store is quite slow in my case and I
> >> wonder
> >>> if I did anything wrong.
> >>>
> >>> I have a topic, that I groupBy() and then aggregate() 50 times. That
> is,
> >> I
> >>> will create 50 results topics and a lot more changelog and repartition
> >>> topics.
> >>>
> >>> There are a few things that are weird and here I report one, which is
> the
> >>> State store speed.
> >>>
> >>> If I use:
> >>>
> >>>     StateStoreSupplier stateStoreSupplier = Stores.create(storeName)
> >>>       .withKeys(stringSerde)
> >>>       .withValues(avroSerde)
> >>>       .inMemory()
> >>>       .build();
> >>>
> >>> Then processing 1 millions records takes around 5 minutes on my coding
> >>> computer.
> >>>
> >>> If I use:
> >>>
> >>>     StateStoreSupplier stateStoreSupplier = Stores.create(storeName)
> >>>       .withKeys(stringSerde)
> >>>       .withValues(avroSerde)
> >>>       .persistent()
> >>>       .disableLogging()
> >>>       .enableCaching()
> >>>       .build();
> >>>
> >>> Processing the same 1 million records takes around 10 minutes.
> >>>
> >>> I believe in the first case, changelog is backed up to Kafka and in the
> >>> second case, only RocketsDB is used.
> >>>
> >>> But why the RocketsDB is so slow?
> >>>
> >>> Eventually, I am hoping to do windowed aggregation and it seems I have
> to
> >>> use RocketsDB, but given the performance, I am hesitating.
> >>>
> >>> Thanks
> >>> Tianji
> >>
> >>
>
>

Re: Kafka Stream: RocksDBKeyValueStoreSupplier performance

Posted by Eno Thereska <en...@gmail.com>.

Tianji,

A couple of things:

- for now could you use RocksDb without the cache? I've opened a JIRA to verify why it's slower with the cache: https://issues.apache.org/jira/browse/KAFKA-4904 <https://issues.apache.org/jira/browse/KAFKA-4904> 

- you can tune the RocksDb performance further by increasing "its" cache (yes, RocksDb has a separate cache and its size is set to quite small by default). Look at this question on how to do that with the RocksDbConfigSetter: https://groups.google.com/forum/#!topic/confluent-platform/RgkaUy1TUno <https://groups.google.com/forum/#!topic/confluent-platform/RgkaUy1TUno>. This might be a bit too much to start with, but it's possible. You'd have to set the blockCacheSize option, for example as done in the openDb call in RocksDbStore.java <https://github.com/apache/kafka/blob/trunk/streams/src/main/java/org/apache/kafka/streams/state/internals/RocksDBStore.java#L115>

- in summary, I'd recommend you use RocksDb as is since 7 vs 5 is a reasonable difference.

However, the real performance will be when you actually enable logging, right? You might want RocksDb to be backed to Kafka for fault tolerance.

Finally, make sure to use 0.10.2, the latest release.

Thanks
Eno


> On 15 Mar 2017, at 18:14, Tianji Li <sk...@gmail.com> wrote:
> 
> Hi Eno,
> 
> Rocksdb without caching took around 7 minutes.
> 
> Tianji
> 
> 
> On Wed, Mar 15, 2017 at 9:40 AM, Eno Thereska <en...@gmail.com>
> wrote:
> 
>> Tianji,
>> 
>> Could you provide a third data point, running with RocksDb, but without
>> caching, i.e:
>> 
>>> StateStoreSupplier stateStoreSupplier = Stores.create(storeName)
>>>       .withKeys(stringSerde)
>>>       .withValues(avroSerde)
>>>       .persistent()
>>>       .disableLogging()
>>>       .build();
>> 
>> 
>> Thanks
>> Eno
>> 
>> 
>>> On 15 Mar 2017, at 13:02, Tianji Li <sk...@gmail.com> wrote:
>>> 
>>> Hi there,
>>> 
>>> It seems that the RocksDB state store is quite slow in my case and I
>> wonder
>>> if I did anything wrong.
>>> 
>>> I have a topic, that I groupBy() and then aggregate() 50 times. That is,
>> I
>>> will create 50 results topics and a lot more changelog and repartition
>>> topics.
>>> 
>>> There are a few things that are weird and here I report one, which is the
>>> State store speed.
>>> 
>>> If I use:
>>> 
>>>     StateStoreSupplier stateStoreSupplier = Stores.create(storeName)
>>>       .withKeys(stringSerde)
>>>       .withValues(avroSerde)
>>>       .inMemory()
>>>       .build();
>>> 
>>> Then processing 1 millions records takes around 5 minutes on my coding
>>> computer.
>>> 
>>> If I use:
>>> 
>>>     StateStoreSupplier stateStoreSupplier = Stores.create(storeName)
>>>       .withKeys(stringSerde)
>>>       .withValues(avroSerde)
>>>       .persistent()
>>>       .disableLogging()
>>>       .enableCaching()
>>>       .build();
>>> 
>>> Processing the same 1 million records takes around 10 minutes.
>>> 
>>> I believe in the first case, changelog is backed up to Kafka and in the
>>> second case, only RocketsDB is used.
>>> 
>>> But why the RocketsDB is so slow?
>>> 
>>> Eventually, I am hoping to do windowed aggregation and it seems I have to
>>> use RocketsDB, but given the performance, I am hesitating.
>>> 
>>> Thanks
>>> Tianji
>> 
>>

Re: Kafka Stream: RocksDBKeyValueStoreSupplier performance

Posted by Tianji Li <sk...@gmail.com>.

Hi Eno,

Rocksdb without caching took around 7 minutes.

Tianji


On Wed, Mar 15, 2017 at 9:40 AM, Eno Thereska <en...@gmail.com>
wrote:

> Tianji,
>
> Could you provide a third data point, running with RocksDb, but without
> caching, i.e:
>
> > StateStoreSupplier stateStoreSupplier = Stores.create(storeName)
> >        .withKeys(stringSerde)
> >        .withValues(avroSerde)
> >        .persistent()
> >        .disableLogging()
> >        .build();
>
>
> Thanks
> Eno
>
>
> > On 15 Mar 2017, at 13:02, Tianji Li <sk...@gmail.com> wrote:
> >
> > Hi there,
> >
> > It seems that the RocksDB state store is quite slow in my case and I
> wonder
> > if I did anything wrong.
> >
> > I have a topic, that I groupBy() and then aggregate() 50 times. That is,
> I
> > will create 50 results topics and a lot more changelog and repartition
> > topics.
> >
> > There are a few things that are weird and here I report one, which is the
> > State store speed.
> >
> > If I use:
> >
> >      StateStoreSupplier stateStoreSupplier = Stores.create(storeName)
> >        .withKeys(stringSerde)
> >        .withValues(avroSerde)
> >        .inMemory()
> >        .build();
> >
> > Then processing 1 millions records takes around 5 minutes on my coding
> > computer.
> >
> > If I use:
> >
> >      StateStoreSupplier stateStoreSupplier = Stores.create(storeName)
> >        .withKeys(stringSerde)
> >        .withValues(avroSerde)
> >        .persistent()
> >        .disableLogging()
> >        .enableCaching()
> >        .build();
> >
> > Processing the same 1 million records takes around 10 minutes.
> >
> > I believe in the first case, changelog is backed up to Kafka and in the
> > second case, only RocketsDB is used.
> >
> > But why the RocketsDB is so slow?
> >
> > Eventually, I am hoping to do windowed aggregation and it seems I have to
> > use RocketsDB, but given the performance, I am hesitating.
> >
> > Thanks
> > Tianji
>
>

Re: Kafka Stream: RocksDBKeyValueStoreSupplier performance

Posted by Eno Thereska <en...@gmail.com>.

Tianji,

Could you provide a third data point, running with RocksDb, but without caching, i.e:

> StateStoreSupplier stateStoreSupplier = Stores.create(storeName)
>        .withKeys(stringSerde)
>        .withValues(avroSerde)
>        .persistent()
>        .disableLogging()
>        .build();


Thanks
Eno


> On 15 Mar 2017, at 13:02, Tianji Li <sk...@gmail.com> wrote:
> 
> Hi there,
> 
> It seems that the RocksDB state store is quite slow in my case and I wonder
> if I did anything wrong.
> 
> I have a topic, that I groupBy() and then aggregate() 50 times. That is, I
> will create 50 results topics and a lot more changelog and repartition
> topics.
> 
> There are a few things that are weird and here I report one, which is the
> State store speed.
> 
> If I use:
> 
>      StateStoreSupplier stateStoreSupplier = Stores.create(storeName)
>        .withKeys(stringSerde)
>        .withValues(avroSerde)
>        .inMemory()
>        .build();
> 
> Then processing 1 millions records takes around 5 minutes on my coding
> computer.
> 
> If I use:
> 
>      StateStoreSupplier stateStoreSupplier = Stores.create(storeName)
>        .withKeys(stringSerde)
>        .withValues(avroSerde)
>        .persistent()
>        .disableLogging()
>        .enableCaching()
>        .build();
> 
> Processing the same 1 million records takes around 10 minutes.
> 
> I believe in the first case, changelog is backed up to Kafka and in the
> second case, only RocketsDB is used.
> 
> But why the RocketsDB is so slow?
> 
> Eventually, I am hoping to do windowed aggregation and it seems I have to
> use RocketsDB, but given the performance, I am hesitating.
> 
> Thanks
> Tianji