You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Adrian Tubio <at...@gmail.com> on 2022/05/19 08:56:30 UTC

Leverage multiple disks for kafka streams stores

Hi there,

My kafka streams topology has one store that is particularly busy, that
alongside other stores in the same topology is exhausting I/O which leads
to write stalls and increased latency.

The amount of compaction that this store does with regards to others is
about 3/4 times more, so we were wondering if, since we have more
disks/volumes available, would it be possible to set a different path for
this store so it falls into a different disk?

I don't seem to be able to find any way to do it, ideally it should be done
via RocksDbConfigSetter, but that doesn't seem to offer that possibility as
it seems the state store comes from StateStoreContext which is initialized
from the STATE_DIR_CONFIG global setting.

Has anyone done something similar?

Best regards,

Adrian Tubio

Re: Leverage multiple disks for kafka streams stores

Posted by Bruno Cadonna <ca...@apache.org>.
Hi Adrian,

Thank you for the additional information!

One reason to have a single folder is that Streams also stores metadata 
that refers to all state stores in the state directory. That could be 
changed if we have a good reason.

If you have a good idea to solve this issue, please feel free to open a 
KIP. Would be glad to discuss such a KIP.

Best,
Bruno

On 19.05.22 15:40, Adrian Tubio wrote:
> Hi Bruno,
> 
> Thanks a lot for your answer.
> 
> I have tried to tune store by store to the best of my ability, and indeed I
> have managed to improve considerably. We even changed the disk to a much
> faster one. But it's still not enough.
> 
> Yes we can try dividing the application up into sub applications to make
> use of different disks, but it feels like an artificial solution.
> 
> There might be reasons I don't know of to have a single folder for all
> stores, but it feels limiting, especially if you consider that you can
> plugin other types of stores instead of rocks db which doesn't even use
> local disk.
> 
> If my CPU is ok, my memory is ok and the only limiting factor is Disk, why
> not allow the usage of multiple disks instead?
> Especially in cloud deployments in which you can arbitrarily attach
> multiple volumes, sometimes it is cheaper to use several cheaper volumes in
> parallel than a single very expensive one.
> 
> I personally believe that this should be considered for a KIP.
> 
> Best regards,
> 
> Adrian Tubio
> 
> 
> 
> On Thu, May 19, 2022 at 1:49 PM Bruno Cadonna <ca...@apache.org> wrote:
> 
>> Hi Adrian,
>>
>> I am afraid that you cannot set the state directory for a single state
>> store to a different directory than all other stores.
>>
>> Maybe the following blog post can help you debug and solve your issue:
>>
>>
>> https://www.confluent.io/blog/how-to-tune-rocksdb-kafka-streams-state-stores-performance
>>
>> Specifically look at the section "High disk I/O and write stalls":
>>
>> https://www.confluent.io/blog/how-to-tune-rocksdb-kafka-streams-state-stores-performance/#write-stalls
>>
>> Best,
>> Bruno
>>
>>
>> On 19.05.22 10:56, Adrian Tubio wrote:
>>> Hi there,
>>>
>>> My kafka streams topology has one store that is particularly busy, that
>>> alongside other stores in the same topology is exhausting I/O which leads
>>> to write stalls and increased latency.
>>>
>>> The amount of compaction that this store does with regards to others is
>>> about 3/4 times more, so we were wondering if, since we have more
>>> disks/volumes available, would it be possible to set a different path for
>>> this store so it falls into a different disk?
>>>
>>> I don't seem to be able to find any way to do it, ideally it should be
>> done
>>> via RocksDbConfigSetter, but that doesn't seem to offer that possibility
>> as
>>> it seems the state store comes from StateStoreContext which is
>> initialized
>>> from the STATE_DIR_CONFIG global setting.
>>>
>>> Has anyone done something similar?
>>>
>>> Best regards,
>>>
>>> Adrian Tubio
>>>
>>
> 

Re: Leverage multiple disks for kafka streams stores

Posted by Adrian Tubio <at...@gmail.com>.
Hi Bruno,

Thanks a lot for your answer.

I have tried to tune store by store to the best of my ability, and indeed I
have managed to improve considerably. We even changed the disk to a much
faster one. But it's still not enough.

Yes we can try dividing the application up into sub applications to make
use of different disks, but it feels like an artificial solution.

There might be reasons I don't know of to have a single folder for all
stores, but it feels limiting, especially if you consider that you can
plugin other types of stores instead of rocks db which doesn't even use
local disk.

If my CPU is ok, my memory is ok and the only limiting factor is Disk, why
not allow the usage of multiple disks instead?
Especially in cloud deployments in which you can arbitrarily attach
multiple volumes, sometimes it is cheaper to use several cheaper volumes in
parallel than a single very expensive one.

I personally believe that this should be considered for a KIP.

Best regards,

Adrian Tubio



On Thu, May 19, 2022 at 1:49 PM Bruno Cadonna <ca...@apache.org> wrote:

> Hi Adrian,
>
> I am afraid that you cannot set the state directory for a single state
> store to a different directory than all other stores.
>
> Maybe the following blog post can help you debug and solve your issue:
>
>
> https://www.confluent.io/blog/how-to-tune-rocksdb-kafka-streams-state-stores-performance
>
> Specifically look at the section "High disk I/O and write stalls":
>
> https://www.confluent.io/blog/how-to-tune-rocksdb-kafka-streams-state-stores-performance/#write-stalls
>
> Best,
> Bruno
>
>
> On 19.05.22 10:56, Adrian Tubio wrote:
> > Hi there,
> >
> > My kafka streams topology has one store that is particularly busy, that
> > alongside other stores in the same topology is exhausting I/O which leads
> > to write stalls and increased latency.
> >
> > The amount of compaction that this store does with regards to others is
> > about 3/4 times more, so we were wondering if, since we have more
> > disks/volumes available, would it be possible to set a different path for
> > this store so it falls into a different disk?
> >
> > I don't seem to be able to find any way to do it, ideally it should be
> done
> > via RocksDbConfigSetter, but that doesn't seem to offer that possibility
> as
> > it seems the state store comes from StateStoreContext which is
> initialized
> > from the STATE_DIR_CONFIG global setting.
> >
> > Has anyone done something similar?
> >
> > Best regards,
> >
> > Adrian Tubio
> >
>

Re: Leverage multiple disks for kafka streams stores

Posted by Bruno Cadonna <ca...@apache.org>.
Hi Adrian,

I am afraid that you cannot set the state directory for a single state 
store to a different directory than all other stores.

Maybe the following blog post can help you debug and solve your issue:

https://www.confluent.io/blog/how-to-tune-rocksdb-kafka-streams-state-stores-performance

Specifically look at the section "High disk I/O and write stalls":
https://www.confluent.io/blog/how-to-tune-rocksdb-kafka-streams-state-stores-performance/#write-stalls

Best,
Bruno


On 19.05.22 10:56, Adrian Tubio wrote:
> Hi there,
> 
> My kafka streams topology has one store that is particularly busy, that
> alongside other stores in the same topology is exhausting I/O which leads
> to write stalls and increased latency.
> 
> The amount of compaction that this store does with regards to others is
> about 3/4 times more, so we were wondering if, since we have more
> disks/volumes available, would it be possible to set a different path for
> this store so it falls into a different disk?
> 
> I don't seem to be able to find any way to do it, ideally it should be done
> via RocksDbConfigSetter, but that doesn't seem to offer that possibility as
> it seems the state store comes from StateStoreContext which is initialized
> from the STATE_DIR_CONFIG global setting.
> 
> Has anyone done something similar?
> 
> Best regards,
> 
> Adrian Tubio
>