You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by mangat rai <ma...@gmail.com> on 2021/08/10 13:10:20 UTC

High disk read with Kafka streams

Hey All,

We are using the low level processor API to create kafka stream
applications. Each app has 1 or more in-memory state stores with caching
disabled and changelog enabled. Some of the apps also have global stores.
We noticed from the node metrics (kubernetes) that the stream applications
are consuming too much disk IO. On going deeper I found following

1. Running locally with docker I could see some pretty high disk reads. I
used `docker stats` and got `BLOCK I/O` as `438MB / 0B`. To compare we did
only a few gigabytes of Net I/O.
2. In kubernetes, `container_fs_reads_bytes_total` gives us pretty big
numbers whereas `container_fs_writes_bytes_total` is almost negligible.

Now we are *not* using RocksDB. The pattern is not correlated to having a
global store. I read various documents but I still can't figure out why a
stream application would perform so much disk read. It's not even writing
so that rules out the swap space or any buffering etc.

I also noticed that a higher amount of data consumption is directly
proportional to a higher amount of disk reads. Is it possible that the data
is zero copied from the network interface to the disk and Kafka app is
reading from it. I am not aware if there is any mechanism to do that.

I would really appreciate any help in debugging this issue.

Thanks,
Mangat

Re: High disk read with Kafka streams

Posted by Guozhang Wang <wa...@gmail.com>.
Hello Magnat,

Thanks for reporting your observations. I have some questions:

1) Are your global state stores also in-memory or persisted on disks?
2) Are your Kafka and KStreams colocated?


Guozhang

On Tue, Aug 10, 2021 at 6:10 AM mangat rai <ma...@gmail.com> wrote:

> Hey All,
>
> We are using the low level processor API to create kafka stream
> applications. Each app has 1 or more in-memory state stores with caching
> disabled and changelog enabled. Some of the apps also have global stores.
> We noticed from the node metrics (kubernetes) that the stream applications
> are consuming too much disk IO. On going deeper I found following
>
> 1. Running locally with docker I could see some pretty high disk reads. I
> used `docker stats` and got `BLOCK I/O` as `438MB / 0B`. To compare we did
> only a few gigabytes of Net I/O.
> 2. In kubernetes, `container_fs_reads_bytes_total` gives us pretty big
> numbers whereas `container_fs_writes_bytes_total` is almost negligible.
>
> Now we are *not* using RocksDB. The pattern is not correlated to having a
> global store. I read various documents but I still can't figure out why a
> stream application would perform so much disk read. It's not even writing
> so that rules out the swap space or any buffering etc.
>
> I also noticed that a higher amount of data consumption is directly
> proportional to a higher amount of disk reads. Is it possible that the data
> is zero copied from the network interface to the disk and Kafka app is
> reading from it. I am not aware if there is any mechanism to do that.
>
> I would really appreciate any help in debugging this issue.
>
> Thanks,
> Mangat
>


-- 
-- Guozhang