You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Elad Eldor <el...@gmail.com> on 2018/03/28 10:50:15 UTC

kafka IO utilization with more disks and brokers

Hi all,

We performed a kafka benchmark (BM) in order to figure out the maximum
throughput (TP) available with the given kafka brokers and disks.

*kafka brokers setup (machine spec & disks):*

3 kafka brokers, Intel(R) Xeon(R) CPU E5-2650 v4 @ 2.20GHz, 8 cores.

each broker has sdb device mounted to /var/kafka, in size 14.6T.

the sdb device is combined of 16 SAS disks ~1TB in RAID-10. which means 8
disks are used as parity.

*kafka producer configuration:*

   -

   key=string, value=byteArray
   -

   enable.auto.commit=false
   -

   buffer.memory=500000000
   -

   batch.size=262144
   -

   retry.backoff.ms=5
   -

   linger.ms=20000
   -

   retries=0
   -

   compression.type=lz4
   -

   acks=1

*kafka topic configuration*

100 partitions, balanced between all 3 brokers

replication factor = 3

*how the kafka BM was performed*

we injected messages using a proprietary KakkaInjector tool messages.

the messages were in size ~1K and were sent into all 100 partitions
(equally) for consecutive 2.5 hours.

the BM goal was to see what's the maximal TP that can be achieved without
reaching more than ~80%-85% IO utilization%.

*kafka BM results (throughput and IO utilization%)*

[image: enter image description here] <https://i.stack.imgur.com/DdWMI.png>

so with ~85% IO utilization in all 3 brokers, the rate of messages/sec was
550,000 msgs being read & 550,000 msgs being written.

If we look at the TP in kB measures, then all 3 brokers reached tota of 380
rKB/s and 495 wKB/s.

*my questions*

these results were achieved with 3 kafka brokers X 16 SAS disks X 1TB. we
want to reach ~1.5M messages/sec instead of the current rate of 550K
msgs/sec.

so my question is:

   -

   is adding more disks to each broker will increase linearly the number of
   msgs being read and written?
   -

   is adding more brokers with the same disks setup will increase linearly
   the number of msgs being read and written?
   -

   if we change the RAID from RAID-10 to RAID-0, will the TP increase by 2X?
   -

   if we change the disks from SAS to SSD will it increase the TP?


Thanks