You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Lawrence Weikum <lw...@pandora.com> on 2016/09/09 21:08:48 UTC

Flickering Kafka Topic

Hello everyone!

We seem to be experiencing some odd behavior in Kafka and were wondering if anyone has come across the same issue and if you’ve been able to fix it.  Here’s the setup:

8 brokers in the cluster.  Kafka 0.10.0.0.

One topic, and only one topic on this cluster, is having issues where ISRs continuously decrease and increase but never stabilize.  This happens after roughly 50,000 messages per second come in, and the problem is exacerbated when the messages increased to 110,000 messages per second.  Messages are small. Total inbound is only about 50 MB/s.

There’s no errors in the logs. We just get countless number of messages like theses in the logs:

[2016-09-09 12:54:07,147] INFO Partition [topic_a,11] on broker 4: Expanding ISR for partition [topic_a,11] from 4 to 4,2 (kafka.cluster.Partition)
[2016-09-09 12:54:23,070] INFO Partition [topic_a,11] on broker 4: Shrinking ISR for partition [topic_a,11] from 4,2 to 4 (kafka.cluster.Partition)

This topic has transient data that is unimportant after 20 minutes, so losing some due to a cluster shutdown isn’t that important, and we also don’t mind if messages are occasionally dropped.  With this in mind we have these settings:
Partitions = 16
Producer ACKs = 1
Replication factor = 2
min.insync.replicas = 1

CPU is sitting fairly idle at ~18%, and a thread dump and profile showed that most threads are sitting idle as well – very little contention if any.

We tried to increase the number of partitions from 16 to 24, but it seems to have only grown the CPU (from 18% to 23%) and the number of Under Replicated Partitions.

Any advice or insight is appreciated. Thank you all!

Lawrence