You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Riccardo Ferrari <fe...@gmail.com> on 2016/11/03 16:52:42 UTC

Java Consumers Threads vs Topics

Hi list,

I need some input on best practices on wiritng Java Kafka (0.10.1.0)
consumers.

*The scenario:*
A java distributed system sending/receving messages, currently based on
Akka + RabbitMQ.
A reasonably low number of channels (~dozen) (mapped to Kafka topics)
however it can potentially grow to a high number (~thousands) as the system
scale.

*The requirement:*
Use Kafka as replacement for RabbitMQ (basically as a queue).
Disabled offset auto-commit.

According to their Javadoc there are mainly two options:
https://kafka.apache.org/0101/javadoc/org/apache/kafka/clients/consumer/KafkaConsumer.html#multithreaded
#1. One thread per consumer (and per topic)
#2. Decouple message consuming from processing

*The problem with #1:*
While this solution is very easy to implement, it seems that every Kafka
Consumer is adding (too much) load to the system.
System load get very high (7.0+ on a dual-core vm). This means 99% cpu
utilization with NO io wait.

I actually tried 2 different implementation:
1) Main Thread.run() polling from Kafka and putting messages into a local
concurrent queue. a commodity method (getMessages) to retrieve messages
from the locally populated queue.
2) Put the consumer.poll() logic straight in getMessages call.

No big difference. Taking thread dumps and checking them against system
threads all seems to point the finger agains the consumer.poll() logic.

Are there any server/client side tuning that can help?
Any suggestion on what to investigate further in order to get a clear
answer on why those few threads are adding so much CPU usage?

*Before trying #2:*
I have not tried this solution yet. My main concern is that my consumer
threads will have to handle multiple topics.
Is there a "best practice" or limit in terms of topics per consumer thread?

Any help is much appreciated