You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@storm.apache.org by "Nithin Uppalapati (BLOOMBERG/ 731 LEX)" <nu...@bloomberg.net> on 2018/08/07 13:48:51 UTC

Kafka Spout Performance Tuning

Hi,

The CPU utilization is going high to around 400% with our topology. So to analyze more deeply and segregate areas of high CPU utilization I commented out the entire topology except the KafkaSpout, so basically my topology only has KafkaSpout and CPU utilization is around 150% on a 20 core machine. Topology is running using a single worker process with Kafka Parallelism set equal to the number of partitions in the kafka. 

The data load during this phase is a total of 50k records, at a rate of 1600/sec - 2200/sec.

Question: how to tune the performance of KafkaSpout, to reduce CPU utilization which is around 150% with just kafkaspout? The below parameters definitions does not give an idea. Also, is there a way to control the reading of data from the kafka in a spout?

Following are the values of some of the parameters:

*poll.timeout.ms to 200.
*offset.commit.period.ms to 30000 (30 seconds).
*max.uncommitted.offsets to 10000000 (ten million)


Re: Kafka Spout Performance Tuning

Posted by Hugo Louro <hm...@gmail.com>.
Hi,

Which Storm and Kafka versions are you using ? How many Kafka partitions do you have ? Is there a way for you to do a live profile of the application to see what is happening ?

You can control the number of records fetched on each poll using properties such as 

max.poll.records
fetch.max.bytes
max.partition.fetch.bytes

You can check the Kafka new consumer properties documentation for details.

Hugo

> On Aug 7, 2018, at 6:48 AM, Nithin Uppalapati (BLOOMBERG/ 731 LEX) <nu...@bloomberg.net> wrote:
> 
> Hi,
> 
> The CPU utilization is going high to around 400% with our topology. So to analyze more deeply and segregate areas of high CPU utilization I commented out the entire topology except the KafkaSpout, so basically my topology only has KafkaSpout and CPU utilization is around 150% on a 20 core machine. Topology is running using a single worker process with Kafka Parallelism set equal to the number of partitions in the kafka. 
> 
> The data load during this phase is a total of 50k records, at a rate of 1600/sec - 2200/sec.
> 
> Question: how to tune the performance of KafkaSpout, to reduce CPU utilization which is around 150% with just kafkaspout? The below parameters definitions does not give an idea. Also, is there a way to control the reading of data from the kafka in a spout?
> 
> Following are the values of some of the parameters:
> poll.timeout.ms to 200.
> offset.commit.period.ms to 30000 (30 seconds).
> max.uncommitted.offsets to 10000000 (ten million)
>