You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Krzysztof Nawara <kr...@cern.ch> on 2016/07/19 21:33:35 UTC

Performance of producer sending to many vs to few topics

Hi,

I have been testing Kafka in order to determine how does number of partitions affects performance. In order to do that, I have set up 2 Kafka nodes and 1 Zookeeper nodes, and used 8 producers running on different machines to send messages.
First, my producers were sending messages to broker selecting topics in round-robin fashion - each message would go to different topic. In second scenario I created just as many topics, but all producers concentrated on sending to just one of them. In first case, with 1000-topic/cluster throughput was ~250 times smaller than with 1-topic/cluster scenario. In the second one it was more like 1.5-2 times slower - huge difference.   
Do you have any ideas what might be the cause? Two things I came up with were: under-utilization of producer batching and a lot of random IO on the brokers. Both are just wild guesses.  I have been using stock configuration - can you think about any particular properties I should play with?

Details:
Machines (Kafka, Zk): STRATOS S810-X52L (32 cores, 64GB RAM), data stored on single dedicated SATA drive
Machines (producers): Openstacks VMs (4 vCPUs, 8GB, 40GBs on SSD) 
Partitions per topic: 2
Replication factor: 2

Cordially,
Chris

Re: Performance of producer sending to many vs to few topics

Posted by Krzysztof Nawara <kr...@cern.ch>.
Hello,

No, in all test I used two partitions per topic, one per broker. The
only variable in the test was number of topics. From what I've read what
really affects performance is total number of partitions per broker
(probably including replicas), so 1 topic with 1000 partitions should
pretty much offer similar performance characteristics as 1000 topics
with single partition each (provided of course we utilize those
partitions in the same manner). Do your findings contradict this?

As for metadata exchange - you mean communication with Zookeeper or
nodes in the cluster directly exchange any per-partition metadata?

Chris

On Tue, 2016-07-19 at 23:47 -0700, R Krishna wrote:
> We did similar testing recently, newbie here, assuming you did async
> publisher, did you also test with multiple partitions (1-1000) per topic as
> well. More topics, implies more metadata per topic exchanged every minute,
> more batches maintained and flushed per topic+partition per producer so
> higher CPU/memory usage. What we noticed was for the same topic, more
> producers gave more throughput, and increasing partitions reduced
> throughput probably for the similar reasoning as above. You may want to try
> again by increasing these metadata exchange timeouts and memory as you
> increase topics/partitions.
> 
> 
> 
> 
> 
> 
> 
> On Tue, Jul 19, 2016 at 2:33 PM, Krzysztof Nawara <kr...@cern.ch>
> wrote:
> 
> > Hi,
> >
> > I have been testing Kafka in order to determine how does number of
> > partitions affects performance. In order to do that, I have set up 2 Kafka
> > nodes and 1 Zookeeper nodes, and used 8 producers running on different
> > machines to send messages.
> > First, my producers were sending messages to broker selecting topics in
> > round-robin fashion - each message would go to different topic. In second
> > scenario I created just as many topics, but all producers concentrated on
> > sending to just one of them. In first case, with 1000-topic/cluster
> > throughput was ~250 times smaller than with 1-topic/cluster scenario. In
> > the second one it was more like 1.5-2 times slower - huge difference.
> > Do you have any ideas what might be the cause? Two things I came up with
> > were: under-utilization of producer batching and a lot of random IO on the
> > brokers. Both are just wild guesses.  I have been using stock configuration
> > - can you think about any particular properties I should play with?
> >
> > Details:
> > Machines (Kafka, Zk): STRATOS S810-X52L (32 cores, 64GB RAM), data stored
> > on single dedicated SATA drive
> > Machines (producers): Openstacks VMs (4 vCPUs, 8GB, 40GBs on SSD)
> > Partitions per topic: 2
> > Replication factor: 2
> >
> > Cordially,
> > Chris
> 
> 
> 
> 



Re: Performance of producer sending to many vs to few topics

Posted by R Krishna <kr...@gmail.com>.
We did similar testing recently, newbie here, assuming you did async
publisher, did you also test with multiple partitions (1-1000) per topic as
well. More topics, implies more metadata per topic exchanged every minute,
more batches maintained and flushed per topic+partition per producer so
higher CPU/memory usage. What we noticed was for the same topic, more
producers gave more throughput, and increasing partitions reduced
throughput probably for the similar reasoning as above. You may want to try
again by increasing these metadata exchange timeouts and memory as you
increase topics/partitions.







On Tue, Jul 19, 2016 at 2:33 PM, Krzysztof Nawara <kr...@cern.ch>
wrote:

> Hi,
>
> I have been testing Kafka in order to determine how does number of
> partitions affects performance. In order to do that, I have set up 2 Kafka
> nodes and 1 Zookeeper nodes, and used 8 producers running on different
> machines to send messages.
> First, my producers were sending messages to broker selecting topics in
> round-robin fashion - each message would go to different topic. In second
> scenario I created just as many topics, but all producers concentrated on
> sending to just one of them. In first case, with 1000-topic/cluster
> throughput was ~250 times smaller than with 1-topic/cluster scenario. In
> the second one it was more like 1.5-2 times slower - huge difference.
> Do you have any ideas what might be the cause? Two things I came up with
> were: under-utilization of producer batching and a lot of random IO on the
> brokers. Both are just wild guesses.  I have been using stock configuration
> - can you think about any particular properties I should play with?
>
> Details:
> Machines (Kafka, Zk): STRATOS S810-X52L (32 cores, 64GB RAM), data stored
> on single dedicated SATA drive
> Machines (producers): Openstacks VMs (4 vCPUs, 8GB, 40GBs on SSD)
> Partitions per topic: 2
> Replication factor: 2
>
> Cordially,
> Chris




-- 
Radha Krishna, Proddaturi
253-234-5657