You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by abhijeet kadam <ab...@gmail.com> on 2014/03/05 21:06:10 UTC

Kafka Producer load distribution

Hi, I am new with kafka and using kafka 0.8 to build a distributed queuing
system in amazon web service cluster.


I have 4 machines Z1, B1, B2 and B3. 1 Zookeeper instance is running on Z1
and 3 different brokers are running on B1,B2 and B3 respectively.


I am running 3 producers on 3 broker machines(B1, B2, B3) , one in each
machine. Similarly 3 consumers  on 3 broker machines, one in each machine.


I created a topic , lets say 'test', with 12 partitions (test-0,test-1 ...
test-11).
4 partitions in each broker machine.
   B1 - test-0,test-1,test-2,test-3
   B2 - test-4,test-5,test-6,test-7
   B3 - test-8,test-9,test-10,test-11

Zookeeper assigned broker in each machine as a leader to the partitions
present in the same machine.
Partition   -  leader
test-0     -    B1
test-1     -    B1
test-2     -    B1
test-3     -    B1
test-4     -    B2
test-5     -    B2
test-6     -    B2
test-7     -    B2
test-8     -    B3
test-9     -    B3
test-10     -  B3
test-11     -  B3

All 3 producers are producing messages to this topic 'test' and all 3
consumers are trying to consume from the same topic 'test'.

What I am trying to achieve here is , whenever a producer send a message to
this topic , it should use the broker present in the same machine as
producer and ultimately using the partitions in the same machine.
Producer 1 ---> B1 ---->  (test-0,test-1,test-2,test-3) -----> consumer 1
Producer 2 ---> B2 ---->  (test-4,test-5,test-6,test-7) -----> consumer 2
Producer 3 ---> B3 ---->  (test-8,test-9,test-10,test-11) -----> consumer 3

I am assuming this will reduce the inter-machine message transfer and will
improve the performance.

My questions are :

1) Does it really help in improving performance, when message is produced
and consumed from same machine in a distributed environment.

2) I read that producer can fetch metadata from broker about all
leader-partition mapping for a topic. It will help to pick the leader
present in the same machine as producer. How a producer can fetch this
metadata ? Could not find any implementation.

Thanks in advance,
Abhijeet

Re: Kafka Producer load distribution

Posted by Joel Koshy <jj...@gmail.com>.
> I am assuming this will reduce the inter-machine message transfer and will
> improve the performance.
> 
> My questions are :
> 
> 1) Does it really help in improving performance, when message is produced
> and consumed from same machine in a distributed environment.

I doubt that it helps a whole lot - especially if you let the producer
batch messages in a single request (default).

> 2) I read that producer can fetch metadata from broker about all
> leader-partition mapping for a topic. It will help to pick the leader
> present in the same machine as producer. How a producer can fetch this
> metadata ? Could not find any implementation.

You can use the SyncProducer or SimpleConsumer class - which provide a
send(<request>) API that can accept a topic metadata request and
returns a topic metadata response.

-- 
Joel