You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Sandy Waters <sa...@gmail.com> on 2015/07/16 02:19:35 UTC

Load Balancing Kafka

Hi all,

Do I need to load balance against the brokers?  I am using the python
driver and it seems to only want a single kafka broker host.  However, in a
situation where I have 10 brokers, is it still fine to just give it one
host.  Does zookeeper and kafka handle the load balancing and redirect my
push somewhere else?

Would it hurt if I load balanced with Nginx and had it do round robin to
the brokers?

Much thanks for any help.

-Sandy

Re: Load Balancing Kafka

Posted by Dana Powers <da...@rd.io>.
I think the answer here is that the Kafka protocol includes a broker
metadata api. The client uses the broker host(s) you provide to discover
the full list of brokers in the cluster (and the topics+partitions each
manages/leads). The java client has a similar interface via
metadata.brokers.list / bootstrap.servers.

-Dana
AhŠ It seems you are more focusing on producer side workload balanceŠ If
that is the case, please ignore my previous comments.

Jiangjie (Becket) Qin

On 7/15/15, 6:01 PM, "Jiangjie Qin" <jq...@linkedin.com> wrote:

>If you have pretty balanced traffic on each partition and have set
>auto.leader.rebalance.enabled to true or false, you might not need to do
>further workload balance.
>
>However, in most cases you probably still need to do some sort of load
>balancing based on the traffic and disk utilization of each broker. You
>might want to do leader migration and/or partition reassignment.
>
>Leader migration is a cheaper rebalance and mostly addresses CPU and
>Network unbalance. Partition reassignment is a much more expensive
>operation as it moves actual data, this can help with disk utilization in
>addition to CPU and network.
>
>Thanks,
>
>Jiangjie (Becket) Qin
>
>On 7/15/15, 5:19 PM, "Sandy Waters" <sa...@gmail.com> wrote:
>
>>Hi all,
>>
>>Do I need to load balance against the brokers?  I am using the python
>>driver and it seems to only want a single kafka broker host.  However, in
>>a
>>situation where I have 10 brokers, is it still fine to just give it one
>>host.  Does zookeeper and kafka handle the load balancing and redirect my
>>push somewhere else?
>>
>>Would it hurt if I load balanced with Nginx and had it do round robin to
>>the brokers?
>>
>>Much thanks for any help.
>>
>>-Sandy
>

Re: Load Balancing Kafka

Posted by Jiangjie Qin <jq...@linkedin.com.INVALID>.
AhŠ It seems you are more focusing on producer side workload balanceŠ If
that is the case, please ignore my previous comments.

Jiangjie (Becket) Qin

On 7/15/15, 6:01 PM, "Jiangjie Qin" <jq...@linkedin.com> wrote:

>If you have pretty balanced traffic on each partition and have set
>auto.leader.rebalance.enabled to true or false, you might not need to do
>further workload balance.
>
>However, in most cases you probably still need to do some sort of load
>balancing based on the traffic and disk utilization of each broker. You
>might want to do leader migration and/or partition reassignment.
>
>Leader migration is a cheaper rebalance and mostly addresses CPU and
>Network unbalance. Partition reassignment is a much more expensive
>operation as it moves actual data, this can help with disk utilization in
>addition to CPU and network.
>
>Thanks,
>
>Jiangjie (Becket) Qin
>
>On 7/15/15, 5:19 PM, "Sandy Waters" <sa...@gmail.com> wrote:
>
>>Hi all,
>>
>>Do I need to load balance against the brokers?  I am using the python
>>driver and it seems to only want a single kafka broker host.  However, in
>>a
>>situation where I have 10 brokers, is it still fine to just give it one
>>host.  Does zookeeper and kafka handle the load balancing and redirect my
>>push somewhere else?
>>
>>Would it hurt if I load balanced with Nginx and had it do round robin to
>>the brokers?
>>
>>Much thanks for any help.
>>
>>-Sandy
>


Re: Load Balancing Kafka

Posted by Jiangjie Qin <jq...@linkedin.com.INVALID>.
If you have pretty balanced traffic on each partition and have set
auto.leader.rebalance.enabled to true or false, you might not need to do
further workload balance.

However, in most cases you probably still need to do some sort of load
balancing based on the traffic and disk utilization of each broker. You
might want to do leader migration and/or partition reassignment.

Leader migration is a cheaper rebalance and mostly addresses CPU and
Network unbalance. Partition reassignment is a much more expensive
operation as it moves actual data, this can help with disk utilization in
addition to CPU and network.

Thanks,

Jiangjie (Becket) Qin

On 7/15/15, 5:19 PM, "Sandy Waters" <sa...@gmail.com> wrote:

>Hi all,
>
>Do I need to load balance against the brokers?  I am using the python
>driver and it seems to only want a single kafka broker host.  However, in
>a
>situation where I have 10 brokers, is it still fine to just give it one
>host.  Does zookeeper and kafka handle the load balancing and redirect my
>push somewhere else?
>
>Would it hurt if I load balanced with Nginx and had it do round robin to
>the brokers?
>
>Much thanks for any help.
>
>-Sandy


Re: Load Balancing Kafka

Posted by Terry Bates <te...@gmail.com>.
Greetings Sandy,

Folks smarter than me can correct me if I am wrong. Using Python client you
don't have to connect to Zookeeper, so just specifying one of the brokers
should be sufficient. In terms of what happens to your messages as your
client produces them, they should be randomly assigned to a partition of
the topic you specify, lest you use keyed messages, that will send a
messages to a particular partition based on the key:

http://kafka.apache.org/documentation.html#theproducer

How to actually do that process, of relating keys messages have to
particular partitions is beyond my realm of knowledge. I suspect the
concern is flooding one broker with messages, while the others are
underutilized. I believe Kafka's architecture ensures that will while only
one broker will be the leader for a particular partition, and take writes
for that partition, other brokers that are not leader for a particular
partition will eventually be in-sync with the leader for a particular
partition. So, I don't think you need to worry about sending your messages
to VIP and having to direct where messages end up with manual
load-balancing, even if your messages are assigned to a partition randomly.
hth!



*Terry Bates*

*Email: *terryjbates@gmail.com
*Phone: (*412) 215-0881
*Skype*: terryjbates
*GitHub*: https://github.com/terryjbates
*Linkedin*: http://www.linkedin.com/in/terryjbates/


On Wed, Jul 15, 2015 at 5:19 PM, Sandy Waters <sa...@gmail.com>
wrote:

> Hi all,
>
> Do I need to load balance against the brokers?  I am using the python
> driver and it seems to only want a single kafka broker host.  However, in a
> situation where I have 10 brokers, is it still fine to just give it one
> host.  Does zookeeper and kafka handle the load balancing and redirect my
> push somewhere else?
>
> Would it hurt if I load balanced with Nginx and had it do round robin to
> the brokers?
>
> Much thanks for any help.
>
> -Sandy
>