You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by "Prasad, Karthik" <Ka...@sony.com> on 2017/03/21 22:40:15 UTC

Question about kafka-streams task load balancing

Hey,

I have a typical scenario of a kafka-streams application in a production environment.

We have a kafka-cluster with multiple topics. Messages from one topic is being consumed by a the kafka-streams application. The topic, currently, has 9 partitions. We have configured consumer thread count to 14. We are running 2 instances of this stream application on 2 different machines, thereby consisting of 28 threads across both machines. The group id for the consumers are the same. But, what I observe is that all partitions are being assigned to threads on a single machine. Now, I do understand that if the task on the active machine fails, then the threads in the other machine would take over. My question is that is there a way that kafka-streams can auto-balance across instances of the same stream application ? If yes, how do I go about doing that ? Please let me know. Thanks,

Best,
Karthik Prasad
Senior Software Engineer
Sony Interactive Entertainment



Re: Question about kafka-streams task load balancing

Posted by Guozhang Wang <wa...@gmail.com>.
Hi Karthik,

I think in the current trunk we do effectively load balance across
processes (they are named as "clients" in the partition assignor) already.
More specifically:

1. Consumer clients embedded a "client UUID" in its subscription so that
the leader can group them into a single client, whose "capacity" is the
number of threads it has.
2. Suppose there are N tasks in total, and M capacities (i.e. M
num.total.threads): then for each client with a capacity of m, it will
likely to get (N / M) * m tasks no matter if N > M or N < M.
3. So in the case of N < M even, say in the above example N = 9 and M = 28,
each client should have 9 / 28 * 14 = 4.5 tasks.


You could try to build your app from Kafka trunk and see if this is the
case in your scenario. Never the less, Matthias point is still valid that
we do not recommend you ever have N < M since it will result in idle
threads.

Guozhang


On Tue, Mar 21, 2017 at 4:56 PM, Matthias J. Sax <ma...@confluent.io>
 wrote:

> Hi,
>
> I guess, it's currently not possible to load balance between different
> machines. It might be a nice optimization to add into Streams though.
>
> Right now, you should reduce the number of threads. Load balancing is
> based on threads, and thus, if Streams place tasks to all threads of one
> machine, it will automatically assign the remaining tasks to thread of
> the second machine.
>
> Btw: If you have only 9 input partitions, you will get most likely 9
> tasks (might be more, depending on your topology structure) and thus,
> you cannot utilize more then 9 thread anyway. Thus, running with 28
> thread will most likely result in many idle threads.
>
> See the docs for more details:
>
>  -
> http://docs.confluent.io/current/streams/architecture.
> html#parallelism-model
>  -
> http://docs.confluent.io/current/streams/architecture.html#threading-model
>
>
>
> -Matthias
>
> On 3/21/17 3:40 PM, Prasad, Karthik wrote:
> > Hey,
> >
> > I have a typical scenario of a kafka-streams application in a production
> environment.
> >
> > We have a kafka-cluster with multiple topics. Messages from one topic is
> being consumed by a the kafka-streams application. The topic, currently,
> has 9 partitions. We have configured consumer thread count to 14. We are
> running 2 instances of this stream application on 2 different machines,
> thereby consisting of 28 threads across both machines. The group id for the
> consumers are the same. But, what I observe is that all partitions are
> being assigned to threads on a single machine. Now, I do understand that if
> the task on the active machine fails, then the threads in the other machine
> would take over. My question is that is there a way that kafka-streams can
> auto-balance across instances of the same stream application ? If yes, how
> do I go about doing that ? Please let me know. Thanks,
> >
> > Best,
> > Karthik Prasad
> > Senior Software Engineer
> > Sony Interactive Entertainment
> >
> >
> >
>
>


-- 
Thanks,
Guozhang

*Guozhang Wang | Software Engineer | Confluent | +1 607.339.8352
<607.339.8352> *

On Tue, Mar 21, 2017 at 4:56 PM, Matthias J. Sax <ma...@confluent.io>
wrote:

> Hi,
>
> I guess, it's currently not possible to load balance between different
> machines. It might be a nice optimization to add into Streams though.
>
> Right now, you should reduce the number of threads. Load balancing is
> based on threads, and thus, if Streams place tasks to all threads of one
> machine, it will automatically assign the remaining tasks to thread of
> the second machine.
>
> Btw: If you have only 9 input partitions, you will get most likely 9
> tasks (might be more, depending on your topology structure) and thus,
> you cannot utilize more then 9 thread anyway. Thus, running with 28
> thread will most likely result in many idle threads.
>
> See the docs for more details:
>
>  -
> http://docs.confluent.io/current/streams/architecture.
> html#parallelism-model
>  -
> http://docs.confluent.io/current/streams/architecture.html#threading-model
>
>
>
> -Matthias
>
> On 3/21/17 3:40 PM, Prasad, Karthik wrote:
> > Hey,
> >
> > I have a typical scenario of a kafka-streams application in a production
> environment.
> >
> > We have a kafka-cluster with multiple topics. Messages from one topic is
> being consumed by a the kafka-streams application. The topic, currently,
> has 9 partitions. We have configured consumer thread count to 14. We are
> running 2 instances of this stream application on 2 different machines,
> thereby consisting of 28 threads across both machines. The group id for the
> consumers are the same. But, what I observe is that all partitions are
> being assigned to threads on a single machine. Now, I do understand that if
> the task on the active machine fails, then the threads in the other machine
> would take over. My question is that is there a way that kafka-streams can
> auto-balance across instances of the same stream application ? If yes, how
> do I go about doing that ? Please let me know. Thanks,
> >
> > Best,
> > Karthik Prasad
> > Senior Software Engineer
> > Sony Interactive Entertainment
> >
> >
> >
>
>


-- 
-- Guozhang

Re: Question about kafka-streams task load balancing

Posted by "Matthias J. Sax" <ma...@confluent.io>.
Hi,

I guess, it's currently not possible to load balance between different
machines. It might be a nice optimization to add into Streams though.

Right now, you should reduce the number of threads. Load balancing is
based on threads, and thus, if Streams place tasks to all threads of one
machine, it will automatically assign the remaining tasks to thread of
the second machine.

Btw: If you have only 9 input partitions, you will get most likely 9
tasks (might be more, depending on your topology structure) and thus,
you cannot utilize more then 9 thread anyway. Thus, running with 28
thread will most likely result in many idle threads.

See the docs for more details:

 -
http://docs.confluent.io/current/streams/architecture.html#parallelism-model
 -
http://docs.confluent.io/current/streams/architecture.html#threading-model



-Matthias

On 3/21/17 3:40 PM, Prasad, Karthik wrote:
> Hey,
> 
> I have a typical scenario of a kafka-streams application in a production environment.
> 
> We have a kafka-cluster with multiple topics. Messages from one topic is being consumed by a the kafka-streams application. The topic, currently, has 9 partitions. We have configured consumer thread count to 14. We are running 2 instances of this stream application on 2 different machines, thereby consisting of 28 threads across both machines. The group id for the consumers are the same. But, what I observe is that all partitions are being assigned to threads on a single machine. Now, I do understand that if the task on the active machine fails, then the threads in the other machine would take over. My question is that is there a way that kafka-streams can auto-balance across instances of the same stream application ? If yes, how do I go about doing that ? Please let me know. Thanks,
> 
> Best,
> Karthik Prasad
> Senior Software Engineer
> Sony Interactive Entertainment
> 
> 
>