You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Helin Xiang <xk...@gmail.com> on 2013/04/17 12:00:18 UTC

Question about Partitions not receive equal number messages

Hi,
We are using kafka 0.7.2.

The situation is a little complicated:

1. We use Java API and multi-thread to send logs to kafka.  (like 16
threads).  Each thread contain its own kafka.javaapi.producer.Producer
object.
2. There is one topic which the partition of is set to 4. we use random
partition to send.
3. We generate messages of this topic at speed of 100 per second, so each
thread only gets several logs per seconds.

But we find the 4 partition gets unbalanced data. partition 0 gets logs 10
times  more than partition 1 ,2 and 3.  Partition 1 , 2 , 3 gets nearly
equal messages.

after that, we set threads to 1, this unbalanced phenomenon vanished.

we are not sure what happened under the java api of Producer.
Could any one explain it ?
Or is it necessary to generate new kafka.javaapi.producer.Producer object
in each thread? I hear the kafka.javaapi.producer.Producer class is thread
safe, but I don't know if 1 producer object can handle large throughput?


THANKS


-- 
*Best Regards

Xiang Helin*

Re: Question about Partitions not receive equal number messages

Posted by 王国栋 <wa...@gmail.com>.
Hi Neha,

We can not understand why the partitions will be unbalanced if each thread
gets different number of messages.

We go through the code of producer, and the partition number is generated
by "random.nextInt(numOfPartitions)", so we think even if different thread
gets different number of messages, theoretically, the partition number
shoule be evenly distributed.

Could you please give us more detailed explanation about this?

Thanks a lot.

Guodong


On Wed, Apr 17, 2013 at 9:29 PM, Neha Narkhede <ne...@gmail.com>wrote:

> I suspect each of the threads are not assigned equal number of messages to
> send. I don't think it matter whether you use one producer or more as long
> as you distribute work amongst those threads equally.
>
> Thanks,
> Neha
>
> On Wednesday, April 17, 2013, Helin Xiang wrote:
>
> > Hi,
> > We are using kafka 0.7.2.
> >
> > The situation is a little complicated:
> >
> > 1. We use Java API and multi-thread to send logs to kafka.  (like 16
> > threads).  Each thread contain its own kafka.javaapi.producer.Producer
> > object.
> > 2. There is one topic which the partition of is set to 4. we use random
> > partition to send.
> > 3. We generate messages of this topic at speed of 100 per second, so each
> > thread only gets several logs per seconds.
> >
> > But we find the 4 partition gets unbalanced data. partition 0 gets logs
> 10
> > times  more than partition 1 ,2 and 3.  Partition 1 , 2 , 3 gets nearly
> > equal messages.
> >
> > after that, we set threads to 1, this unbalanced phenomenon vanished.
> >
> > we are not sure what happened under the java api of Producer.
> > Could any one explain it ?
> > Or is it necessary to generate new kafka.javaapi.producer.Producer object
> > in each thread? I hear the kafka.javaapi.producer.Producer class is
> thread
> > safe, but I don't know if 1 producer object can handle large throughput?
> >
> >
> > THANKS
> >
> >
> > --
> > *Best Regards
> >
> > Xiang Helin*
> >
>

Re: Question about Partitions not receive equal number messages

Posted by Neha Narkhede <ne...@gmail.com>.
I suspect each of the threads are not assigned equal number of messages to
send. I don't think it matter whether you use one producer or more as long
as you distribute work amongst those threads equally.

Thanks,
Neha

On Wednesday, April 17, 2013, Helin Xiang wrote:

> Hi,
> We are using kafka 0.7.2.
>
> The situation is a little complicated:
>
> 1. We use Java API and multi-thread to send logs to kafka.  (like 16
> threads).  Each thread contain its own kafka.javaapi.producer.Producer
> object.
> 2. There is one topic which the partition of is set to 4. we use random
> partition to send.
> 3. We generate messages of this topic at speed of 100 per second, so each
> thread only gets several logs per seconds.
>
> But we find the 4 partition gets unbalanced data. partition 0 gets logs 10
> times  more than partition 1 ,2 and 3.  Partition 1 , 2 , 3 gets nearly
> equal messages.
>
> after that, we set threads to 1, this unbalanced phenomenon vanished.
>
> we are not sure what happened under the java api of Producer.
> Could any one explain it ?
> Or is it necessary to generate new kafka.javaapi.producer.Producer object
> in each thread? I hear the kafka.javaapi.producer.Producer class is thread
> safe, but I don't know if 1 producer object can handle large throughput?
>
>
> THANKS
>
>
> --
> *Best Regards
>
> Xiang Helin*
>