You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@kafka.apache.org by prashant amar <am...@gmail.com> on 2013/09/12 22:56:55 UTC

Producer not distributing across all partitions

I created a topic with 4 partitions and for some reason the producer is
pushing only to one partition.

This is consistently happening across all topics that I created ...

Is there a specific configuration that I need to apply to ensure that load
is evenly distributed across all partitions?


Group           Topic                          Pid Offset          logSize
        Lag             Owner
perfgroup1      perfpayload1                   0   10965           11220
        255             perfgroup1_XXXX-0
perfgroup1      perfpayload1                   1   0               0
        0               perfgroup1_XXXX-1
perfgroup1      perfpayload1                   2   0               0
        0               perfgroup1_XXXXX-2
perfgroup1      perfpayload1                   3   0               0
        0               perfgroup1_XXXXX-3

Re: Producer not distributing across all partitions

Posted by Neha Narkhede <ne...@gmail.com>.

Are you using Kafka 07 or 08 ?



On Thu, Sep 12, 2013 at 1:56 PM, prashant amar <am...@gmail.com> wrote:

> I created a topic with 4 partitions and for some reason the producer is
> pushing only to one partition.
>
> This is consistently happening across all topics that I created ...
>
> Is there a specific configuration that I need to apply to ensure that load
> is evenly distributed across all partitions?
>
>
> Group           Topic                          Pid Offset          logSize
>         Lag             Owner
> perfgroup1      perfpayload1                   0   10965           11220
>         255             perfgroup1_XXXX-0
> perfgroup1      perfpayload1                   1   0               0
>         0               perfgroup1_XXXX-1
> perfgroup1      perfpayload1                   2   0               0
>         0               perfgroup1_XXXXX-2
> perfgroup1      perfpayload1                   3   0               0
>         0               perfgroup1_XXXXX-3
>

Re: Producer not distributing across all partitions

Posted by Neha Narkhede <ne...@gmail.com>.

Are you using Kafka 07 or 08 ?



On Thu, Sep 12, 2013 at 1:56 PM, prashant amar <am...@gmail.com> wrote:

> I created a topic with 4 partitions and for some reason the producer is
> pushing only to one partition.
>
> This is consistently happening across all topics that I created ...
>
> Is there a specific configuration that I need to apply to ensure that load
> is evenly distributed across all partitions?
>
>
> Group           Topic                          Pid Offset          logSize
>         Lag             Owner
> perfgroup1      perfpayload1                   0   10965           11220
>         255             perfgroup1_XXXX-0
> perfgroup1      perfpayload1                   1   0               0
>         0               perfgroup1_XXXX-1
> perfgroup1      perfpayload1                   2   0               0
>         0               perfgroup1_XXXXX-2
> perfgroup1      perfpayload1                   3   0               0
>         0               perfgroup1_XXXXX-3
>

Re: Producer not distributing across all partitions

Posted by Swapnil Ghike <sg...@linkedin.com>.

I meant to say that messages were appended to two different partitions, so
one partition received 5 messages and other received 5 messages out of 10
messages that were produced, say. No messages were duplicated across
partitions.

Swapnil

On 9/14/13 11:03 PM, "chetan conikee" <co...@gmail.com> wrote:

>Swapnil
>
>What do you mean by "I did a local test today that showed that choosing
>DefaultPartitioner with
>null key in the messages appended data to multiple partitions"?
>
>Are messages being duplicated across partitions?
>
>-Chetan
>
>
>On Sat, Sep 14, 2013 at 9:02 PM, Swapnil Ghike <sg...@linkedin.com>
>wrote:
>
>> Hi Joe, Drew,
>>
>> In 0.8 HEAD, if the key is null, the DefaultEventHandler randomly
>>chooses
>> an available partition and never calls the partitioner.partition(key,
>> numPartitions) method. This is done in lines 204 to 212 of the github
>> commit Drew pointed to, though that piece of code is slightly different
>>now
>> because of KAFKA-1017 and KAFKA-959.
>>
>> I did a local test today that showed that choosing DefaultPartitioner
>>with
>> null key in the messages appended data to multiple partitions. For this
>> Test, I set topic.metadata.refresh.interval.ms to 1 second because 0.8
>> HEAD
>> Sticks to a partition in a given topic.metadata.refresh.interval.ms (as
>>is
>> being discussed in the other e-mail thread on dev@kafka).
>>
>> Please let me know if you see different results.
>>
>> Thanks,
>> Swapnil
>>
>>
>>
>> On 9/13/13 1:48 PM, "Joe Stein" <cr...@gmail.com> wrote:
>>
>> >Isn't this a bug?
>> >
>> >I don't see why we would want users to have to code and generate random
>> >partition keys to randomly distributed the data to partitions, that is
>> >Kafka's job isn't it?
>> >
>> >Or if supplying a null value tell the user this is not supported (throw
>> >exception) in KeyedMessage like we do for topic and not treat null as a
>> >key
>> >to hash?
>> >
>> >My preference is to put those three lines back in and let key be null
>>and
>> >give folks randomness unless its not a bug and there is a good reason
>>for
>> >it?
>> >
>> >Is there something about
>> >https://issues.apache.org/jira/browse/KAFKA-691that requires the lines
>> >taken out? I haven't had a chance to look through
>> >it yet
>> >
>> >My thought is a new person coming in they would expect to see the
>> >partitions filling up in a round robin fashion as our docs says and
>>unless
>> >we force them in the API to know they have to-do this or give them the
>> >ability for this to happen when passing nothing in
>> >
>> >/*******************************************
>> > Joe Stein
>> > Founder, Principal Consultant
>> > Big Data Open Source Security LLC
>> > http://www.stealth.ly
>> > Twitter: @allthingshadoop <http://www.twitter.com/allthingshadoop>
>> >********************************************/
>> >
>> >
>> >On Fri, Sep 13, 2013 at 4:17 PM, Drew Goya <dr...@gradientx.com> wrote:
>> >
>> >> I ran into this problem as well Prashant.  The default partition key
>>was
>> >> recently changed:
>> >>
>> >>
>> >>
>> >>
>> 
>>https://github.com/apache/kafka/commit/b71e6dc352770f22daec0c9a3682138666
>> >>f032be
>> >>
>> >> It no longer assigns a random partition to data with a null partition
>> >>key.
>> >>  I had to change my code to generate random partition keys to get the
>> >> randomly distributed behavior the producer used to have.
>> >>
>> >>
>> >> On Fri, Sep 13, 2013 at 11:42 AM, prashant amar <am...@gmail.com>
>> >> wrote:
>> >>
>> >> > Thanks Neha
>> >> >
>> >> > I will try applying this property and circle back.
>> >> >
>> >> > Also, I have been attempting to execute kafka-producer-perf-test.sh
>> >>and I
>> >> > receive the following error
>> >> >
>> >> >        Error: Could not find or load main class
>> >> > kafka.perf.ProducerPerformance
>> >> >
>> >> > I am running against 0.8.0-beta1
>> >> >
>> >> > Seems like perf is a separate project in the workspace.
>> >> >
>> >> > Does sbt package-assembly bundle the perf jar as well?
>> >> >
>> >> > Neither producer-perf-test not consumer-test are working with this
>> >>build
>> >> >
>> >> >
>> >> >
>> >> > On Fri, Sep 13, 2013 at 9:56 AM, Neha Narkhede
>> >><neha.narkhede@gmail.com
>> >> > >wrote:
>> >> >
>> >> > > As Jun suggested, one reason could be that the
>> >> > > topic.metadata.refresh.interval.ms is too high. Did you observe
>>if
>> >>the
>> >> > > distribution improves after topic.metadata.refresh.interval.ms
>>has
>> >> > passed
>> >> > > ?
>> >> > >
>> >> > > Thanks
>> >> > > Neha
>> >> > >
>> >> > >
>> >> > > On Fri, Sep 13, 2013 at 4:47 AM, prashant amar
>><amasindhu@gmail.com
>> >
>> >> > > wrote:
>> >> > >
>> >> > > > I am using kafka 08 version ...
>> >> > > >
>> >> > > >
>> >> > > > On Thu, Sep 12, 2013 at 8:44 PM, Jun Rao <ju...@gmail.com>
>> wrote:
>> >> > > >
>> >> > > > > Which revision of 0.8 are you using? In a recent change, a
>> >>producer
>> >> > > will
>> >> > > > > stick to a partition for topic.metadata.refresh.interval.ms
>> >> (defaults
>> >> > > to
>> >> > > > > 10
>> >> > > > > mins) time before picking another partition at random.
>> >> > > > > Thanks,
>> >> > > > > Jun
>> >> > > > >
>> >> > > > >
>> >> > > > > On Thu, Sep 12, 2013 at 1:56 PM, prashant amar <
>> >> amasindhu@gmail.com>
>> >> > > > > wrote:
>> >> > > > >
>> >> > > > > > I created a topic with 4 partitions and for some reason the
>> >> > producer
>> >> > > is
>> >> > > > > > pushing only to one partition.
>> >> > > > > >
>> >> > > > > > This is consistently happening across all topics that I
>> >>created
>> >> ...
>> >> > > > > >
>> >> > > > > > Is there a specific configuration that I need to apply to
>> >>ensure
>> >> > that
>> >> > > > > load
>> >> > > > > > is evenly distributed across all partitions?
>> >> > > > > >
>> >> > > > > >
>> >> > > > > > Group           Topic                          Pid Offset
>> >> > > > >  logSize
>> >> > > > > >         Lag             Owner
>> >> > > > > > perfgroup1      perfpayload1                   0   10965
>> >> > > > 11220
>> >> > > > > >         255             perfgroup1_XXXX-0
>> >> > > > > > perfgroup1      perfpayload1                   1   0
>> >> > 0
>> >> > > > > >         0               perfgroup1_XXXX-1
>> >> > > > > > perfgroup1      perfpayload1                   2   0
>> >> > 0
>> >> > > > > >         0               perfgroup1_XXXXX-2
>> >> > > > > > perfgroup1      perfpayload1                   3   0
>> >> > 0
>> >> > > > > >         0               perfgroup1_XXXXX-3
>> >> > > > > >
>> >> > > > >
>> >> > > >
>> >> > >
>> >> >
>>
>>

Re: Producer not distributing across all partitions

Posted by chetan conikee <co...@gmail.com>.

Swapnil

What do you mean by "I did a local test today that showed that choosing
DefaultPartitioner with
null key in the messages appended data to multiple partitions"?

Are messages being duplicated across partitions?

-Chetan


On Sat, Sep 14, 2013 at 9:02 PM, Swapnil Ghike <sg...@linkedin.com> wrote:

> Hi Joe, Drew,
>
> In 0.8 HEAD, if the key is null, the DefaultEventHandler randomly chooses
> an available partition and never calls the partitioner.partition(key,
> numPartitions) method. This is done in lines 204 to 212 of the github
> commit Drew pointed to, though that piece of code is slightly different now
> because of KAFKA-1017 and KAFKA-959.
>
> I did a local test today that showed that choosing DefaultPartitioner with
> null key in the messages appended data to multiple partitions. For this
> Test, I set topic.metadata.refresh.interval.ms to 1 second because 0.8
> HEAD
> Sticks to a partition in a given topic.metadata.refresh.interval.ms (as is
> being discussed in the other e-mail thread on dev@kafka).
>
> Please let me know if you see different results.
>
> Thanks,
> Swapnil
>
>
>
> On 9/13/13 1:48 PM, "Joe Stein" <cr...@gmail.com> wrote:
>
> >Isn't this a bug?
> >
> >I don't see why we would want users to have to code and generate random
> >partition keys to randomly distributed the data to partitions, that is
> >Kafka's job isn't it?
> >
> >Or if supplying a null value tell the user this is not supported (throw
> >exception) in KeyedMessage like we do for topic and not treat null as a
> >key
> >to hash?
> >
> >My preference is to put those three lines back in and let key be null and
> >give folks randomness unless its not a bug and there is a good reason for
> >it?
> >
> >Is there something about
> >https://issues.apache.org/jira/browse/KAFKA-691that requires the lines
> >taken out? I haven't had a chance to look through
> >it yet
> >
> >My thought is a new person coming in they would expect to see the
> >partitions filling up in a round robin fashion as our docs says and unless
> >we force them in the API to know they have to-do this or give them the
> >ability for this to happen when passing nothing in
> >
> >/*******************************************
> > Joe Stein
> > Founder, Principal Consultant
> > Big Data Open Source Security LLC
> > http://www.stealth.ly
> > Twitter: @allthingshadoop <http://www.twitter.com/allthingshadoop>
> >********************************************/
> >
> >
> >On Fri, Sep 13, 2013 at 4:17 PM, Drew Goya <dr...@gradientx.com> wrote:
> >
> >> I ran into this problem as well Prashant.  The default partition key was
> >> recently changed:
> >>
> >>
> >>
> >>
> https://github.com/apache/kafka/commit/b71e6dc352770f22daec0c9a3682138666
> >>f032be
> >>
> >> It no longer assigns a random partition to data with a null partition
> >>key.
> >>  I had to change my code to generate random partition keys to get the
> >> randomly distributed behavior the producer used to have.
> >>
> >>
> >> On Fri, Sep 13, 2013 at 11:42 AM, prashant amar <am...@gmail.com>
> >> wrote:
> >>
> >> > Thanks Neha
> >> >
> >> > I will try applying this property and circle back.
> >> >
> >> > Also, I have been attempting to execute kafka-producer-perf-test.sh
> >>and I
> >> > receive the following error
> >> >
> >> >        Error: Could not find or load main class
> >> > kafka.perf.ProducerPerformance
> >> >
> >> > I am running against 0.8.0-beta1
> >> >
> >> > Seems like perf is a separate project in the workspace.
> >> >
> >> > Does sbt package-assembly bundle the perf jar as well?
> >> >
> >> > Neither producer-perf-test not consumer-test are working with this
> >>build
> >> >
> >> >
> >> >
> >> > On Fri, Sep 13, 2013 at 9:56 AM, Neha Narkhede
> >><neha.narkhede@gmail.com
> >> > >wrote:
> >> >
> >> > > As Jun suggested, one reason could be that the
> >> > > topic.metadata.refresh.interval.ms is too high. Did you observe if
> >>the
> >> > > distribution improves after topic.metadata.refresh.interval.ms has
> >> > passed
> >> > > ?
> >> > >
> >> > > Thanks
> >> > > Neha
> >> > >
> >> > >
> >> > > On Fri, Sep 13, 2013 at 4:47 AM, prashant amar <amasindhu@gmail.com
> >
> >> > > wrote:
> >> > >
> >> > > > I am using kafka 08 version ...
> >> > > >
> >> > > >
> >> > > > On Thu, Sep 12, 2013 at 8:44 PM, Jun Rao <ju...@gmail.com>
> wrote:
> >> > > >
> >> > > > > Which revision of 0.8 are you using? In a recent change, a
> >>producer
> >> > > will
> >> > > > > stick to a partition for topic.metadata.refresh.interval.ms
> >> (defaults
> >> > > to
> >> > > > > 10
> >> > > > > mins) time before picking another partition at random.
> >> > > > > Thanks,
> >> > > > > Jun
> >> > > > >
> >> > > > >
> >> > > > > On Thu, Sep 12, 2013 at 1:56 PM, prashant amar <
> >> amasindhu@gmail.com>
> >> > > > > wrote:
> >> > > > >
> >> > > > > > I created a topic with 4 partitions and for some reason the
> >> > producer
> >> > > is
> >> > > > > > pushing only to one partition.
> >> > > > > >
> >> > > > > > This is consistently happening across all topics that I
> >>created
> >> ...
> >> > > > > >
> >> > > > > > Is there a specific configuration that I need to apply to
> >>ensure
> >> > that
> >> > > > > load
> >> > > > > > is evenly distributed across all partitions?
> >> > > > > >
> >> > > > > >
> >> > > > > > Group           Topic                          Pid Offset
> >> > > > >  logSize
> >> > > > > >         Lag             Owner
> >> > > > > > perfgroup1      perfpayload1                   0   10965
> >> > > > 11220
> >> > > > > >         255             perfgroup1_XXXX-0
> >> > > > > > perfgroup1      perfpayload1                   1   0
> >> > 0
> >> > > > > >         0               perfgroup1_XXXX-1
> >> > > > > > perfgroup1      perfpayload1                   2   0
> >> > 0
> >> > > > > >         0               perfgroup1_XXXXX-2
> >> > > > > > perfgroup1      perfpayload1                   3   0
> >> > 0
> >> > > > > >         0               perfgroup1_XXXXX-3
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
>
>

Re: Producer not distributing across all partitions

Posted by chetan conikee <co...@gmail.com>.

Prashant

I recall you mentioning that you are on the 0.8 branch ..

If so, can you check your producer to verify if you are using
DefaultParitioner, SimplePartitioner or null (which defaults to
RandomParitioner)?

*kafkaProps.put("partitioner.class", "kafka.producer.DefaultPartitioner") *

Also , if you are using KeyedMessage , are your keys unique hashes to abet
Random Distribution across partitions?

*val producerData: KeyedMessage[String, String] = new KeyedMessage[String,
String](topicName, key, massage)*
*
*
Note that the DefaultPartitioner is based on the hash of the key, so folks
often commit the mistake of setting a dummy key (if one does not care of
the key) and choosing *DefaultPartitioner*  causing the message to
consistently get hashed to only *ONE* partition

Try choosing a sufficiently random/unique key for each message and then
choose Default or Random Partitioner and give it a spin and let us know

Chetan





On Sat, Sep 14, 2013 at 12:45 PM, Swapnil Ghike <sg...@linkedin.com> wrote:

> Hi Prashant,
>
> I tried a local test using a very short topic.metadata.refresh.interval.ms
> on the producer. The server had two partitions and both of them appended
> data. Could you check if you have set the
> topic.metadata.refresh.interval.ms on your producer to a very high value?
>
> Swapnil
>
> On 9/13/13 8:46 PM, "Jun Rao" <ju...@gmail.com> wrote:
>
> >Without fixing KAFKA-1017, the issue is that the producer will maintain a
> >socket connection per min(#partitions, #brokers). If you have lots of
> >producers, the open file handlers on the broker could be an issue.
> >
> >So, what KAFKA-1017 fixes is to pick a random partition and stick to it
> >for
> >a configurable amount of time, and then switch to another random
> >partition.
> >This is the behavior in 0.7 when a load balancer is used and reduces #
> >socket connections significantly.
> >
> >The issue you are reporting seems like a bug though. Which revision in 0.8
> >are you using?
> >
> >Thanks,
> >
> >Jun
> >
> >
> >On Fri, Sep 13, 2013 at 8:28 PM, prashant amar <am...@gmail.com>
> >wrote:
> >
> >> Hi Guozhang, Joe, Drew
> >>
> >> In our case we have been running for the past 3 weeks and it has been
> >> consistently writing only to to the first partition. The rest of the
> >> partitions have empty index files.
> >>
> >> Not sure if I am hitting any issue here.
> >>
> >> I am using  offset checker as my barometer. Also introspect r&d the
> >>folder
> >> and it indicates the same.
> >>
> >> On Friday, September 13, 2013, Guozhang Wang wrote:
> >>
> >> > Hello Joe,
> >> >
> >> > The reason we make the producers to produce to a fixed partition for
> >>each
> >> > metadata-refresh interval are the following:
> >> >
> >> > https://issues.apache.org/jira/browse/KAFKA-1017
> >> >
> >> > https://issues.apache.org/jira/browse/KAFKA-959
> >> >
> >> > So in a word the randomness is still preserved but within one
> >> > metadata-refresh interval the assignment is fixed.
> >> >
> >> > I agree that the document should be updated accordingly.
> >> >
> >> > Guozhang
> >> >
> >> >
> >> > On Fri, Sep 13, 2013 at 1:48 PM, Joe Stein <cr...@gmail.com>
> wrote:
> >> >
> >> > > Isn't this a bug?
> >> > >
> >> > > I don't see why we would want users to have to code and generate
> >>random
> >> > > partition keys to randomly distributed the data to partitions, that
> >>is
> >> > > Kafka's job isn't it?
> >> > >
> >> > > Or if supplying a null value tell the user this is not supported
> >>(throw
> >> > > exception) in KeyedMessage like we do for topic and not treat null
> >>as a
> >> > key
> >> > > to hash?
> >> > >
> >> > > My preference is to put those three lines back in and let key be
> >>null
> >> and
> >> > > give folks randomness unless its not a bug and there is a good
> >>reason
> >> for
> >> > > it?
> >> > >
> >> > > Is there something about
> >> > > https://issues.apache.org/jira/browse/KAFKA-691that requires the
> >>lines
> >> > > taken out? I haven't had a chance to look through
> >> > > it yet
> >> > >
> >> > > My thought is a new person coming in they would expect to see the
> >> > > partitions filling up in a round robin fashion as our docs says and
> >> > unless
> >> > > we force them in the API to know they have to-do this or give them
> >>the
> >> > > ability for this to happen when passing nothing in
> >> > >
> >> > > /*******************************************
> >> > >  Joe Stein
> >> > >  Founder, Principal Consultant
> >> > >  Big Data Open Source Security LLC
> >> > >  http://www.stealth.ly
> >> > >  Twitter: @allthingshadoop <http://www.twitter.com/allthingshadoop>
> >> > > ********************************************/
> >> > >
> >> > >
> >> > > On Fri, Sep 13, 2013 at 4:17 PM, Drew Goya <dr...@gradientx.com>
> >>wrote:
> >> > >
> >> > > > I ran into this problem as well Prashant.  The default partition
> >>key
> >> > was
> >> > > > recently changed:
> >> > > >
> >> > > >
> >> > > >
> >> > >
> >> >
> >>
> >>
> https://github.com/apache/kafka/commit/b71e6dc352770f22daec0c9a3682138666
> >>f032be
> >> > > >
> >> > > > It no longer assigns a random partition to data with a null
> >>partition
> >> > > key.
> >> > > >  I had to change my code to generate random partition keys to get
> >>the
> >> > > > randomly distributed behavior the producer used to have.
> >> > > >
> >> > > >
> >> > > > On Fri, Sep 13, 2013 at 11:42 AM, prashant amar
> >><amasindhu@gmail.com
> >> >
> >> > > > wrote:
> >> > > >
> >> > > > > Thanks Neha
> >> > > > >
> >> > > > > I will try applying this property and circle back.
> >> > > > >
> >> > > > > Also, I have been attempting to execute
> >>kafka-producer-perf-test.sh
> >> > > and I
> >> > > > > receive the following error
> >> > > > >
> >> > > > >        Error: Could not find or load main class
> >> > > > > kafka.perf.ProducerPerformance
> >> > > > >
> >> > > > > I am running against 0.8.0-beta1
> >> > > > >
> >> > > > > Seems like perf is a separate project in the workspace.
> >> > > > >
> >> > > > > Does sbt package-assembly bundle the perf jar as well?
> >> > > > >
> >> > > > > Neither producer-perf-test not consumer-test are working with
> >>this
> >> > > build
> >> > > > >
> >> > > > >
> >> > > > >
> >> > > > > On Fri, Sep 13, 2013 at 9:56 AM, Neha Narkhede <
> >> > > neha.narkhede@gmail.com
> >> > > > > >wrote:
> >> > > > >
> >> > > > > > As Jun suggested, one reason could be that the
> >> > > > > > topic.metadata.refresh.interval.ms is too high. Did you
> >>observe
> >> if
> >> > > the
> >> > > > > > distribution improves after
> >>topic.metadata.refresh.interval.mshas
> >> > > > > passed
> >> > > > > > ?
> >> > > > > >
> >> > > > > > Thanks
> >> > > > > > Neha
> >> > > > > >
> >> > > > > >
> >> > > > > > On Fri, Sep 13, 2013 at 4:47 AM, prashant amar <
> >> > amasindhu@gmail.com>
> >> > > > > > wrote:
> >> > > > > >
> >> > > > > --
> >> > -- Guozhang
> >> >
> >>
>
>

Re: Producer not distributing across all partitions

Posted by Swapnil Ghike <sg...@linkedin.com>.

Hi Prashant,

I tried a local test using a very short topic.metadata.refresh.interval.ms
on the producer. The server had two partitions and both of them appended
data. Could you check if you have set the
topic.metadata.refresh.interval.ms on your producer to a very high value?

Swapnil

On 9/13/13 8:46 PM, "Jun Rao" <ju...@gmail.com> wrote:

>Without fixing KAFKA-1017, the issue is that the producer will maintain a
>socket connection per min(#partitions, #brokers). If you have lots of
>producers, the open file handlers on the broker could be an issue.
>
>So, what KAFKA-1017 fixes is to pick a random partition and stick to it
>for
>a configurable amount of time, and then switch to another random
>partition.
>This is the behavior in 0.7 when a load balancer is used and reduces #
>socket connections significantly.
>
>The issue you are reporting seems like a bug though. Which revision in 0.8
>are you using?
>
>Thanks,
>
>Jun
>
>
>On Fri, Sep 13, 2013 at 8:28 PM, prashant amar <am...@gmail.com>
>wrote:
>
>> Hi Guozhang, Joe, Drew
>>
>> In our case we have been running for the past 3 weeks and it has been
>> consistently writing only to to the first partition. The rest of the
>> partitions have empty index files.
>>
>> Not sure if I am hitting any issue here.
>>
>> I am using  offset checker as my barometer. Also introspect r&d the
>>folder
>> and it indicates the same.
>>
>> On Friday, September 13, 2013, Guozhang Wang wrote:
>>
>> > Hello Joe,
>> >
>> > The reason we make the producers to produce to a fixed partition for
>>each
>> > metadata-refresh interval are the following:
>> >
>> > https://issues.apache.org/jira/browse/KAFKA-1017
>> >
>> > https://issues.apache.org/jira/browse/KAFKA-959
>> >
>> > So in a word the randomness is still preserved but within one
>> > metadata-refresh interval the assignment is fixed.
>> >
>> > I agree that the document should be updated accordingly.
>> >
>> > Guozhang
>> >
>> >
>> > On Fri, Sep 13, 2013 at 1:48 PM, Joe Stein <cr...@gmail.com> wrote:
>> >
>> > > Isn't this a bug?
>> > >
>> > > I don't see why we would want users to have to code and generate
>>random
>> > > partition keys to randomly distributed the data to partitions, that
>>is
>> > > Kafka's job isn't it?
>> > >
>> > > Or if supplying a null value tell the user this is not supported
>>(throw
>> > > exception) in KeyedMessage like we do for topic and not treat null
>>as a
>> > key
>> > > to hash?
>> > >
>> > > My preference is to put those three lines back in and let key be
>>null
>> and
>> > > give folks randomness unless its not a bug and there is a good
>>reason
>> for
>> > > it?
>> > >
>> > > Is there something about
>> > > https://issues.apache.org/jira/browse/KAFKA-691that requires the
>>lines
>> > > taken out? I haven't had a chance to look through
>> > > it yet
>> > >
>> > > My thought is a new person coming in they would expect to see the
>> > > partitions filling up in a round robin fashion as our docs says and
>> > unless
>> > > we force them in the API to know they have to-do this or give them
>>the
>> > > ability for this to happen when passing nothing in
>> > >
>> > > /*******************************************
>> > >  Joe Stein
>> > >  Founder, Principal Consultant
>> > >  Big Data Open Source Security LLC
>> > >  http://www.stealth.ly
>> > >  Twitter: @allthingshadoop <http://www.twitter.com/allthingshadoop>
>> > > ********************************************/
>> > >
>> > >
>> > > On Fri, Sep 13, 2013 at 4:17 PM, Drew Goya <dr...@gradientx.com>
>>wrote:
>> > >
>> > > > I ran into this problem as well Prashant.  The default partition
>>key
>> > was
>> > > > recently changed:
>> > > >
>> > > >
>> > > >
>> > >
>> >
>> 
>>https://github.com/apache/kafka/commit/b71e6dc352770f22daec0c9a3682138666
>>f032be
>> > > >
>> > > > It no longer assigns a random partition to data with a null
>>partition
>> > > key.
>> > > >  I had to change my code to generate random partition keys to get
>>the
>> > > > randomly distributed behavior the producer used to have.
>> > > >
>> > > >
>> > > > On Fri, Sep 13, 2013 at 11:42 AM, prashant amar
>><amasindhu@gmail.com
>> >
>> > > > wrote:
>> > > >
>> > > > > Thanks Neha
>> > > > >
>> > > > > I will try applying this property and circle back.
>> > > > >
>> > > > > Also, I have been attempting to execute
>>kafka-producer-perf-test.sh
>> > > and I
>> > > > > receive the following error
>> > > > >
>> > > > >        Error: Could not find or load main class
>> > > > > kafka.perf.ProducerPerformance
>> > > > >
>> > > > > I am running against 0.8.0-beta1
>> > > > >
>> > > > > Seems like perf is a separate project in the workspace.
>> > > > >
>> > > > > Does sbt package-assembly bundle the perf jar as well?
>> > > > >
>> > > > > Neither producer-perf-test not consumer-test are working with
>>this
>> > > build
>> > > > >
>> > > > >
>> > > > >
>> > > > > On Fri, Sep 13, 2013 at 9:56 AM, Neha Narkhede <
>> > > neha.narkhede@gmail.com
>> > > > > >wrote:
>> > > > >
>> > > > > > As Jun suggested, one reason could be that the
>> > > > > > topic.metadata.refresh.interval.ms is too high. Did you
>>observe
>> if
>> > > the
>> > > > > > distribution improves after
>>topic.metadata.refresh.interval.mshas
>> > > > > passed
>> > > > > > ?
>> > > > > >
>> > > > > > Thanks
>> > > > > > Neha
>> > > > > >
>> > > > > >
>> > > > > > On Fri, Sep 13, 2013 at 4:47 AM, prashant amar <
>> > amasindhu@gmail.com>
>> > > > > > wrote:
>> > > > > >
>> > > > > --
>> > -- Guozhang
>> >
>>

Re: Producer not distributing across all partitions

Posted by Jun Rao <ju...@gmail.com>.

Without fixing KAFKA-1017, the issue is that the producer will maintain a
socket connection per min(#partitions, #brokers). If you have lots of
producers, the open file handlers on the broker could be an issue.

So, what KAFKA-1017 fixes is to pick a random partition and stick to it for
a configurable amount of time, and then switch to another random partition.
This is the behavior in 0.7 when a load balancer is used and reduces #
socket connections significantly.

The issue you are reporting seems like a bug though. Which revision in 0.8
are you using?

Thanks,

Jun


On Fri, Sep 13, 2013 at 8:28 PM, prashant amar <am...@gmail.com> wrote:

> Hi Guozhang, Joe, Drew
>
> In our case we have been running for the past 3 weeks and it has been
> consistently writing only to to the first partition. The rest of the
> partitions have empty index files.
>
> Not sure if I am hitting any issue here.
>
> I am using  offset checker as my barometer. Also introspect r&d the folder
> and it indicates the same.
>
> On Friday, September 13, 2013, Guozhang Wang wrote:
>
> > Hello Joe,
> >
> > The reason we make the producers to produce to a fixed partition for each
> > metadata-refresh interval are the following:
> >
> > https://issues.apache.org/jira/browse/KAFKA-1017
> >
> > https://issues.apache.org/jira/browse/KAFKA-959
> >
> > So in a word the randomness is still preserved but within one
> > metadata-refresh interval the assignment is fixed.
> >
> > I agree that the document should be updated accordingly.
> >
> > Guozhang
> >
> >
> > On Fri, Sep 13, 2013 at 1:48 PM, Joe Stein <cr...@gmail.com> wrote:
> >
> > > Isn't this a bug?
> > >
> > > I don't see why we would want users to have to code and generate random
> > > partition keys to randomly distributed the data to partitions, that is
> > > Kafka's job isn't it?
> > >
> > > Or if supplying a null value tell the user this is not supported (throw
> > > exception) in KeyedMessage like we do for topic and not treat null as a
> > key
> > > to hash?
> > >
> > > My preference is to put those three lines back in and let key be null
> and
> > > give folks randomness unless its not a bug and there is a good reason
> for
> > > it?
> > >
> > > Is there something about
> > > https://issues.apache.org/jira/browse/KAFKA-691that requires the lines
> > > taken out? I haven't had a chance to look through
> > > it yet
> > >
> > > My thought is a new person coming in they would expect to see the
> > > partitions filling up in a round robin fashion as our docs says and
> > unless
> > > we force them in the API to know they have to-do this or give them the
> > > ability for this to happen when passing nothing in
> > >
> > > /*******************************************
> > >  Joe Stein
> > >  Founder, Principal Consultant
> > >  Big Data Open Source Security LLC
> > >  http://www.stealth.ly
> > >  Twitter: @allthingshadoop <http://www.twitter.com/allthingshadoop>
> > > ********************************************/
> > >
> > >
> > > On Fri, Sep 13, 2013 at 4:17 PM, Drew Goya <dr...@gradientx.com> wrote:
> > >
> > > > I ran into this problem as well Prashant.  The default partition key
> > was
> > > > recently changed:
> > > >
> > > >
> > > >
> > >
> >
> https://github.com/apache/kafka/commit/b71e6dc352770f22daec0c9a3682138666f032be
> > > >
> > > > It no longer assigns a random partition to data with a null partition
> > > key.
> > > >  I had to change my code to generate random partition keys to get the
> > > > randomly distributed behavior the producer used to have.
> > > >
> > > >
> > > > On Fri, Sep 13, 2013 at 11:42 AM, prashant amar <amasindhu@gmail.com
> >
> > > > wrote:
> > > >
> > > > > Thanks Neha
> > > > >
> > > > > I will try applying this property and circle back.
> > > > >
> > > > > Also, I have been attempting to execute kafka-producer-perf-test.sh
> > > and I
> > > > > receive the following error
> > > > >
> > > > >        Error: Could not find or load main class
> > > > > kafka.perf.ProducerPerformance
> > > > >
> > > > > I am running against 0.8.0-beta1
> > > > >
> > > > > Seems like perf is a separate project in the workspace.
> > > > >
> > > > > Does sbt package-assembly bundle the perf jar as well?
> > > > >
> > > > > Neither producer-perf-test not consumer-test are working with this
> > > build
> > > > >
> > > > >
> > > > >
> > > > > On Fri, Sep 13, 2013 at 9:56 AM, Neha Narkhede <
> > > neha.narkhede@gmail.com
> > > > > >wrote:
> > > > >
> > > > > > As Jun suggested, one reason could be that the
> > > > > > topic.metadata.refresh.interval.ms is too high. Did you observe
> if
> > > the
> > > > > > distribution improves after topic.metadata.refresh.interval.mshas
> > > > > passed
> > > > > > ?
> > > > > >
> > > > > > Thanks
> > > > > > Neha
> > > > > >
> > > > > >
> > > > > > On Fri, Sep 13, 2013 at 4:47 AM, prashant amar <
> > amasindhu@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > --
> > -- Guozhang
> >
>

Re: Producer not distributing across all partitions

Posted by prashant amar <am...@gmail.com>.

Hi Guozhang, Joe, Drew

In our case we have been running for the past 3 weeks and it has been
consistently writing only to to the first partition. The rest of the
partitions have empty index files.

Not sure if I am hitting any issue here.

I am using  offset checker as my barometer. Also introspect r&d the folder
and it indicates the same.

On Friday, September 13, 2013, Guozhang Wang wrote:

> Hello Joe,
>
> The reason we make the producers to produce to a fixed partition for each
> metadata-refresh interval are the following:
>
> https://issues.apache.org/jira/browse/KAFKA-1017
>
> https://issues.apache.org/jira/browse/KAFKA-959
>
> So in a word the randomness is still preserved but within one
> metadata-refresh interval the assignment is fixed.
>
> I agree that the document should be updated accordingly.
>
> Guozhang
>
>
> On Fri, Sep 13, 2013 at 1:48 PM, Joe Stein <cr...@gmail.com> wrote:
>
> > Isn't this a bug?
> >
> > I don't see why we would want users to have to code and generate random
> > partition keys to randomly distributed the data to partitions, that is
> > Kafka's job isn't it?
> >
> > Or if supplying a null value tell the user this is not supported (throw
> > exception) in KeyedMessage like we do for topic and not treat null as a
> key
> > to hash?
> >
> > My preference is to put those three lines back in and let key be null and
> > give folks randomness unless its not a bug and there is a good reason for
> > it?
> >
> > Is there something about
> > https://issues.apache.org/jira/browse/KAFKA-691that requires the lines
> > taken out? I haven't had a chance to look through
> > it yet
> >
> > My thought is a new person coming in they would expect to see the
> > partitions filling up in a round robin fashion as our docs says and
> unless
> > we force them in the API to know they have to-do this or give them the
> > ability for this to happen when passing nothing in
> >
> > /*******************************************
> >  Joe Stein
> >  Founder, Principal Consultant
> >  Big Data Open Source Security LLC
> >  http://www.stealth.ly
> >  Twitter: @allthingshadoop <http://www.twitter.com/allthingshadoop>
> > ********************************************/
> >
> >
> > On Fri, Sep 13, 2013 at 4:17 PM, Drew Goya <dr...@gradientx.com> wrote:
> >
> > > I ran into this problem as well Prashant.  The default partition key
> was
> > > recently changed:
> > >
> > >
> > >
> >
> https://github.com/apache/kafka/commit/b71e6dc352770f22daec0c9a3682138666f032be
> > >
> > > It no longer assigns a random partition to data with a null partition
> > key.
> > >  I had to change my code to generate random partition keys to get the
> > > randomly distributed behavior the producer used to have.
> > >
> > >
> > > On Fri, Sep 13, 2013 at 11:42 AM, prashant amar <am...@gmail.com>
> > > wrote:
> > >
> > > > Thanks Neha
> > > >
> > > > I will try applying this property and circle back.
> > > >
> > > > Also, I have been attempting to execute kafka-producer-perf-test.sh
> > and I
> > > > receive the following error
> > > >
> > > >        Error: Could not find or load main class
> > > > kafka.perf.ProducerPerformance
> > > >
> > > > I am running against 0.8.0-beta1
> > > >
> > > > Seems like perf is a separate project in the workspace.
> > > >
> > > > Does sbt package-assembly bundle the perf jar as well?
> > > >
> > > > Neither producer-perf-test not consumer-test are working with this
> > build
> > > >
> > > >
> > > >
> > > > On Fri, Sep 13, 2013 at 9:56 AM, Neha Narkhede <
> > neha.narkhede@gmail.com
> > > > >wrote:
> > > >
> > > > > As Jun suggested, one reason could be that the
> > > > > topic.metadata.refresh.interval.ms is too high. Did you observe if
> > the
> > > > > distribution improves after topic.metadata.refresh.interval.ms has
> > > > passed
> > > > > ?
> > > > >
> > > > > Thanks
> > > > > Neha
> > > > >
> > > > >
> > > > > On Fri, Sep 13, 2013 at 4:47 AM, prashant amar <
> amasindhu@gmail.com>
> > > > > wrote:
> > > > >
> > > > --
> -- Guozhang
>

Re: Producer not distributing across all partitions

Posted by Guozhang Wang <wa...@gmail.com>.

Hello Joe,

The reason we make the producers to produce to a fixed partition for each
metadata-refresh interval are the following:

https://issues.apache.org/jira/browse/KAFKA-1017

https://issues.apache.org/jira/browse/KAFKA-959

So in a word the randomness is still preserved but within one
metadata-refresh interval the assignment is fixed.

I agree that the document should be updated accordingly.

Guozhang


On Fri, Sep 13, 2013 at 1:48 PM, Joe Stein <cr...@gmail.com> wrote:

> Isn't this a bug?
>
> I don't see why we would want users to have to code and generate random
> partition keys to randomly distributed the data to partitions, that is
> Kafka's job isn't it?
>
> Or if supplying a null value tell the user this is not supported (throw
> exception) in KeyedMessage like we do for topic and not treat null as a key
> to hash?
>
> My preference is to put those three lines back in and let key be null and
> give folks randomness unless its not a bug and there is a good reason for
> it?
>
> Is there something about
> https://issues.apache.org/jira/browse/KAFKA-691that requires the lines
> taken out? I haven't had a chance to look through
> it yet
>
> My thought is a new person coming in they would expect to see the
> partitions filling up in a round robin fashion as our docs says and unless
> we force them in the API to know they have to-do this or give them the
> ability for this to happen when passing nothing in
>
> /*******************************************
>  Joe Stein
>  Founder, Principal Consultant
>  Big Data Open Source Security LLC
>  http://www.stealth.ly
>  Twitter: @allthingshadoop <http://www.twitter.com/allthingshadoop>
> ********************************************/
>
>
> On Fri, Sep 13, 2013 at 4:17 PM, Drew Goya <dr...@gradientx.com> wrote:
>
> > I ran into this problem as well Prashant.  The default partition key was
> > recently changed:
> >
> >
> >
> https://github.com/apache/kafka/commit/b71e6dc352770f22daec0c9a3682138666f032be
> >
> > It no longer assigns a random partition to data with a null partition
> key.
> >  I had to change my code to generate random partition keys to get the
> > randomly distributed behavior the producer used to have.
> >
> >
> > On Fri, Sep 13, 2013 at 11:42 AM, prashant amar <am...@gmail.com>
> > wrote:
> >
> > > Thanks Neha
> > >
> > > I will try applying this property and circle back.
> > >
> > > Also, I have been attempting to execute kafka-producer-perf-test.sh
> and I
> > > receive the following error
> > >
> > >        Error: Could not find or load main class
> > > kafka.perf.ProducerPerformance
> > >
> > > I am running against 0.8.0-beta1
> > >
> > > Seems like perf is a separate project in the workspace.
> > >
> > > Does sbt package-assembly bundle the perf jar as well?
> > >
> > > Neither producer-perf-test not consumer-test are working with this
> build
> > >
> > >
> > >
> > > On Fri, Sep 13, 2013 at 9:56 AM, Neha Narkhede <
> neha.narkhede@gmail.com
> > > >wrote:
> > >
> > > > As Jun suggested, one reason could be that the
> > > > topic.metadata.refresh.interval.ms is too high. Did you observe if
> the
> > > > distribution improves after topic.metadata.refresh.interval.ms has
> > > passed
> > > > ?
> > > >
> > > > Thanks
> > > > Neha
> > > >
> > > >
> > > > On Fri, Sep 13, 2013 at 4:47 AM, prashant amar <am...@gmail.com>
> > > > wrote:
> > > >
> > > > > I am using kafka 08 version ...
> > > > >
> > > > >
> > > > > On Thu, Sep 12, 2013 at 8:44 PM, Jun Rao <ju...@gmail.com> wrote:
> > > > >
> > > > > > Which revision of 0.8 are you using? In a recent change, a
> producer
> > > > will
> > > > > > stick to a partition for topic.metadata.refresh.interval.ms
> > (defaults
> > > > to
> > > > > > 10
> > > > > > mins) time before picking another partition at random.
> > > > > > Thanks,
> > > > > > Jun
> > > > > >
> > > > > >
> > > > > > On Thu, Sep 12, 2013 at 1:56 PM, prashant amar <
> > amasindhu@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > I created a topic with 4 partitions and for some reason the
> > > producer
> > > > is
> > > > > > > pushing only to one partition.
> > > > > > >
> > > > > > > This is consistently happening across all topics that I created
> > ...
> > > > > > >
> > > > > > > Is there a specific configuration that I need to apply to
> ensure
> > > that
> > > > > > load
> > > > > > > is evenly distributed across all partitions?
> > > > > > >
> > > > > > >
> > > > > > > Group           Topic                          Pid Offset
> > > > > >  logSize
> > > > > > >         Lag             Owner
> > > > > > > perfgroup1      perfpayload1                   0   10965
> > > > > 11220
> > > > > > >         255             perfgroup1_XXXX-0
> > > > > > > perfgroup1      perfpayload1                   1   0
> > > 0
> > > > > > >         0               perfgroup1_XXXX-1
> > > > > > > perfgroup1      perfpayload1                   2   0
> > > 0
> > > > > > >         0               perfgroup1_XXXXX-2
> > > > > > > perfgroup1      perfpayload1                   3   0
> > > 0
> > > > > > >         0               perfgroup1_XXXXX-3
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>



-- 
-- Guozhang

Re: Producer not distributing across all partitions

Posted by Swapnil Ghike <sg...@linkedin.com>.

Hi Joe, Drew,

In 0.8 HEAD, if the key is null, the DefaultEventHandler randomly chooses
an available partition and never calls the partitioner.partition(key,
numPartitions) method. This is done in lines 204 to 212 of the github
commit Drew pointed to, though that piece of code is slightly different now
because of KAFKA-1017 and KAFKA-959.

I did a local test today that showed that choosing DefaultPartitioner with
null key in the messages appended data to multiple partitions. For this
Test, I set topic.metadata.refresh.interval.ms to 1 second because 0.8
HEAD 
Sticks to a partition in a given topic.metadata.refresh.interval.ms (as is
being discussed in the other e-mail thread on dev@kafka).

Please let me know if you see different results.

Thanks,
Swapnil



On 9/13/13 1:48 PM, "Joe Stein" <cr...@gmail.com> wrote:

>Isn't this a bug?
>
>I don't see why we would want users to have to code and generate random
>partition keys to randomly distributed the data to partitions, that is
>Kafka's job isn't it?
>
>Or if supplying a null value tell the user this is not supported (throw
>exception) in KeyedMessage like we do for topic and not treat null as a
>key
>to hash?
>
>My preference is to put those three lines back in and let key be null and
>give folks randomness unless its not a bug and there is a good reason for
>it?
>
>Is there something about
>https://issues.apache.org/jira/browse/KAFKA-691that requires the lines
>taken out? I haven't had a chance to look through
>it yet
>
>My thought is a new person coming in they would expect to see the
>partitions filling up in a round robin fashion as our docs says and unless
>we force them in the API to know they have to-do this or give them the
>ability for this to happen when passing nothing in
>
>/*******************************************
> Joe Stein
> Founder, Principal Consultant
> Big Data Open Source Security LLC
> http://www.stealth.ly
> Twitter: @allthingshadoop <http://www.twitter.com/allthingshadoop>
>********************************************/
>
>
>On Fri, Sep 13, 2013 at 4:17 PM, Drew Goya <dr...@gradientx.com> wrote:
>
>> I ran into this problem as well Prashant.  The default partition key was
>> recently changed:
>>
>>
>> 
>>https://github.com/apache/kafka/commit/b71e6dc352770f22daec0c9a3682138666
>>f032be
>>
>> It no longer assigns a random partition to data with a null partition
>>key.
>>  I had to change my code to generate random partition keys to get the
>> randomly distributed behavior the producer used to have.
>>
>>
>> On Fri, Sep 13, 2013 at 11:42 AM, prashant amar <am...@gmail.com>
>> wrote:
>>
>> > Thanks Neha
>> >
>> > I will try applying this property and circle back.
>> >
>> > Also, I have been attempting to execute kafka-producer-perf-test.sh
>>and I
>> > receive the following error
>> >
>> >        Error: Could not find or load main class
>> > kafka.perf.ProducerPerformance
>> >
>> > I am running against 0.8.0-beta1
>> >
>> > Seems like perf is a separate project in the workspace.
>> >
>> > Does sbt package-assembly bundle the perf jar as well?
>> >
>> > Neither producer-perf-test not consumer-test are working with this
>>build
>> >
>> >
>> >
>> > On Fri, Sep 13, 2013 at 9:56 AM, Neha Narkhede
>><neha.narkhede@gmail.com
>> > >wrote:
>> >
>> > > As Jun suggested, one reason could be that the
>> > > topic.metadata.refresh.interval.ms is too high. Did you observe if
>>the
>> > > distribution improves after topic.metadata.refresh.interval.ms has
>> > passed
>> > > ?
>> > >
>> > > Thanks
>> > > Neha
>> > >
>> > >
>> > > On Fri, Sep 13, 2013 at 4:47 AM, prashant amar <am...@gmail.com>
>> > > wrote:
>> > >
>> > > > I am using kafka 08 version ...
>> > > >
>> > > >
>> > > > On Thu, Sep 12, 2013 at 8:44 PM, Jun Rao <ju...@gmail.com> wrote:
>> > > >
>> > > > > Which revision of 0.8 are you using? In a recent change, a
>>producer
>> > > will
>> > > > > stick to a partition for topic.metadata.refresh.interval.ms
>> (defaults
>> > > to
>> > > > > 10
>> > > > > mins) time before picking another partition at random.
>> > > > > Thanks,
>> > > > > Jun
>> > > > >
>> > > > >
>> > > > > On Thu, Sep 12, 2013 at 1:56 PM, prashant amar <
>> amasindhu@gmail.com>
>> > > > > wrote:
>> > > > >
>> > > > > > I created a topic with 4 partitions and for some reason the
>> > producer
>> > > is
>> > > > > > pushing only to one partition.
>> > > > > >
>> > > > > > This is consistently happening across all topics that I
>>created
>> ...
>> > > > > >
>> > > > > > Is there a specific configuration that I need to apply to
>>ensure
>> > that
>> > > > > load
>> > > > > > is evenly distributed across all partitions?
>> > > > > >
>> > > > > >
>> > > > > > Group           Topic                          Pid Offset
>> > > > >  logSize
>> > > > > >         Lag             Owner
>> > > > > > perfgroup1      perfpayload1                   0   10965
>> > > > 11220
>> > > > > >         255             perfgroup1_XXXX-0
>> > > > > > perfgroup1      perfpayload1                   1   0
>> > 0
>> > > > > >         0               perfgroup1_XXXX-1
>> > > > > > perfgroup1      perfpayload1                   2   0
>> > 0
>> > > > > >         0               perfgroup1_XXXXX-2
>> > > > > > perfgroup1      perfpayload1                   3   0
>> > 0
>> > > > > >         0               perfgroup1_XXXXX-3
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >

Re: Producer not distributing across all partitions

Posted by Joe Stein <cr...@gmail.com>.

Isn't this a bug?

I don't see why we would want users to have to code and generate random
partition keys to randomly distributed the data to partitions, that is
Kafka's job isn't it?

Or if supplying a null value tell the user this is not supported (throw
exception) in KeyedMessage like we do for topic and not treat null as a key
to hash?

My preference is to put those three lines back in and let key be null and
give folks randomness unless its not a bug and there is a good reason for
it?

Is there something about
https://issues.apache.org/jira/browse/KAFKA-691that requires the lines
taken out? I haven't had a chance to look through
it yet

My thought is a new person coming in they would expect to see the
partitions filling up in a round robin fashion as our docs says and unless
we force them in the API to know they have to-do this or give them the
ability for this to happen when passing nothing in

/*******************************************
 Joe Stein
 Founder, Principal Consultant
 Big Data Open Source Security LLC
 http://www.stealth.ly
 Twitter: @allthingshadoop <http://www.twitter.com/allthingshadoop>
********************************************/


On Fri, Sep 13, 2013 at 4:17 PM, Drew Goya <dr...@gradientx.com> wrote:

> I ran into this problem as well Prashant.  The default partition key was
> recently changed:
>
>
> https://github.com/apache/kafka/commit/b71e6dc352770f22daec0c9a3682138666f032be
>
> It no longer assigns a random partition to data with a null partition key.
>  I had to change my code to generate random partition keys to get the
> randomly distributed behavior the producer used to have.
>
>
> On Fri, Sep 13, 2013 at 11:42 AM, prashant amar <am...@gmail.com>
> wrote:
>
> > Thanks Neha
> >
> > I will try applying this property and circle back.
> >
> > Also, I have been attempting to execute kafka-producer-perf-test.sh and I
> > receive the following error
> >
> >        Error: Could not find or load main class
> > kafka.perf.ProducerPerformance
> >
> > I am running against 0.8.0-beta1
> >
> > Seems like perf is a separate project in the workspace.
> >
> > Does sbt package-assembly bundle the perf jar as well?
> >
> > Neither producer-perf-test not consumer-test are working with this build
> >
> >
> >
> > On Fri, Sep 13, 2013 at 9:56 AM, Neha Narkhede <neha.narkhede@gmail.com
> > >wrote:
> >
> > > As Jun suggested, one reason could be that the
> > > topic.metadata.refresh.interval.ms is too high. Did you observe if the
> > > distribution improves after topic.metadata.refresh.interval.ms has
> > passed
> > > ?
> > >
> > > Thanks
> > > Neha
> > >
> > >
> > > On Fri, Sep 13, 2013 at 4:47 AM, prashant amar <am...@gmail.com>
> > > wrote:
> > >
> > > > I am using kafka 08 version ...
> > > >
> > > >
> > > > On Thu, Sep 12, 2013 at 8:44 PM, Jun Rao <ju...@gmail.com> wrote:
> > > >
> > > > > Which revision of 0.8 are you using? In a recent change, a producer
> > > will
> > > > > stick to a partition for topic.metadata.refresh.interval.ms
> (defaults
> > > to
> > > > > 10
> > > > > mins) time before picking another partition at random.
> > > > > Thanks,
> > > > > Jun
> > > > >
> > > > >
> > > > > On Thu, Sep 12, 2013 at 1:56 PM, prashant amar <
> amasindhu@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > I created a topic with 4 partitions and for some reason the
> > producer
> > > is
> > > > > > pushing only to one partition.
> > > > > >
> > > > > > This is consistently happening across all topics that I created
> ...
> > > > > >
> > > > > > Is there a specific configuration that I need to apply to ensure
> > that
> > > > > load
> > > > > > is evenly distributed across all partitions?
> > > > > >
> > > > > >
> > > > > > Group           Topic                          Pid Offset
> > > > >  logSize
> > > > > >         Lag             Owner
> > > > > > perfgroup1      perfpayload1                   0   10965
> > > > 11220
> > > > > >         255             perfgroup1_XXXX-0
> > > > > > perfgroup1      perfpayload1                   1   0
> > 0
> > > > > >         0               perfgroup1_XXXX-1
> > > > > > perfgroup1      perfpayload1                   2   0
> > 0
> > > > > >         0               perfgroup1_XXXXX-2
> > > > > > perfgroup1      perfpayload1                   3   0
> > 0
> > > > > >         0               perfgroup1_XXXXX-3
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Producer not distributing across all partitions

Posted by Joe Stein <cr...@gmail.com>.

Isn't this a bug?

I don't see why we would want users to have to code and generate random
partition keys to randomly distributed the data to partitions, that is
Kafka's job isn't it?

Or if supplying a null value tell the user this is not supported (throw
exception) in KeyedMessage like we do for topic and not treat null as a key
to hash?

My preference is to put those three lines back in and let key be null and
give folks randomness unless its not a bug and there is a good reason for
it?

Is there something about
https://issues.apache.org/jira/browse/KAFKA-691that requires the lines
taken out? I haven't had a chance to look through
it yet

My thought is a new person coming in they would expect to see the
partitions filling up in a round robin fashion as our docs says and unless
we force them in the API to know they have to-do this or give them the
ability for this to happen when passing nothing in

/*******************************************
 Joe Stein
 Founder, Principal Consultant
 Big Data Open Source Security LLC
 http://www.stealth.ly
 Twitter: @allthingshadoop <http://www.twitter.com/allthingshadoop>
********************************************/


On Fri, Sep 13, 2013 at 4:17 PM, Drew Goya <dr...@gradientx.com> wrote:

> I ran into this problem as well Prashant.  The default partition key was
> recently changed:
>
>
> https://github.com/apache/kafka/commit/b71e6dc352770f22daec0c9a3682138666f032be
>
> It no longer assigns a random partition to data with a null partition key.
>  I had to change my code to generate random partition keys to get the
> randomly distributed behavior the producer used to have.
>
>
> On Fri, Sep 13, 2013 at 11:42 AM, prashant amar <am...@gmail.com>
> wrote:
>
> > Thanks Neha
> >
> > I will try applying this property and circle back.
> >
> > Also, I have been attempting to execute kafka-producer-perf-test.sh and I
> > receive the following error
> >
> >        Error: Could not find or load main class
> > kafka.perf.ProducerPerformance
> >
> > I am running against 0.8.0-beta1
> >
> > Seems like perf is a separate project in the workspace.
> >
> > Does sbt package-assembly bundle the perf jar as well?
> >
> > Neither producer-perf-test not consumer-test are working with this build
> >
> >
> >
> > On Fri, Sep 13, 2013 at 9:56 AM, Neha Narkhede <neha.narkhede@gmail.com
> > >wrote:
> >
> > > As Jun suggested, one reason could be that the
> > > topic.metadata.refresh.interval.ms is too high. Did you observe if the
> > > distribution improves after topic.metadata.refresh.interval.ms has
> > passed
> > > ?
> > >
> > > Thanks
> > > Neha
> > >
> > >
> > > On Fri, Sep 13, 2013 at 4:47 AM, prashant amar <am...@gmail.com>
> > > wrote:
> > >
> > > > I am using kafka 08 version ...
> > > >
> > > >
> > > > On Thu, Sep 12, 2013 at 8:44 PM, Jun Rao <ju...@gmail.com> wrote:
> > > >
> > > > > Which revision of 0.8 are you using? In a recent change, a producer
> > > will
> > > > > stick to a partition for topic.metadata.refresh.interval.ms
> (defaults
> > > to
> > > > > 10
> > > > > mins) time before picking another partition at random.
> > > > > Thanks,
> > > > > Jun
> > > > >
> > > > >
> > > > > On Thu, Sep 12, 2013 at 1:56 PM, prashant amar <
> amasindhu@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > I created a topic with 4 partitions and for some reason the
> > producer
> > > is
> > > > > > pushing only to one partition.
> > > > > >
> > > > > > This is consistently happening across all topics that I created
> ...
> > > > > >
> > > > > > Is there a specific configuration that I need to apply to ensure
> > that
> > > > > load
> > > > > > is evenly distributed across all partitions?
> > > > > >
> > > > > >
> > > > > > Group           Topic                          Pid Offset
> > > > >  logSize
> > > > > >         Lag             Owner
> > > > > > perfgroup1      perfpayload1                   0   10965
> > > > 11220
> > > > > >         255             perfgroup1_XXXX-0
> > > > > > perfgroup1      perfpayload1                   1   0
> > 0
> > > > > >         0               perfgroup1_XXXX-1
> > > > > > perfgroup1      perfpayload1                   2   0
> > 0
> > > > > >         0               perfgroup1_XXXXX-2
> > > > > > perfgroup1      perfpayload1                   3   0
> > 0
> > > > > >         0               perfgroup1_XXXXX-3
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Producer not distributing across all partitions

Posted by Drew Goya <dr...@gradientx.com>.

I ran into this problem as well Prashant.  The default partition key was
recently changed:

https://github.com/apache/kafka/commit/b71e6dc352770f22daec0c9a3682138666f032be

It no longer assigns a random partition to data with a null partition key.
 I had to change my code to generate random partition keys to get the
randomly distributed behavior the producer used to have.


On Fri, Sep 13, 2013 at 11:42 AM, prashant amar <am...@gmail.com> wrote:

> Thanks Neha
>
> I will try applying this property and circle back.
>
> Also, I have been attempting to execute kafka-producer-perf-test.sh and I
> receive the following error
>
>        Error: Could not find or load main class
> kafka.perf.ProducerPerformance
>
> I am running against 0.8.0-beta1
>
> Seems like perf is a separate project in the workspace.
>
> Does sbt package-assembly bundle the perf jar as well?
>
> Neither producer-perf-test not consumer-test are working with this build
>
>
>
> On Fri, Sep 13, 2013 at 9:56 AM, Neha Narkhede <neha.narkhede@gmail.com
> >wrote:
>
> > As Jun suggested, one reason could be that the
> > topic.metadata.refresh.interval.ms is too high. Did you observe if the
> > distribution improves after topic.metadata.refresh.interval.ms has
> passed
> > ?
> >
> > Thanks
> > Neha
> >
> >
> > On Fri, Sep 13, 2013 at 4:47 AM, prashant amar <am...@gmail.com>
> > wrote:
> >
> > > I am using kafka 08 version ...
> > >
> > >
> > > On Thu, Sep 12, 2013 at 8:44 PM, Jun Rao <ju...@gmail.com> wrote:
> > >
> > > > Which revision of 0.8 are you using? In a recent change, a producer
> > will
> > > > stick to a partition for topic.metadata.refresh.interval.ms(defaults
> > to
> > > > 10
> > > > mins) time before picking another partition at random.
> > > > Thanks,
> > > > Jun
> > > >
> > > >
> > > > On Thu, Sep 12, 2013 at 1:56 PM, prashant amar <am...@gmail.com>
> > > > wrote:
> > > >
> > > > > I created a topic with 4 partitions and for some reason the
> producer
> > is
> > > > > pushing only to one partition.
> > > > >
> > > > > This is consistently happening across all topics that I created ...
> > > > >
> > > > > Is there a specific configuration that I need to apply to ensure
> that
> > > > load
> > > > > is evenly distributed across all partitions?
> > > > >
> > > > >
> > > > > Group           Topic                          Pid Offset
> > > >  logSize
> > > > >         Lag             Owner
> > > > > perfgroup1      perfpayload1                   0   10965
> > > 11220
> > > > >         255             perfgroup1_XXXX-0
> > > > > perfgroup1      perfpayload1                   1   0
> 0
> > > > >         0               perfgroup1_XXXX-1
> > > > > perfgroup1      perfpayload1                   2   0
> 0
> > > > >         0               perfgroup1_XXXXX-2
> > > > > perfgroup1      perfpayload1                   3   0
> 0
> > > > >         0               perfgroup1_XXXXX-3
> > > > >
> > > >
> > >
> >
>

Re: Producer not distributing across all partitions

Posted by prashant amar <am...@gmail.com>.

Thanks Neha

I will try applying this property and circle back.

Also, I have been attempting to execute kafka-producer-perf-test.sh and I
receive the following error

       Error: Could not find or load main class
kafka.perf.ProducerPerformance

I am running against 0.8.0-beta1

Seems like perf is a separate project in the workspace.

Does sbt package-assembly bundle the perf jar as well?

Neither producer-perf-test not consumer-test are working with this build



On Fri, Sep 13, 2013 at 9:56 AM, Neha Narkhede <ne...@gmail.com>wrote:

> As Jun suggested, one reason could be that the
> topic.metadata.refresh.interval.ms is too high. Did you observe if the
> distribution improves after topic.metadata.refresh.interval.ms has passed
> ?
>
> Thanks
> Neha
>
>
> On Fri, Sep 13, 2013 at 4:47 AM, prashant amar <am...@gmail.com>
> wrote:
>
> > I am using kafka 08 version ...
> >
> >
> > On Thu, Sep 12, 2013 at 8:44 PM, Jun Rao <ju...@gmail.com> wrote:
> >
> > > Which revision of 0.8 are you using? In a recent change, a producer
> will
> > > stick to a partition for topic.metadata.refresh.interval.ms (defaults
> to
> > > 10
> > > mins) time before picking another partition at random.
> > > Thanks,
> > > Jun
> > >
> > >
> > > On Thu, Sep 12, 2013 at 1:56 PM, prashant amar <am...@gmail.com>
> > > wrote:
> > >
> > > > I created a topic with 4 partitions and for some reason the producer
> is
> > > > pushing only to one partition.
> > > >
> > > > This is consistently happening across all topics that I created ...
> > > >
> > > > Is there a specific configuration that I need to apply to ensure that
> > > load
> > > > is evenly distributed across all partitions?
> > > >
> > > >
> > > > Group           Topic                          Pid Offset
> > >  logSize
> > > >         Lag             Owner
> > > > perfgroup1      perfpayload1                   0   10965
> > 11220
> > > >         255             perfgroup1_XXXX-0
> > > > perfgroup1      perfpayload1                   1   0               0
> > > >         0               perfgroup1_XXXX-1
> > > > perfgroup1      perfpayload1                   2   0               0
> > > >         0               perfgroup1_XXXXX-2
> > > > perfgroup1      perfpayload1                   3   0               0
> > > >         0               perfgroup1_XXXXX-3
> > > >
> > >
> >
>

Re: Producer not distributing across all partitions

Posted by Neha Narkhede <ne...@gmail.com>.

As Jun suggested, one reason could be that the
topic.metadata.refresh.interval.ms is too high. Did you observe if the
distribution improves after topic.metadata.refresh.interval.ms has passed ?

Thanks
Neha


On Fri, Sep 13, 2013 at 4:47 AM, prashant amar <am...@gmail.com> wrote:

> I am using kafka 08 version ...
>
>
> On Thu, Sep 12, 2013 at 8:44 PM, Jun Rao <ju...@gmail.com> wrote:
>
> > Which revision of 0.8 are you using? In a recent change, a producer will
> > stick to a partition for topic.metadata.refresh.interval.ms (defaults to
> > 10
> > mins) time before picking another partition at random.
> > Thanks,
> > Jun
> >
> >
> > On Thu, Sep 12, 2013 at 1:56 PM, prashant amar <am...@gmail.com>
> > wrote:
> >
> > > I created a topic with 4 partitions and for some reason the producer is
> > > pushing only to one partition.
> > >
> > > This is consistently happening across all topics that I created ...
> > >
> > > Is there a specific configuration that I need to apply to ensure that
> > load
> > > is evenly distributed across all partitions?
> > >
> > >
> > > Group           Topic                          Pid Offset
> >  logSize
> > >         Lag             Owner
> > > perfgroup1      perfpayload1                   0   10965
> 11220
> > >         255             perfgroup1_XXXX-0
> > > perfgroup1      perfpayload1                   1   0               0
> > >         0               perfgroup1_XXXX-1
> > > perfgroup1      perfpayload1                   2   0               0
> > >         0               perfgroup1_XXXXX-2
> > > perfgroup1      perfpayload1                   3   0               0
> > >         0               perfgroup1_XXXXX-3
> > >
> >
>

Re: Producer not distributing across all partitions

Posted by prashant amar <am...@gmail.com>.

I am using kafka 08 version ...


On Thu, Sep 12, 2013 at 8:44 PM, Jun Rao <ju...@gmail.com> wrote:

> Which revision of 0.8 are you using? In a recent change, a producer will
> stick to a partition for topic.metadata.refresh.interval.ms (defaults to
> 10
> mins) time before picking another partition at random.
> Thanks,
> Jun
>
>
> On Thu, Sep 12, 2013 at 1:56 PM, prashant amar <am...@gmail.com>
> wrote:
>
> > I created a topic with 4 partitions and for some reason the producer is
> > pushing only to one partition.
> >
> > This is consistently happening across all topics that I created ...
> >
> > Is there a specific configuration that I need to apply to ensure that
> load
> > is evenly distributed across all partitions?
> >
> >
> > Group           Topic                          Pid Offset
>  logSize
> >         Lag             Owner
> > perfgroup1      perfpayload1                   0   10965           11220
> >         255             perfgroup1_XXXX-0
> > perfgroup1      perfpayload1                   1   0               0
> >         0               perfgroup1_XXXX-1
> > perfgroup1      perfpayload1                   2   0               0
> >         0               perfgroup1_XXXXX-2
> > perfgroup1      perfpayload1                   3   0               0
> >         0               perfgroup1_XXXXX-3
> >
>

Re: Producer not distributing across all partitions

Posted by chetan conikee <co...@gmail.com>.

I am using kafka 0.8.0-beta1 ..

Seems like messages are being delivered only to one partition (since
installation)

Should I upgrade or apply a patch to mitigate this issue.

Please advice


On Thu, Sep 12, 2013 at 8:44 PM, Jun Rao <ju...@gmail.com> wrote:

> Which revision of 0.8 are you using? In a recent change, a producer will
> stick to a partition for topic.metadata.refresh.interval.ms (defaults to
> 10
> mins) time before picking another partition at random.
> Thanks,
> Jun
>
>
> On Thu, Sep 12, 2013 at 1:56 PM, prashant amar <am...@gmail.com>
> wrote:
>
> > I created a topic with 4 partitions and for some reason the producer is
> > pushing only to one partition.
> >
> > This is consistently happening across all topics that I created ...
> >
> > Is there a specific configuration that I need to apply to ensure that
> load
> > is evenly distributed across all partitions?
> >
> >
> > Group           Topic                          Pid Offset
>  logSize
> >         Lag             Owner
> > perfgroup1      perfpayload1                   0   10965           11220
> >         255             perfgroup1_XXXX-0
> > perfgroup1      perfpayload1                   1   0               0
> >         0               perfgroup1_XXXX-1
> > perfgroup1      perfpayload1                   2   0               0
> >         0               perfgroup1_XXXXX-2
> > perfgroup1      perfpayload1                   3   0               0
> >         0               perfgroup1_XXXXX-3
> >
>

Re: Producer not distributing across all partitions

Posted by Jun Rao <ju...@gmail.com>.

Which revision of 0.8 are you using? In a recent change, a producer will
stick to a partition for topic.metadata.refresh.interval.ms (defaults to 10
mins) time before picking another partition at random.
Thanks,
Jun


On Thu, Sep 12, 2013 at 1:56 PM, prashant amar <am...@gmail.com> wrote:

> I created a topic with 4 partitions and for some reason the producer is
> pushing only to one partition.
>
> This is consistently happening across all topics that I created ...
>
> Is there a specific configuration that I need to apply to ensure that load
> is evenly distributed across all partitions?
>
>
> Group           Topic                          Pid Offset          logSize
>         Lag             Owner
> perfgroup1      perfpayload1                   0   10965           11220
>         255             perfgroup1_XXXX-0
> perfgroup1      perfpayload1                   1   0               0
>         0               perfgroup1_XXXX-1
> perfgroup1      perfpayload1                   2   0               0
>         0               perfgroup1_XXXXX-2
> perfgroup1      perfpayload1                   3   0               0
>         0               perfgroup1_XXXXX-3
>