You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Prakash Gowri Shankor <pr...@gmail.com> on 2014/06/10 01:13:35 UTC

Strange partitioning behavior with 0.8.1.1

Hi,

This is with 0.8.1.1 and I ran the command line console consumer.
I have one broker, one producer and several consumers. I have one topic,
many partitions m, many consumers n, m=n , one consumer group defined for
all the consumers

>From using Kafka Monitor, I see that each partition is assigned to one
consumer now. However, it seems that there is no parallelism in data
consumption. What I see happening is that one consumer gets messages from
time t0 to t1 from partition P1. Then another consumer gets messages from
t1 to t2 from partition P2 and so on.

*Why is there no parallel consumption happening ?* It looks to me that the
producer's data goes into P1 from t0 to t1 and then from t1 to t2 into P2.
I thought that if I dont specify a partitioning key, the producer's data
will get partitioned randomly. It's just that the randomness seems to be
"delayed". Why is this so ?

I tried setting topic.metadata.refresh.interval.ms=100 in the
producer.properties.

That did not seem to change this strange partitioning behavior.

Please help.

Thanks

Re: Strange partitioning behavior with 0.8.1.1

Posted by Guozhang Wang <wa...@gmail.com>.
In console producer you can specify the producer properties in command line
as metadata-expiry-ms.

You can type just ./kafka-console-producer.sh and it will show you all the
configs that you can specify.

Guozhang


On Wed, Jun 11, 2014 at 10:56 AM, Prakash Gowri Shankor <
prakash.shankor@gmail.com> wrote:

> Guozhang,
>
> I set this in my producer.properties
>
> topic.metadata.refresh.interval.ms=1000
>
> Then I start the console producer as
>
> ./kafka-console-producer.sh --broker-list localhost:9092 --topic test2
>
> I still dont see data being written to different partitions after every 1
> second.
>
> I wonder if the producer is picking up the properties file - I dont see it
> being passed explicitly in the script to the kafka.producer.ConsoleProducer
> class.
>
> -Prakash
>
>
> On Tue, Jun 10, 2014 at 11:04 AM, Guozhang Wang <wa...@gmail.com>
> wrote:
>
> > Yes, reducing the refresh interval to 100ms will cause it to try to
> select
> > another partition every 100ms, not necessarily a different partition
> tough,
> > since it just gets a next random int % num.partitions.
> >
> > Setting the key can also resolve this issue, as long as the key values
> are
> > evenly distributed, since the partition selected is effectively based on
> > key values.
> >
> > Guozhang
> >
> >
> > On Tue, Jun 10, 2014 at 9:54 AM, Prakash Gowri Shankor <
> > prakash.shankor@gmail.com> wrote:
> >
> > > Can you please tell me how to set this property ?
> > > topic.metadata.refresh.interval.ms
> > > Is a value of 100 low enough to solve this issue ?
> > > Im guessing I can set it to 100 and restart the command line producer
> and
> > > the partitioning should work ? Please confirm.
> > >
> > > Thanks
> > >
> > >
> > > On Mon, Jun 9, 2014 at 5:09 PM, Prakash Gowri Shankor <
> > > prakash.shankor@gmail.com> wrote:
> > >
> > > > Thank you Guozhang.
> > > > I've specified how i set and use the property in my previous mail.
> Can
> > > you
> > > > tell me if that is fine ?
> > > > I also noticed that the kafka-console-producer.sh takes a custom
> > > > property(key-value) on the command line. Would it help to set this
> > > property
> > > > directly on the command line of the producer script ?
> > > >
> > > >
> > > > On Mon, Jun 9, 2014 at 5:06 PM, Guozhang Wang <wa...@gmail.com>
> > > wrote:
> > > >
> > > >> In the new producer we are changing the default behavior back to
> pure
> > > >> random partitioning and let users to customize their own
> partitioning
> > > >> schemes if they want. For now reducing
> > > topic.metadata.refresh.interval.ms
> > > >> should help because the stickiness only persists until a metadata
> > > refresh.
> > > >>
> > > >> Guozhang
> > > >>
> > > >>
> > > >> On Mon, Jun 9, 2014 at 4:54 PM, Prakash Gowri Shankor <
> > > >> prakash.shankor@gmail.com> wrote:
> > > >>
> > > >> > Is there a way to modify this duration ? This is not adhering to
> the
> > > >> > "random" behavior that the documentation talks about.
> > > >> >
> > > >> >
> > > >> > On Mon, Jun 9, 2014 at 4:41 PM, Kane Kane <ka...@gmail.com>
> > > >> wrote:
> > > >> >
> > > >> > > Last time I've checked it, producer sticks to partition for 10
> > > >> minutes.
> > > >> > >
> > > >> > > On Mon, Jun 9, 2014 at 4:13 PM, Prakash Gowri Shankor
> > > >> > > <pr...@gmail.com> wrote:
> > > >> > > > Hi,
> > > >> > > >
> > > >> > > > This is with 0.8.1.1 and I ran the command line console
> > consumer.
> > > >> > > > I have one broker, one producer and several consumers. I have
> > one
> > > >> > topic,
> > > >> > > > many partitions m, many consumers n, m=n , one consumer group
> > > >> defined
> > > >> > for
> > > >> > > > all the consumers
> > > >> > > >
> > > >> > > > From using Kafka Monitor, I see that each partition is
> assigned
> > to
> > > >> one
> > > >> > > > consumer now. However, it seems that there is no parallelism
> in
> > > data
> > > >> > > > consumption. What I see happening is that one consumer gets
> > > messages
> > > >> > from
> > > >> > > > time t0 to t1 from partition P1. Then another consumer gets
> > > messages
> > > >> > from
> > > >> > > > t1 to t2 from partition P2 and so on.
> > > >> > > >
> > > >> > > > *Why is there no parallel consumption happening ?* It looks to
> > me
> > > >> that
> > > >> > > the
> > > >> > > > producer's data goes into P1 from t0 to t1 and then from t1 to
> > t2
> > > >> into
> > > >> > > P2.
> > > >> > > > I thought that if I dont specify a partitioning key, the
> > > producer's
> > > >> > data
> > > >> > > > will get partitioned randomly. It's just that the randomness
> > seems
> > > >> to
> > > >> > be
> > > >> > > > "delayed". Why is this so ?
> > > >> > > >
> > > >> > > > I tried setting topic.metadata.refresh.interval.ms=100 in the
> > > >> > > > producer.properties.
> > > >> > > >
> > > >> > > > That did not seem to change this strange partitioning
> behavior.
> > > >> > > >
> > > >> > > > Please help.
> > > >> > > >
> > > >> > > > Thanks
> > > >> > >
> > > >> >
> > > >>
> > > >>
> > > >>
> > > >> --
> > > >> -- Guozhang
> > > >>
> > > >
> > > >
> > >
> >
> >
> >
> > --
> > -- Guozhang
> >
>



-- 
-- Guozhang

Re: Strange partitioning behavior with 0.8.1.1

Posted by Prakash Gowri Shankor <pr...@gmail.com>.
Guozhang,

I set this in my producer.properties

topic.metadata.refresh.interval.ms=1000

Then I start the console producer as

./kafka-console-producer.sh --broker-list localhost:9092 --topic test2

I still dont see data being written to different partitions after every 1
second.

I wonder if the producer is picking up the properties file - I dont see it
being passed explicitly in the script to the kafka.producer.ConsoleProducer
class.

-Prakash


On Tue, Jun 10, 2014 at 11:04 AM, Guozhang Wang <wa...@gmail.com> wrote:

> Yes, reducing the refresh interval to 100ms will cause it to try to select
> another partition every 100ms, not necessarily a different partition tough,
> since it just gets a next random int % num.partitions.
>
> Setting the key can also resolve this issue, as long as the key values are
> evenly distributed, since the partition selected is effectively based on
> key values.
>
> Guozhang
>
>
> On Tue, Jun 10, 2014 at 9:54 AM, Prakash Gowri Shankor <
> prakash.shankor@gmail.com> wrote:
>
> > Can you please tell me how to set this property ?
> > topic.metadata.refresh.interval.ms
> > Is a value of 100 low enough to solve this issue ?
> > Im guessing I can set it to 100 and restart the command line producer and
> > the partitioning should work ? Please confirm.
> >
> > Thanks
> >
> >
> > On Mon, Jun 9, 2014 at 5:09 PM, Prakash Gowri Shankor <
> > prakash.shankor@gmail.com> wrote:
> >
> > > Thank you Guozhang.
> > > I've specified how i set and use the property in my previous mail. Can
> > you
> > > tell me if that is fine ?
> > > I also noticed that the kafka-console-producer.sh takes a custom
> > > property(key-value) on the command line. Would it help to set this
> > property
> > > directly on the command line of the producer script ?
> > >
> > >
> > > On Mon, Jun 9, 2014 at 5:06 PM, Guozhang Wang <wa...@gmail.com>
> > wrote:
> > >
> > >> In the new producer we are changing the default behavior back to pure
> > >> random partitioning and let users to customize their own partitioning
> > >> schemes if they want. For now reducing
> > topic.metadata.refresh.interval.ms
> > >> should help because the stickiness only persists until a metadata
> > refresh.
> > >>
> > >> Guozhang
> > >>
> > >>
> > >> On Mon, Jun 9, 2014 at 4:54 PM, Prakash Gowri Shankor <
> > >> prakash.shankor@gmail.com> wrote:
> > >>
> > >> > Is there a way to modify this duration ? This is not adhering to the
> > >> > "random" behavior that the documentation talks about.
> > >> >
> > >> >
> > >> > On Mon, Jun 9, 2014 at 4:41 PM, Kane Kane <ka...@gmail.com>
> > >> wrote:
> > >> >
> > >> > > Last time I've checked it, producer sticks to partition for 10
> > >> minutes.
> > >> > >
> > >> > > On Mon, Jun 9, 2014 at 4:13 PM, Prakash Gowri Shankor
> > >> > > <pr...@gmail.com> wrote:
> > >> > > > Hi,
> > >> > > >
> > >> > > > This is with 0.8.1.1 and I ran the command line console
> consumer.
> > >> > > > I have one broker, one producer and several consumers. I have
> one
> > >> > topic,
> > >> > > > many partitions m, many consumers n, m=n , one consumer group
> > >> defined
> > >> > for
> > >> > > > all the consumers
> > >> > > >
> > >> > > > From using Kafka Monitor, I see that each partition is assigned
> to
> > >> one
> > >> > > > consumer now. However, it seems that there is no parallelism in
> > data
> > >> > > > consumption. What I see happening is that one consumer gets
> > messages
> > >> > from
> > >> > > > time t0 to t1 from partition P1. Then another consumer gets
> > messages
> > >> > from
> > >> > > > t1 to t2 from partition P2 and so on.
> > >> > > >
> > >> > > > *Why is there no parallel consumption happening ?* It looks to
> me
> > >> that
> > >> > > the
> > >> > > > producer's data goes into P1 from t0 to t1 and then from t1 to
> t2
> > >> into
> > >> > > P2.
> > >> > > > I thought that if I dont specify a partitioning key, the
> > producer's
> > >> > data
> > >> > > > will get partitioned randomly. It's just that the randomness
> seems
> > >> to
> > >> > be
> > >> > > > "delayed". Why is this so ?
> > >> > > >
> > >> > > > I tried setting topic.metadata.refresh.interval.ms=100 in the
> > >> > > > producer.properties.
> > >> > > >
> > >> > > > That did not seem to change this strange partitioning behavior.
> > >> > > >
> > >> > > > Please help.
> > >> > > >
> > >> > > > Thanks
> > >> > >
> > >> >
> > >>
> > >>
> > >>
> > >> --
> > >> -- Guozhang
> > >>
> > >
> > >
> >
>
>
>
> --
> -- Guozhang
>

Re: Strange partitioning behavior with 0.8.1.1

Posted by Guozhang Wang <wa...@gmail.com>.
Yes, reducing the refresh interval to 100ms will cause it to try to select
another partition every 100ms, not necessarily a different partition tough,
since it just gets a next random int % num.partitions.

Setting the key can also resolve this issue, as long as the key values are
evenly distributed, since the partition selected is effectively based on
key values.

Guozhang


On Tue, Jun 10, 2014 at 9:54 AM, Prakash Gowri Shankor <
prakash.shankor@gmail.com> wrote:

> Can you please tell me how to set this property ?
> topic.metadata.refresh.interval.ms
> Is a value of 100 low enough to solve this issue ?
> Im guessing I can set it to 100 and restart the command line producer and
> the partitioning should work ? Please confirm.
>
> Thanks
>
>
> On Mon, Jun 9, 2014 at 5:09 PM, Prakash Gowri Shankor <
> prakash.shankor@gmail.com> wrote:
>
> > Thank you Guozhang.
> > I've specified how i set and use the property in my previous mail. Can
> you
> > tell me if that is fine ?
> > I also noticed that the kafka-console-producer.sh takes a custom
> > property(key-value) on the command line. Would it help to set this
> property
> > directly on the command line of the producer script ?
> >
> >
> > On Mon, Jun 9, 2014 at 5:06 PM, Guozhang Wang <wa...@gmail.com>
> wrote:
> >
> >> In the new producer we are changing the default behavior back to pure
> >> random partitioning and let users to customize their own partitioning
> >> schemes if they want. For now reducing
> topic.metadata.refresh.interval.ms
> >> should help because the stickiness only persists until a metadata
> refresh.
> >>
> >> Guozhang
> >>
> >>
> >> On Mon, Jun 9, 2014 at 4:54 PM, Prakash Gowri Shankor <
> >> prakash.shankor@gmail.com> wrote:
> >>
> >> > Is there a way to modify this duration ? This is not adhering to the
> >> > "random" behavior that the documentation talks about.
> >> >
> >> >
> >> > On Mon, Jun 9, 2014 at 4:41 PM, Kane Kane <ka...@gmail.com>
> >> wrote:
> >> >
> >> > > Last time I've checked it, producer sticks to partition for 10
> >> minutes.
> >> > >
> >> > > On Mon, Jun 9, 2014 at 4:13 PM, Prakash Gowri Shankor
> >> > > <pr...@gmail.com> wrote:
> >> > > > Hi,
> >> > > >
> >> > > > This is with 0.8.1.1 and I ran the command line console consumer.
> >> > > > I have one broker, one producer and several consumers. I have one
> >> > topic,
> >> > > > many partitions m, many consumers n, m=n , one consumer group
> >> defined
> >> > for
> >> > > > all the consumers
> >> > > >
> >> > > > From using Kafka Monitor, I see that each partition is assigned to
> >> one
> >> > > > consumer now. However, it seems that there is no parallelism in
> data
> >> > > > consumption. What I see happening is that one consumer gets
> messages
> >> > from
> >> > > > time t0 to t1 from partition P1. Then another consumer gets
> messages
> >> > from
> >> > > > t1 to t2 from partition P2 and so on.
> >> > > >
> >> > > > *Why is there no parallel consumption happening ?* It looks to me
> >> that
> >> > > the
> >> > > > producer's data goes into P1 from t0 to t1 and then from t1 to t2
> >> into
> >> > > P2.
> >> > > > I thought that if I dont specify a partitioning key, the
> producer's
> >> > data
> >> > > > will get partitioned randomly. It's just that the randomness seems
> >> to
> >> > be
> >> > > > "delayed". Why is this so ?
> >> > > >
> >> > > > I tried setting topic.metadata.refresh.interval.ms=100 in the
> >> > > > producer.properties.
> >> > > >
> >> > > > That did not seem to change this strange partitioning behavior.
> >> > > >
> >> > > > Please help.
> >> > > >
> >> > > > Thanks
> >> > >
> >> >
> >>
> >>
> >>
> >> --
> >> -- Guozhang
> >>
> >
> >
>



-- 
-- Guozhang

Re: Strange partitioning behavior with 0.8.1.1

Posted by Prakash Gowri Shankor <pr...@gmail.com>.
Can you please tell me how to set this property ?
topic.metadata.refresh.interval.ms
Is a value of 100 low enough to solve this issue ?
Im guessing I can set it to 100 and restart the command line producer and
the partitioning should work ? Please confirm.

Thanks


On Mon, Jun 9, 2014 at 5:09 PM, Prakash Gowri Shankor <
prakash.shankor@gmail.com> wrote:

> Thank you Guozhang.
> I've specified how i set and use the property in my previous mail. Can you
> tell me if that is fine ?
> I also noticed that the kafka-console-producer.sh takes a custom
> property(key-value) on the command line. Would it help to set this property
> directly on the command line of the producer script ?
>
>
> On Mon, Jun 9, 2014 at 5:06 PM, Guozhang Wang <wa...@gmail.com> wrote:
>
>> In the new producer we are changing the default behavior back to pure
>> random partitioning and let users to customize their own partitioning
>> schemes if they want. For now reducing topic.metadata.refresh.interval.ms
>> should help because the stickiness only persists until a metadata refresh.
>>
>> Guozhang
>>
>>
>> On Mon, Jun 9, 2014 at 4:54 PM, Prakash Gowri Shankor <
>> prakash.shankor@gmail.com> wrote:
>>
>> > Is there a way to modify this duration ? This is not adhering to the
>> > "random" behavior that the documentation talks about.
>> >
>> >
>> > On Mon, Jun 9, 2014 at 4:41 PM, Kane Kane <ka...@gmail.com>
>> wrote:
>> >
>> > > Last time I've checked it, producer sticks to partition for 10
>> minutes.
>> > >
>> > > On Mon, Jun 9, 2014 at 4:13 PM, Prakash Gowri Shankor
>> > > <pr...@gmail.com> wrote:
>> > > > Hi,
>> > > >
>> > > > This is with 0.8.1.1 and I ran the command line console consumer.
>> > > > I have one broker, one producer and several consumers. I have one
>> > topic,
>> > > > many partitions m, many consumers n, m=n , one consumer group
>> defined
>> > for
>> > > > all the consumers
>> > > >
>> > > > From using Kafka Monitor, I see that each partition is assigned to
>> one
>> > > > consumer now. However, it seems that there is no parallelism in data
>> > > > consumption. What I see happening is that one consumer gets messages
>> > from
>> > > > time t0 to t1 from partition P1. Then another consumer gets messages
>> > from
>> > > > t1 to t2 from partition P2 and so on.
>> > > >
>> > > > *Why is there no parallel consumption happening ?* It looks to me
>> that
>> > > the
>> > > > producer's data goes into P1 from t0 to t1 and then from t1 to t2
>> into
>> > > P2.
>> > > > I thought that if I dont specify a partitioning key, the producer's
>> > data
>> > > > will get partitioned randomly. It's just that the randomness seems
>> to
>> > be
>> > > > "delayed". Why is this so ?
>> > > >
>> > > > I tried setting topic.metadata.refresh.interval.ms=100 in the
>> > > > producer.properties.
>> > > >
>> > > > That did not seem to change this strange partitioning behavior.
>> > > >
>> > > > Please help.
>> > > >
>> > > > Thanks
>> > >
>> >
>>
>>
>>
>> --
>> -- Guozhang
>>
>
>

Re: Strange partitioning behavior with 0.8.1.1

Posted by Prakash Gowri Shankor <pr...@gmail.com>.
Thank you Guozhang.
I've specified how i set and use the property in my previous mail. Can you
tell me if that is fine ?
I also noticed that the kafka-console-producer.sh takes a custom
property(key-value) on the command line. Would it help to set this property
directly on the command line of the producer script ?


On Mon, Jun 9, 2014 at 5:06 PM, Guozhang Wang <wa...@gmail.com> wrote:

> In the new producer we are changing the default behavior back to pure
> random partitioning and let users to customize their own partitioning
> schemes if they want. For now reducing topic.metadata.refresh.interval.ms
> should help because the stickiness only persists until a metadata refresh.
>
> Guozhang
>
>
> On Mon, Jun 9, 2014 at 4:54 PM, Prakash Gowri Shankor <
> prakash.shankor@gmail.com> wrote:
>
> > Is there a way to modify this duration ? This is not adhering to the
> > "random" behavior that the documentation talks about.
> >
> >
> > On Mon, Jun 9, 2014 at 4:41 PM, Kane Kane <ka...@gmail.com> wrote:
> >
> > > Last time I've checked it, producer sticks to partition for 10 minutes.
> > >
> > > On Mon, Jun 9, 2014 at 4:13 PM, Prakash Gowri Shankor
> > > <pr...@gmail.com> wrote:
> > > > Hi,
> > > >
> > > > This is with 0.8.1.1 and I ran the command line console consumer.
> > > > I have one broker, one producer and several consumers. I have one
> > topic,
> > > > many partitions m, many consumers n, m=n , one consumer group defined
> > for
> > > > all the consumers
> > > >
> > > > From using Kafka Monitor, I see that each partition is assigned to
> one
> > > > consumer now. However, it seems that there is no parallelism in data
> > > > consumption. What I see happening is that one consumer gets messages
> > from
> > > > time t0 to t1 from partition P1. Then another consumer gets messages
> > from
> > > > t1 to t2 from partition P2 and so on.
> > > >
> > > > *Why is there no parallel consumption happening ?* It looks to me
> that
> > > the
> > > > producer's data goes into P1 from t0 to t1 and then from t1 to t2
> into
> > > P2.
> > > > I thought that if I dont specify a partitioning key, the producer's
> > data
> > > > will get partitioned randomly. It's just that the randomness seems to
> > be
> > > > "delayed". Why is this so ?
> > > >
> > > > I tried setting topic.metadata.refresh.interval.ms=100 in the
> > > > producer.properties.
> > > >
> > > > That did not seem to change this strange partitioning behavior.
> > > >
> > > > Please help.
> > > >
> > > > Thanks
> > >
> >
>
>
>
> --
> -- Guozhang
>

Re: Strange partitioning behavior with 0.8.1.1

Posted by Guozhang Wang <wa...@gmail.com>.
In the new producer we are changing the default behavior back to pure
random partitioning and let users to customize their own partitioning
schemes if they want. For now reducing topic.metadata.refresh.interval.ms
should help because the stickiness only persists until a metadata refresh.

Guozhang


On Mon, Jun 9, 2014 at 4:54 PM, Prakash Gowri Shankor <
prakash.shankor@gmail.com> wrote:

> Is there a way to modify this duration ? This is not adhering to the
> "random" behavior that the documentation talks about.
>
>
> On Mon, Jun 9, 2014 at 4:41 PM, Kane Kane <ka...@gmail.com> wrote:
>
> > Last time I've checked it, producer sticks to partition for 10 minutes.
> >
> > On Mon, Jun 9, 2014 at 4:13 PM, Prakash Gowri Shankor
> > <pr...@gmail.com> wrote:
> > > Hi,
> > >
> > > This is with 0.8.1.1 and I ran the command line console consumer.
> > > I have one broker, one producer and several consumers. I have one
> topic,
> > > many partitions m, many consumers n, m=n , one consumer group defined
> for
> > > all the consumers
> > >
> > > From using Kafka Monitor, I see that each partition is assigned to one
> > > consumer now. However, it seems that there is no parallelism in data
> > > consumption. What I see happening is that one consumer gets messages
> from
> > > time t0 to t1 from partition P1. Then another consumer gets messages
> from
> > > t1 to t2 from partition P2 and so on.
> > >
> > > *Why is there no parallel consumption happening ?* It looks to me that
> > the
> > > producer's data goes into P1 from t0 to t1 and then from t1 to t2 into
> > P2.
> > > I thought that if I dont specify a partitioning key, the producer's
> data
> > > will get partitioned randomly. It's just that the randomness seems to
> be
> > > "delayed". Why is this so ?
> > >
> > > I tried setting topic.metadata.refresh.interval.ms=100 in the
> > > producer.properties.
> > >
> > > That did not seem to change this strange partitioning behavior.
> > >
> > > Please help.
> > >
> > > Thanks
> >
>



-- 
-- Guozhang

Re: Strange partitioning behavior with 0.8.1.1

Posted by Prakash Gowri Shankor <pr...@gmail.com>.
Is there a way to modify this duration ? This is not adhering to the
"random" behavior that the documentation talks about.


On Mon, Jun 9, 2014 at 4:41 PM, Kane Kane <ka...@gmail.com> wrote:

> Last time I've checked it, producer sticks to partition for 10 minutes.
>
> On Mon, Jun 9, 2014 at 4:13 PM, Prakash Gowri Shankor
> <pr...@gmail.com> wrote:
> > Hi,
> >
> > This is with 0.8.1.1 and I ran the command line console consumer.
> > I have one broker, one producer and several consumers. I have one topic,
> > many partitions m, many consumers n, m=n , one consumer group defined for
> > all the consumers
> >
> > From using Kafka Monitor, I see that each partition is assigned to one
> > consumer now. However, it seems that there is no parallelism in data
> > consumption. What I see happening is that one consumer gets messages from
> > time t0 to t1 from partition P1. Then another consumer gets messages from
> > t1 to t2 from partition P2 and so on.
> >
> > *Why is there no parallel consumption happening ?* It looks to me that
> the
> > producer's data goes into P1 from t0 to t1 and then from t1 to t2 into
> P2.
> > I thought that if I dont specify a partitioning key, the producer's data
> > will get partitioned randomly. It's just that the randomness seems to be
> > "delayed". Why is this so ?
> >
> > I tried setting topic.metadata.refresh.interval.ms=100 in the
> > producer.properties.
> >
> > That did not seem to change this strange partitioning behavior.
> >
> > Please help.
> >
> > Thanks
>

Re: Strange partitioning behavior with 0.8.1.1

Posted by Prakash Gowri Shankor <pr...@gmail.com>.
I have seen that mail thread. Here is what i tried:

In my producer.properties I set topic.metadata.refresh.interval.ms=1000. I
guess this means that the a different partition will be selected every
second.

Then I restart my producer as : ./kafka-console-producer.sh --broker-list
localhost:9092 --topic test2

Then I generate more data for a minute or so, I still see it going to one
partition.

Is this the right way to use the refresh property ?


On Mon, Jun 9, 2014 at 4:56 PM, Guozhang Wang <wa...@gmail.com> wrote:

> Kane is right, please see this FAQ for details:
>
>
> https://cwiki.apache.org/confluence/display/KAFKA/FAQ#FAQ-Whyisdatanotevenlydistributedamongpartitionswhenapartitioningkeyisnotspecified
> ?
>
>
> On Mon, Jun 9, 2014 at 4:41 PM, Kane Kane <ka...@gmail.com> wrote:
>
> > Last time I've checked it, producer sticks to partition for 10 minutes.
> >
> > On Mon, Jun 9, 2014 at 4:13 PM, Prakash Gowri Shankor
> > <pr...@gmail.com> wrote:
> > > Hi,
> > >
> > > This is with 0.8.1.1 and I ran the command line console consumer.
> > > I have one broker, one producer and several consumers. I have one
> topic,
> > > many partitions m, many consumers n, m=n , one consumer group defined
> for
> > > all the consumers
> > >
> > > From using Kafka Monitor, I see that each partition is assigned to one
> > > consumer now. However, it seems that there is no parallelism in data
> > > consumption. What I see happening is that one consumer gets messages
> from
> > > time t0 to t1 from partition P1. Then another consumer gets messages
> from
> > > t1 to t2 from partition P2 and so on.
> > >
> > > *Why is there no parallel consumption happening ?* It looks to me that
> > the
> > > producer's data goes into P1 from t0 to t1 and then from t1 to t2 into
> > P2.
> > > I thought that if I dont specify a partitioning key, the producer's
> data
> > > will get partitioned randomly. It's just that the randomness seems to
> be
> > > "delayed". Why is this so ?
> > >
> > > I tried setting topic.metadata.refresh.interval.ms=100 in the
> > > producer.properties.
> > >
> > > That did not seem to change this strange partitioning behavior.
> > >
> > > Please help.
> > >
> > > Thanks
> >
>
>
>
> --
> -- Guozhang
>

Re: Strange partitioning behavior with 0.8.1.1

Posted by Guozhang Wang <wa...@gmail.com>.
Kane is right, please see this FAQ for details:

https://cwiki.apache.org/confluence/display/KAFKA/FAQ#FAQ-Whyisdatanotevenlydistributedamongpartitionswhenapartitioningkeyisnotspecified
?


On Mon, Jun 9, 2014 at 4:41 PM, Kane Kane <ka...@gmail.com> wrote:

> Last time I've checked it, producer sticks to partition for 10 minutes.
>
> On Mon, Jun 9, 2014 at 4:13 PM, Prakash Gowri Shankor
> <pr...@gmail.com> wrote:
> > Hi,
> >
> > This is with 0.8.1.1 and I ran the command line console consumer.
> > I have one broker, one producer and several consumers. I have one topic,
> > many partitions m, many consumers n, m=n , one consumer group defined for
> > all the consumers
> >
> > From using Kafka Monitor, I see that each partition is assigned to one
> > consumer now. However, it seems that there is no parallelism in data
> > consumption. What I see happening is that one consumer gets messages from
> > time t0 to t1 from partition P1. Then another consumer gets messages from
> > t1 to t2 from partition P2 and so on.
> >
> > *Why is there no parallel consumption happening ?* It looks to me that
> the
> > producer's data goes into P1 from t0 to t1 and then from t1 to t2 into
> P2.
> > I thought that if I dont specify a partitioning key, the producer's data
> > will get partitioned randomly. It's just that the randomness seems to be
> > "delayed". Why is this so ?
> >
> > I tried setting topic.metadata.refresh.interval.ms=100 in the
> > producer.properties.
> >
> > That did not seem to change this strange partitioning behavior.
> >
> > Please help.
> >
> > Thanks
>



-- 
-- Guozhang

Re: Strange partitioning behavior with 0.8.1.1

Posted by Kane Kane <ka...@gmail.com>.
Last time I've checked it, producer sticks to partition for 10 minutes.

On Mon, Jun 9, 2014 at 4:13 PM, Prakash Gowri Shankor
<pr...@gmail.com> wrote:
> Hi,
>
> This is with 0.8.1.1 and I ran the command line console consumer.
> I have one broker, one producer and several consumers. I have one topic,
> many partitions m, many consumers n, m=n , one consumer group defined for
> all the consumers
>
> From using Kafka Monitor, I see that each partition is assigned to one
> consumer now. However, it seems that there is no parallelism in data
> consumption. What I see happening is that one consumer gets messages from
> time t0 to t1 from partition P1. Then another consumer gets messages from
> t1 to t2 from partition P2 and so on.
>
> *Why is there no parallel consumption happening ?* It looks to me that the
> producer's data goes into P1 from t0 to t1 and then from t1 to t2 into P2.
> I thought that if I dont specify a partitioning key, the producer's data
> will get partitioned randomly. It's just that the randomness seems to be
> "delayed". Why is this so ?
>
> I tried setting topic.metadata.refresh.interval.ms=100 in the
> producer.properties.
>
> That did not seem to change this strange partitioning behavior.
>
> Please help.
>
> Thanks