You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Virendra Pratap Singh <vp...@yahoo-inc.com.INVALID> on 2014/06/25 01:44:46 UTC
Uneven distribution of kafka topic partitions across multiple
brokers
Have a kafka cluster with 10 brokers (kafka 0.8.0). All of the brokers were setup upfront. None was added later. Default number of partition is set to 4 and default replication to 2.
Have 3 topics in the system. None of these topics are manually created upfront, when the cluster is setup. So relying on kafka to automatically create these topics when the producer(s) send data first time for each of these topics.
We have multiple producer which will emit data for all of these topics at any point of time. What it means is that kafka will be hit with producer request simultaneously from multiple producer for producer request for these 3 topics.
What is observed is the topics partitions do not get spread out evenly in this scenario. There are 10 brokers (ids 1-10) so expectation is that 3 * 4 = 12 topic partitions should be spread out on all 10 servers. However in this case the first 2 brokers share most of the load and few partitions are spread out. The same is true for the replicated instances also.
Here is the dump of list topic
topic: topic1 partition: 0 leader: 1 replicas: 1,2 isr: 1,2
topic: topic1 partition: 1 leader: 2 replicas: 2,1 isr: 2,1
topic: topic1 partition: 2 leader: 1 replicas: 1,2 isr: 1,2
topic: topic1 partition: 3 leader: 2 replicas: 2,1 isr: 2,1
topic: topic2 partition: 0 leader: 9 replicas: 9,4 isr: 9,4
topic: topic2 partition: 1 leader: 10 replicas: 10,5 isr: 10,5
topic: topic2 partition: 2 leader: 1 replicas: 1,6 isr: 1,6
topic: topic2 partition: 3 leader: 2 replicas: 2,7 isr: 2,7
topic: topic3 partition: 0 leader: 2 replicas: 2,1 isr: 2,1
topic: topic3 partition: 1 leader: 1 replicas: 1,2 isr: 1,2
topic: topic3 partition: 2 leader: 2 replicas: 2,1 isr: 2,1
topic: topic3 partition: 3 leader: 1 replicas: 1,2 isr: 1,2
So what is my options to have kafka evenly distribute the topic partitions? Would pre creating the topics via create topic command help?
Regards,
Virendra
Re: Uneven distribution of kafka topic partitions across multiple
brokers
Posted by Virendra Pratap Singh <vp...@yahoo-inc.com.INVALID>.
Hi Joe,
Thanks for the info. I am aware of the reassignment thingy. I was
trying to understand why the uneven distribution in the first place.
Regards,
Virendra
On 6/24/14, 8:41 PM, "Joe Stein" <jo...@stealth.ly> wrote:
>Take a look at
>
>bin/kafka-reassign-partitions.sh
>
>Option Description
>
>------ -----------
>
>--broker-list <brokerlist> The list of brokers to which the
>
> partitions need to be reassigned
>in
> the form "0,1,2". This is
>required
> if --topics-to-move-json-file is
>
> used to generate reassignment
>
> configuration
>
>--execute Kick off the reassignment as
>specified
> by the --reassignment-json-file
>
> option.
>
>--generate Generate a candidate partition
>
> reassignment configuration. Note
>
> that this only generates a
>candidate
> assignment, it does not execute
>it.
>--reassignment-json-file <manual The JSON file with the partition
>
> assignment json file path> reassignment configurationThe
>format
> to use is -
>
> {"partitions":
>
> [{"topic": "foo",
>
> "partition": 1,
>
> "replicas": [1,2,3] }],
>
> "version":1
>
> }
>
>--topics-to-move-json-file <topics to Generate a reassignment
>configuration
> reassign json file path> to move the partitions of the
>
> specified topics to the list of
>
> brokers specified by the
>--broker-
> list option. The format to use
>is
>-
> {"topics":
>
> [{"topic": "foo"},{"topic":
>"foo1"}],
> "version":1
>
> }
>
>--verify Verify if the reassignment
>completed
> as specified by the
>--reassignment-
> json-file option.
>
>--zookeeper <urls> REQUIRED: The connection string
>for
>
> the zookeeper connection in the
>form
> host:port. Multiple URLS can be
>
> given to allow fail-over.
>
>Command must include exactly one action: --generate, --execute or --verify
>
>/*******************************************
> Joe Stein
> Founder, Principal Consultant
> Big Data Open Source Security LLC
> http://www.stealth.ly
> Twitter: @allthingshadoop <http://www.twitter.com/allthingshadoop>
>********************************************/
>
>
>On Tue, Jun 24, 2014 at 7:44 PM, Virendra Pratap Singh <
>vpsingh@yahoo-inc.com.invalid> wrote:
>
>> Have a kafka cluster with 10 brokers (kafka 0.8.0). All of the brokers
>> were setup upfront. None was added later. Default number of partition is
>> set to 4 and default replication to 2.
>> Have 3 topics in the system. None of these topics are manually created
>> upfront, when the cluster is setup. So relying on kafka to automatically
>> create these topics when the producer(s) send data first time for each
>>of
>> these topics.
>> We have multiple producer which will emit data for all of these topics
>>at
>> any point of time. What it means is that kafka will be hit with producer
>> request simultaneously from multiple producer for producer request for
>> these 3 topics.
>>
>> What is observed is the topics partitions do not get spread out evenly
>>in
>> this scenario. There are 10 brokers (ids 1-10) so expectation is that 3
>>* 4
>> = 12 topic partitions should be spread out on all 10 servers. However in
>> this case the first 2 brokers share most of the load and few partitions
>>are
>> spread out. The same is true for the replicated instances also.
>>
>> Here is the dump of list topic
>>
>> topic: topic1 partition: 0 leader: 1 replicas: 1,2 isr: 1,2
>> topic: topic1 partition: 1 leader: 2 replicas: 2,1 isr: 2,1
>> topic: topic1 partition: 2 leader: 1 replicas: 1,2 isr: 1,2
>> topic: topic1 partition: 3 leader: 2 replicas: 2,1 isr: 2,1
>> topic: topic2 partition: 0 leader: 9 replicas: 9,4
>>isr:
>> 9,4
>> topic: topic2 partition: 1 leader: 10 replicas: 10,5
>>isr:
>> 10,5
>> topic: topic2 partition: 2 leader: 1 replicas: 1,6
>>isr:
>> 1,6
>> topic: topic2 partition: 3 leader: 2 replicas: 2,7
>>isr:
>> 2,7
>> topic: topic3 partition: 0 leader: 2 replicas: 2,1 isr:
>>2,1
>> topic: topic3 partition: 1 leader: 1 replicas: 1,2 isr:
>>1,2
>> topic: topic3 partition: 2 leader: 2 replicas: 2,1 isr:
>>2,1
>> topic: topic3 partition: 3 leader: 1 replicas: 1,2 isr:
>>1,2
>>
>> So what is my options to have kafka evenly distribute the topic
>> partitions? Would pre creating the topics via create topic command help?
>>
>> Regards,
>> Virendra
>>
Re: Uneven distribution of kafka topic partitions across multiple brokers
Posted by Joe Stein <jo...@stealth.ly>.
Take a look at
bin/kafka-reassign-partitions.sh
Option Description
------ -----------
--broker-list <brokerlist> The list of brokers to which the
partitions need to be reassigned
in
the form "0,1,2". This is
required
if --topics-to-move-json-file is
used to generate reassignment
configuration
--execute Kick off the reassignment as
specified
by the --reassignment-json-file
option.
--generate Generate a candidate partition
reassignment configuration. Note
that this only generates a
candidate
assignment, it does not execute
it.
--reassignment-json-file <manual The JSON file with the partition
assignment json file path> reassignment configurationThe
format
to use is -
{"partitions":
[{"topic": "foo",
"partition": 1,
"replicas": [1,2,3] }],
"version":1
}
--topics-to-move-json-file <topics to Generate a reassignment
configuration
reassign json file path> to move the partitions of the
specified topics to the list of
brokers specified by the
--broker-
list option. The format to use is
-
{"topics":
[{"topic": "foo"},{"topic":
"foo1"}],
"version":1
}
--verify Verify if the reassignment
completed
as specified by the
--reassignment-
json-file option.
--zookeeper <urls> REQUIRED: The connection string for
the zookeeper connection in the
form
host:port. Multiple URLS can be
given to allow fail-over.
Command must include exactly one action: --generate, --execute or --verify
/*******************************************
Joe Stein
Founder, Principal Consultant
Big Data Open Source Security LLC
http://www.stealth.ly
Twitter: @allthingshadoop <http://www.twitter.com/allthingshadoop>
********************************************/
On Tue, Jun 24, 2014 at 7:44 PM, Virendra Pratap Singh <
vpsingh@yahoo-inc.com.invalid> wrote:
> Have a kafka cluster with 10 brokers (kafka 0.8.0). All of the brokers
> were setup upfront. None was added later. Default number of partition is
> set to 4 and default replication to 2.
> Have 3 topics in the system. None of these topics are manually created
> upfront, when the cluster is setup. So relying on kafka to automatically
> create these topics when the producer(s) send data first time for each of
> these topics.
> We have multiple producer which will emit data for all of these topics at
> any point of time. What it means is that kafka will be hit with producer
> request simultaneously from multiple producer for producer request for
> these 3 topics.
>
> What is observed is the topics partitions do not get spread out evenly in
> this scenario. There are 10 brokers (ids 1-10) so expectation is that 3 * 4
> = 12 topic partitions should be spread out on all 10 servers. However in
> this case the first 2 brokers share most of the load and few partitions are
> spread out. The same is true for the replicated instances also.
>
> Here is the dump of list topic
>
> topic: topic1 partition: 0 leader: 1 replicas: 1,2 isr: 1,2
> topic: topic1 partition: 1 leader: 2 replicas: 2,1 isr: 2,1
> topic: topic1 partition: 2 leader: 1 replicas: 1,2 isr: 1,2
> topic: topic1 partition: 3 leader: 2 replicas: 2,1 isr: 2,1
> topic: topic2 partition: 0 leader: 9 replicas: 9,4 isr:
> 9,4
> topic: topic2 partition: 1 leader: 10 replicas: 10,5 isr:
> 10,5
> topic: topic2 partition: 2 leader: 1 replicas: 1,6 isr:
> 1,6
> topic: topic2 partition: 3 leader: 2 replicas: 2,7 isr:
> 2,7
> topic: topic3 partition: 0 leader: 2 replicas: 2,1 isr: 2,1
> topic: topic3 partition: 1 leader: 1 replicas: 1,2 isr: 1,2
> topic: topic3 partition: 2 leader: 2 replicas: 2,1 isr: 2,1
> topic: topic3 partition: 3 leader: 1 replicas: 1,2 isr: 1,2
>
> So what is my options to have kafka evenly distribute the topic
> partitions? Would pre creating the topics via create topic command help?
>
> Regards,
> Virendra
>
Re: Uneven distribution of kafka topic partitions across multiple brokers
Posted by Neha Narkhede <ne...@gmail.com>.
Cool. Thanks for circling back with the verification.
On Wed, Jun 25, 2014 at 2:49 PM, Virendra Pratap Singh <
vpsingh@yahoo-inc.com.invalid> wrote:
> Hi Neha,
>
> You are correct. I checked the controller.log and found that even
> though I had assumed that the producers were started after whole kafka
> cluster, that was not true.
> And topic1 and topic3 creation request came in only when broker 1 and 2
> were alive. And then in split second all the other 8 brokers were up,
> followed by topic2 creation request (and so for that we see even
> distribution).
>
> Regards,
> Virendra
>
> On 6/24/14, 11:06 PM, "Neha Narkhede" <ne...@gmail.com> wrote:
>
> >Looking at the output of list topics, here is what I think happened. When
> >topic1 and topic3 were created, only brokers 1&2 were online and alive.
> >When topic2 was created, almost all brokers were online. Only brokers that
> >are alive at the time of topic creation can be assigned replicas for the
> >topic. I would suggest ensuring that all brokers are alive and repeating
> >the experiment with 0.8.1.1, which is the latest stable release.
> >
> >Thanks,
> >Neha
> >
> >
> >On Tue, Jun 24, 2014 at 4:44 PM, Virendra Pratap Singh <
> >vpsingh@yahoo-inc.com.invalid> wrote:
> >
> >> Have a kafka cluster with 10 brokers (kafka 0.8.0). All of the brokers
> >> were setup upfront. None was added later. Default number of partition is
> >> set to 4 and default replication to 2.
> >> Have 3 topics in the system. None of these topics are manually created
> >> upfront, when the cluster is setup. So relying on kafka to automatically
> >> create these topics when the producer(s) send data first time for each
> >>of
> >> these topics.
> >> We have multiple producer which will emit data for all of these topics
> >>at
> >> any point of time. What it means is that kafka will be hit with producer
> >> request simultaneously from multiple producer for producer request for
> >> these 3 topics.
> >>
> >> What is observed is the topics partitions do not get spread out evenly
> >>in
> >> this scenario. There are 10 brokers (ids 1-10) so expectation is that 3
> >>* 4
> >> = 12 topic partitions should be spread out on all 10 servers. However in
> >> this case the first 2 brokers share most of the load and few partitions
> >>are
> >> spread out. The same is true for the replicated instances also.
> >>
> >> Here is the dump of list topic
> >>
> >> topic: topic1 partition: 0 leader: 1 replicas: 1,2 isr: 1,2
> >> topic: topic1 partition: 1 leader: 2 replicas: 2,1 isr: 2,1
> >> topic: topic1 partition: 2 leader: 1 replicas: 1,2 isr: 1,2
> >> topic: topic1 partition: 3 leader: 2 replicas: 2,1 isr: 2,1
> >> topic: topic2 partition: 0 leader: 9 replicas: 9,4
> >>isr:
> >> 9,4
> >> topic: topic2 partition: 1 leader: 10 replicas: 10,5
> >>isr:
> >> 10,5
> >> topic: topic2 partition: 2 leader: 1 replicas: 1,6
> >>isr:
> >> 1,6
> >> topic: topic2 partition: 3 leader: 2 replicas: 2,7
> >>isr:
> >> 2,7
> >> topic: topic3 partition: 0 leader: 2 replicas: 2,1 isr:
> >>2,1
> >> topic: topic3 partition: 1 leader: 1 replicas: 1,2 isr:
> >>1,2
> >> topic: topic3 partition: 2 leader: 2 replicas: 2,1 isr:
> >>2,1
> >> topic: topic3 partition: 3 leader: 1 replicas: 1,2 isr:
> >>1,2
> >>
> >> So what is my options to have kafka evenly distribute the topic
> >> partitions? Would pre creating the topics via create topic command help?
> >>
> >> Regards,
> >> Virendra
> >>
>
>
Re: Uneven distribution of kafka topic partitions across multiple
brokers
Posted by Virendra Pratap Singh <vp...@yahoo-inc.com.INVALID>.
Hi Neha,
You are correct. I checked the controller.log and found that even
though I had assumed that the producers were started after whole kafka
cluster, that was not true.
And topic1 and topic3 creation request came in only when broker 1 and 2
were alive. And then in split second all the other 8 brokers were up,
followed by topic2 creation request (and so for that we see even
distribution).
Regards,
Virendra
On 6/24/14, 11:06 PM, "Neha Narkhede" <ne...@gmail.com> wrote:
>Looking at the output of list topics, here is what I think happened. When
>topic1 and topic3 were created, only brokers 1&2 were online and alive.
>When topic2 was created, almost all brokers were online. Only brokers that
>are alive at the time of topic creation can be assigned replicas for the
>topic. I would suggest ensuring that all brokers are alive and repeating
>the experiment with 0.8.1.1, which is the latest stable release.
>
>Thanks,
>Neha
>
>
>On Tue, Jun 24, 2014 at 4:44 PM, Virendra Pratap Singh <
>vpsingh@yahoo-inc.com.invalid> wrote:
>
>> Have a kafka cluster with 10 brokers (kafka 0.8.0). All of the brokers
>> were setup upfront. None was added later. Default number of partition is
>> set to 4 and default replication to 2.
>> Have 3 topics in the system. None of these topics are manually created
>> upfront, when the cluster is setup. So relying on kafka to automatically
>> create these topics when the producer(s) send data first time for each
>>of
>> these topics.
>> We have multiple producer which will emit data for all of these topics
>>at
>> any point of time. What it means is that kafka will be hit with producer
>> request simultaneously from multiple producer for producer request for
>> these 3 topics.
>>
>> What is observed is the topics partitions do not get spread out evenly
>>in
>> this scenario. There are 10 brokers (ids 1-10) so expectation is that 3
>>* 4
>> = 12 topic partitions should be spread out on all 10 servers. However in
>> this case the first 2 brokers share most of the load and few partitions
>>are
>> spread out. The same is true for the replicated instances also.
>>
>> Here is the dump of list topic
>>
>> topic: topic1 partition: 0 leader: 1 replicas: 1,2 isr: 1,2
>> topic: topic1 partition: 1 leader: 2 replicas: 2,1 isr: 2,1
>> topic: topic1 partition: 2 leader: 1 replicas: 1,2 isr: 1,2
>> topic: topic1 partition: 3 leader: 2 replicas: 2,1 isr: 2,1
>> topic: topic2 partition: 0 leader: 9 replicas: 9,4
>>isr:
>> 9,4
>> topic: topic2 partition: 1 leader: 10 replicas: 10,5
>>isr:
>> 10,5
>> topic: topic2 partition: 2 leader: 1 replicas: 1,6
>>isr:
>> 1,6
>> topic: topic2 partition: 3 leader: 2 replicas: 2,7
>>isr:
>> 2,7
>> topic: topic3 partition: 0 leader: 2 replicas: 2,1 isr:
>>2,1
>> topic: topic3 partition: 1 leader: 1 replicas: 1,2 isr:
>>1,2
>> topic: topic3 partition: 2 leader: 2 replicas: 2,1 isr:
>>2,1
>> topic: topic3 partition: 3 leader: 1 replicas: 1,2 isr:
>>1,2
>>
>> So what is my options to have kafka evenly distribute the topic
>> partitions? Would pre creating the topics via create topic command help?
>>
>> Regards,
>> Virendra
>>
Re: Uneven distribution of kafka topic partitions across multiple brokers
Posted by Neha Narkhede <ne...@gmail.com>.
Looking at the output of list topics, here is what I think happened. When
topic1 and topic3 were created, only brokers 1&2 were online and alive.
When topic2 was created, almost all brokers were online. Only brokers that
are alive at the time of topic creation can be assigned replicas for the
topic. I would suggest ensuring that all brokers are alive and repeating
the experiment with 0.8.1.1, which is the latest stable release.
Thanks,
Neha
On Tue, Jun 24, 2014 at 4:44 PM, Virendra Pratap Singh <
vpsingh@yahoo-inc.com.invalid> wrote:
> Have a kafka cluster with 10 brokers (kafka 0.8.0). All of the brokers
> were setup upfront. None was added later. Default number of partition is
> set to 4 and default replication to 2.
> Have 3 topics in the system. None of these topics are manually created
> upfront, when the cluster is setup. So relying on kafka to automatically
> create these topics when the producer(s) send data first time for each of
> these topics.
> We have multiple producer which will emit data for all of these topics at
> any point of time. What it means is that kafka will be hit with producer
> request simultaneously from multiple producer for producer request for
> these 3 topics.
>
> What is observed is the topics partitions do not get spread out evenly in
> this scenario. There are 10 brokers (ids 1-10) so expectation is that 3 * 4
> = 12 topic partitions should be spread out on all 10 servers. However in
> this case the first 2 brokers share most of the load and few partitions are
> spread out. The same is true for the replicated instances also.
>
> Here is the dump of list topic
>
> topic: topic1 partition: 0 leader: 1 replicas: 1,2 isr: 1,2
> topic: topic1 partition: 1 leader: 2 replicas: 2,1 isr: 2,1
> topic: topic1 partition: 2 leader: 1 replicas: 1,2 isr: 1,2
> topic: topic1 partition: 3 leader: 2 replicas: 2,1 isr: 2,1
> topic: topic2 partition: 0 leader: 9 replicas: 9,4 isr:
> 9,4
> topic: topic2 partition: 1 leader: 10 replicas: 10,5 isr:
> 10,5
> topic: topic2 partition: 2 leader: 1 replicas: 1,6 isr:
> 1,6
> topic: topic2 partition: 3 leader: 2 replicas: 2,7 isr:
> 2,7
> topic: topic3 partition: 0 leader: 2 replicas: 2,1 isr: 2,1
> topic: topic3 partition: 1 leader: 1 replicas: 1,2 isr: 1,2
> topic: topic3 partition: 2 leader: 2 replicas: 2,1 isr: 2,1
> topic: topic3 partition: 3 leader: 1 replicas: 1,2 isr: 1,2
>
> So what is my options to have kafka evenly distribute the topic
> partitions? Would pre creating the topics via create topic command help?
>
> Regards,
> Virendra
>