You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Virendra Pratap Singh <vp...@yahoo-inc.com.INVALID> on 2014/06/25 01:44:46 UTC

Uneven distribution of kafka topic partitions across multiple brokers

Have a kafka cluster with 10 brokers (kafka 0.8.0).  All of the brokers were setup upfront. None was added later. Default number of partition is set to 4 and default replication to 2.
Have 3 topics in the system. None of these topics are manually created upfront, when the cluster is setup. So relying on kafka to automatically create these topics when the producer(s) send data first time for each of these topics.
We have multiple producer which will emit data for all of these topics at any point of time. What it means is that kafka will be hit with producer request simultaneously from multiple producer for producer request for these 3 topics.

What is observed is the topics partitions do not get spread out evenly in this scenario. There are 10 brokers (ids 1-10) so expectation is that 3 * 4 = 12 topic partitions should be spread out on all 10 servers. However in this case the first 2 brokers share most of the load and few partitions are spread out. The same is true for the replicated instances also.

Here is the dump of list topic

topic: topic1  partition: 0    leader: 1       replicas: 1,2   isr: 1,2
topic: topic1  partition: 1    leader: 2       replicas: 2,1   isr: 2,1
topic: topic1  partition: 2    leader: 1       replicas: 1,2   isr: 1,2
topic: topic1  partition: 3    leader: 2       replicas: 2,1   isr: 2,1
topic: topic2        partition: 0    leader: 9       replicas: 9,4   isr: 9,4
topic: topic2        partition: 1    leader: 10      replicas: 10,5  isr: 10,5
topic: topic2        partition: 2    leader: 1       replicas: 1,6   isr: 1,6
topic: topic2        partition: 3    leader: 2       replicas: 2,7   isr: 2,7
topic: topic3     partition: 0    leader: 2       replicas: 2,1   isr: 2,1
topic: topic3     partition: 1    leader: 1       replicas: 1,2   isr: 1,2
topic: topic3     partition: 2    leader: 2       replicas: 2,1   isr: 2,1
topic: topic3     partition: 3    leader: 1       replicas: 1,2   isr: 1,2

So what is my options to have kafka evenly distribute the topic partitions? Would pre creating the topics via create topic command help?

Regards,
Virendra

Re: Uneven distribution of kafka topic partitions across multiple brokers

Posted by Virendra Pratap Singh <vp...@yahoo-inc.com.INVALID>.
Hi Joe,

   Thanks for the info. I am aware of the reassignment thingy. I was
trying to understand why the uneven distribution in the first place.

Regards,
Virendra

On 6/24/14, 8:41 PM, "Joe Stein" <jo...@stealth.ly> wrote:

>Take a look at
>
>bin/kafka-reassign-partitions.sh
>
>Option                                  Description
>
>------                                  -----------
>
>--broker-list <brokerlist>              The list of brokers to which the
>
>                                          partitions need to be reassigned
>in
>                                          the form "0,1,2". This is
>required
>                                          if --topics-to-move-json-file is
>
>                                          used to generate reassignment
>
>                                          configuration
>
>--execute                               Kick off the reassignment as
>specified
>                                          by the --reassignment-json-file
>
>                                          option.
>
>--generate                              Generate a candidate partition
>
>                                          reassignment configuration. Note
>
>                                          that this only generates a
>candidate
>                                          assignment, it does not execute
>it.
>--reassignment-json-file <manual        The JSON file with the partition
>
>  assignment json file path>              reassignment configurationThe
>format
>                                          to use is -
>
>                                        {"partitions":
>
>                                        [{"topic": "foo",
>
>                                          "partition": 1,
>
>                                          "replicas": [1,2,3] }],
>
>                                        "version":1
>
>                                        }
>
>--topics-to-move-json-file <topics to   Generate a reassignment
>configuration
>  reassign json file path>                to move the partitions of the
>
>                                          specified topics to the list of
>
>                                          brokers specified by the
>--broker-
>                                          list option. The format to use
>is
>-
>                                        {"topics":
>
>                                        [{"topic": "foo"},{"topic":
>"foo1"}],
>                                        "version":1
>
>                                        }
>
>--verify                                Verify if the reassignment
>completed
>                                          as specified by the
>--reassignment-
>                                          json-file option.
>
>--zookeeper <urls>                      REQUIRED: The connection string
>for
>
>                                          the zookeeper connection in the
>form
>                                          host:port. Multiple URLS can be
>
>                                          given to allow fail-over.
>
>Command must include exactly one action: --generate, --execute or --verify
>
>/*******************************************
> Joe Stein
> Founder, Principal Consultant
> Big Data Open Source Security LLC
> http://www.stealth.ly
> Twitter: @allthingshadoop <http://www.twitter.com/allthingshadoop>
>********************************************/
>
>
>On Tue, Jun 24, 2014 at 7:44 PM, Virendra Pratap Singh <
>vpsingh@yahoo-inc.com.invalid> wrote:
>
>> Have a kafka cluster with 10 brokers (kafka 0.8.0).  All of the brokers
>> were setup upfront. None was added later. Default number of partition is
>> set to 4 and default replication to 2.
>> Have 3 topics in the system. None of these topics are manually created
>> upfront, when the cluster is setup. So relying on kafka to automatically
>> create these topics when the producer(s) send data first time for each
>>of
>> these topics.
>> We have multiple producer which will emit data for all of these topics
>>at
>> any point of time. What it means is that kafka will be hit with producer
>> request simultaneously from multiple producer for producer request for
>> these 3 topics.
>>
>> What is observed is the topics partitions do not get spread out evenly
>>in
>> this scenario. There are 10 brokers (ids 1-10) so expectation is that 3
>>* 4
>> = 12 topic partitions should be spread out on all 10 servers. However in
>> this case the first 2 brokers share most of the load and few partitions
>>are
>> spread out. The same is true for the replicated instances also.
>>
>> Here is the dump of list topic
>>
>> topic: topic1  partition: 0    leader: 1       replicas: 1,2   isr: 1,2
>> topic: topic1  partition: 1    leader: 2       replicas: 2,1   isr: 2,1
>> topic: topic1  partition: 2    leader: 1       replicas: 1,2   isr: 1,2
>> topic: topic1  partition: 3    leader: 2       replicas: 2,1   isr: 2,1
>> topic: topic2        partition: 0    leader: 9       replicas: 9,4
>>isr:
>> 9,4
>> topic: topic2        partition: 1    leader: 10      replicas: 10,5
>>isr:
>> 10,5
>> topic: topic2        partition: 2    leader: 1       replicas: 1,6
>>isr:
>> 1,6
>> topic: topic2        partition: 3    leader: 2       replicas: 2,7
>>isr:
>> 2,7
>> topic: topic3     partition: 0    leader: 2       replicas: 2,1   isr:
>>2,1
>> topic: topic3     partition: 1    leader: 1       replicas: 1,2   isr:
>>1,2
>> topic: topic3     partition: 2    leader: 2       replicas: 2,1   isr:
>>2,1
>> topic: topic3     partition: 3    leader: 1       replicas: 1,2   isr:
>>1,2
>>
>> So what is my options to have kafka evenly distribute the topic
>> partitions? Would pre creating the topics via create topic command help?
>>
>> Regards,
>> Virendra
>>


Re: Uneven distribution of kafka topic partitions across multiple brokers

Posted by Joe Stein <jo...@stealth.ly>.
Take a look at

bin/kafka-reassign-partitions.sh

Option                                  Description

------                                  -----------

--broker-list <brokerlist>              The list of brokers to which the

                                          partitions need to be reassigned
in
                                          the form "0,1,2". This is
required
                                          if --topics-to-move-json-file is

                                          used to generate reassignment

                                          configuration

--execute                               Kick off the reassignment as
specified
                                          by the --reassignment-json-file

                                          option.

--generate                              Generate a candidate partition

                                          reassignment configuration. Note

                                          that this only generates a
candidate
                                          assignment, it does not execute
it.
--reassignment-json-file <manual        The JSON file with the partition

  assignment json file path>              reassignment configurationThe
format
                                          to use is -

                                        {"partitions":

                                        [{"topic": "foo",

                                          "partition": 1,

                                          "replicas": [1,2,3] }],

                                        "version":1

                                        }

--topics-to-move-json-file <topics to   Generate a reassignment
configuration
  reassign json file path>                to move the partitions of the

                                          specified topics to the list of

                                          brokers specified by the
--broker-
                                          list option. The format to use is
-
                                        {"topics":

                                        [{"topic": "foo"},{"topic":
"foo1"}],
                                        "version":1

                                        }

--verify                                Verify if the reassignment
completed
                                          as specified by the
--reassignment-
                                          json-file option.

--zookeeper <urls>                      REQUIRED: The connection string for

                                          the zookeeper connection in the
form
                                          host:port. Multiple URLS can be

                                          given to allow fail-over.

Command must include exactly one action: --generate, --execute or --verify

/*******************************************
 Joe Stein
 Founder, Principal Consultant
 Big Data Open Source Security LLC
 http://www.stealth.ly
 Twitter: @allthingshadoop <http://www.twitter.com/allthingshadoop>
********************************************/


On Tue, Jun 24, 2014 at 7:44 PM, Virendra Pratap Singh <
vpsingh@yahoo-inc.com.invalid> wrote:

> Have a kafka cluster with 10 brokers (kafka 0.8.0).  All of the brokers
> were setup upfront. None was added later. Default number of partition is
> set to 4 and default replication to 2.
> Have 3 topics in the system. None of these topics are manually created
> upfront, when the cluster is setup. So relying on kafka to automatically
> create these topics when the producer(s) send data first time for each of
> these topics.
> We have multiple producer which will emit data for all of these topics at
> any point of time. What it means is that kafka will be hit with producer
> request simultaneously from multiple producer for producer request for
> these 3 topics.
>
> What is observed is the topics partitions do not get spread out evenly in
> this scenario. There are 10 brokers (ids 1-10) so expectation is that 3 * 4
> = 12 topic partitions should be spread out on all 10 servers. However in
> this case the first 2 brokers share most of the load and few partitions are
> spread out. The same is true for the replicated instances also.
>
> Here is the dump of list topic
>
> topic: topic1  partition: 0    leader: 1       replicas: 1,2   isr: 1,2
> topic: topic1  partition: 1    leader: 2       replicas: 2,1   isr: 2,1
> topic: topic1  partition: 2    leader: 1       replicas: 1,2   isr: 1,2
> topic: topic1  partition: 3    leader: 2       replicas: 2,1   isr: 2,1
> topic: topic2        partition: 0    leader: 9       replicas: 9,4   isr:
> 9,4
> topic: topic2        partition: 1    leader: 10      replicas: 10,5  isr:
> 10,5
> topic: topic2        partition: 2    leader: 1       replicas: 1,6   isr:
> 1,6
> topic: topic2        partition: 3    leader: 2       replicas: 2,7   isr:
> 2,7
> topic: topic3     partition: 0    leader: 2       replicas: 2,1   isr: 2,1
> topic: topic3     partition: 1    leader: 1       replicas: 1,2   isr: 1,2
> topic: topic3     partition: 2    leader: 2       replicas: 2,1   isr: 2,1
> topic: topic3     partition: 3    leader: 1       replicas: 1,2   isr: 1,2
>
> So what is my options to have kafka evenly distribute the topic
> partitions? Would pre creating the topics via create topic command help?
>
> Regards,
> Virendra
>

Re: Uneven distribution of kafka topic partitions across multiple brokers

Posted by Neha Narkhede <ne...@gmail.com>.
Cool. Thanks for circling back with the verification.


On Wed, Jun 25, 2014 at 2:49 PM, Virendra Pratap Singh <
vpsingh@yahoo-inc.com.invalid> wrote:

> Hi Neha,
>
>     You are correct. I checked the controller.log and found that even
> though I had assumed that the producers were started after whole kafka
> cluster, that was not true.
> And topic1 and topic3 creation request came in only when broker 1 and 2
> were alive. And then in split second all the other 8 brokers were up,
> followed by topic2 creation request (and so for that we see even
> distribution).
>
> Regards,
> Virendra
>
> On 6/24/14, 11:06 PM, "Neha Narkhede" <ne...@gmail.com> wrote:
>
> >Looking at the output of list topics, here is what I think happened. When
> >topic1 and topic3 were created, only brokers 1&2 were online and alive.
> >When topic2 was created, almost all brokers were online. Only brokers that
> >are alive at the time of topic creation can be assigned replicas for the
> >topic. I would suggest ensuring that all brokers are alive and repeating
> >the experiment with 0.8.1.1, which is the latest stable release.
> >
> >Thanks,
> >Neha
> >
> >
> >On Tue, Jun 24, 2014 at 4:44 PM, Virendra Pratap Singh <
> >vpsingh@yahoo-inc.com.invalid> wrote:
> >
> >> Have a kafka cluster with 10 brokers (kafka 0.8.0).  All of the brokers
> >> were setup upfront. None was added later. Default number of partition is
> >> set to 4 and default replication to 2.
> >> Have 3 topics in the system. None of these topics are manually created
> >> upfront, when the cluster is setup. So relying on kafka to automatically
> >> create these topics when the producer(s) send data first time for each
> >>of
> >> these topics.
> >> We have multiple producer which will emit data for all of these topics
> >>at
> >> any point of time. What it means is that kafka will be hit with producer
> >> request simultaneously from multiple producer for producer request for
> >> these 3 topics.
> >>
> >> What is observed is the topics partitions do not get spread out evenly
> >>in
> >> this scenario. There are 10 brokers (ids 1-10) so expectation is that 3
> >>* 4
> >> = 12 topic partitions should be spread out on all 10 servers. However in
> >> this case the first 2 brokers share most of the load and few partitions
> >>are
> >> spread out. The same is true for the replicated instances also.
> >>
> >> Here is the dump of list topic
> >>
> >> topic: topic1  partition: 0    leader: 1       replicas: 1,2   isr: 1,2
> >> topic: topic1  partition: 1    leader: 2       replicas: 2,1   isr: 2,1
> >> topic: topic1  partition: 2    leader: 1       replicas: 1,2   isr: 1,2
> >> topic: topic1  partition: 3    leader: 2       replicas: 2,1   isr: 2,1
> >> topic: topic2        partition: 0    leader: 9       replicas: 9,4
> >>isr:
> >> 9,4
> >> topic: topic2        partition: 1    leader: 10      replicas: 10,5
> >>isr:
> >> 10,5
> >> topic: topic2        partition: 2    leader: 1       replicas: 1,6
> >>isr:
> >> 1,6
> >> topic: topic2        partition: 3    leader: 2       replicas: 2,7
> >>isr:
> >> 2,7
> >> topic: topic3     partition: 0    leader: 2       replicas: 2,1   isr:
> >>2,1
> >> topic: topic3     partition: 1    leader: 1       replicas: 1,2   isr:
> >>1,2
> >> topic: topic3     partition: 2    leader: 2       replicas: 2,1   isr:
> >>2,1
> >> topic: topic3     partition: 3    leader: 1       replicas: 1,2   isr:
> >>1,2
> >>
> >> So what is my options to have kafka evenly distribute the topic
> >> partitions? Would pre creating the topics via create topic command help?
> >>
> >> Regards,
> >> Virendra
> >>
>
>

Re: Uneven distribution of kafka topic partitions across multiple brokers

Posted by Virendra Pratap Singh <vp...@yahoo-inc.com.INVALID>.
Hi Neha,

    You are correct. I checked the controller.log and found that even
though I had assumed that the producers were started after whole kafka
cluster, that was not true.
And topic1 and topic3 creation request came in only when broker 1 and 2
were alive. And then in split second all the other 8 brokers were up,
followed by topic2 creation request (and so for that we see even
distribution).
  
Regards,
Virendra

On 6/24/14, 11:06 PM, "Neha Narkhede" <ne...@gmail.com> wrote:

>Looking at the output of list topics, here is what I think happened. When
>topic1 and topic3 were created, only brokers 1&2 were online and alive.
>When topic2 was created, almost all brokers were online. Only brokers that
>are alive at the time of topic creation can be assigned replicas for the
>topic. I would suggest ensuring that all brokers are alive and repeating
>the experiment with 0.8.1.1, which is the latest stable release.
>
>Thanks,
>Neha
>
>
>On Tue, Jun 24, 2014 at 4:44 PM, Virendra Pratap Singh <
>vpsingh@yahoo-inc.com.invalid> wrote:
>
>> Have a kafka cluster with 10 brokers (kafka 0.8.0).  All of the brokers
>> were setup upfront. None was added later. Default number of partition is
>> set to 4 and default replication to 2.
>> Have 3 topics in the system. None of these topics are manually created
>> upfront, when the cluster is setup. So relying on kafka to automatically
>> create these topics when the producer(s) send data first time for each
>>of
>> these topics.
>> We have multiple producer which will emit data for all of these topics
>>at
>> any point of time. What it means is that kafka will be hit with producer
>> request simultaneously from multiple producer for producer request for
>> these 3 topics.
>>
>> What is observed is the topics partitions do not get spread out evenly
>>in
>> this scenario. There are 10 brokers (ids 1-10) so expectation is that 3
>>* 4
>> = 12 topic partitions should be spread out on all 10 servers. However in
>> this case the first 2 brokers share most of the load and few partitions
>>are
>> spread out. The same is true for the replicated instances also.
>>
>> Here is the dump of list topic
>>
>> topic: topic1  partition: 0    leader: 1       replicas: 1,2   isr: 1,2
>> topic: topic1  partition: 1    leader: 2       replicas: 2,1   isr: 2,1
>> topic: topic1  partition: 2    leader: 1       replicas: 1,2   isr: 1,2
>> topic: topic1  partition: 3    leader: 2       replicas: 2,1   isr: 2,1
>> topic: topic2        partition: 0    leader: 9       replicas: 9,4
>>isr:
>> 9,4
>> topic: topic2        partition: 1    leader: 10      replicas: 10,5
>>isr:
>> 10,5
>> topic: topic2        partition: 2    leader: 1       replicas: 1,6
>>isr:
>> 1,6
>> topic: topic2        partition: 3    leader: 2       replicas: 2,7
>>isr:
>> 2,7
>> topic: topic3     partition: 0    leader: 2       replicas: 2,1   isr:
>>2,1
>> topic: topic3     partition: 1    leader: 1       replicas: 1,2   isr:
>>1,2
>> topic: topic3     partition: 2    leader: 2       replicas: 2,1   isr:
>>2,1
>> topic: topic3     partition: 3    leader: 1       replicas: 1,2   isr:
>>1,2
>>
>> So what is my options to have kafka evenly distribute the topic
>> partitions? Would pre creating the topics via create topic command help?
>>
>> Regards,
>> Virendra
>>


Re: Uneven distribution of kafka topic partitions across multiple brokers

Posted by Neha Narkhede <ne...@gmail.com>.
Looking at the output of list topics, here is what I think happened. When
topic1 and topic3 were created, only brokers 1&2 were online and alive.
When topic2 was created, almost all brokers were online. Only brokers that
are alive at the time of topic creation can be assigned replicas for the
topic. I would suggest ensuring that all brokers are alive and repeating
the experiment with 0.8.1.1, which is the latest stable release.

Thanks,
Neha


On Tue, Jun 24, 2014 at 4:44 PM, Virendra Pratap Singh <
vpsingh@yahoo-inc.com.invalid> wrote:

> Have a kafka cluster with 10 brokers (kafka 0.8.0).  All of the brokers
> were setup upfront. None was added later. Default number of partition is
> set to 4 and default replication to 2.
> Have 3 topics in the system. None of these topics are manually created
> upfront, when the cluster is setup. So relying on kafka to automatically
> create these topics when the producer(s) send data first time for each of
> these topics.
> We have multiple producer which will emit data for all of these topics at
> any point of time. What it means is that kafka will be hit with producer
> request simultaneously from multiple producer for producer request for
> these 3 topics.
>
> What is observed is the topics partitions do not get spread out evenly in
> this scenario. There are 10 brokers (ids 1-10) so expectation is that 3 * 4
> = 12 topic partitions should be spread out on all 10 servers. However in
> this case the first 2 brokers share most of the load and few partitions are
> spread out. The same is true for the replicated instances also.
>
> Here is the dump of list topic
>
> topic: topic1  partition: 0    leader: 1       replicas: 1,2   isr: 1,2
> topic: topic1  partition: 1    leader: 2       replicas: 2,1   isr: 2,1
> topic: topic1  partition: 2    leader: 1       replicas: 1,2   isr: 1,2
> topic: topic1  partition: 3    leader: 2       replicas: 2,1   isr: 2,1
> topic: topic2        partition: 0    leader: 9       replicas: 9,4   isr:
> 9,4
> topic: topic2        partition: 1    leader: 10      replicas: 10,5  isr:
> 10,5
> topic: topic2        partition: 2    leader: 1       replicas: 1,6   isr:
> 1,6
> topic: topic2        partition: 3    leader: 2       replicas: 2,7   isr:
> 2,7
> topic: topic3     partition: 0    leader: 2       replicas: 2,1   isr: 2,1
> topic: topic3     partition: 1    leader: 1       replicas: 1,2   isr: 1,2
> topic: topic3     partition: 2    leader: 2       replicas: 2,1   isr: 2,1
> topic: topic3     partition: 3    leader: 1       replicas: 1,2   isr: 1,2
>
> So what is my options to have kafka evenly distribute the topic
> partitions? Would pre creating the topics via create topic command help?
>
> Regards,
> Virendra
>