You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Alexis Midon <al...@airbedandbreakfast.com> on 2014/09/03 09:12:00 UTC

Re: Migrating data from old brokers to new borkers question

Hi Marcin,

A few weeks ago, I did an upgrade to 0.8.1.1 and then augmented the cluster
from 3 to 9 brokers. All went smoothly.
In a dev environment, we found out that the biggest pain point is to have
to deal with the json file and the error-prone command line interface.
So to make our life easier, my team mate Nelson [1] came up with kafkat:
https://github.com/airbnb/kafkat

We now install kafkat on every broker. Note that kafkat does NOT connect to
a broker, but to zookeeper. So you can actually use it from any machine.

For reassignment, please see:
`kafkat reassign [topic] [--brokers <ids>] [--replicas <n>] `
It will transparently generate and kick off a balanced assignment.

feedback and contributions welcome! Enjoy!

Alexis

[1] https://github.com/nelgau



On Tue, Aug 26, 2014 at 10:27 AM, Marcin Michalski <mm...@tagged.com>
wrote:

> I am running on 0.8.1.1 and I thought that the partition reassignment tools
> can do this job. Just was not sure if this is the best way to do this.
> I will try this out in stage env first and will perform the same in prod.
>
> Thanks,
> marcin
>
>
> On Mon, Aug 25, 2014 at 7:23 PM, Joe Stein <jo...@stealth.ly> wrote:
>
> > Marcin, that is a typical task now.  What version of Kafka are you
> running?
> >
> > Take a look at
> > https://kafka.apache.org/documentation.html#basic_ops_cluster_expansion
> > and
> >
> >
> https://kafka.apache.org/documentation.html#basic_ops_increase_replication_factor
> >
> > Basically you can do a --generate to get existing JSON topology and with
> > that take the results of "Current partition replica assignment" (the
> first
> > JSON that outputs) and make whatever changes (like sed old node for new
> > node and add more replica's which increase the replication factor,
> whatever
> > you want) and then --execute.
> >
> > With lots of data this takes time so you will want to run --verify to see
> > what is in progress... good thing do a node at a time (even topic at a
> > time) however you want to manage and wait for it as such.
> >
> > The "preferred" replica is simply the first one in the list of replicas.
> >  The kafka-preferred-replica-election.sh just makes that replica the
> leader
> > as this is not automatic yet.
> >
> > If you are running a version prior to 0.8.1.1 it might make sense to
> > upgrade the old nodes first then run reassign to the new servers.
> >
> >
> > /*******************************************
> >  Joe Stein
> >  Founder, Principal Consultant
> >  Big Data Open Source Security LLC
> >  http://www.stealth.ly
> >  Twitter: @allthingshadoop <http://www.twitter.com/allthingshadoop>
> > ********************************************/
> >
> >
> > On Mon, Aug 25, 2014 at 8:59 PM, Marcin Michalski <mmichalski@tagged.com
> >
> > wrote:
> >
> > > Hi, I would like to migrate my Kafka setup from old servers to new
> > servers.
> > > Let say I have 8 really old servers that have the kafka
> topics/partitions
> > > replicated 4 ways and want to migrate the data to 4 brand new servers
> and
> > > want the replication factor be 3. I wonder if anyone has ever performed
> > > this type of migration?
> > >
> > > Will auto rebalancing take care of this automatically if I do the
> > > following?
> > >
> > > Let say I bring down old broker id 1 down and startup new server broker
> > id
> > > 100 up, is there a way to migrate all of the data of the topic that had
> > the
> > > topic (where borker id 1 was the leader) over to the new broker 100?
> > >
> > > Or do I need to use *bin/kafka-preferred-replica-election.sh *to
> reassign
> > > the topics/partitions from old broker 1 to broker 100? And then just
> keep
> > > doing the same thing until all of the old brokers are decommissioned?
> > >
> > > Also, would kafka-preferred-replica-election.sh let me actually lower
> the
> > > number of replicas as well, if I just simply make sure that given
> > > topic/partition was only elected 3 times versus 4?
> > >
> > > Thanks for your insight,
> > > Marcin
> > >
> >
>

Re: Migrating data from old brokers to new borkers question

Posted by Neha Narkhede <ne...@gmail.com>.
The idea is to bake the functionality of such a tool in Kafka itself. In an
ideal world, a Kafka cluster would automatically detect leader and data
imbalance and trigger a rebalance operation that leads to optimal
performance. I'm not sure if we have a JIRA for this though. So feel free
to create one.

On Wed, Sep 17, 2014 at 5:51 PM, Alexis Midon <
alexis.midon@airbedandbreakfast.com> wrote:

> we would be very happy to contribute. However a description of the current
> plan and status regarding tooling would be helpful.
> It would speed up the learning curve. You mentioned some jira tickets?
>
> (maybe I should sign up to the developer mailing list and take the
> conversation over there)
>
> On Tue, Sep 16, 2014 at 6:46 PM, Gwen Shapira <gs...@cloudera.com>
> wrote:
>
> > Since these tools are so useful, I wonder what it requires (from both
> > Airbnb and Kafka) to merge this into Kafka project. I think there are
> > couple of Jira regarding improved tool usability that this resolved.
> >
> > On Mon, Sep 15, 2014 at 11:45 AM, Alexis Midon
> > <al...@airbedandbreakfast.com> wrote:
> > > distribution will be even based on the number of partitions.
> > > It is the same logic as AdminUtils.
> > > see
> > >
> >
> https://github.com/airbnb/kafkat/blob/master/lib/kafkat/command/reassign.rb#L39
> > >
> > > On Sun, Sep 14, 2014 at 6:05 PM, Neha Narkhede <
> neha.narkhede@gmail.com>
> > > wrote:
> > >
> > >> This is great. Thanks for sharing! Does kafkat automatically figure
> out
> > the
> > >> right reassignment strategy based on even data distribution?
> > >>
> > >> On Wed, Sep 3, 2014 at 12:12 AM, Alexis Midon <
> > >> alexis.midon@airbedandbreakfast.com> wrote:
> > >>
> > >> > Hi Marcin,
> > >> >
> > >> > A few weeks ago, I did an upgrade to 0.8.1.1 and then augmented the
> > >> cluster
> > >> > from 3 to 9 brokers. All went smoothly.
> > >> > In a dev environment, we found out that the biggest pain point is to
> > have
> > >> > to deal with the json file and the error-prone command line
> interface.
> > >> > So to make our life easier, my team mate Nelson [1] came up with
> > kafkat:
> > >> > https://github.com/airbnb/kafkat
> > >> >
> > >> > We now install kafkat on every broker. Note that kafkat does NOT
> > connect
> > >> to
> > >> > a broker, but to zookeeper. So you can actually use it from any
> > machine.
> > >> >
> > >> > For reassignment, please see:
> > >> > `kafkat reassign [topic] [--brokers <ids>] [--replicas <n>] `
> > >> > It will transparently generate and kick off a balanced assignment.
> > >> >
> > >> > feedback and contributions welcome! Enjoy!
> > >> >
> > >> > Alexis
> > >> >
> > >> > [1] https://github.com/nelgau
> > >> >
> > >> >
> > >> >
> > >> > On Tue, Aug 26, 2014 at 10:27 AM, Marcin Michalski <
> > >> mmichalski@tagged.com>
> > >> > wrote:
> > >> >
> > >> > > I am running on 0.8.1.1 and I thought that the partition
> > reassignment
> > >> > tools
> > >> > > can do this job. Just was not sure if this is the best way to do
> > this.
> > >> > > I will try this out in stage env first and will perform the same
> in
> > >> prod.
> > >> > >
> > >> > > Thanks,
> > >> > > marcin
> > >> > >
> > >> > >
> > >> > > On Mon, Aug 25, 2014 at 7:23 PM, Joe Stein <jo...@stealth.ly>
> > >> wrote:
> > >> > >
> > >> > > > Marcin, that is a typical task now.  What version of Kafka are
> you
> > >> > > running?
> > >> > > >
> > >> > > > Take a look at
> > >> > > >
> > >> >
> > https://kafka.apache.org/documentation.html#basic_ops_cluster_expansion
> > >> > > > and
> > >> > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> https://kafka.apache.org/documentation.html#basic_ops_increase_replication_factor
> > >> > > >
> > >> > > > Basically you can do a --generate to get existing JSON topology
> > and
> > >> > with
> > >> > > > that take the results of "Current partition replica assignment"
> > (the
> > >> > > first
> > >> > > > JSON that outputs) and make whatever changes (like sed old node
> > for
> > >> new
> > >> > > > node and add more replica's which increase the replication
> factor,
> > >> > > whatever
> > >> > > > you want) and then --execute.
> > >> > > >
> > >> > > > With lots of data this takes time so you will want to run
> > --verify to
> > >> > see
> > >> > > > what is in progress... good thing do a node at a time (even
> topic
> > at
> > >> a
> > >> > > > time) however you want to manage and wait for it as such.
> > >> > > >
> > >> > > > The "preferred" replica is simply the first one in the list of
> > >> > replicas.
> > >> > > >  The kafka-preferred-replica-election.sh just makes that replica
> > the
> > >> > > leader
> > >> > > > as this is not automatic yet.
> > >> > > >
> > >> > > > If you are running a version prior to 0.8.1.1 it might make
> sense
> > to
> > >> > > > upgrade the old nodes first then run reassign to the new
> servers.
> > >> > > >
> > >> > > >
> > >> > > > /*******************************************
> > >> > > >  Joe Stein
> > >> > > >  Founder, Principal Consultant
> > >> > > >  Big Data Open Source Security LLC
> > >> > > >  http://www.stealth.ly
> > >> > > >  Twitter: @allthingshadoop <
> > http://www.twitter.com/allthingshadoop>
> > >> > > > ********************************************/
> > >> > > >
> > >> > > >
> > >> > > > On Mon, Aug 25, 2014 at 8:59 PM, Marcin Michalski <
> > >> > mmichalski@tagged.com
> > >> > > >
> > >> > > > wrote:
> > >> > > >
> > >> > > > > Hi, I would like to migrate my Kafka setup from old servers to
> > new
> > >> > > > servers.
> > >> > > > > Let say I have 8 really old servers that have the kafka
> > >> > > topics/partitions
> > >> > > > > replicated 4 ways and want to migrate the data to 4 brand new
> > >> servers
> > >> > > and
> > >> > > > > want the replication factor be 3. I wonder if anyone has ever
> > >> > performed
> > >> > > > > this type of migration?
> > >> > > > >
> > >> > > > > Will auto rebalancing take care of this automatically if I do
> > the
> > >> > > > > following?
> > >> > > > >
> > >> > > > > Let say I bring down old broker id 1 down and startup new
> server
> > >> > broker
> > >> > > > id
> > >> > > > > 100 up, is there a way to migrate all of the data of the topic
> > that
> > >> > had
> > >> > > > the
> > >> > > > > topic (where borker id 1 was the leader) over to the new
> broker
> > >> 100?
> > >> > > > >
> > >> > > > > Or do I need to use *bin/kafka-preferred-replica-election.sh
> *to
> > >> > > reassign
> > >> > > > > the topics/partitions from old broker 1 to broker 100? And
> then
> > >> just
> > >> > > keep
> > >> > > > > doing the same thing until all of the old brokers are
> > >> decommissioned?
> > >> > > > >
> > >> > > > > Also, would kafka-preferred-replica-election.sh let me
> actually
> > >> lower
> > >> > > the
> > >> > > > > number of replicas as well, if I just simply make sure that
> > given
> > >> > > > > topic/partition was only elected 3 times versus 4?
> > >> > > > >
> > >> > > > > Thanks for your insight,
> > >> > > > > Marcin
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
>

Re: Migrating data from old brokers to new borkers question

Posted by Alexis Midon <al...@airbedandbreakfast.com>.
we would be very happy to contribute. However a description of the current
plan and status regarding tooling would be helpful.
It would speed up the learning curve. You mentioned some jira tickets?

(maybe I should sign up to the developer mailing list and take the
conversation over there)

On Tue, Sep 16, 2014 at 6:46 PM, Gwen Shapira <gs...@cloudera.com> wrote:

> Since these tools are so useful, I wonder what it requires (from both
> Airbnb and Kafka) to merge this into Kafka project. I think there are
> couple of Jira regarding improved tool usability that this resolved.
>
> On Mon, Sep 15, 2014 at 11:45 AM, Alexis Midon
> <al...@airbedandbreakfast.com> wrote:
> > distribution will be even based on the number of partitions.
> > It is the same logic as AdminUtils.
> > see
> >
> https://github.com/airbnb/kafkat/blob/master/lib/kafkat/command/reassign.rb#L39
> >
> > On Sun, Sep 14, 2014 at 6:05 PM, Neha Narkhede <ne...@gmail.com>
> > wrote:
> >
> >> This is great. Thanks for sharing! Does kafkat automatically figure out
> the
> >> right reassignment strategy based on even data distribution?
> >>
> >> On Wed, Sep 3, 2014 at 12:12 AM, Alexis Midon <
> >> alexis.midon@airbedandbreakfast.com> wrote:
> >>
> >> > Hi Marcin,
> >> >
> >> > A few weeks ago, I did an upgrade to 0.8.1.1 and then augmented the
> >> cluster
> >> > from 3 to 9 brokers. All went smoothly.
> >> > In a dev environment, we found out that the biggest pain point is to
> have
> >> > to deal with the json file and the error-prone command line interface.
> >> > So to make our life easier, my team mate Nelson [1] came up with
> kafkat:
> >> > https://github.com/airbnb/kafkat
> >> >
> >> > We now install kafkat on every broker. Note that kafkat does NOT
> connect
> >> to
> >> > a broker, but to zookeeper. So you can actually use it from any
> machine.
> >> >
> >> > For reassignment, please see:
> >> > `kafkat reassign [topic] [--brokers <ids>] [--replicas <n>] `
> >> > It will transparently generate and kick off a balanced assignment.
> >> >
> >> > feedback and contributions welcome! Enjoy!
> >> >
> >> > Alexis
> >> >
> >> > [1] https://github.com/nelgau
> >> >
> >> >
> >> >
> >> > On Tue, Aug 26, 2014 at 10:27 AM, Marcin Michalski <
> >> mmichalski@tagged.com>
> >> > wrote:
> >> >
> >> > > I am running on 0.8.1.1 and I thought that the partition
> reassignment
> >> > tools
> >> > > can do this job. Just was not sure if this is the best way to do
> this.
> >> > > I will try this out in stage env first and will perform the same in
> >> prod.
> >> > >
> >> > > Thanks,
> >> > > marcin
> >> > >
> >> > >
> >> > > On Mon, Aug 25, 2014 at 7:23 PM, Joe Stein <jo...@stealth.ly>
> >> wrote:
> >> > >
> >> > > > Marcin, that is a typical task now.  What version of Kafka are you
> >> > > running?
> >> > > >
> >> > > > Take a look at
> >> > > >
> >> >
> https://kafka.apache.org/documentation.html#basic_ops_cluster_expansion
> >> > > > and
> >> > > >
> >> > > >
> >> > >
> >> >
> >>
> https://kafka.apache.org/documentation.html#basic_ops_increase_replication_factor
> >> > > >
> >> > > > Basically you can do a --generate to get existing JSON topology
> and
> >> > with
> >> > > > that take the results of "Current partition replica assignment"
> (the
> >> > > first
> >> > > > JSON that outputs) and make whatever changes (like sed old node
> for
> >> new
> >> > > > node and add more replica's which increase the replication factor,
> >> > > whatever
> >> > > > you want) and then --execute.
> >> > > >
> >> > > > With lots of data this takes time so you will want to run
> --verify to
> >> > see
> >> > > > what is in progress... good thing do a node at a time (even topic
> at
> >> a
> >> > > > time) however you want to manage and wait for it as such.
> >> > > >
> >> > > > The "preferred" replica is simply the first one in the list of
> >> > replicas.
> >> > > >  The kafka-preferred-replica-election.sh just makes that replica
> the
> >> > > leader
> >> > > > as this is not automatic yet.
> >> > > >
> >> > > > If you are running a version prior to 0.8.1.1 it might make sense
> to
> >> > > > upgrade the old nodes first then run reassign to the new servers.
> >> > > >
> >> > > >
> >> > > > /*******************************************
> >> > > >  Joe Stein
> >> > > >  Founder, Principal Consultant
> >> > > >  Big Data Open Source Security LLC
> >> > > >  http://www.stealth.ly
> >> > > >  Twitter: @allthingshadoop <
> http://www.twitter.com/allthingshadoop>
> >> > > > ********************************************/
> >> > > >
> >> > > >
> >> > > > On Mon, Aug 25, 2014 at 8:59 PM, Marcin Michalski <
> >> > mmichalski@tagged.com
> >> > > >
> >> > > > wrote:
> >> > > >
> >> > > > > Hi, I would like to migrate my Kafka setup from old servers to
> new
> >> > > > servers.
> >> > > > > Let say I have 8 really old servers that have the kafka
> >> > > topics/partitions
> >> > > > > replicated 4 ways and want to migrate the data to 4 brand new
> >> servers
> >> > > and
> >> > > > > want the replication factor be 3. I wonder if anyone has ever
> >> > performed
> >> > > > > this type of migration?
> >> > > > >
> >> > > > > Will auto rebalancing take care of this automatically if I do
> the
> >> > > > > following?
> >> > > > >
> >> > > > > Let say I bring down old broker id 1 down and startup new server
> >> > broker
> >> > > > id
> >> > > > > 100 up, is there a way to migrate all of the data of the topic
> that
> >> > had
> >> > > > the
> >> > > > > topic (where borker id 1 was the leader) over to the new broker
> >> 100?
> >> > > > >
> >> > > > > Or do I need to use *bin/kafka-preferred-replica-election.sh *to
> >> > > reassign
> >> > > > > the topics/partitions from old broker 1 to broker 100? And then
> >> just
> >> > > keep
> >> > > > > doing the same thing until all of the old brokers are
> >> decommissioned?
> >> > > > >
> >> > > > > Also, would kafka-preferred-replica-election.sh let me actually
> >> lower
> >> > > the
> >> > > > > number of replicas as well, if I just simply make sure that
> given
> >> > > > > topic/partition was only elected 3 times versus 4?
> >> > > > >
> >> > > > > Thanks for your insight,
> >> > > > > Marcin
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
>

Re: Migrating data from old brokers to new borkers question

Posted by Gwen Shapira <gs...@cloudera.com>.
Since these tools are so useful, I wonder what it requires (from both
Airbnb and Kafka) to merge this into Kafka project. I think there are
couple of Jira regarding improved tool usability that this resolved.

On Mon, Sep 15, 2014 at 11:45 AM, Alexis Midon
<al...@airbedandbreakfast.com> wrote:
> distribution will be even based on the number of partitions.
> It is the same logic as AdminUtils.
> see
> https://github.com/airbnb/kafkat/blob/master/lib/kafkat/command/reassign.rb#L39
>
> On Sun, Sep 14, 2014 at 6:05 PM, Neha Narkhede <ne...@gmail.com>
> wrote:
>
>> This is great. Thanks for sharing! Does kafkat automatically figure out the
>> right reassignment strategy based on even data distribution?
>>
>> On Wed, Sep 3, 2014 at 12:12 AM, Alexis Midon <
>> alexis.midon@airbedandbreakfast.com> wrote:
>>
>> > Hi Marcin,
>> >
>> > A few weeks ago, I did an upgrade to 0.8.1.1 and then augmented the
>> cluster
>> > from 3 to 9 brokers. All went smoothly.
>> > In a dev environment, we found out that the biggest pain point is to have
>> > to deal with the json file and the error-prone command line interface.
>> > So to make our life easier, my team mate Nelson [1] came up with kafkat:
>> > https://github.com/airbnb/kafkat
>> >
>> > We now install kafkat on every broker. Note that kafkat does NOT connect
>> to
>> > a broker, but to zookeeper. So you can actually use it from any machine.
>> >
>> > For reassignment, please see:
>> > `kafkat reassign [topic] [--brokers <ids>] [--replicas <n>] `
>> > It will transparently generate and kick off a balanced assignment.
>> >
>> > feedback and contributions welcome! Enjoy!
>> >
>> > Alexis
>> >
>> > [1] https://github.com/nelgau
>> >
>> >
>> >
>> > On Tue, Aug 26, 2014 at 10:27 AM, Marcin Michalski <
>> mmichalski@tagged.com>
>> > wrote:
>> >
>> > > I am running on 0.8.1.1 and I thought that the partition reassignment
>> > tools
>> > > can do this job. Just was not sure if this is the best way to do this.
>> > > I will try this out in stage env first and will perform the same in
>> prod.
>> > >
>> > > Thanks,
>> > > marcin
>> > >
>> > >
>> > > On Mon, Aug 25, 2014 at 7:23 PM, Joe Stein <jo...@stealth.ly>
>> wrote:
>> > >
>> > > > Marcin, that is a typical task now.  What version of Kafka are you
>> > > running?
>> > > >
>> > > > Take a look at
>> > > >
>> > https://kafka.apache.org/documentation.html#basic_ops_cluster_expansion
>> > > > and
>> > > >
>> > > >
>> > >
>> >
>> https://kafka.apache.org/documentation.html#basic_ops_increase_replication_factor
>> > > >
>> > > > Basically you can do a --generate to get existing JSON topology and
>> > with
>> > > > that take the results of "Current partition replica assignment" (the
>> > > first
>> > > > JSON that outputs) and make whatever changes (like sed old node for
>> new
>> > > > node and add more replica's which increase the replication factor,
>> > > whatever
>> > > > you want) and then --execute.
>> > > >
>> > > > With lots of data this takes time so you will want to run --verify to
>> > see
>> > > > what is in progress... good thing do a node at a time (even topic at
>> a
>> > > > time) however you want to manage and wait for it as such.
>> > > >
>> > > > The "preferred" replica is simply the first one in the list of
>> > replicas.
>> > > >  The kafka-preferred-replica-election.sh just makes that replica the
>> > > leader
>> > > > as this is not automatic yet.
>> > > >
>> > > > If you are running a version prior to 0.8.1.1 it might make sense to
>> > > > upgrade the old nodes first then run reassign to the new servers.
>> > > >
>> > > >
>> > > > /*******************************************
>> > > >  Joe Stein
>> > > >  Founder, Principal Consultant
>> > > >  Big Data Open Source Security LLC
>> > > >  http://www.stealth.ly
>> > > >  Twitter: @allthingshadoop <http://www.twitter.com/allthingshadoop>
>> > > > ********************************************/
>> > > >
>> > > >
>> > > > On Mon, Aug 25, 2014 at 8:59 PM, Marcin Michalski <
>> > mmichalski@tagged.com
>> > > >
>> > > > wrote:
>> > > >
>> > > > > Hi, I would like to migrate my Kafka setup from old servers to new
>> > > > servers.
>> > > > > Let say I have 8 really old servers that have the kafka
>> > > topics/partitions
>> > > > > replicated 4 ways and want to migrate the data to 4 brand new
>> servers
>> > > and
>> > > > > want the replication factor be 3. I wonder if anyone has ever
>> > performed
>> > > > > this type of migration?
>> > > > >
>> > > > > Will auto rebalancing take care of this automatically if I do the
>> > > > > following?
>> > > > >
>> > > > > Let say I bring down old broker id 1 down and startup new server
>> > broker
>> > > > id
>> > > > > 100 up, is there a way to migrate all of the data of the topic that
>> > had
>> > > > the
>> > > > > topic (where borker id 1 was the leader) over to the new broker
>> 100?
>> > > > >
>> > > > > Or do I need to use *bin/kafka-preferred-replica-election.sh *to
>> > > reassign
>> > > > > the topics/partitions from old broker 1 to broker 100? And then
>> just
>> > > keep
>> > > > > doing the same thing until all of the old brokers are
>> decommissioned?
>> > > > >
>> > > > > Also, would kafka-preferred-replica-election.sh let me actually
>> lower
>> > > the
>> > > > > number of replicas as well, if I just simply make sure that given
>> > > > > topic/partition was only elected 3 times versus 4?
>> > > > >
>> > > > > Thanks for your insight,
>> > > > > Marcin
>> > > > >
>> > > >
>> > >
>> >
>>

Re: Migrating data from old brokers to new borkers question

Posted by Alexis Midon <al...@airbedandbreakfast.com>.
distribution will be even based on the number of partitions.
It is the same logic as AdminUtils.
see
https://github.com/airbnb/kafkat/blob/master/lib/kafkat/command/reassign.rb#L39

On Sun, Sep 14, 2014 at 6:05 PM, Neha Narkhede <ne...@gmail.com>
wrote:

> This is great. Thanks for sharing! Does kafkat automatically figure out the
> right reassignment strategy based on even data distribution?
>
> On Wed, Sep 3, 2014 at 12:12 AM, Alexis Midon <
> alexis.midon@airbedandbreakfast.com> wrote:
>
> > Hi Marcin,
> >
> > A few weeks ago, I did an upgrade to 0.8.1.1 and then augmented the
> cluster
> > from 3 to 9 brokers. All went smoothly.
> > In a dev environment, we found out that the biggest pain point is to have
> > to deal with the json file and the error-prone command line interface.
> > So to make our life easier, my team mate Nelson [1] came up with kafkat:
> > https://github.com/airbnb/kafkat
> >
> > We now install kafkat on every broker. Note that kafkat does NOT connect
> to
> > a broker, but to zookeeper. So you can actually use it from any machine.
> >
> > For reassignment, please see:
> > `kafkat reassign [topic] [--brokers <ids>] [--replicas <n>] `
> > It will transparently generate and kick off a balanced assignment.
> >
> > feedback and contributions welcome! Enjoy!
> >
> > Alexis
> >
> > [1] https://github.com/nelgau
> >
> >
> >
> > On Tue, Aug 26, 2014 at 10:27 AM, Marcin Michalski <
> mmichalski@tagged.com>
> > wrote:
> >
> > > I am running on 0.8.1.1 and I thought that the partition reassignment
> > tools
> > > can do this job. Just was not sure if this is the best way to do this.
> > > I will try this out in stage env first and will perform the same in
> prod.
> > >
> > > Thanks,
> > > marcin
> > >
> > >
> > > On Mon, Aug 25, 2014 at 7:23 PM, Joe Stein <jo...@stealth.ly>
> wrote:
> > >
> > > > Marcin, that is a typical task now.  What version of Kafka are you
> > > running?
> > > >
> > > > Take a look at
> > > >
> > https://kafka.apache.org/documentation.html#basic_ops_cluster_expansion
> > > > and
> > > >
> > > >
> > >
> >
> https://kafka.apache.org/documentation.html#basic_ops_increase_replication_factor
> > > >
> > > > Basically you can do a --generate to get existing JSON topology and
> > with
> > > > that take the results of "Current partition replica assignment" (the
> > > first
> > > > JSON that outputs) and make whatever changes (like sed old node for
> new
> > > > node and add more replica's which increase the replication factor,
> > > whatever
> > > > you want) and then --execute.
> > > >
> > > > With lots of data this takes time so you will want to run --verify to
> > see
> > > > what is in progress... good thing do a node at a time (even topic at
> a
> > > > time) however you want to manage and wait for it as such.
> > > >
> > > > The "preferred" replica is simply the first one in the list of
> > replicas.
> > > >  The kafka-preferred-replica-election.sh just makes that replica the
> > > leader
> > > > as this is not automatic yet.
> > > >
> > > > If you are running a version prior to 0.8.1.1 it might make sense to
> > > > upgrade the old nodes first then run reassign to the new servers.
> > > >
> > > >
> > > > /*******************************************
> > > >  Joe Stein
> > > >  Founder, Principal Consultant
> > > >  Big Data Open Source Security LLC
> > > >  http://www.stealth.ly
> > > >  Twitter: @allthingshadoop <http://www.twitter.com/allthingshadoop>
> > > > ********************************************/
> > > >
> > > >
> > > > On Mon, Aug 25, 2014 at 8:59 PM, Marcin Michalski <
> > mmichalski@tagged.com
> > > >
> > > > wrote:
> > > >
> > > > > Hi, I would like to migrate my Kafka setup from old servers to new
> > > > servers.
> > > > > Let say I have 8 really old servers that have the kafka
> > > topics/partitions
> > > > > replicated 4 ways and want to migrate the data to 4 brand new
> servers
> > > and
> > > > > want the replication factor be 3. I wonder if anyone has ever
> > performed
> > > > > this type of migration?
> > > > >
> > > > > Will auto rebalancing take care of this automatically if I do the
> > > > > following?
> > > > >
> > > > > Let say I bring down old broker id 1 down and startup new server
> > broker
> > > > id
> > > > > 100 up, is there a way to migrate all of the data of the topic that
> > had
> > > > the
> > > > > topic (where borker id 1 was the leader) over to the new broker
> 100?
> > > > >
> > > > > Or do I need to use *bin/kafka-preferred-replica-election.sh *to
> > > reassign
> > > > > the topics/partitions from old broker 1 to broker 100? And then
> just
> > > keep
> > > > > doing the same thing until all of the old brokers are
> decommissioned?
> > > > >
> > > > > Also, would kafka-preferred-replica-election.sh let me actually
> lower
> > > the
> > > > > number of replicas as well, if I just simply make sure that given
> > > > > topic/partition was only elected 3 times versus 4?
> > > > >
> > > > > Thanks for your insight,
> > > > > Marcin
> > > > >
> > > >
> > >
> >
>

Re: Migrating data from old brokers to new borkers question

Posted by Neha Narkhede <ne...@gmail.com>.
This is great. Thanks for sharing! Does kafkat automatically figure out the
right reassignment strategy based on even data distribution?

On Wed, Sep 3, 2014 at 12:12 AM, Alexis Midon <
alexis.midon@airbedandbreakfast.com> wrote:

> Hi Marcin,
>
> A few weeks ago, I did an upgrade to 0.8.1.1 and then augmented the cluster
> from 3 to 9 brokers. All went smoothly.
> In a dev environment, we found out that the biggest pain point is to have
> to deal with the json file and the error-prone command line interface.
> So to make our life easier, my team mate Nelson [1] came up with kafkat:
> https://github.com/airbnb/kafkat
>
> We now install kafkat on every broker. Note that kafkat does NOT connect to
> a broker, but to zookeeper. So you can actually use it from any machine.
>
> For reassignment, please see:
> `kafkat reassign [topic] [--brokers <ids>] [--replicas <n>] `
> It will transparently generate and kick off a balanced assignment.
>
> feedback and contributions welcome! Enjoy!
>
> Alexis
>
> [1] https://github.com/nelgau
>
>
>
> On Tue, Aug 26, 2014 at 10:27 AM, Marcin Michalski <mm...@tagged.com>
> wrote:
>
> > I am running on 0.8.1.1 and I thought that the partition reassignment
> tools
> > can do this job. Just was not sure if this is the best way to do this.
> > I will try this out in stage env first and will perform the same in prod.
> >
> > Thanks,
> > marcin
> >
> >
> > On Mon, Aug 25, 2014 at 7:23 PM, Joe Stein <jo...@stealth.ly> wrote:
> >
> > > Marcin, that is a typical task now.  What version of Kafka are you
> > running?
> > >
> > > Take a look at
> > >
> https://kafka.apache.org/documentation.html#basic_ops_cluster_expansion
> > > and
> > >
> > >
> >
> https://kafka.apache.org/documentation.html#basic_ops_increase_replication_factor
> > >
> > > Basically you can do a --generate to get existing JSON topology and
> with
> > > that take the results of "Current partition replica assignment" (the
> > first
> > > JSON that outputs) and make whatever changes (like sed old node for new
> > > node and add more replica's which increase the replication factor,
> > whatever
> > > you want) and then --execute.
> > >
> > > With lots of data this takes time so you will want to run --verify to
> see
> > > what is in progress... good thing do a node at a time (even topic at a
> > > time) however you want to manage and wait for it as such.
> > >
> > > The "preferred" replica is simply the first one in the list of
> replicas.
> > >  The kafka-preferred-replica-election.sh just makes that replica the
> > leader
> > > as this is not automatic yet.
> > >
> > > If you are running a version prior to 0.8.1.1 it might make sense to
> > > upgrade the old nodes first then run reassign to the new servers.
> > >
> > >
> > > /*******************************************
> > >  Joe Stein
> > >  Founder, Principal Consultant
> > >  Big Data Open Source Security LLC
> > >  http://www.stealth.ly
> > >  Twitter: @allthingshadoop <http://www.twitter.com/allthingshadoop>
> > > ********************************************/
> > >
> > >
> > > On Mon, Aug 25, 2014 at 8:59 PM, Marcin Michalski <
> mmichalski@tagged.com
> > >
> > > wrote:
> > >
> > > > Hi, I would like to migrate my Kafka setup from old servers to new
> > > servers.
> > > > Let say I have 8 really old servers that have the kafka
> > topics/partitions
> > > > replicated 4 ways and want to migrate the data to 4 brand new servers
> > and
> > > > want the replication factor be 3. I wonder if anyone has ever
> performed
> > > > this type of migration?
> > > >
> > > > Will auto rebalancing take care of this automatically if I do the
> > > > following?
> > > >
> > > > Let say I bring down old broker id 1 down and startup new server
> broker
> > > id
> > > > 100 up, is there a way to migrate all of the data of the topic that
> had
> > > the
> > > > topic (where borker id 1 was the leader) over to the new broker 100?
> > > >
> > > > Or do I need to use *bin/kafka-preferred-replica-election.sh *to
> > reassign
> > > > the topics/partitions from old broker 1 to broker 100? And then just
> > keep
> > > > doing the same thing until all of the old brokers are decommissioned?
> > > >
> > > > Also, would kafka-preferred-replica-election.sh let me actually lower
> > the
> > > > number of replicas as well, if I just simply make sure that given
> > > > topic/partition was only elected 3 times versus 4?
> > > >
> > > > Thanks for your insight,
> > > > Marcin
> > > >
> > >
> >
>