You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Marcin Michalski <mm...@tagged.com> on 2014/08/26 02:59:26 UTC

Migrating data from old brokers to new borkers question

Hi, I would like to migrate my Kafka setup from old servers to new servers.
Let say I have 8 really old servers that have the kafka topics/partitions
replicated 4 ways and want to migrate the data to 4 brand new servers and
want the replication factor be 3. I wonder if anyone has ever performed
this type of migration?

Will auto rebalancing take care of this automatically if I do the following?

Let say I bring down old broker id 1 down and startup new server broker id
100 up, is there a way to migrate all of the data of the topic that had the
topic (where borker id 1 was the leader) over to the new broker 100?

Or do I need to use *bin/kafka-preferred-replica-election.sh *to reassign
the topics/partitions from old broker 1 to broker 100? And then just keep
doing the same thing until all of the old brokers are decommissioned?

Also, would kafka-preferred-replica-election.sh let me actually lower the
number of replicas as well, if I just simply make sure that given
topic/partition was only elected 3 times versus 4?

Thanks for your insight,
Marcin

Re: Migrating data from old brokers to new borkers question

Posted by Kashyap Paidimarri <ka...@gmail.com>.
If you are also planning on a version upgrade as part of this, it might be
safer to create the new cluster  separately and use the mirror maker to
copy data over.
On Aug 26, 2014 6:29 AM, "Marcin Michalski" <mm...@tagged.com> wrote:

> Hi, I would like to migrate my Kafka setup from old servers to new servers.
> Let say I have 8 really old servers that have the kafka topics/partitions
> replicated 4 ways and want to migrate the data to 4 brand new servers and
> want the replication factor be 3. I wonder if anyone has ever performed
> this type of migration?
>
> Will auto rebalancing take care of this automatically if I do the
> following?
>
> Let say I bring down old broker id 1 down and startup new server broker id
> 100 up, is there a way to migrate all of the data of the topic that had the
> topic (where borker id 1 was the leader) over to the new broker 100?
>
> Or do I need to use *bin/kafka-preferred-replica-election.sh *to reassign
> the topics/partitions from old broker 1 to broker 100? And then just keep
> doing the same thing until all of the old brokers are decommissioned?
>
> Also, would kafka-preferred-replica-election.sh let me actually lower the
> number of replicas as well, if I just simply make sure that given
> topic/partition was only elected 3 times versus 4?
>
> Thanks for your insight,
> Marcin
>

Re: Migrating data from old brokers to new borkers question

Posted by Neha Narkhede <ne...@gmail.com>.
The idea is to bake the functionality of such a tool in Kafka itself. In an
ideal world, a Kafka cluster would automatically detect leader and data
imbalance and trigger a rebalance operation that leads to optimal
performance. I'm not sure if we have a JIRA for this though. So feel free
to create one.

On Wed, Sep 17, 2014 at 5:51 PM, Alexis Midon <
alexis.midon@airbedandbreakfast.com> wrote:

> we would be very happy to contribute. However a description of the current
> plan and status regarding tooling would be helpful.
> It would speed up the learning curve. You mentioned some jira tickets?
>
> (maybe I should sign up to the developer mailing list and take the
> conversation over there)
>
> On Tue, Sep 16, 2014 at 6:46 PM, Gwen Shapira <gs...@cloudera.com>
> wrote:
>
> > Since these tools are so useful, I wonder what it requires (from both
> > Airbnb and Kafka) to merge this into Kafka project. I think there are
> > couple of Jira regarding improved tool usability that this resolved.
> >
> > On Mon, Sep 15, 2014 at 11:45 AM, Alexis Midon
> > <al...@airbedandbreakfast.com> wrote:
> > > distribution will be even based on the number of partitions.
> > > It is the same logic as AdminUtils.
> > > see
> > >
> >
> https://github.com/airbnb/kafkat/blob/master/lib/kafkat/command/reassign.rb#L39
> > >
> > > On Sun, Sep 14, 2014 at 6:05 PM, Neha Narkhede <
> neha.narkhede@gmail.com>
> > > wrote:
> > >
> > >> This is great. Thanks for sharing! Does kafkat automatically figure
> out
> > the
> > >> right reassignment strategy based on even data distribution?
> > >>
> > >> On Wed, Sep 3, 2014 at 12:12 AM, Alexis Midon <
> > >> alexis.midon@airbedandbreakfast.com> wrote:
> > >>
> > >> > Hi Marcin,
> > >> >
> > >> > A few weeks ago, I did an upgrade to 0.8.1.1 and then augmented the
> > >> cluster
> > >> > from 3 to 9 brokers. All went smoothly.
> > >> > In a dev environment, we found out that the biggest pain point is to
> > have
> > >> > to deal with the json file and the error-prone command line
> interface.
> > >> > So to make our life easier, my team mate Nelson [1] came up with
> > kafkat:
> > >> > https://github.com/airbnb/kafkat
> > >> >
> > >> > We now install kafkat on every broker. Note that kafkat does NOT
> > connect
> > >> to
> > >> > a broker, but to zookeeper. So you can actually use it from any
> > machine.
> > >> >
> > >> > For reassignment, please see:
> > >> > `kafkat reassign [topic] [--brokers <ids>] [--replicas <n>] `
> > >> > It will transparently generate and kick off a balanced assignment.
> > >> >
> > >> > feedback and contributions welcome! Enjoy!
> > >> >
> > >> > Alexis
> > >> >
> > >> > [1] https://github.com/nelgau
> > >> >
> > >> >
> > >> >
> > >> > On Tue, Aug 26, 2014 at 10:27 AM, Marcin Michalski <
> > >> mmichalski@tagged.com>
> > >> > wrote:
> > >> >
> > >> > > I am running on 0.8.1.1 and I thought that the partition
> > reassignment
> > >> > tools
> > >> > > can do this job. Just was not sure if this is the best way to do
> > this.
> > >> > > I will try this out in stage env first and will perform the same
> in
> > >> prod.
> > >> > >
> > >> > > Thanks,
> > >> > > marcin
> > >> > >
> > >> > >
> > >> > > On Mon, Aug 25, 2014 at 7:23 PM, Joe Stein <jo...@stealth.ly>
> > >> wrote:
> > >> > >
> > >> > > > Marcin, that is a typical task now.  What version of Kafka are
> you
> > >> > > running?
> > >> > > >
> > >> > > > Take a look at
> > >> > > >
> > >> >
> > https://kafka.apache.org/documentation.html#basic_ops_cluster_expansion
> > >> > > > and
> > >> > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> https://kafka.apache.org/documentation.html#basic_ops_increase_replication_factor
> > >> > > >
> > >> > > > Basically you can do a --generate to get existing JSON topology
> > and
> > >> > with
> > >> > > > that take the results of "Current partition replica assignment"
> > (the
> > >> > > first
> > >> > > > JSON that outputs) and make whatever changes (like sed old node
> > for
> > >> new
> > >> > > > node and add more replica's which increase the replication
> factor,
> > >> > > whatever
> > >> > > > you want) and then --execute.
> > >> > > >
> > >> > > > With lots of data this takes time so you will want to run
> > --verify to
> > >> > see
> > >> > > > what is in progress... good thing do a node at a time (even
> topic
> > at
> > >> a
> > >> > > > time) however you want to manage and wait for it as such.
> > >> > > >
> > >> > > > The "preferred" replica is simply the first one in the list of
> > >> > replicas.
> > >> > > >  The kafka-preferred-replica-election.sh just makes that replica
> > the
> > >> > > leader
> > >> > > > as this is not automatic yet.
> > >> > > >
> > >> > > > If you are running a version prior to 0.8.1.1 it might make
> sense
> > to
> > >> > > > upgrade the old nodes first then run reassign to the new
> servers.
> > >> > > >
> > >> > > >
> > >> > > > /*******************************************
> > >> > > >  Joe Stein
> > >> > > >  Founder, Principal Consultant
> > >> > > >  Big Data Open Source Security LLC
> > >> > > >  http://www.stealth.ly
> > >> > > >  Twitter: @allthingshadoop <
> > http://www.twitter.com/allthingshadoop>
> > >> > > > ********************************************/
> > >> > > >
> > >> > > >
> > >> > > > On Mon, Aug 25, 2014 at 8:59 PM, Marcin Michalski <
> > >> > mmichalski@tagged.com
> > >> > > >
> > >> > > > wrote:
> > >> > > >
> > >> > > > > Hi, I would like to migrate my Kafka setup from old servers to
> > new
> > >> > > > servers.
> > >> > > > > Let say I have 8 really old servers that have the kafka
> > >> > > topics/partitions
> > >> > > > > replicated 4 ways and want to migrate the data to 4 brand new
> > >> servers
> > >> > > and
> > >> > > > > want the replication factor be 3. I wonder if anyone has ever
> > >> > performed
> > >> > > > > this type of migration?
> > >> > > > >
> > >> > > > > Will auto rebalancing take care of this automatically if I do
> > the
> > >> > > > > following?
> > >> > > > >
> > >> > > > > Let say I bring down old broker id 1 down and startup new
> server
> > >> > broker
> > >> > > > id
> > >> > > > > 100 up, is there a way to migrate all of the data of the topic
> > that
> > >> > had
> > >> > > > the
> > >> > > > > topic (where borker id 1 was the leader) over to the new
> broker
> > >> 100?
> > >> > > > >
> > >> > > > > Or do I need to use *bin/kafka-preferred-replica-election.sh
> *to
> > >> > > reassign
> > >> > > > > the topics/partitions from old broker 1 to broker 100? And
> then
> > >> just
> > >> > > keep
> > >> > > > > doing the same thing until all of the old brokers are
> > >> decommissioned?
> > >> > > > >
> > >> > > > > Also, would kafka-preferred-replica-election.sh let me
> actually
> > >> lower
> > >> > > the
> > >> > > > > number of replicas as well, if I just simply make sure that
> > given
> > >> > > > > topic/partition was only elected 3 times versus 4?
> > >> > > > >
> > >> > > > > Thanks for your insight,
> > >> > > > > Marcin
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
>

Re: Migrating data from old brokers to new borkers question

Posted by Alexis Midon <al...@airbedandbreakfast.com>.
we would be very happy to contribute. However a description of the current
plan and status regarding tooling would be helpful.
It would speed up the learning curve. You mentioned some jira tickets?

(maybe I should sign up to the developer mailing list and take the
conversation over there)

On Tue, Sep 16, 2014 at 6:46 PM, Gwen Shapira <gs...@cloudera.com> wrote:

> Since these tools are so useful, I wonder what it requires (from both
> Airbnb and Kafka) to merge this into Kafka project. I think there are
> couple of Jira regarding improved tool usability that this resolved.
>
> On Mon, Sep 15, 2014 at 11:45 AM, Alexis Midon
> <al...@airbedandbreakfast.com> wrote:
> > distribution will be even based on the number of partitions.
> > It is the same logic as AdminUtils.
> > see
> >
> https://github.com/airbnb/kafkat/blob/master/lib/kafkat/command/reassign.rb#L39
> >
> > On Sun, Sep 14, 2014 at 6:05 PM, Neha Narkhede <ne...@gmail.com>
> > wrote:
> >
> >> This is great. Thanks for sharing! Does kafkat automatically figure out
> the
> >> right reassignment strategy based on even data distribution?
> >>
> >> On Wed, Sep 3, 2014 at 12:12 AM, Alexis Midon <
> >> alexis.midon@airbedandbreakfast.com> wrote:
> >>
> >> > Hi Marcin,
> >> >
> >> > A few weeks ago, I did an upgrade to 0.8.1.1 and then augmented the
> >> cluster
> >> > from 3 to 9 brokers. All went smoothly.
> >> > In a dev environment, we found out that the biggest pain point is to
> have
> >> > to deal with the json file and the error-prone command line interface.
> >> > So to make our life easier, my team mate Nelson [1] came up with
> kafkat:
> >> > https://github.com/airbnb/kafkat
> >> >
> >> > We now install kafkat on every broker. Note that kafkat does NOT
> connect
> >> to
> >> > a broker, but to zookeeper. So you can actually use it from any
> machine.
> >> >
> >> > For reassignment, please see:
> >> > `kafkat reassign [topic] [--brokers <ids>] [--replicas <n>] `
> >> > It will transparently generate and kick off a balanced assignment.
> >> >
> >> > feedback and contributions welcome! Enjoy!
> >> >
> >> > Alexis
> >> >
> >> > [1] https://github.com/nelgau
> >> >
> >> >
> >> >
> >> > On Tue, Aug 26, 2014 at 10:27 AM, Marcin Michalski <
> >> mmichalski@tagged.com>
> >> > wrote:
> >> >
> >> > > I am running on 0.8.1.1 and I thought that the partition
> reassignment
> >> > tools
> >> > > can do this job. Just was not sure if this is the best way to do
> this.
> >> > > I will try this out in stage env first and will perform the same in
> >> prod.
> >> > >
> >> > > Thanks,
> >> > > marcin
> >> > >
> >> > >
> >> > > On Mon, Aug 25, 2014 at 7:23 PM, Joe Stein <jo...@stealth.ly>
> >> wrote:
> >> > >
> >> > > > Marcin, that is a typical task now.  What version of Kafka are you
> >> > > running?
> >> > > >
> >> > > > Take a look at
> >> > > >
> >> >
> https://kafka.apache.org/documentation.html#basic_ops_cluster_expansion
> >> > > > and
> >> > > >
> >> > > >
> >> > >
> >> >
> >>
> https://kafka.apache.org/documentation.html#basic_ops_increase_replication_factor
> >> > > >
> >> > > > Basically you can do a --generate to get existing JSON topology
> and
> >> > with
> >> > > > that take the results of "Current partition replica assignment"
> (the
> >> > > first
> >> > > > JSON that outputs) and make whatever changes (like sed old node
> for
> >> new
> >> > > > node and add more replica's which increase the replication factor,
> >> > > whatever
> >> > > > you want) and then --execute.
> >> > > >
> >> > > > With lots of data this takes time so you will want to run
> --verify to
> >> > see
> >> > > > what is in progress... good thing do a node at a time (even topic
> at
> >> a
> >> > > > time) however you want to manage and wait for it as such.
> >> > > >
> >> > > > The "preferred" replica is simply the first one in the list of
> >> > replicas.
> >> > > >  The kafka-preferred-replica-election.sh just makes that replica
> the
> >> > > leader
> >> > > > as this is not automatic yet.
> >> > > >
> >> > > > If you are running a version prior to 0.8.1.1 it might make sense
> to
> >> > > > upgrade the old nodes first then run reassign to the new servers.
> >> > > >
> >> > > >
> >> > > > /*******************************************
> >> > > >  Joe Stein
> >> > > >  Founder, Principal Consultant
> >> > > >  Big Data Open Source Security LLC
> >> > > >  http://www.stealth.ly
> >> > > >  Twitter: @allthingshadoop <
> http://www.twitter.com/allthingshadoop>
> >> > > > ********************************************/
> >> > > >
> >> > > >
> >> > > > On Mon, Aug 25, 2014 at 8:59 PM, Marcin Michalski <
> >> > mmichalski@tagged.com
> >> > > >
> >> > > > wrote:
> >> > > >
> >> > > > > Hi, I would like to migrate my Kafka setup from old servers to
> new
> >> > > > servers.
> >> > > > > Let say I have 8 really old servers that have the kafka
> >> > > topics/partitions
> >> > > > > replicated 4 ways and want to migrate the data to 4 brand new
> >> servers
> >> > > and
> >> > > > > want the replication factor be 3. I wonder if anyone has ever
> >> > performed
> >> > > > > this type of migration?
> >> > > > >
> >> > > > > Will auto rebalancing take care of this automatically if I do
> the
> >> > > > > following?
> >> > > > >
> >> > > > > Let say I bring down old broker id 1 down and startup new server
> >> > broker
> >> > > > id
> >> > > > > 100 up, is there a way to migrate all of the data of the topic
> that
> >> > had
> >> > > > the
> >> > > > > topic (where borker id 1 was the leader) over to the new broker
> >> 100?
> >> > > > >
> >> > > > > Or do I need to use *bin/kafka-preferred-replica-election.sh *to
> >> > > reassign
> >> > > > > the topics/partitions from old broker 1 to broker 100? And then
> >> just
> >> > > keep
> >> > > > > doing the same thing until all of the old brokers are
> >> decommissioned?
> >> > > > >
> >> > > > > Also, would kafka-preferred-replica-election.sh let me actually
> >> lower
> >> > > the
> >> > > > > number of replicas as well, if I just simply make sure that
> given
> >> > > > > topic/partition was only elected 3 times versus 4?
> >> > > > >
> >> > > > > Thanks for your insight,
> >> > > > > Marcin
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
>

Re: Migrating data from old brokers to new borkers question

Posted by Gwen Shapira <gs...@cloudera.com>.
Since these tools are so useful, I wonder what it requires (from both
Airbnb and Kafka) to merge this into Kafka project. I think there are
couple of Jira regarding improved tool usability that this resolved.

On Mon, Sep 15, 2014 at 11:45 AM, Alexis Midon
<al...@airbedandbreakfast.com> wrote:
> distribution will be even based on the number of partitions.
> It is the same logic as AdminUtils.
> see
> https://github.com/airbnb/kafkat/blob/master/lib/kafkat/command/reassign.rb#L39
>
> On Sun, Sep 14, 2014 at 6:05 PM, Neha Narkhede <ne...@gmail.com>
> wrote:
>
>> This is great. Thanks for sharing! Does kafkat automatically figure out the
>> right reassignment strategy based on even data distribution?
>>
>> On Wed, Sep 3, 2014 at 12:12 AM, Alexis Midon <
>> alexis.midon@airbedandbreakfast.com> wrote:
>>
>> > Hi Marcin,
>> >
>> > A few weeks ago, I did an upgrade to 0.8.1.1 and then augmented the
>> cluster
>> > from 3 to 9 brokers. All went smoothly.
>> > In a dev environment, we found out that the biggest pain point is to have
>> > to deal with the json file and the error-prone command line interface.
>> > So to make our life easier, my team mate Nelson [1] came up with kafkat:
>> > https://github.com/airbnb/kafkat
>> >
>> > We now install kafkat on every broker. Note that kafkat does NOT connect
>> to
>> > a broker, but to zookeeper. So you can actually use it from any machine.
>> >
>> > For reassignment, please see:
>> > `kafkat reassign [topic] [--brokers <ids>] [--replicas <n>] `
>> > It will transparently generate and kick off a balanced assignment.
>> >
>> > feedback and contributions welcome! Enjoy!
>> >
>> > Alexis
>> >
>> > [1] https://github.com/nelgau
>> >
>> >
>> >
>> > On Tue, Aug 26, 2014 at 10:27 AM, Marcin Michalski <
>> mmichalski@tagged.com>
>> > wrote:
>> >
>> > > I am running on 0.8.1.1 and I thought that the partition reassignment
>> > tools
>> > > can do this job. Just was not sure if this is the best way to do this.
>> > > I will try this out in stage env first and will perform the same in
>> prod.
>> > >
>> > > Thanks,
>> > > marcin
>> > >
>> > >
>> > > On Mon, Aug 25, 2014 at 7:23 PM, Joe Stein <jo...@stealth.ly>
>> wrote:
>> > >
>> > > > Marcin, that is a typical task now.  What version of Kafka are you
>> > > running?
>> > > >
>> > > > Take a look at
>> > > >
>> > https://kafka.apache.org/documentation.html#basic_ops_cluster_expansion
>> > > > and
>> > > >
>> > > >
>> > >
>> >
>> https://kafka.apache.org/documentation.html#basic_ops_increase_replication_factor
>> > > >
>> > > > Basically you can do a --generate to get existing JSON topology and
>> > with
>> > > > that take the results of "Current partition replica assignment" (the
>> > > first
>> > > > JSON that outputs) and make whatever changes (like sed old node for
>> new
>> > > > node and add more replica's which increase the replication factor,
>> > > whatever
>> > > > you want) and then --execute.
>> > > >
>> > > > With lots of data this takes time so you will want to run --verify to
>> > see
>> > > > what is in progress... good thing do a node at a time (even topic at
>> a
>> > > > time) however you want to manage and wait for it as such.
>> > > >
>> > > > The "preferred" replica is simply the first one in the list of
>> > replicas.
>> > > >  The kafka-preferred-replica-election.sh just makes that replica the
>> > > leader
>> > > > as this is not automatic yet.
>> > > >
>> > > > If you are running a version prior to 0.8.1.1 it might make sense to
>> > > > upgrade the old nodes first then run reassign to the new servers.
>> > > >
>> > > >
>> > > > /*******************************************
>> > > >  Joe Stein
>> > > >  Founder, Principal Consultant
>> > > >  Big Data Open Source Security LLC
>> > > >  http://www.stealth.ly
>> > > >  Twitter: @allthingshadoop <http://www.twitter.com/allthingshadoop>
>> > > > ********************************************/
>> > > >
>> > > >
>> > > > On Mon, Aug 25, 2014 at 8:59 PM, Marcin Michalski <
>> > mmichalski@tagged.com
>> > > >
>> > > > wrote:
>> > > >
>> > > > > Hi, I would like to migrate my Kafka setup from old servers to new
>> > > > servers.
>> > > > > Let say I have 8 really old servers that have the kafka
>> > > topics/partitions
>> > > > > replicated 4 ways and want to migrate the data to 4 brand new
>> servers
>> > > and
>> > > > > want the replication factor be 3. I wonder if anyone has ever
>> > performed
>> > > > > this type of migration?
>> > > > >
>> > > > > Will auto rebalancing take care of this automatically if I do the
>> > > > > following?
>> > > > >
>> > > > > Let say I bring down old broker id 1 down and startup new server
>> > broker
>> > > > id
>> > > > > 100 up, is there a way to migrate all of the data of the topic that
>> > had
>> > > > the
>> > > > > topic (where borker id 1 was the leader) over to the new broker
>> 100?
>> > > > >
>> > > > > Or do I need to use *bin/kafka-preferred-replica-election.sh *to
>> > > reassign
>> > > > > the topics/partitions from old broker 1 to broker 100? And then
>> just
>> > > keep
>> > > > > doing the same thing until all of the old brokers are
>> decommissioned?
>> > > > >
>> > > > > Also, would kafka-preferred-replica-election.sh let me actually
>> lower
>> > > the
>> > > > > number of replicas as well, if I just simply make sure that given
>> > > > > topic/partition was only elected 3 times versus 4?
>> > > > >
>> > > > > Thanks for your insight,
>> > > > > Marcin
>> > > > >
>> > > >
>> > >
>> >
>>

Re: Migrating data from old brokers to new borkers question

Posted by Alexis Midon <al...@airbedandbreakfast.com>.
distribution will be even based on the number of partitions.
It is the same logic as AdminUtils.
see
https://github.com/airbnb/kafkat/blob/master/lib/kafkat/command/reassign.rb#L39

On Sun, Sep 14, 2014 at 6:05 PM, Neha Narkhede <ne...@gmail.com>
wrote:

> This is great. Thanks for sharing! Does kafkat automatically figure out the
> right reassignment strategy based on even data distribution?
>
> On Wed, Sep 3, 2014 at 12:12 AM, Alexis Midon <
> alexis.midon@airbedandbreakfast.com> wrote:
>
> > Hi Marcin,
> >
> > A few weeks ago, I did an upgrade to 0.8.1.1 and then augmented the
> cluster
> > from 3 to 9 brokers. All went smoothly.
> > In a dev environment, we found out that the biggest pain point is to have
> > to deal with the json file and the error-prone command line interface.
> > So to make our life easier, my team mate Nelson [1] came up with kafkat:
> > https://github.com/airbnb/kafkat
> >
> > We now install kafkat on every broker. Note that kafkat does NOT connect
> to
> > a broker, but to zookeeper. So you can actually use it from any machine.
> >
> > For reassignment, please see:
> > `kafkat reassign [topic] [--brokers <ids>] [--replicas <n>] `
> > It will transparently generate and kick off a balanced assignment.
> >
> > feedback and contributions welcome! Enjoy!
> >
> > Alexis
> >
> > [1] https://github.com/nelgau
> >
> >
> >
> > On Tue, Aug 26, 2014 at 10:27 AM, Marcin Michalski <
> mmichalski@tagged.com>
> > wrote:
> >
> > > I am running on 0.8.1.1 and I thought that the partition reassignment
> > tools
> > > can do this job. Just was not sure if this is the best way to do this.
> > > I will try this out in stage env first and will perform the same in
> prod.
> > >
> > > Thanks,
> > > marcin
> > >
> > >
> > > On Mon, Aug 25, 2014 at 7:23 PM, Joe Stein <jo...@stealth.ly>
> wrote:
> > >
> > > > Marcin, that is a typical task now.  What version of Kafka are you
> > > running?
> > > >
> > > > Take a look at
> > > >
> > https://kafka.apache.org/documentation.html#basic_ops_cluster_expansion
> > > > and
> > > >
> > > >
> > >
> >
> https://kafka.apache.org/documentation.html#basic_ops_increase_replication_factor
> > > >
> > > > Basically you can do a --generate to get existing JSON topology and
> > with
> > > > that take the results of "Current partition replica assignment" (the
> > > first
> > > > JSON that outputs) and make whatever changes (like sed old node for
> new
> > > > node and add more replica's which increase the replication factor,
> > > whatever
> > > > you want) and then --execute.
> > > >
> > > > With lots of data this takes time so you will want to run --verify to
> > see
> > > > what is in progress... good thing do a node at a time (even topic at
> a
> > > > time) however you want to manage and wait for it as such.
> > > >
> > > > The "preferred" replica is simply the first one in the list of
> > replicas.
> > > >  The kafka-preferred-replica-election.sh just makes that replica the
> > > leader
> > > > as this is not automatic yet.
> > > >
> > > > If you are running a version prior to 0.8.1.1 it might make sense to
> > > > upgrade the old nodes first then run reassign to the new servers.
> > > >
> > > >
> > > > /*******************************************
> > > >  Joe Stein
> > > >  Founder, Principal Consultant
> > > >  Big Data Open Source Security LLC
> > > >  http://www.stealth.ly
> > > >  Twitter: @allthingshadoop <http://www.twitter.com/allthingshadoop>
> > > > ********************************************/
> > > >
> > > >
> > > > On Mon, Aug 25, 2014 at 8:59 PM, Marcin Michalski <
> > mmichalski@tagged.com
> > > >
> > > > wrote:
> > > >
> > > > > Hi, I would like to migrate my Kafka setup from old servers to new
> > > > servers.
> > > > > Let say I have 8 really old servers that have the kafka
> > > topics/partitions
> > > > > replicated 4 ways and want to migrate the data to 4 brand new
> servers
> > > and
> > > > > want the replication factor be 3. I wonder if anyone has ever
> > performed
> > > > > this type of migration?
> > > > >
> > > > > Will auto rebalancing take care of this automatically if I do the
> > > > > following?
> > > > >
> > > > > Let say I bring down old broker id 1 down and startup new server
> > broker
> > > > id
> > > > > 100 up, is there a way to migrate all of the data of the topic that
> > had
> > > > the
> > > > > topic (where borker id 1 was the leader) over to the new broker
> 100?
> > > > >
> > > > > Or do I need to use *bin/kafka-preferred-replica-election.sh *to
> > > reassign
> > > > > the topics/partitions from old broker 1 to broker 100? And then
> just
> > > keep
> > > > > doing the same thing until all of the old brokers are
> decommissioned?
> > > > >
> > > > > Also, would kafka-preferred-replica-election.sh let me actually
> lower
> > > the
> > > > > number of replicas as well, if I just simply make sure that given
> > > > > topic/partition was only elected 3 times versus 4?
> > > > >
> > > > > Thanks for your insight,
> > > > > Marcin
> > > > >
> > > >
> > >
> >
>

Re: Migrating data from old brokers to new borkers question

Posted by Neha Narkhede <ne...@gmail.com>.
This is great. Thanks for sharing! Does kafkat automatically figure out the
right reassignment strategy based on even data distribution?

On Wed, Sep 3, 2014 at 12:12 AM, Alexis Midon <
alexis.midon@airbedandbreakfast.com> wrote:

> Hi Marcin,
>
> A few weeks ago, I did an upgrade to 0.8.1.1 and then augmented the cluster
> from 3 to 9 brokers. All went smoothly.
> In a dev environment, we found out that the biggest pain point is to have
> to deal with the json file and the error-prone command line interface.
> So to make our life easier, my team mate Nelson [1] came up with kafkat:
> https://github.com/airbnb/kafkat
>
> We now install kafkat on every broker. Note that kafkat does NOT connect to
> a broker, but to zookeeper. So you can actually use it from any machine.
>
> For reassignment, please see:
> `kafkat reassign [topic] [--brokers <ids>] [--replicas <n>] `
> It will transparently generate and kick off a balanced assignment.
>
> feedback and contributions welcome! Enjoy!
>
> Alexis
>
> [1] https://github.com/nelgau
>
>
>
> On Tue, Aug 26, 2014 at 10:27 AM, Marcin Michalski <mm...@tagged.com>
> wrote:
>
> > I am running on 0.8.1.1 and I thought that the partition reassignment
> tools
> > can do this job. Just was not sure if this is the best way to do this.
> > I will try this out in stage env first and will perform the same in prod.
> >
> > Thanks,
> > marcin
> >
> >
> > On Mon, Aug 25, 2014 at 7:23 PM, Joe Stein <jo...@stealth.ly> wrote:
> >
> > > Marcin, that is a typical task now.  What version of Kafka are you
> > running?
> > >
> > > Take a look at
> > >
> https://kafka.apache.org/documentation.html#basic_ops_cluster_expansion
> > > and
> > >
> > >
> >
> https://kafka.apache.org/documentation.html#basic_ops_increase_replication_factor
> > >
> > > Basically you can do a --generate to get existing JSON topology and
> with
> > > that take the results of "Current partition replica assignment" (the
> > first
> > > JSON that outputs) and make whatever changes (like sed old node for new
> > > node and add more replica's which increase the replication factor,
> > whatever
> > > you want) and then --execute.
> > >
> > > With lots of data this takes time so you will want to run --verify to
> see
> > > what is in progress... good thing do a node at a time (even topic at a
> > > time) however you want to manage and wait for it as such.
> > >
> > > The "preferred" replica is simply the first one in the list of
> replicas.
> > >  The kafka-preferred-replica-election.sh just makes that replica the
> > leader
> > > as this is not automatic yet.
> > >
> > > If you are running a version prior to 0.8.1.1 it might make sense to
> > > upgrade the old nodes first then run reassign to the new servers.
> > >
> > >
> > > /*******************************************
> > >  Joe Stein
> > >  Founder, Principal Consultant
> > >  Big Data Open Source Security LLC
> > >  http://www.stealth.ly
> > >  Twitter: @allthingshadoop <http://www.twitter.com/allthingshadoop>
> > > ********************************************/
> > >
> > >
> > > On Mon, Aug 25, 2014 at 8:59 PM, Marcin Michalski <
> mmichalski@tagged.com
> > >
> > > wrote:
> > >
> > > > Hi, I would like to migrate my Kafka setup from old servers to new
> > > servers.
> > > > Let say I have 8 really old servers that have the kafka
> > topics/partitions
> > > > replicated 4 ways and want to migrate the data to 4 brand new servers
> > and
> > > > want the replication factor be 3. I wonder if anyone has ever
> performed
> > > > this type of migration?
> > > >
> > > > Will auto rebalancing take care of this automatically if I do the
> > > > following?
> > > >
> > > > Let say I bring down old broker id 1 down and startup new server
> broker
> > > id
> > > > 100 up, is there a way to migrate all of the data of the topic that
> had
> > > the
> > > > topic (where borker id 1 was the leader) over to the new broker 100?
> > > >
> > > > Or do I need to use *bin/kafka-preferred-replica-election.sh *to
> > reassign
> > > > the topics/partitions from old broker 1 to broker 100? And then just
> > keep
> > > > doing the same thing until all of the old brokers are decommissioned?
> > > >
> > > > Also, would kafka-preferred-replica-election.sh let me actually lower
> > the
> > > > number of replicas as well, if I just simply make sure that given
> > > > topic/partition was only elected 3 times versus 4?
> > > >
> > > > Thanks for your insight,
> > > > Marcin
> > > >
> > >
> >
>

Re: Migrating data from old brokers to new borkers question

Posted by Alexis Midon <al...@airbedandbreakfast.com>.
Hi Marcin,

A few weeks ago, I did an upgrade to 0.8.1.1 and then augmented the cluster
from 3 to 9 brokers. All went smoothly.
In a dev environment, we found out that the biggest pain point is to have
to deal with the json file and the error-prone command line interface.
So to make our life easier, my team mate Nelson [1] came up with kafkat:
https://github.com/airbnb/kafkat

We now install kafkat on every broker. Note that kafkat does NOT connect to
a broker, but to zookeeper. So you can actually use it from any machine.

For reassignment, please see:
`kafkat reassign [topic] [--brokers <ids>] [--replicas <n>] `
It will transparently generate and kick off a balanced assignment.

feedback and contributions welcome! Enjoy!

Alexis

[1] https://github.com/nelgau



On Tue, Aug 26, 2014 at 10:27 AM, Marcin Michalski <mm...@tagged.com>
wrote:

> I am running on 0.8.1.1 and I thought that the partition reassignment tools
> can do this job. Just was not sure if this is the best way to do this.
> I will try this out in stage env first and will perform the same in prod.
>
> Thanks,
> marcin
>
>
> On Mon, Aug 25, 2014 at 7:23 PM, Joe Stein <jo...@stealth.ly> wrote:
>
> > Marcin, that is a typical task now.  What version of Kafka are you
> running?
> >
> > Take a look at
> > https://kafka.apache.org/documentation.html#basic_ops_cluster_expansion
> > and
> >
> >
> https://kafka.apache.org/documentation.html#basic_ops_increase_replication_factor
> >
> > Basically you can do a --generate to get existing JSON topology and with
> > that take the results of "Current partition replica assignment" (the
> first
> > JSON that outputs) and make whatever changes (like sed old node for new
> > node and add more replica's which increase the replication factor,
> whatever
> > you want) and then --execute.
> >
> > With lots of data this takes time so you will want to run --verify to see
> > what is in progress... good thing do a node at a time (even topic at a
> > time) however you want to manage and wait for it as such.
> >
> > The "preferred" replica is simply the first one in the list of replicas.
> >  The kafka-preferred-replica-election.sh just makes that replica the
> leader
> > as this is not automatic yet.
> >
> > If you are running a version prior to 0.8.1.1 it might make sense to
> > upgrade the old nodes first then run reassign to the new servers.
> >
> >
> > /*******************************************
> >  Joe Stein
> >  Founder, Principal Consultant
> >  Big Data Open Source Security LLC
> >  http://www.stealth.ly
> >  Twitter: @allthingshadoop <http://www.twitter.com/allthingshadoop>
> > ********************************************/
> >
> >
> > On Mon, Aug 25, 2014 at 8:59 PM, Marcin Michalski <mmichalski@tagged.com
> >
> > wrote:
> >
> > > Hi, I would like to migrate my Kafka setup from old servers to new
> > servers.
> > > Let say I have 8 really old servers that have the kafka
> topics/partitions
> > > replicated 4 ways and want to migrate the data to 4 brand new servers
> and
> > > want the replication factor be 3. I wonder if anyone has ever performed
> > > this type of migration?
> > >
> > > Will auto rebalancing take care of this automatically if I do the
> > > following?
> > >
> > > Let say I bring down old broker id 1 down and startup new server broker
> > id
> > > 100 up, is there a way to migrate all of the data of the topic that had
> > the
> > > topic (where borker id 1 was the leader) over to the new broker 100?
> > >
> > > Or do I need to use *bin/kafka-preferred-replica-election.sh *to
> reassign
> > > the topics/partitions from old broker 1 to broker 100? And then just
> keep
> > > doing the same thing until all of the old brokers are decommissioned?
> > >
> > > Also, would kafka-preferred-replica-election.sh let me actually lower
> the
> > > number of replicas as well, if I just simply make sure that given
> > > topic/partition was only elected 3 times versus 4?
> > >
> > > Thanks for your insight,
> > > Marcin
> > >
> >
>

Re: Migrating data from old brokers to new borkers question

Posted by Marcin Michalski <mm...@tagged.com>.
I am running on 0.8.1.1 and I thought that the partition reassignment tools
can do this job. Just was not sure if this is the best way to do this.
I will try this out in stage env first and will perform the same in prod.

Thanks,
marcin


On Mon, Aug 25, 2014 at 7:23 PM, Joe Stein <jo...@stealth.ly> wrote:

> Marcin, that is a typical task now.  What version of Kafka are you running?
>
> Take a look at
> https://kafka.apache.org/documentation.html#basic_ops_cluster_expansion
> and
>
> https://kafka.apache.org/documentation.html#basic_ops_increase_replication_factor
>
> Basically you can do a --generate to get existing JSON topology and with
> that take the results of "Current partition replica assignment" (the first
> JSON that outputs) and make whatever changes (like sed old node for new
> node and add more replica's which increase the replication factor, whatever
> you want) and then --execute.
>
> With lots of data this takes time so you will want to run --verify to see
> what is in progress... good thing do a node at a time (even topic at a
> time) however you want to manage and wait for it as such.
>
> The "preferred" replica is simply the first one in the list of replicas.
>  The kafka-preferred-replica-election.sh just makes that replica the leader
> as this is not automatic yet.
>
> If you are running a version prior to 0.8.1.1 it might make sense to
> upgrade the old nodes first then run reassign to the new servers.
>
>
> /*******************************************
>  Joe Stein
>  Founder, Principal Consultant
>  Big Data Open Source Security LLC
>  http://www.stealth.ly
>  Twitter: @allthingshadoop <http://www.twitter.com/allthingshadoop>
> ********************************************/
>
>
> On Mon, Aug 25, 2014 at 8:59 PM, Marcin Michalski <mm...@tagged.com>
> wrote:
>
> > Hi, I would like to migrate my Kafka setup from old servers to new
> servers.
> > Let say I have 8 really old servers that have the kafka topics/partitions
> > replicated 4 ways and want to migrate the data to 4 brand new servers and
> > want the replication factor be 3. I wonder if anyone has ever performed
> > this type of migration?
> >
> > Will auto rebalancing take care of this automatically if I do the
> > following?
> >
> > Let say I bring down old broker id 1 down and startup new server broker
> id
> > 100 up, is there a way to migrate all of the data of the topic that had
> the
> > topic (where borker id 1 was the leader) over to the new broker 100?
> >
> > Or do I need to use *bin/kafka-preferred-replica-election.sh *to reassign
> > the topics/partitions from old broker 1 to broker 100? And then just keep
> > doing the same thing until all of the old brokers are decommissioned?
> >
> > Also, would kafka-preferred-replica-election.sh let me actually lower the
> > number of replicas as well, if I just simply make sure that given
> > topic/partition was only elected 3 times versus 4?
> >
> > Thanks for your insight,
> > Marcin
> >
>

Re: Migrating data from old brokers to new borkers question

Posted by Joe Stein <jo...@stealth.ly>.
Marcin, that is a typical task now.  What version of Kafka are you running?

Take a look at
https://kafka.apache.org/documentation.html#basic_ops_cluster_expansion and
https://kafka.apache.org/documentation.html#basic_ops_increase_replication_factor

Basically you can do a --generate to get existing JSON topology and with
that take the results of "Current partition replica assignment" (the first
JSON that outputs) and make whatever changes (like sed old node for new
node and add more replica's which increase the replication factor, whatever
you want) and then --execute.

With lots of data this takes time so you will want to run --verify to see
what is in progress... good thing do a node at a time (even topic at a
time) however you want to manage and wait for it as such.

The "preferred" replica is simply the first one in the list of replicas.
 The kafka-preferred-replica-election.sh just makes that replica the leader
as this is not automatic yet.

If you are running a version prior to 0.8.1.1 it might make sense to
upgrade the old nodes first then run reassign to the new servers.


/*******************************************
 Joe Stein
 Founder, Principal Consultant
 Big Data Open Source Security LLC
 http://www.stealth.ly
 Twitter: @allthingshadoop <http://www.twitter.com/allthingshadoop>
********************************************/


On Mon, Aug 25, 2014 at 8:59 PM, Marcin Michalski <mm...@tagged.com>
wrote:

> Hi, I would like to migrate my Kafka setup from old servers to new servers.
> Let say I have 8 really old servers that have the kafka topics/partitions
> replicated 4 ways and want to migrate the data to 4 brand new servers and
> want the replication factor be 3. I wonder if anyone has ever performed
> this type of migration?
>
> Will auto rebalancing take care of this automatically if I do the
> following?
>
> Let say I bring down old broker id 1 down and startup new server broker id
> 100 up, is there a way to migrate all of the data of the topic that had the
> topic (where borker id 1 was the leader) over to the new broker 100?
>
> Or do I need to use *bin/kafka-preferred-replica-election.sh *to reassign
> the topics/partitions from old broker 1 to broker 100? And then just keep
> doing the same thing until all of the old brokers are decommissioned?
>
> Also, would kafka-preferred-replica-election.sh let me actually lower the
> number of replicas as well, if I just simply make sure that given
> topic/partition was only elected 3 times versus 4?
>
> Thanks for your insight,
> Marcin
>