You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@kafka.apache.org by Joe Stein <jo...@stealth.ly> on 2015/01/22 07:11:13 UTC

[DISCUSS] KIP-6 - New reassignment partition logic for re-balancing

Posted a KIP for --re-balance for partition assignment in reassignment tool.

https://cwiki.apache.org/confluence/display/KAFKA/KIP-6+-+New+reassignment+partition+logic+for+re-balancing

JIRA https://issues.apache.org/jira/browse/KAFKA-1792

While going through the KIP I thought of one thing from the JIRA that we
should change. We should preserve --generate to be existing functionality
for the next release it is in. If folks want to use --re-balance then
great, it just won't break any upgrade paths, yet.

/*******************************************
 Joe Stein
 Founder, Principal Consultant
 Big Data Open Source Security LLC
 http://www.stealth.ly
 Twitter: @allthingshadoop <http://www.twitter.com/allthingshadoop>
********************************************/

Re: [DISCUSS] KIP-6 - New reassignment partition logic for re-balancing

Posted by Jun Rao <ju...@confluent.io>.

Hi, Joe,

A couple of comments.

1. When creating a new topic, our replica assignment algorithm tries to
achieve a few things: (a) all replicas are spread evenly across brokers;
(b) the preferred replica (first replica in the assigned replica list) of
all partitions are spread evenly across brokers; (c) the non-preferred
replicas are spread out in such a way that if we lose a broker, the load on
the failed broker is spread evenly among the remaining brokers.

For example, if you look at the following replica assignment on brokers b1,
b2, and b3 (with replication factor 2). Broker b1 will be the leader for
partition p0 and p3. Broker b2 will be the leader for partition p1 and p4.
Broker b3 will be the leader for partition p2 and p5. If b1 is gone, b2
will take over as the leader for p0 and b3 will take over as the leader for
p3. This strategy makes sure that the load is even in the normal case as
well as the failure case.

b1 b2 b3
p0 p1 p2
p2 p0 p1
p3 p4 p5
p4 p5 p3

The current reassignment strategy actually maintains properties (a), (b)
and (c) after the reassignment completes.

The new algorithm takes the last few replicas from an overloaded broker and
moves them to an underloaded broker. It does reduce the data movement
compared with the current algorithm. It also maintains property (a).
However, it doesn't seem to explicitly maintain properties (b) and (c).
Data movement is a one-time cost. Maintaining balance after the data
movement has long term benefit. So, it will be useful to try to maintain
these properties even perhaps at the expense of a bit more data movement.

Also, I think the new algorithm needs to make sure that we don't move the
same replica to a new broker more than once.

2. I am not sure that we need to add a new --rebalance option. All we are
changing is the assignment strategy. If that's a better strategy than
before, there is no reason for anyone to use the old strategy. So, the new
strategy should just be used in the --generate mode.

Thanks,

Jun




On Wed, Mar 11, 2015 at 12:12 PM, Joe Stein <jo...@stealth.ly> wrote:

> Sorry for not catching up on this thread earlier, I wanted to-do this
> before the KIP got its updates so we could discuss if need be and not waste
> more time re-writing/working things that folks have issues with or such. I
> captured all the comments so far here with responses.
>
> << So fair assignment by count (taking into account the current partition
> count of each broker) is very good. However, it's worth noting that all
> partitions are not created equal. We have actually been performing more
> rebalance work based on the partition size on disk, as given equal
> retention of all topics, the size on disk is a better indicator of the
> amount of traffic a partition gets, both in terms of storage and network
> traffic. Overall, this seems to be a better balance.
>
> Agreed though this is out of scope (imho) for what the motivations for the
> KIP were. The motivations section is blank (that is on me) but honestly it
> is because we did all the development, went back and forth with Neha on the
> testing and then had to back it all into the KIP process... Its a
> time/resource/scheduling and hope to update this soon on the KIP ... all of
> this is in the JIRA and code patch so its not like it is not there just not
> in the place maybe were folks are looking since we changed where folks
> should look.
>
> Initial cut at "Motivations": the --generate is not used by a lot of folks
> because they don't trust it. Issues such as giving different results
> sometimes when you run it. Also other feedback from the community that it
> does not account for specific uses cases like "adding new brokers" and
> "removing brokers" (which is where that patch started
> https://issues.apache.org/jira/browse/KAFKA-1678 but then we changed it
> after review into just --rebalance
> https://issues.apache.org/jira/browse/KAFKA-1792). The use case for add
> and
> remove brokers is one that happens in AWS and auto scailing. There are
> other reasons for this too of course.  The goal originally was to make what
> folks are already coding today (with the output of " available in the
> project for the community. Based on the discussion in the JIRA with Neha we
> all agreed that making it be a faire rebalance would fulfill both uses
> cases.
>
> << In addition to this, I think there is very much a need to have Kafka be
> rack-aware. That is, to be able to assure that for a given cluster, you
> never assign all replicas for a given partition in the same rack. This
> would allow us to guard against maintenances or power failures that affect
> a full rack of systems (or a given switch).
>
> Agreed, this though I think is out of scope for this change and something
> we can also do in the future. There is more that we have to figure out for
> rack aware specifically answering "how do we know what rack the broker is
> on". I really really (really) worry that we keep trying to put too much
> into a single change the discussions go into rabbit holes and good
> important features (that are community driven) that could get out there
> will get bogged down with different uses cases and scope creep. So, I think
> rack awareness is its own KIP that has two parts... setting broker rack and
> rebalancing for that. That features doesn't invalidate the need for
> --rebalance but can be built on top of it.
>
> << I think it would make sense to implement the reassignment logic as a
> pluggable component. That way it would be easy to select a scheme when
> performing a reassignment (count, size, rack aware). Configuring a default
> scheme for a cluster would allow for the brokers to create new topics and
> partitions in compliance with the requested policy.
>
> I don't agree with this because right now you get back "the current state
> of the partitions" so you can (today) write whatever logic you want (with
> the information that is there). With --rebalance you also get that back so
> moving forward. Moving forward we can maybe expose more information so that
> folks can write different logic they want
> (like partition number, location (label string for rack), size, throughput
> average, etc, etc, etc... but again... that to me is a different
> KIP entirely from the motivations of this one. If eventually we want to
> make it plugable then we should have a KIP and discussion around it I just
> don't see how it relates to the original motivations of the change.
>
> << Is it possible to describe the proposed partition reassignment algorithm
> in more detail on the KIP? In fact, it would be really easy to understand
> if we had some concrete examples comparing partition assignment with the
> old algorithm and the new.
>
> sure, it is in the JIRA linked to the KIP too though
> https://issues.apache.org/jira/browse/KAFKA-1792 and documented in
> comments
> in the patch also as requested. Let me know if this is what you are looking
> for and we can simply update the KIP with this information or give more
> detail specifically what you think might be missing please.
>
> << Would we want to
> support some kind of automated reassignment of existing partitions
> (personally - no. I want to trigger that manually because it is a very disk
> and network intensive process)?
>
> You can automate the reassignment with a line of code that takes the
> response and calls --execute if folks want that... I don't think we should
> ever link these (or at least not yet) because of the reasons you say. I
> think as long as we have a way
>
> ********
>
> If there is anything else I missed please let me know so I can make sure
> that the detail gets update so we minimize the back and forth both in
> efforts and elapsed time. This was always supposed to be a very small fix
> for something that pains A LOT of people and I want to make sure that we
> aren't running scope creep on the change but are making sure that folks
> understand the motivation behind a new feature.
>
> ~ Joe Stein
> - - - - - - - - - - - - - - - - -
>
>   http://www.stealth.ly
> - - - - - - - - - - - - - - - - -
>
> On Sun, Mar 8, 2015 at 1:21 PM, Joe Stein <jo...@stealth.ly> wrote:
>
> > Jay,
> >
> > That makes sense.  I think what folks are bringing up all sounds great
> but
> > I feel can/should be done afterwards as further improvements as the scope
> > for this change has a very specific focus to resolve problems folks have
> > today with --generate (with a patch tested and ready to go ). I should be
> > able to update the KIP this week and followup.
> >
> > ~ Joestein
> > On Mar 8, 2015 12:54 PM, "Jay Kreps" <ja...@gmail.com> wrote:
> >
> >> Hey Joe,
> >>
> >> This still seems pretty incomplete. It still has most the sections just
> >> containing the default text you are supposed to replace. It is really
> hard
> >> to understand what is being proposed and why and how much of the problem
> >> we
> >> are addressing. For example the motivation section just says
> >> "operational".
> >>
> >> I'd really like us to do a good job of this. I actually think putting
> the
> >> time in to convey context really matters. For example I think (but can't
> >> really know) that what you are proposing is just a simple fix to the
> JSON
> >> output of the command line tool. But you can see that on the thread it
> is
> >> quickly going to spiral into automatic balancing, rack awareness, data
> >> movement throttling, etc.
> >>
> >> Just by giving people a fairly clear description of the change and how
> it
> >> fits into other efforts that could happen in the area really helps keep
> >> things focused on what you want.
> >>
> >> -Jay
> >>
> >>
> >> On Wed, Jan 21, 2015 at 10:11 PM, Joe Stein <jo...@stealth.ly>
> wrote:
> >>
> >> > Posted a KIP for --re-balance for partition assignment in reassignment
> >> > tool.
> >> >
> >> >
> >> >
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-6+-+New+reassignment+partition+logic+for+re-balancing
> >> >
> >> > JIRA https://issues.apache.org/jira/browse/KAFKA-1792
> >> >
> >> > While going through the KIP I thought of one thing from the JIRA that
> we
> >> > should change. We should preserve --generate to be existing
> >> functionality
> >> > for the next release it is in. If folks want to use --re-balance then
> >> > great, it just won't break any upgrade paths, yet.
> >> >
> >> > /*******************************************
> >> >  Joe Stein
> >> >  Founder, Principal Consultant
> >> >  Big Data Open Source Security LLC
> >> >  http://www.stealth.ly
> >> >  Twitter: @allthingshadoop <http://www.twitter.com/allthingshadoop>
> >> > ********************************************/
> >> >
> >>
> >
>

Re: [DISCUSS] KIP-6 - New reassignment partition logic for re-balancing

Posted by Todd Palino <tp...@gmail.com>.

I understand the desire to not bloat this one change with too much more
work, and it's a good change to start with. That said, I have one note on
your comments:

"I don't agree with this because right now you get back "the current state of
the partitions" so you can (today) write whatever logic you want (with the
information that is there)." (with regards to pluggable schemes)

I think this is a really bad place to be. While we're in agreement that
reshuffling the cluster from one scheme to another should not be automated,
the creation and placement of new topics and partitions must be, and you
can't rely on an external process to handle that for you. That leaves gaps
in what is getting done, and a large failure point. Where to put a
partition has to be a decision that the controller makes correctly (where
correctly is defined as "the way I want the cluster balanced") upon
creation, and not something that we come in and fix after the fact.

We're in agreement that this should be a new KIP, and that the sourcing and
handling of metadata for something like rack-awareness is non-trivial and
is going to require a lot of discussion. Plus, we're going to not only want
to be rack-aware but also balanced by partition size and/or count. That's
going to be somewhat tricky to get right.

-Todd


On Wed, Mar 11, 2015 at 12:12 PM, Joe Stein <jo...@stealth.ly> wrote:

> Sorry for not catching up on this thread earlier, I wanted to-do this
> before the KIP got its updates so we could discuss if need be and not waste
> more time re-writing/working things that folks have issues with or such. I
> captured all the comments so far here with responses.
>
> << So fair assignment by count (taking into account the current partition
> count of each broker) is very good. However, it's worth noting that all
> partitions are not created equal. We have actually been performing more
> rebalance work based on the partition size on disk, as given equal
> retention of all topics, the size on disk is a better indicator of the
> amount of traffic a partition gets, both in terms of storage and network
> traffic. Overall, this seems to be a better balance.
>
> Agreed though this is out of scope (imho) for what the motivations for the
> KIP were. The motivations section is blank (that is on me) but honestly it
> is because we did all the development, went back and forth with Neha on the
> testing and then had to back it all into the KIP process... Its a
> time/resource/scheduling and hope to update this soon on the KIP ... all of
> this is in the JIRA and code patch so its not like it is not there just not
> in the place maybe were folks are looking since we changed where folks
> should look.
>
> Initial cut at "Motivations": the --generate is not used by a lot of folks
> because they don't trust it. Issues such as giving different results
> sometimes when you run it. Also other feedback from the community that it
> does not account for specific uses cases like "adding new brokers" and
> "removing brokers" (which is where that patch started
> https://issues.apache.org/jira/browse/KAFKA-1678 but then we changed it
> after review into just --rebalance
> https://issues.apache.org/jira/browse/KAFKA-1792). The use case for add
> and
> remove brokers is one that happens in AWS and auto scailing. There are
> other reasons for this too of course.  The goal originally was to make what
> folks are already coding today (with the output of " available in the
> project for the community. Based on the discussion in the JIRA with Neha we
> all agreed that making it be a faire rebalance would fulfill both uses
> cases.
>
> << In addition to this, I think there is very much a need to have Kafka be
> rack-aware. That is, to be able to assure that for a given cluster, you
> never assign all replicas for a given partition in the same rack. This
> would allow us to guard against maintenances or power failures that affect
> a full rack of systems (or a given switch).
>
> Agreed, this though I think is out of scope for this change and something
> we can also do in the future. There is more that we have to figure out for
> rack aware specifically answering "how do we know what rack the broker is
> on". I really really (really) worry that we keep trying to put too much
> into a single change the discussions go into rabbit holes and good
> important features (that are community driven) that could get out there
> will get bogged down with different uses cases and scope creep. So, I think
> rack awareness is its own KIP that has two parts... setting broker rack and
> rebalancing for that. That features doesn't invalidate the need for
> --rebalance but can be built on top of it.
>
> << I think it would make sense to implement the reassignment logic as a
> pluggable component. That way it would be easy to select a scheme when
> performing a reassignment (count, size, rack aware). Configuring a default
> scheme for a cluster would allow for the brokers to create new topics and
> partitions in compliance with the requested policy.
>
> I don't agree with this because right now you get back "the current state
> of the partitions" so you can (today) write whatever logic you want (with
> the information that is there). With --rebalance you also get that back so
> moving forward. Moving forward we can maybe expose more information so that
> folks can write different logic they want
> (like partition number, location (label string for rack), size, throughput
> average, etc, etc, etc... but again... that to me is a different
> KIP entirely from the motivations of this one. If eventually we want to
> make it plugable then we should have a KIP and discussion around it I just
> don't see how it relates to the original motivations of the change.
>
> << Is it possible to describe the proposed partition reassignment algorithm
> in more detail on the KIP? In fact, it would be really easy to understand
> if we had some concrete examples comparing partition assignment with the
> old algorithm and the new.
>
> sure, it is in the JIRA linked to the KIP too though
> https://issues.apache.org/jira/browse/KAFKA-1792 and documented in
> comments
> in the patch also as requested. Let me know if this is what you are looking
> for and we can simply update the KIP with this information or give more
> detail specifically what you think might be missing please.
>
> << Would we want to
> support some kind of automated reassignment of existing partitions
> (personally - no. I want to trigger that manually because it is a very disk
> and network intensive process)?
>
> You can automate the reassignment with a line of code that takes the
> response and calls --execute if folks want that... I don't think we should
> ever link these (or at least not yet) because of the reasons you say. I
> think as long as we have a way
>
> ********
>
> If there is anything else I missed please let me know so I can make sure
> that the detail gets update so we minimize the back and forth both in
> efforts and elapsed time. This was always supposed to be a very small fix
> for something that pains A LOT of people and I want to make sure that we
> aren't running scope creep on the change but are making sure that folks
> understand the motivation behind a new feature.
>
> ~ Joe Stein
> - - - - - - - - - - - - - - - - -
>
>   http://www.stealth.ly
> - - - - - - - - - - - - - - - - -
>
> On Sun, Mar 8, 2015 at 1:21 PM, Joe Stein <jo...@stealth.ly> wrote:
>
> > Jay,
> >
> > That makes sense.  I think what folks are bringing up all sounds great
> but
> > I feel can/should be done afterwards as further improvements as the scope
> > for this change has a very specific focus to resolve problems folks have
> > today with --generate (with a patch tested and ready to go ). I should be
> > able to update the KIP this week and followup.
> >
> > ~ Joestein
> > On Mar 8, 2015 12:54 PM, "Jay Kreps" <ja...@gmail.com> wrote:
> >
> >> Hey Joe,
> >>
> >> This still seems pretty incomplete. It still has most the sections just
> >> containing the default text you are supposed to replace. It is really
> hard
> >> to understand what is being proposed and why and how much of the problem
> >> we
> >> are addressing. For example the motivation section just says
> >> "operational".
> >>
> >> I'd really like us to do a good job of this. I actually think putting
> the
> >> time in to convey context really matters. For example I think (but can't
> >> really know) that what you are proposing is just a simple fix to the
> JSON
> >> output of the command line tool. But you can see that on the thread it
> is
> >> quickly going to spiral into automatic balancing, rack awareness, data
> >> movement throttling, etc.
> >>
> >> Just by giving people a fairly clear description of the change and how
> it
> >> fits into other efforts that could happen in the area really helps keep
> >> things focused on what you want.
> >>
> >> -Jay
> >>
> >>
> >> On Wed, Jan 21, 2015 at 10:11 PM, Joe Stein <jo...@stealth.ly>
> wrote:
> >>
> >> > Posted a KIP for --re-balance for partition assignment in reassignment
> >> > tool.
> >> >
> >> >
> >> >
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-6+-+New+reassignment+partition+logic+for+re-balancing
> >> >
> >> > JIRA https://issues.apache.org/jira/browse/KAFKA-1792
> >> >
> >> > While going through the KIP I thought of one thing from the JIRA that
> we
> >> > should change. We should preserve --generate to be existing
> >> functionality
> >> > for the next release it is in. If folks want to use --re-balance then
> >> > great, it just won't break any upgrade paths, yet.
> >> >
> >> > /*******************************************
> >> >  Joe Stein
> >> >  Founder, Principal Consultant
> >> >  Big Data Open Source Security LLC
> >> >  http://www.stealth.ly
> >> >  Twitter: @allthingshadoop <http://www.twitter.com/allthingshadoop>
> >> > ********************************************/
> >> >
> >>
> >
>

Re: [DISCUSS] KIP-6 - New reassignment partition logic for re-balancing

Posted by Joe Stein <jo...@stealth.ly>.

Sorry for not catching up on this thread earlier, I wanted to-do this
before the KIP got its updates so we could discuss if need be and not waste
more time re-writing/working things that folks have issues with or such. I
captured all the comments so far here with responses.

<< So fair assignment by count (taking into account the current partition
count of each broker) is very good. However, it's worth noting that all
partitions are not created equal. We have actually been performing more
rebalance work based on the partition size on disk, as given equal
retention of all topics, the size on disk is a better indicator of the
amount of traffic a partition gets, both in terms of storage and network
traffic. Overall, this seems to be a better balance.

Agreed though this is out of scope (imho) for what the motivations for the
KIP were. The motivations section is blank (that is on me) but honestly it
is because we did all the development, went back and forth with Neha on the
testing and then had to back it all into the KIP process... Its a
time/resource/scheduling and hope to update this soon on the KIP ... all of
this is in the JIRA and code patch so its not like it is not there just not
in the place maybe were folks are looking since we changed where folks
should look.

Initial cut at "Motivations": the --generate is not used by a lot of folks
because they don't trust it. Issues such as giving different results
sometimes when you run it. Also other feedback from the community that it
does not account for specific uses cases like "adding new brokers" and
"removing brokers" (which is where that patch started
https://issues.apache.org/jira/browse/KAFKA-1678 but then we changed it
after review into just --rebalance
https://issues.apache.org/jira/browse/KAFKA-1792). The use case for add and
remove brokers is one that happens in AWS and auto scailing. There are
other reasons for this too of course.  The goal originally was to make what
folks are already coding today (with the output of " available in the
project for the community. Based on the discussion in the JIRA with Neha we
all agreed that making it be a faire rebalance would fulfill both uses
cases.

<< In addition to this, I think there is very much a need to have Kafka be
rack-aware. That is, to be able to assure that for a given cluster, you
never assign all replicas for a given partition in the same rack. This
would allow us to guard against maintenances or power failures that affect
a full rack of systems (or a given switch).

Agreed, this though I think is out of scope for this change and something
we can also do in the future. There is more that we have to figure out for
rack aware specifically answering "how do we know what rack the broker is
on". I really really (really) worry that we keep trying to put too much
into a single change the discussions go into rabbit holes and good
important features (that are community driven) that could get out there
will get bogged down with different uses cases and scope creep. So, I think
rack awareness is its own KIP that has two parts... setting broker rack and
rebalancing for that. That features doesn't invalidate the need for
--rebalance but can be built on top of it.

<< I think it would make sense to implement the reassignment logic as a
pluggable component. That way it would be easy to select a scheme when
performing a reassignment (count, size, rack aware). Configuring a default
scheme for a cluster would allow for the brokers to create new topics and
partitions in compliance with the requested policy.

I don't agree with this because right now you get back "the current state
of the partitions" so you can (today) write whatever logic you want (with
the information that is there). With --rebalance you also get that back so
moving forward. Moving forward we can maybe expose more information so that
folks can write different logic they want
(like partition number, location (label string for rack), size, throughput
average, etc, etc, etc... but again... that to me is a different
KIP entirely from the motivations of this one. If eventually we want to
make it plugable then we should have a KIP and discussion around it I just
don't see how it relates to the original motivations of the change.

<< Is it possible to describe the proposed partition reassignment algorithm
in more detail on the KIP? In fact, it would be really easy to understand
if we had some concrete examples comparing partition assignment with the
old algorithm and the new.

sure, it is in the JIRA linked to the KIP too though
https://issues.apache.org/jira/browse/KAFKA-1792 and documented in comments
in the patch also as requested. Let me know if this is what you are looking
for and we can simply update the KIP with this information or give more
detail specifically what you think might be missing please.

<< Would we want to
support some kind of automated reassignment of existing partitions
(personally - no. I want to trigger that manually because it is a very disk
and network intensive process)?

You can automate the reassignment with a line of code that takes the
response and calls --execute if folks want that... I don't think we should
ever link these (or at least not yet) because of the reasons you say. I
think as long as we have a way

********

If there is anything else I missed please let me know so I can make sure
that the detail gets update so we minimize the back and forth both in
efforts and elapsed time. This was always supposed to be a very small fix
for something that pains A LOT of people and I want to make sure that we
aren't running scope creep on the change but are making sure that folks
understand the motivation behind a new feature.

~ Joe Stein
- - - - - - - - - - - - - - - - -

  http://www.stealth.ly
- - - - - - - - - - - - - - - - -

On Sun, Mar 8, 2015 at 1:21 PM, Joe Stein <jo...@stealth.ly> wrote:

> Jay,
>
> That makes sense.  I think what folks are bringing up all sounds great but
> I feel can/should be done afterwards as further improvements as the scope
> for this change has a very specific focus to resolve problems folks have
> today with --generate (with a patch tested and ready to go ). I should be
> able to update the KIP this week and followup.
>
> ~ Joestein
> On Mar 8, 2015 12:54 PM, "Jay Kreps" <ja...@gmail.com> wrote:
>
>> Hey Joe,
>>
>> This still seems pretty incomplete. It still has most the sections just
>> containing the default text you are supposed to replace. It is really hard
>> to understand what is being proposed and why and how much of the problem
>> we
>> are addressing. For example the motivation section just says
>> "operational".
>>
>> I'd really like us to do a good job of this. I actually think putting the
>> time in to convey context really matters. For example I think (but can't
>> really know) that what you are proposing is just a simple fix to the JSON
>> output of the command line tool. But you can see that on the thread it is
>> quickly going to spiral into automatic balancing, rack awareness, data
>> movement throttling, etc.
>>
>> Just by giving people a fairly clear description of the change and how it
>> fits into other efforts that could happen in the area really helps keep
>> things focused on what you want.
>>
>> -Jay
>>
>>
>> On Wed, Jan 21, 2015 at 10:11 PM, Joe Stein <jo...@stealth.ly> wrote:
>>
>> > Posted a KIP for --re-balance for partition assignment in reassignment
>> > tool.
>> >
>> >
>> >
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-6+-+New+reassignment+partition+logic+for+re-balancing
>> >
>> > JIRA https://issues.apache.org/jira/browse/KAFKA-1792
>> >
>> > While going through the KIP I thought of one thing from the JIRA that we
>> > should change. We should preserve --generate to be existing
>> functionality
>> > for the next release it is in. If folks want to use --re-balance then
>> > great, it just won't break any upgrade paths, yet.
>> >
>> > /*******************************************
>> >  Joe Stein
>> >  Founder, Principal Consultant
>> >  Big Data Open Source Security LLC
>> >  http://www.stealth.ly
>> >  Twitter: @allthingshadoop <http://www.twitter.com/allthingshadoop>
>> > ********************************************/
>> >
>>
>

Re: [DISCUSS] KIP-6 - New reassignment partition logic for re-balancing

Posted by Joe Stein <jo...@stealth.ly>.

Jay,

That makes sense.  I think what folks are bringing up all sounds great but
I feel can/should be done afterwards as further improvements as the scope
for this change has a very specific focus to resolve problems folks have
today with --generate (with a patch tested and ready to go ). I should be
able to update the KIP this week and followup.

~ Joestein
On Mar 8, 2015 12:54 PM, "Jay Kreps" <ja...@gmail.com> wrote:

> Hey Joe,
>
> This still seems pretty incomplete. It still has most the sections just
> containing the default text you are supposed to replace. It is really hard
> to understand what is being proposed and why and how much of the problem we
> are addressing. For example the motivation section just says "operational".
>
> I'd really like us to do a good job of this. I actually think putting the
> time in to convey context really matters. For example I think (but can't
> really know) that what you are proposing is just a simple fix to the JSON
> output of the command line tool. But you can see that on the thread it is
> quickly going to spiral into automatic balancing, rack awareness, data
> movement throttling, etc.
>
> Just by giving people a fairly clear description of the change and how it
> fits into other efforts that could happen in the area really helps keep
> things focused on what you want.
>
> -Jay
>
>
> On Wed, Jan 21, 2015 at 10:11 PM, Joe Stein <jo...@stealth.ly> wrote:
>
> > Posted a KIP for --re-balance for partition assignment in reassignment
> > tool.
> >
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-6+-+New+reassignment+partition+logic+for+re-balancing
> >
> > JIRA https://issues.apache.org/jira/browse/KAFKA-1792
> >
> > While going through the KIP I thought of one thing from the JIRA that we
> > should change. We should preserve --generate to be existing functionality
> > for the next release it is in. If folks want to use --re-balance then
> > great, it just won't break any upgrade paths, yet.
> >
> > /*******************************************
> >  Joe Stein
> >  Founder, Principal Consultant
> >  Big Data Open Source Security LLC
> >  http://www.stealth.ly
> >  Twitter: @allthingshadoop <http://www.twitter.com/allthingshadoop>
> > ********************************************/
> >
>

Re: [DISCUSS] KIP-6 - New reassignment partition logic for re-balancing

Posted by Jay Kreps <ja...@gmail.com>.

Hey Joe,

This still seems pretty incomplete. It still has most the sections just
containing the default text you are supposed to replace. It is really hard
to understand what is being proposed and why and how much of the problem we
are addressing. For example the motivation section just says "operational".

I'd really like us to do a good job of this. I actually think putting the
time in to convey context really matters. For example I think (but can't
really know) that what you are proposing is just a simple fix to the JSON
output of the command line tool. But you can see that on the thread it is
quickly going to spiral into automatic balancing, rack awareness, data
movement throttling, etc.

Just by giving people a fairly clear description of the change and how it
fits into other efforts that could happen in the area really helps keep
things focused on what you want.

-Jay

On Wed, Jan 21, 2015 at 10:11 PM, Joe Stein <jo...@stealth.ly> wrote:

> Posted a KIP for --re-balance for partition assignment in reassignment
> tool.
>
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-6+-+New+reassignment+partition+logic+for+re-balancing
>
> JIRA https://issues.apache.org/jira/browse/KAFKA-1792
>
> While going through the KIP I thought of one thing from the JIRA that we
> should change. We should preserve --generate to be existing functionality
> for the next release it is in. If folks want to use --re-balance then
> great, it just won't break any upgrade paths, yet.
>
> /*******************************************
>  Joe Stein
>  Founder, Principal Consultant
>  Big Data Open Source Security LLC
>  http://www.stealth.ly
>  Twitter: @allthingshadoop <http://www.twitter.com/allthingshadoop>
> ********************************************/
>

RE: [DISCUSS] KIP-6 - New reassignment partition logic for re-balancing

Posted by Aditya Auradkar <aa...@linkedin.com.INVALID>.

Thanks for the writeup and RB Joe/Dmitry.

Is it possible to describe the proposed partition reassignment algorithm in more detail on the KIP? In fact, it would be really easy to understand if we had some concrete examples comparing partition assignment with the old algorithm and the new.

Aditya
________________________________________
From: Tong Li [litong01@us.ibm.com]
Sent: Wednesday, March 04, 2015 7:33 PM
To: dev@kafka.apache.org
Subject: Re: [DISCUSS] KIP-6 - New reassignment partition logic for re-balancing

Todd,
    I think plugable design is good with solid default. The only issue I
feel is when you use one and switch to another, will we end up with some
unread messages hanging around and no one thinks or knows it is their
responsibility to take care of them?

Thanks.

Tong

Sent from my iPhone

> On Mar 5, 2015, at 10:46 AM, Todd Palino <tp...@gmail.com> wrote:
>
> Apologize for the late comment on this...
>
> So fair assignment by count (taking into account the current partition
> count of each broker) is very good. However, it's worth noting that all
> partitions are not created equal. We have actually been performing more
> rebalance work based on the partition size on disk, as given equal
> retention of all topics, the size on disk is a better indicator of the
> amount of traffic a partition gets, both in terms of storage and network
> traffic. Overall, this seems to be a better balance.
>
> In addition to this, I think there is very much a need to have Kafka be
> rack-aware. That is, to be able to assure that for a given cluster, you
> never assign all replicas for a given partition in the same rack. This
> would allow us to guard against maintenances or power failures that
affect
> a full rack of systems (or a given switch).
>
> I think it would make sense to implement the reassignment logic as a
> pluggable component. That way it would be easy to select a scheme when
> performing a reassignment (count, size, rack aware). Configuring a
default
> scheme for a cluster would allow for the brokers to create new topics and
> partitions in compliance with the requested policy.
>
> -Todd
>
>
> On Thu, Jan 22, 2015 at 10:13 PM, Joe Stein <jo...@stealth.ly> wrote:
>
> > I will go back through the ticket and code and write more up. Should be
> > able to-do that sometime next week. The intention was to not replace
> > existing functionality by issue a WARN on use. The following version it
is
> > released we could then deprecate it... I will fix the KIP for that too.
> >
> > On Fri, Jan 23, 2015 at 12:34 AM, Neha Narkhede <ne...@confluent.io>
wrote:
> >
> > > Hey Joe,
> > >
> > > 1. Could you add details to the Public Interface section of the KIP?
This
> > > should include the proposed changes to the partition reassignment
tool.
> > > Also, maybe the new option can be named --rebalance instead of
> > > --re-balance?
> > > 2. It makes sense to list --decommission-broker as part of this KIP.
> > > Similarly, shouldn't we also have an --add-broker option? The way I
see
> > > this is that there are several events when a partition reassignment
is
> > > required. Before this functionality is automated on the broker, the
tool
> > > will generate an ideal replica placement for each such event. The
users
> > > should merely have to specify the nature of the event e.g. adding a
> > broker
> > > or decommissioning an existing broker or merely rebalancing.
> > > 3. If I understand the KIP correctly, the upgrade plan for this
feature
> > > includes removing the existing --generate option on the partition
> > > reassignment tool in 0.8.3 while adding all the new options in the
same
> > > release. Is that correct?
> > >
> > > Thanks,
> > > Neha
> > >
> > > On Thu, Jan 22, 2015 at 9:23 PM, Jay Kreps <ja...@gmail.com>
wrote:
> > >
> > > > Ditto on this one. Can you give the algorithm we want to implement?
> > > >
> > > > Also I think in terms of scope this is just proposing to change the
> > logic
> > > > in ReassignPartitionsCommand? I think we've had the discussion
various
> > > > times on the mailing list that what people really want is just for
> > Kafka
> > > to
> > > > do it's best to balance data in an online fashion (for some
definition
> > of
> > > > balance). i.e. if you add a new node partitions would slowly
migrate to
> > > it,
> > > > and if a node dies, partitions slowly migrate off it. This could
> > > > potentially be more work, but I'm not sure how much more. Has
anyone
> > > > thought about how to do it?
> > > >
> > > > -Jay
> > > >
> > > > On Wed, Jan 21, 2015 at 10:11 PM, Joe Stein <jo...@stealth.ly>
> > > wrote:
> > > >
> > > > > Posted a KIP for --re-balance for partition assignment in
> > reassignment
> > > > > tool.
> > > > >
> > > > >
> > > > >
> > > >
> > >
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-6+-+New
+reassignment+partition+logic+for+re-balancing
> > > > >
> > > > > JIRA https://issues.apache.org/jira/browse/KAFKA-1792
> > > > >
> > > > > While going through the KIP I thought of one thing from the JIRA
that
> > > we
> > > > > should change. We should preserve --generate to be existing
> > > functionality
> > > > > for the next release it is in. If folks want to use --re-balance
then
> > > > > great, it just won't break any upgrade paths, yet.
> > > > >
> > > > > /*******************************************
> > > > >  Joe Stein
> > > > >  Founder, Principal Consultant
> > > > >  Big Data Open Source Security LLC
> > > > >  http://www.stealth.ly
> > > > >  Twitter: @allthingshadoop
<http://www.twitter.com/allthingshadoop>
> > > > > ********************************************/
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > Thanks,
> > > Neha
> > >
> >

Re: [DISCUSS] KIP-6 - New reassignment partition logic for re-balancing

Posted by Guozhang Wang <wa...@gmail.com>.

I am +1 Todd's suggestion, the default reassignment scheme is only used
when a reassignment command is issued with no scheme specified, and
changing this default scheme should not automatically trigger a
reassignment of all existing topics: it will only take effect when the next
reassignment command with no specific scheme is issued.

On Thu, Mar 5, 2015 at 10:16 AM, Todd Palino <tp...@gmail.com> wrote:

> I would not think that partitions moving would cause any orphaned messages
> like that. I would be more concerned about what happens when you change the
> default on a running cluster from one scheme to another. Would we want to
> support some kind of automated reassignment of existing partitions
> (personally - no. I want to trigger that manually because it is a very disk
> and network intensive process)?
>
> -Todd
>
> On Wed, Mar 4, 2015 at 7:33 PM, Tong Li <li...@us.ibm.com> wrote:
>
> >
> >
> > Todd,
> >     I think plugable design is good with solid default. The only issue I
> > feel is when you use one and switch to another, will we end up with some
> > unread messages hanging around and no one thinks or knows it is their
> > responsibility to take care of them?
> >
> > Thanks.
> >
> > Tong
> >
> > Sent from my iPhone
> >
> > > On Mar 5, 2015, at 10:46 AM, Todd Palino <tp...@gmail.com> wrote:
> > >
> > > Apologize for the late comment on this...
> > >
> > > So fair assignment by count (taking into account the current partition
> > > count of each broker) is very good. However, it's worth noting that all
> > > partitions are not created equal. We have actually been performing more
> > > rebalance work based on the partition size on disk, as given equal
> > > retention of all topics, the size on disk is a better indicator of the
> > > amount of traffic a partition gets, both in terms of storage and
> network
> > > traffic. Overall, this seems to be a better balance.
> > >
> > > In addition to this, I think there is very much a need to have Kafka be
> > > rack-aware. That is, to be able to assure that for a given cluster, you
> > > never assign all replicas for a given partition in the same rack. This
> > > would allow us to guard against maintenances or power failures that
> > affect
> > > a full rack of systems (or a given switch).
> > >
> > > I think it would make sense to implement the reassignment logic as a
> > > pluggable component. That way it would be easy to select a scheme when
> > > performing a reassignment (count, size, rack aware). Configuring a
> > default
> > > scheme for a cluster would allow for the brokers to create new topics
> and
> > > partitions in compliance with the requested policy.
> > >
> > > -Todd
> > >
> > >
> > > On Thu, Jan 22, 2015 at 10:13 PM, Joe Stein <jo...@stealth.ly>
> > wrote:
> > >
> > > > I will go back through the ticket and code and write more up. Should
> be
> > > > able to-do that sometime next week. The intention was to not replace
> > > > existing functionality by issue a WARN on use. The following version
> it
> > is
> > > > released we could then deprecate it... I will fix the KIP for that
> too.
> > > >
> > > > On Fri, Jan 23, 2015 at 12:34 AM, Neha Narkhede <ne...@confluent.io>
> > wrote:
> > > >
> > > > > Hey Joe,
> > > > >
> > > > > 1. Could you add details to the Public Interface section of the
> KIP?
> > This
> > > > > should include the proposed changes to the partition reassignment
> > tool.
> > > > > Also, maybe the new option can be named --rebalance instead of
> > > > > --re-balance?
> > > > > 2. It makes sense to list --decommission-broker as part of this
> KIP.
> > > > > Similarly, shouldn't we also have an --add-broker option? The way I
> > see
> > > > > this is that there are several events when a partition reassignment
> > is
> > > > > required. Before this functionality is automated on the broker, the
> > tool
> > > > > will generate an ideal replica placement for each such event. The
> > users
> > > > > should merely have to specify the nature of the event e.g. adding a
> > > > broker
> > > > > or decommissioning an existing broker or merely rebalancing.
> > > > > 3. If I understand the KIP correctly, the upgrade plan for this
> > feature
> > > > > includes removing the existing --generate option on the partition
> > > > > reassignment tool in 0.8.3 while adding all the new options in the
> > same
> > > > > release. Is that correct?
> > > > >
> > > > > Thanks,
> > > > > Neha
> > > > >
> > > > > On Thu, Jan 22, 2015 at 9:23 PM, Jay Kreps <ja...@gmail.com>
> > wrote:
> > > > >
> > > > > > Ditto on this one. Can you give the algorithm we want to
> implement?
> > > > > >
> > > > > > Also I think in terms of scope this is just proposing to change
> the
> > > > logic
> > > > > > in ReassignPartitionsCommand? I think we've had the discussion
> > various
> > > > > > times on the mailing list that what people really want is just
> for
> > > > Kafka
> > > > > to
> > > > > > do it's best to balance data in an online fashion (for some
> > definition
> > > > of
> > > > > > balance). i.e. if you add a new node partitions would slowly
> > migrate to
> > > > > it,
> > > > > > and if a node dies, partitions slowly migrate off it. This could
> > > > > > potentially be more work, but I'm not sure how much more. Has
> > anyone
> > > > > > thought about how to do it?
> > > > > >
> > > > > > -Jay
> > > > > >
> > > > > > On Wed, Jan 21, 2015 at 10:11 PM, Joe Stein <
> joe.stein@stealth.ly>
> > > > > wrote:
> > > > > >
> > > > > > > Posted a KIP for --re-balance for partition assignment in
> > > > reassignment
> > > > > > > tool.
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-6+-+New
> > +reassignment+partition+logic+for+re-balancing
> > > > > > >
> > > > > > > JIRA https://issues.apache.org/jira/browse/KAFKA-1792
> > > > > > >
> > > > > > > While going through the KIP I thought of one thing from the
> JIRA
> > that
> > > > > we
> > > > > > > should change. We should preserve --generate to be existing
> > > > > functionality
> > > > > > > for the next release it is in. If folks want to use
> --re-balance
> > then
> > > > > > > great, it just won't break any upgrade paths, yet.
> > > > > > >
> > > > > > > /*******************************************
> > > > > > >  Joe Stein
> > > > > > >  Founder, Principal Consultant
> > > > > > >  Big Data Open Source Security LLC
> > > > > > >  http://www.stealth.ly
> > > > > > >  Twitter: @allthingshadoop
> > <http://www.twitter.com/allthingshadoop>
> > > > > > > ********************************************/
> > > > > > >
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Thanks,
> > > > > Neha
> > > > >
> > > >
> >
>



-- 
-- Guozhang

Re: [DISCUSS] KIP-6 - New reassignment partition logic for re-balancing

Posted by Todd Palino <tp...@gmail.com>.

I would not think that partitions moving would cause any orphaned messages
like that. I would be more concerned about what happens when you change the
default on a running cluster from one scheme to another. Would we want to
support some kind of automated reassignment of existing partitions
(personally - no. I want to trigger that manually because it is a very disk
and network intensive process)?

-Todd

On Wed, Mar 4, 2015 at 7:33 PM, Tong Li <li...@us.ibm.com> wrote:

>
>
> Todd,
>     I think plugable design is good with solid default. The only issue I
> feel is when you use one and switch to another, will we end up with some
> unread messages hanging around and no one thinks or knows it is their
> responsibility to take care of them?
>
> Thanks.
>
> Tong
>
> Sent from my iPhone
>
> > On Mar 5, 2015, at 10:46 AM, Todd Palino <tp...@gmail.com> wrote:
> >
> > Apologize for the late comment on this...
> >
> > So fair assignment by count (taking into account the current partition
> > count of each broker) is very good. However, it's worth noting that all
> > partitions are not created equal. We have actually been performing more
> > rebalance work based on the partition size on disk, as given equal
> > retention of all topics, the size on disk is a better indicator of the
> > amount of traffic a partition gets, both in terms of storage and network
> > traffic. Overall, this seems to be a better balance.
> >
> > In addition to this, I think there is very much a need to have Kafka be
> > rack-aware. That is, to be able to assure that for a given cluster, you
> > never assign all replicas for a given partition in the same rack. This
> > would allow us to guard against maintenances or power failures that
> affect
> > a full rack of systems (or a given switch).
> >
> > I think it would make sense to implement the reassignment logic as a
> > pluggable component. That way it would be easy to select a scheme when
> > performing a reassignment (count, size, rack aware). Configuring a
> default
> > scheme for a cluster would allow for the brokers to create new topics and
> > partitions in compliance with the requested policy.
> >
> > -Todd
> >
> >
> > On Thu, Jan 22, 2015 at 10:13 PM, Joe Stein <jo...@stealth.ly>
> wrote:
> >
> > > I will go back through the ticket and code and write more up. Should be
> > > able to-do that sometime next week. The intention was to not replace
> > > existing functionality by issue a WARN on use. The following version it
> is
> > > released we could then deprecate it... I will fix the KIP for that too.
> > >
> > > On Fri, Jan 23, 2015 at 12:34 AM, Neha Narkhede <ne...@confluent.io>
> wrote:
> > >
> > > > Hey Joe,
> > > >
> > > > 1. Could you add details to the Public Interface section of the KIP?
> This
> > > > should include the proposed changes to the partition reassignment
> tool.
> > > > Also, maybe the new option can be named --rebalance instead of
> > > > --re-balance?
> > > > 2. It makes sense to list --decommission-broker as part of this KIP.
> > > > Similarly, shouldn't we also have an --add-broker option? The way I
> see
> > > > this is that there are several events when a partition reassignment
> is
> > > > required. Before this functionality is automated on the broker, the
> tool
> > > > will generate an ideal replica placement for each such event. The
> users
> > > > should merely have to specify the nature of the event e.g. adding a
> > > broker
> > > > or decommissioning an existing broker or merely rebalancing.
> > > > 3. If I understand the KIP correctly, the upgrade plan for this
> feature
> > > > includes removing the existing --generate option on the partition
> > > > reassignment tool in 0.8.3 while adding all the new options in the
> same
> > > > release. Is that correct?
> > > >
> > > > Thanks,
> > > > Neha
> > > >
> > > > On Thu, Jan 22, 2015 at 9:23 PM, Jay Kreps <ja...@gmail.com>
> wrote:
> > > >
> > > > > Ditto on this one. Can you give the algorithm we want to implement?
> > > > >
> > > > > Also I think in terms of scope this is just proposing to change the
> > > logic
> > > > > in ReassignPartitionsCommand? I think we've had the discussion
> various
> > > > > times on the mailing list that what people really want is just for
> > > Kafka
> > > > to
> > > > > do it's best to balance data in an online fashion (for some
> definition
> > > of
> > > > > balance). i.e. if you add a new node partitions would slowly
> migrate to
> > > > it,
> > > > > and if a node dies, partitions slowly migrate off it. This could
> > > > > potentially be more work, but I'm not sure how much more. Has
> anyone
> > > > > thought about how to do it?
> > > > >
> > > > > -Jay
> > > > >
> > > > > On Wed, Jan 21, 2015 at 10:11 PM, Joe Stein <jo...@stealth.ly>
> > > > wrote:
> > > > >
> > > > > > Posted a KIP for --re-balance for partition assignment in
> > > reassignment
> > > > > > tool.
> > > > > >
> > > > > >
> > > > > >
> > > > >
> > > >
> > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-6+-+New
> +reassignment+partition+logic+for+re-balancing
> > > > > >
> > > > > > JIRA https://issues.apache.org/jira/browse/KAFKA-1792
> > > > > >
> > > > > > While going through the KIP I thought of one thing from the JIRA
> that
> > > > we
> > > > > > should change. We should preserve --generate to be existing
> > > > functionality
> > > > > > for the next release it is in. If folks want to use --re-balance
> then
> > > > > > great, it just won't break any upgrade paths, yet.
> > > > > >
> > > > > > /*******************************************
> > > > > >  Joe Stein
> > > > > >  Founder, Principal Consultant
> > > > > >  Big Data Open Source Security LLC
> > > > > >  http://www.stealth.ly
> > > > > >  Twitter: @allthingshadoop
> <http://www.twitter.com/allthingshadoop>
> > > > > > ********************************************/
> > > > > >
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Thanks,
> > > > Neha
> > > >
> > >
>

Re: [DISCUSS] KIP-6 - New reassignment partition logic for re-balancing

Posted by Tong Li <li...@us.ibm.com>.


Todd,
    I think plugable design is good with solid default. The only issue I
feel is when you use one and switch to another, will we end up with some
unread messages hanging around and no one thinks or knows it is their
responsibility to take care of them?

Thanks.

Tong

Sent from my iPhone

> On Mar 5, 2015, at 10:46 AM, Todd Palino <tp...@gmail.com> wrote:
>
> Apologize for the late comment on this...
>
> So fair assignment by count (taking into account the current partition
> count of each broker) is very good. However, it's worth noting that all
> partitions are not created equal. We have actually been performing more
> rebalance work based on the partition size on disk, as given equal
> retention of all topics, the size on disk is a better indicator of the
> amount of traffic a partition gets, both in terms of storage and network
> traffic. Overall, this seems to be a better balance.
>
> In addition to this, I think there is very much a need to have Kafka be
> rack-aware. That is, to be able to assure that for a given cluster, you
> never assign all replicas for a given partition in the same rack. This
> would allow us to guard against maintenances or power failures that
affect
> a full rack of systems (or a given switch).
>
> I think it would make sense to implement the reassignment logic as a
> pluggable component. That way it would be easy to select a scheme when
> performing a reassignment (count, size, rack aware). Configuring a
default
> scheme for a cluster would allow for the brokers to create new topics and
> partitions in compliance with the requested policy.
>
> -Todd
>
>
> On Thu, Jan 22, 2015 at 10:13 PM, Joe Stein <jo...@stealth.ly> wrote:
>
> > I will go back through the ticket and code and write more up. Should be
> > able to-do that sometime next week. The intention was to not replace
> > existing functionality by issue a WARN on use. The following version it
is
> > released we could then deprecate it... I will fix the KIP for that too.
> >
> > On Fri, Jan 23, 2015 at 12:34 AM, Neha Narkhede <ne...@confluent.io>
wrote:
> >
> > > Hey Joe,
> > >
> > > 1. Could you add details to the Public Interface section of the KIP?
This
> > > should include the proposed changes to the partition reassignment
tool.
> > > Also, maybe the new option can be named --rebalance instead of
> > > --re-balance?
> > > 2. It makes sense to list --decommission-broker as part of this KIP.
> > > Similarly, shouldn't we also have an --add-broker option? The way I
see
> > > this is that there are several events when a partition reassignment
is
> > > required. Before this functionality is automated on the broker, the
tool
> > > will generate an ideal replica placement for each such event. The
users
> > > should merely have to specify the nature of the event e.g. adding a
> > broker
> > > or decommissioning an existing broker or merely rebalancing.
> > > 3. If I understand the KIP correctly, the upgrade plan for this
feature
> > > includes removing the existing --generate option on the partition
> > > reassignment tool in 0.8.3 while adding all the new options in the
same
> > > release. Is that correct?
> > >
> > > Thanks,
> > > Neha
> > >
> > > On Thu, Jan 22, 2015 at 9:23 PM, Jay Kreps <ja...@gmail.com>
wrote:
> > >
> > > > Ditto on this one. Can you give the algorithm we want to implement?
> > > >
> > > > Also I think in terms of scope this is just proposing to change the
> > logic
> > > > in ReassignPartitionsCommand? I think we've had the discussion
various
> > > > times on the mailing list that what people really want is just for
> > Kafka
> > > to
> > > > do it's best to balance data in an online fashion (for some
definition
> > of
> > > > balance). i.e. if you add a new node partitions would slowly
migrate to
> > > it,
> > > > and if a node dies, partitions slowly migrate off it. This could
> > > > potentially be more work, but I'm not sure how much more. Has
anyone
> > > > thought about how to do it?
> > > >
> > > > -Jay
> > > >
> > > > On Wed, Jan 21, 2015 at 10:11 PM, Joe Stein <jo...@stealth.ly>
> > > wrote:
> > > >
> > > > > Posted a KIP for --re-balance for partition assignment in
> > reassignment
> > > > > tool.
> > > > >
> > > > >
> > > > >
> > > >
> > >
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-6+-+New
+reassignment+partition+logic+for+re-balancing
> > > > >
> > > > > JIRA https://issues.apache.org/jira/browse/KAFKA-1792
> > > > >
> > > > > While going through the KIP I thought of one thing from the JIRA
that
> > > we
> > > > > should change. We should preserve --generate to be existing
> > > functionality
> > > > > for the next release it is in. If folks want to use --re-balance
then
> > > > > great, it just won't break any upgrade paths, yet.
> > > > >
> > > > > /*******************************************
> > > > >  Joe Stein
> > > > >  Founder, Principal Consultant
> > > > >  Big Data Open Source Security LLC
> > > > >  http://www.stealth.ly
> > > > >  Twitter: @allthingshadoop
<http://www.twitter.com/allthingshadoop>
> > > > > ********************************************/
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > Thanks,
> > > Neha
> > >
> >

Re: [DISCUSS] KIP-6 - New reassignment partition logic for re-balancing

Posted by Todd Palino <tp...@gmail.com>.

Apologize for the late comment on this...

So fair assignment by count (taking into account the current partition
count of each broker) is very good. However, it's worth noting that all
partitions are not created equal. We have actually been performing more
rebalance work based on the partition size on disk, as given equal
retention of all topics, the size on disk is a better indicator of the
amount of traffic a partition gets, both in terms of storage and network
traffic. Overall, this seems to be a better balance.

In addition to this, I think there is very much a need to have Kafka be
rack-aware. That is, to be able to assure that for a given cluster, you
never assign all replicas for a given partition in the same rack. This
would allow us to guard against maintenances or power failures that affect
a full rack of systems (or a given switch).

I think it would make sense to implement the reassignment logic as a
pluggable component. That way it would be easy to select a scheme when
performing a reassignment (count, size, rack aware). Configuring a default
scheme for a cluster would allow for the brokers to create new topics and
partitions in compliance with the requested policy.

-Todd


On Thu, Jan 22, 2015 at 10:13 PM, Joe Stein <jo...@stealth.ly> wrote:

> I will go back through the ticket and code and write more up. Should be
> able to-do that sometime next week. The intention was to not replace
> existing functionality by issue a WARN on use. The following version it is
> released we could then deprecate it... I will fix the KIP for that too.
>
> On Fri, Jan 23, 2015 at 12:34 AM, Neha Narkhede <ne...@confluent.io> wrote:
>
> > Hey Joe,
> >
> > 1. Could you add details to the Public Interface section of the KIP? This
> > should include the proposed changes to the partition reassignment tool.
> > Also, maybe the new option can be named --rebalance instead of
> > --re-balance?
> > 2. It makes sense to list --decommission-broker as part of this KIP.
> > Similarly, shouldn't we also have an --add-broker option? The way I see
> > this is that there are several events when a partition reassignment is
> > required. Before this functionality is automated on the broker, the tool
> > will generate an ideal replica placement for each such event. The users
> > should merely have to specify the nature of the event e.g. adding a
> broker
> > or decommissioning an existing broker or merely rebalancing.
> > 3. If I understand the KIP correctly, the upgrade plan for this feature
> > includes removing the existing --generate option on the partition
> > reassignment tool in 0.8.3 while adding all the new options in the same
> > release. Is that correct?
> >
> > Thanks,
> > Neha
> >
> > On Thu, Jan 22, 2015 at 9:23 PM, Jay Kreps <ja...@gmail.com> wrote:
> >
> > > Ditto on this one. Can you give the algorithm we want to implement?
> > >
> > > Also I think in terms of scope this is just proposing to change the
> logic
> > > in ReassignPartitionsCommand? I think we've had the discussion various
> > > times on the mailing list that what people really want is just for
> Kafka
> > to
> > > do it's best to balance data in an online fashion (for some definition
> of
> > > balance). i.e. if you add a new node partitions would slowly migrate to
> > it,
> > > and if a node dies, partitions slowly migrate off it. This could
> > > potentially be more work, but I'm not sure how much more. Has anyone
> > > thought about how to do it?
> > >
> > > -Jay
> > >
> > > On Wed, Jan 21, 2015 at 10:11 PM, Joe Stein <jo...@stealth.ly>
> > wrote:
> > >
> > > > Posted a KIP for --re-balance for partition assignment in
> reassignment
> > > > tool.
> > > >
> > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-6+-+New+reassignment+partition+logic+for+re-balancing
> > > >
> > > > JIRA https://issues.apache.org/jira/browse/KAFKA-1792
> > > >
> > > > While going through the KIP I thought of one thing from the JIRA that
> > we
> > > > should change. We should preserve --generate to be existing
> > functionality
> > > > for the next release it is in. If folks want to use --re-balance then
> > > > great, it just won't break any upgrade paths, yet.
> > > >
> > > > /*******************************************
> > > >  Joe Stein
> > > >  Founder, Principal Consultant
> > > >  Big Data Open Source Security LLC
> > > >  http://www.stealth.ly
> > > >  Twitter: @allthingshadoop <http://www.twitter.com/allthingshadoop>
> > > > ********************************************/
> > > >
> > >
> >
> >
> >
> > --
> > Thanks,
> > Neha
> >
>

Re: [DISCUSS] KIP-6 - New reassignment partition logic for re-balancing

Posted by Joe Stein <jo...@stealth.ly>.

I will go back through the ticket and code and write more up. Should be
able to-do that sometime next week. The intention was to not replace
existing functionality by issue a WARN on use. The following version it is
released we could then deprecate it... I will fix the KIP for that too.

On Fri, Jan 23, 2015 at 12:34 AM, Neha Narkhede <ne...@confluent.io> wrote:

> Hey Joe,
>
> 1. Could you add details to the Public Interface section of the KIP? This
> should include the proposed changes to the partition reassignment tool.
> Also, maybe the new option can be named --rebalance instead of
> --re-balance?
> 2. It makes sense to list --decommission-broker as part of this KIP.
> Similarly, shouldn't we also have an --add-broker option? The way I see
> this is that there are several events when a partition reassignment is
> required. Before this functionality is automated on the broker, the tool
> will generate an ideal replica placement for each such event. The users
> should merely have to specify the nature of the event e.g. adding a broker
> or decommissioning an existing broker or merely rebalancing.
> 3. If I understand the KIP correctly, the upgrade plan for this feature
> includes removing the existing --generate option on the partition
> reassignment tool in 0.8.3 while adding all the new options in the same
> release. Is that correct?
>
> Thanks,
> Neha
>
> On Thu, Jan 22, 2015 at 9:23 PM, Jay Kreps <ja...@gmail.com> wrote:
>
> > Ditto on this one. Can you give the algorithm we want to implement?
> >
> > Also I think in terms of scope this is just proposing to change the logic
> > in ReassignPartitionsCommand? I think we've had the discussion various
> > times on the mailing list that what people really want is just for Kafka
> to
> > do it's best to balance data in an online fashion (for some definition of
> > balance). i.e. if you add a new node partitions would slowly migrate to
> it,
> > and if a node dies, partitions slowly migrate off it. This could
> > potentially be more work, but I'm not sure how much more. Has anyone
> > thought about how to do it?
> >
> > -Jay
> >
> > On Wed, Jan 21, 2015 at 10:11 PM, Joe Stein <jo...@stealth.ly>
> wrote:
> >
> > > Posted a KIP for --re-balance for partition assignment in reassignment
> > > tool.
> > >
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-6+-+New+reassignment+partition+logic+for+re-balancing
> > >
> > > JIRA https://issues.apache.org/jira/browse/KAFKA-1792
> > >
> > > While going through the KIP I thought of one thing from the JIRA that
> we
> > > should change. We should preserve --generate to be existing
> functionality
> > > for the next release it is in. If folks want to use --re-balance then
> > > great, it just won't break any upgrade paths, yet.
> > >
> > > /*******************************************
> > >  Joe Stein
> > >  Founder, Principal Consultant
> > >  Big Data Open Source Security LLC
> > >  http://www.stealth.ly
> > >  Twitter: @allthingshadoop <http://www.twitter.com/allthingshadoop>
> > > ********************************************/
> > >
> >
>
>
>
> --
> Thanks,
> Neha
>

Re: [DISCUSS] KIP-6 - New reassignment partition logic for re-balancing

Posted by Neha Narkhede <ne...@confluent.io>.

Hey Joe,

1. Could you add details to the Public Interface section of the KIP? This
should include the proposed changes to the partition reassignment tool.
Also, maybe the new option can be named --rebalance instead of
--re-balance?
2. It makes sense to list --decommission-broker as part of this KIP.
Similarly, shouldn't we also have an --add-broker option? The way I see
this is that there are several events when a partition reassignment is
required. Before this functionality is automated on the broker, the tool
will generate an ideal replica placement for each such event. The users
should merely have to specify the nature of the event e.g. adding a broker
or decommissioning an existing broker or merely rebalancing.
3. If I understand the KIP correctly, the upgrade plan for this feature
includes removing the existing --generate option on the partition
reassignment tool in 0.8.3 while adding all the new options in the same
release. Is that correct?

Thanks,
Neha

On Thu, Jan 22, 2015 at 9:23 PM, Jay Kreps <ja...@gmail.com> wrote:

> Ditto on this one. Can you give the algorithm we want to implement?
>
> Also I think in terms of scope this is just proposing to change the logic
> in ReassignPartitionsCommand? I think we've had the discussion various
> times on the mailing list that what people really want is just for Kafka to
> do it's best to balance data in an online fashion (for some definition of
> balance). i.e. if you add a new node partitions would slowly migrate to it,
> and if a node dies, partitions slowly migrate off it. This could
> potentially be more work, but I'm not sure how much more. Has anyone
> thought about how to do it?
>
> -Jay
>
> On Wed, Jan 21, 2015 at 10:11 PM, Joe Stein <jo...@stealth.ly> wrote:
>
> > Posted a KIP for --re-balance for partition assignment in reassignment
> > tool.
> >
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-6+-+New+reassignment+partition+logic+for+re-balancing
> >
> > JIRA https://issues.apache.org/jira/browse/KAFKA-1792
> >
> > While going through the KIP I thought of one thing from the JIRA that we
> > should change. We should preserve --generate to be existing functionality
> > for the next release it is in. If folks want to use --re-balance then
> > great, it just won't break any upgrade paths, yet.
> >
> > /*******************************************
> >  Joe Stein
> >  Founder, Principal Consultant
> >  Big Data Open Source Security LLC
> >  http://www.stealth.ly
> >  Twitter: @allthingshadoop <http://www.twitter.com/allthingshadoop>
> > ********************************************/
> >
>



-- 
Thanks,
Neha

Re: [DISCUSS] KIP-6 - New reassignment partition logic for re-balancing

Posted by Jay Kreps <ja...@gmail.com>.

Ditto on this one. Can you give the algorithm we want to implement?

Also I think in terms of scope this is just proposing to change the logic
in ReassignPartitionsCommand? I think we've had the discussion various
times on the mailing list that what people really want is just for Kafka to
do it's best to balance data in an online fashion (for some definition of
balance). i.e. if you add a new node partitions would slowly migrate to it,
and if a node dies, partitions slowly migrate off it. This could
potentially be more work, but I'm not sure how much more. Has anyone
thought about how to do it?

-Jay

On Wed, Jan 21, 2015 at 10:11 PM, Joe Stein <jo...@stealth.ly> wrote:

> Posted a KIP for --re-balance for partition assignment in reassignment
> tool.
>
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-6+-+New+reassignment+partition+logic+for+re-balancing
>
> JIRA https://issues.apache.org/jira/browse/KAFKA-1792
>
> While going through the KIP I thought of one thing from the JIRA that we
> should change. We should preserve --generate to be existing functionality
> for the next release it is in. If folks want to use --re-balance then
> great, it just won't break any upgrade paths, yet.
>
> /*******************************************
>  Joe Stein
>  Founder, Principal Consultant
>  Big Data Open Source Security LLC
>  http://www.stealth.ly
>  Twitter: @allthingshadoop <http://www.twitter.com/allthingshadoop>
> ********************************************/
>