You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by Mickael Maison <mi...@gmail.com> on 2022/05/02 16:28:31 UTC

Re: [DISCUSS] KIP-660: Pluggable ReplicaPlacer

Hi,

If there are no further comments, I'll start a vote in the next few days.

Thanks,
Mickael

On Wed, Mar 30, 2022 at 3:51 AM Luke Chen <sh...@gmail.com> wrote:
>
> Hi Mickael,
>
> Thanks for the update.
> It answered my questions!
>
> Thank you.
> Luke
>
> On Wed, Mar 30, 2022 at 12:09 AM Mickael Maison <mi...@gmail.com>
> wrote:
>
> > Hi Luke,
> >
> > Thanks for the feedback.
> >
> > 1. Thanks, fixed!
> > 2. Yes that's right. It's the same behavior for topic policies
> > 3. I've added details about how the mentioned scenarios could be
> > addressed. The information required to make such decisions is not part
> > of the Kafka cluster metadata so an external input is necessary. This
> > KIP does not propose a specific mechanism for doing it.
> >
> > I hope this answers your questions.
> >
> > Thanks,
> > Mickael
> >
> >
> > On Tue, Mar 29, 2022 at 5:42 PM Mickael Maison <mi...@gmail.com>
> > wrote:
> > >
> > > Hi Ryanne,
> > >
> > > That's a good point!
> > >
> > > There's no value in having all implementations perform the same sanity
> > > checks. If the replication factor is < 1 or larger than the current
> > > number of registered brokers, the controller should directly throw
> > > InvalidReplicationFactorException and not call the ReplicaPlacer. I've
> > > updated the KIP so the place() method now only throws
> > > ReplicaPlacementException.
> > >
> > > Thanks,
> > > Mickael
> > >
> > >
> > >
> > > On Mon, Mar 28, 2022 at 6:20 PM Ryanne Dolan <ry...@gmail.com>
> > wrote:
> > > >
> > > > Wondering about InvalidReplicationFactorException. Why would an
> > > > implementation throw this? Given the information passed to the method,
> > > > seems like this could only be thrown if there were obviously invalid
> > > > arguments, like a negative number or zero. Can we just guarantee such
> > > > invalid arguments aren't passed in?
> > > >
> > > > Ryanne
> > > >
> > > > On Sat, Mar 26, 2022, 8:51 AM Luke Chen <sh...@gmail.com> wrote:
> > > >
> > > > > Hi Mickael,
> > > > >
> > > > > Thanks for the KIP!
> > > > > It's indeed a pain point for the Kafka admins.
> > > > >
> > > > > I have some comments:
> > > > > 1. Typo in motivation section: When administrators [when to] remove
> > brokers
> > > > > from a cluster,....
> > > > > 2. If different `replica.placer.class.name` configs are set in all
> > > > > controllers, I think only the config for  "active controller" will
> > take
> > > > > effect, right?
> > > > > 3. Could you explain more about how the proposal fixes some
> > scenarios you
> > > > > listed, ex: the new added broker case. How could we know the broker
> > is new
> > > > > added? I guess it's by checking the broker load via some metrics
> > > > > dynamically, right?
> > > > >
> > > > >
> > > > > Thank you.
> > > > > Luke
> > > > >
> > > > > On Fri, Mar 18, 2022 at 10:30 AM Ryanne Dolan <ryannedolan@gmail.com
> > >
> > > > > wrote:
> > > > >
> > > > > > Thanks Mickael, this makes sense to me! I've been wanting
> > something like
> > > > > > this in order to decommission a broker without new partitions
> > getting
> > > > > > accidentally assigned to it.
> > > > > >
> > > > > > Ryanne
> > > > > >
> > > > > > On Thu, Mar 17, 2022, 5:56 AM Mickael Maison <
> > mickael.maison@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > Hi,
> > > > > > >
> > > > > > > I'd like to start a new discussion on KIP-660. I originally
> > wrote this
> > > > > > > KIP in 2020 and the initial discussion
> > > > > > > (
> > https://lists.apache.org/thread/xn7xyb74nyt281brto4x28r9rzxm4lp9)
> > > > > > > raised some concerns especially around KRaft (which did not
> > exist at
> > > > > > > that time) and scalability.
> > > > > > >
> > > > > > > Since then, we got a new KRaft controller so I've been able to
> > revisit
> > > > > > > this KIP. I kept the KIP number as it's essentially the same
> > idea, but
> > > > > > > the proposal is significantly different:
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-660%3A+Pluggable+ReplicaPlacer
> > > > > > >
> > > > > > > Please take a look and let me know if you have any feedback.
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Mickael
> > > > > > >
> > > > > >
> > > > >
> >

Re: [DISCUSS] KIP-660: Pluggable ReplicaPlacer

Posted by Divij Vaidya <di...@gmail.com>.
Nice proposal. I can definitely see the value add it could bring based on
varied custom use cases.

1. When exposing an interface external to Kafka, could we strive to ensure
that the implementation can not mutate the incoming parameters? This would
ensure that the implementations don't un-intentionally change some internal
state of the cluster. Also, this would be a two way door, so later if we
decide to relax the constraint to allow implementations to mutate certain
things, we should be able to do that. This immutability can be achieved for
example by passing a clone of the list of usable brokers to the interface.
2. Why is the interface Closeable? What resource does the interface expect
to close? Please add a comment on the class documentation about this.
3. The iterator that is being passed as an argument to `place()`. Do we
expect the place() method to mutate the input? If not, then an iterator may
not be necessary? Could it be an unmodifiable collection instead?
4. Please add documentation on behaviour expected out of the place()
function such as, should it be idempotent? Is it called in some latency
sensitive path? What should be the performance expectation from this
function? etc.
5. Please add documentation on behaviour expected out of the place()
function for error scenario's e.g. would it be able to set a message in the
ReplicaPlacementException?
6. The interface should be extensible in the future, e.g. if we want to add
a new parameter to the place() function in future, we shouldn't have to
require code changes to all concrete implementations. This can be achieved
by grouping the parameters of place() into higher abstractions, say,
PlacementSpec (which already exist). The advantage is that a new parameter
could be added to PlacementSpec with a setter/getter in future but it
wouldn't require any code modification to the custom implementations of
ReplicaPlacer.

Divij Vaidya



On Mon, May 2, 2022 at 6:28 PM Mickael Maison <mi...@gmail.com>
wrote:

> Hi,
>
> If there are no further comments, I'll start a vote in the next few days.
>
> Thanks,
> Mickael
>
> On Wed, Mar 30, 2022 at 3:51 AM Luke Chen <sh...@gmail.com> wrote:
> >
> > Hi Mickael,
> >
> > Thanks for the update.
> > It answered my questions!
> >
> > Thank you.
> > Luke
> >
> > On Wed, Mar 30, 2022 at 12:09 AM Mickael Maison <
> mickael.maison@gmail.com>
> > wrote:
> >
> > > Hi Luke,
> > >
> > > Thanks for the feedback.
> > >
> > > 1. Thanks, fixed!
> > > 2. Yes that's right. It's the same behavior for topic policies
> > > 3. I've added details about how the mentioned scenarios could be
> > > addressed. The information required to make such decisions is not part
> > > of the Kafka cluster metadata so an external input is necessary. This
> > > KIP does not propose a specific mechanism for doing it.
> > >
> > > I hope this answers your questions.
> > >
> > > Thanks,
> > > Mickael
> > >
> > >
> > > On Tue, Mar 29, 2022 at 5:42 PM Mickael Maison <
> mickael.maison@gmail.com>
> > > wrote:
> > > >
> > > > Hi Ryanne,
> > > >
> > > > That's a good point!
> > > >
> > > > There's no value in having all implementations perform the same
> sanity
> > > > checks. If the replication factor is < 1 or larger than the current
> > > > number of registered brokers, the controller should directly throw
> > > > InvalidReplicationFactorException and not call the ReplicaPlacer.
> I've
> > > > updated the KIP so the place() method now only throws
> > > > ReplicaPlacementException.
> > > >
> > > > Thanks,
> > > > Mickael
> > > >
> > > >
> > > >
> > > > On Mon, Mar 28, 2022 at 6:20 PM Ryanne Dolan <ry...@gmail.com>
> > > wrote:
> > > > >
> > > > > Wondering about InvalidReplicationFactorException. Why would an
> > > > > implementation throw this? Given the information passed to the
> method,
> > > > > seems like this could only be thrown if there were obviously
> invalid
> > > > > arguments, like a negative number or zero. Can we just guarantee
> such
> > > > > invalid arguments aren't passed in?
> > > > >
> > > > > Ryanne
> > > > >
> > > > > On Sat, Mar 26, 2022, 8:51 AM Luke Chen <sh...@gmail.com> wrote:
> > > > >
> > > > > > Hi Mickael,
> > > > > >
> > > > > > Thanks for the KIP!
> > > > > > It's indeed a pain point for the Kafka admins.
> > > > > >
> > > > > > I have some comments:
> > > > > > 1. Typo in motivation section: When administrators [when to]
> remove
> > > brokers
> > > > > > from a cluster,....
> > > > > > 2. If different `replica.placer.class.name` configs are set in
> all
> > > > > > controllers, I think only the config for  "active controller"
> will
> > > take
> > > > > > effect, right?
> > > > > > 3. Could you explain more about how the proposal fixes some
> > > scenarios you
> > > > > > listed, ex: the new added broker case. How could we know the
> broker
> > > is new
> > > > > > added? I guess it's by checking the broker load via some metrics
> > > > > > dynamically, right?
> > > > > >
> > > > > >
> > > > > > Thank you.
> > > > > > Luke
> > > > > >
> > > > > > On Fri, Mar 18, 2022 at 10:30 AM Ryanne Dolan <
> ryannedolan@gmail.com
> > > >
> > > > > > wrote:
> > > > > >
> > > > > > > Thanks Mickael, this makes sense to me! I've been wanting
> > > something like
> > > > > > > this in order to decommission a broker without new partitions
> > > getting
> > > > > > > accidentally assigned to it.
> > > > > > >
> > > > > > > Ryanne
> > > > > > >
> > > > > > > On Thu, Mar 17, 2022, 5:56 AM Mickael Maison <
> > > mickael.maison@gmail.com>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Hi,
> > > > > > > >
> > > > > > > > I'd like to start a new discussion on KIP-660. I originally
> > > wrote this
> > > > > > > > KIP in 2020 and the initial discussion
> > > > > > > > (
> > > https://lists.apache.org/thread/xn7xyb74nyt281brto4x28r9rzxm4lp9)
> > > > > > > > raised some concerns especially around KRaft (which did not
> > > exist at
> > > > > > > > that time) and scalability.
> > > > > > > >
> > > > > > > > Since then, we got a new KRaft controller so I've been able
> to
> > > revisit
> > > > > > > > this KIP. I kept the KIP number as it's essentially the same
> > > idea, but
> > > > > > > > the proposal is significantly different:
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-660%3A+Pluggable+ReplicaPlacer
> > > > > > > >
> > > > > > > > Please take a look and let me know if you have any feedback.
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > > Mickael
> > > > > > > >
> > > > > > >
> > > > > >
> > >
>

Re: [DISCUSS] KIP-660: Pluggable ReplicaPlacer

Posted by Mickael Maison <mi...@gmail.com>.
Hi Viktor,

Since the use cases targeted by this KIP are common, I plan to add
support for them in the default replica placement logic instead of
requiring custom implementations. This is why I retired this KIP. Feel
free to take over KIP-660 or reuse parts of it if you want as I won't
pursue it.

Thanks,
Mickael

On Thu, Jul 13, 2023 at 5:07 PM Viktor Somogyi-Vass
<vi...@cloudera.com> wrote:
>
> Mickael, have you had some time to review this by any chance?
>
> On Tue, Jun 20, 2023 at 5:23 PM Viktor Somogyi-Vass <vi...@cloudera.com> wrote:
>>
>> Hey all,
>>
>> I'd like to revive this discussion. I've created https://cwiki.apache.org/confluence/display/KAFKA/KIP-879%3A+Multi-level+Rack+Awareness last November and it seems to be that there is a nice overlap between the two and would be good to merge. Should we revive KIP-660 and merge the two KIPs?
>> If you don't have time for this Mickael currently, I'm happy to take it over from you and merge the two interfaces, it seems like they're somewhat similar (and also with the current internal interface).
>>
>> Best,
>> Viktor
>>
>> On Tue, May 31, 2022 at 3:57 PM Mickael Maison <mi...@gmail.com> wrote:
>>>
>>> Hi Vikas,
>>>
>>> You make some very good points and most importantly I agree that being
>>> able to prevent putting new partitions on a broker should be part of
>>> Kafka itself and not require a plugin.
>>>
>>> This feature would addresses 2 out of the 3 scenarios mentioned in the
>>> motivation section. The last one "When adding brokers to a cluster,
>>> Kafka currently does not necessarily place new partitions on new
>>> brokers" is clearly less important.
>>>
>>> So I think I'll retire this KIP and I'll follow up with a new KIP to
>>> focus on that feature.
>>>
>>> Thanks,
>>> Mickael
>>>
>>>
>>> On Mon, May 9, 2022 at 8:11 PM Vikas Singh <vi...@confluent.io.invalid> wrote:
>>> >
>>> > Hi Mickael,
>>> >
>>> > It's a nice proposal. It's appealing to have a pluggable way to override
>>> > default kafka placement decisions, and the motivation section lists some of
>>> > them. Here are few comments:
>>> >
>>> > * The motivation section has "When adding brokers to a cluster, Kafka
>>> > currently does not necessarily place new partitions on new brokers". I am
>>> > not sure how valuable doing this will be. A newly created kafka topic takes
>>> > time to reach the same usage level as existing topics, say because the
>>> > topic created by a new workload that is getting onboarded, or the expansion
>>> > was done to relieve disk pressure on existing nodes etc. While new topics
>>> > catch up to existing workload, the new brokers are not sharing equal load
>>> > in the cluster, which probably defeats the purpose of adding new brokers.
>>> > In addition to that clustering new topics like this on new brokers have
>>> > implications from fault domain perspective. A reasonable way to approach it
>>> > is to indeed use CruiseControl to move things around so that the newly
>>> > added nodes become immediately involved and share cluster load.
>>> > * Regarding "When administrators want to remove brokers from a cluster,
>>> > there is no way to prevent Kafka from placing partitions on them", this is
>>> > indeed an issue. I would argue that this is needed by everyone and should
>>> > be part of Kafka, instead of being implemented as part of a plugin
>>> > interface by multiple teams.
>>> > * For "When some brokers are near their storage/throughput limit, Kafka
>>> > could avoid putting new partitions on them", while this can help relieve
>>> > short term overload I think again the correct solution here is something
>>> > like CruiseControl where the system is monitored and things moved around to
>>> > maintain a balanced cluster. A new topic will not take any disk space, so
>>> > placing them anywhere normally isn't going to add to the storage overload.
>>> > Similar to the previous case, maybe a mechanism in Kafka to put nodes in a
>>> > quarantine state is a better way to approach this.
>>> >
>>> > In terms of the proposed api, I have a couple of comments:
>>> >
>>> > * It is not clear if the proposal applies to partitions of new topics or
>>> > addition on partitions to an existing topic. Explicitly stating that will
>>> > be helpful.
>>> > * Regarding part "To address the use cases identified in the motivation
>>> > section, some knowledge about the current state of the cluster is
>>> > necessary. Details whether a new broker has just been added or is being
>>> > decommissioned are not part of the cluster metadata. Therefore such
>>> > knowledge has to be provided via an external means to the ReplicaPlacer,
>>> > for example via the configuration". It's not clear how this will be done.
>>> > If I have to implement this interface, it will be helpful to have clear
>>> > guidance/examples here which hopefully ties to the use cases in the
>>> > motivation section. It also allows us to figure out if the proposed
>>> > interface is complete and helps future implementers of the interface.
>>> >
>>> > Couple of minor comments:
>>> > * The KIP is not listed in the main KIP page (
>>> > https://cwiki-test.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals).
>>> > Can you please add it there.
>>> > * The page has "This is especially true for the 4 scenarios listed in the
>>> > Motivation section", but there are only 3 scenarios listed.
>>> >
>>> > Regards,
>>> > Vikas
>>> >
>>> >
>>> > On Tue, May 3, 2022 at 5:51 PM Colin McCabe <cm...@apache.org> wrote:
>>> >
>>> > > Hi Mickael,
>>> > >
>>> > > We did discuss this earlier, and I remember not being too enthusiastic
>>> > > about a pluggable policy here :)
>>> > >
>>> > > There have been several changes to the placement code in the last few
>>> > > weeks. (These are examples of the kind of changes that are impossible to do
>>> > > once an API is established, by the way.) Can you please revise the KIP to
>>> > > take these into account?
>>> > >
>>> > > I'd also like to understand a little bit better why we need this API when
>>> > > we have the explicit placement API for createTopics and createPartitions.
>>> > > Can you give me a few scenarios where the manual placement API would be
>>> > > insufficient?
>>> > >
>>> > > best,
>>> > > Colin
>>> > >
>>> > >
>>> > > On Mon, May 2, 2022, at 09:28, Mickael Maison wrote:
>>> > > > Hi,
>>> > > >
>>> > > > If there are no further comments, I'll start a vote in the next few days.
>>> > > >
>>> > > > Thanks,
>>> > > > Mickael
>>> > > >
>>> > > > On Wed, Mar 30, 2022 at 3:51 AM Luke Chen <sh...@gmail.com> wrote:
>>> > > >>
>>> > > >> Hi Mickael,
>>> > > >>
>>> > > >> Thanks for the update.
>>> > > >> It answered my questions!
>>> > > >>
>>> > > >> Thank you.
>>> > > >> Luke
>>> > > >>
>>> > > >> On Wed, Mar 30, 2022 at 12:09 AM Mickael Maison <
>>> > > mickael.maison@gmail.com>
>>> > > >> wrote:
>>> > > >>
>>> > > >> > Hi Luke,
>>> > > >> >
>>> > > >> > Thanks for the feedback.
>>> > > >> >
>>> > > >> > 1. Thanks, fixed!
>>> > > >> > 2. Yes that's right. It's the same behavior for topic policies
>>> > > >> > 3. I've added details about how the mentioned scenarios could be
>>> > > >> > addressed. The information required to make such decisions is not part
>>> > > >> > of the Kafka cluster metadata so an external input is necessary. This
>>> > > >> > KIP does not propose a specific mechanism for doing it.
>>> > > >> >
>>> > > >> > I hope this answers your questions.
>>> > > >> >
>>> > > >> > Thanks,
>>> > > >> > Mickael
>>> > > >> >
>>> > > >> >
>>> > > >> > On Tue, Mar 29, 2022 at 5:42 PM Mickael Maison <
>>> > > mickael.maison@gmail.com>
>>> > > >> > wrote:
>>> > > >> > >
>>> > > >> > > Hi Ryanne,
>>> > > >> > >
>>> > > >> > > That's a good point!
>>> > > >> > >
>>> > > >> > > There's no value in having all implementations perform the same
>>> > > sanity
>>> > > >> > > checks. If the replication factor is < 1 or larger than the current
>>> > > >> > > number of registered brokers, the controller should directly throw
>>> > > >> > > InvalidReplicationFactorException and not call the ReplicaPlacer.
>>> > > I've
>>> > > >> > > updated the KIP so the place() method now only throws
>>> > > >> > > ReplicaPlacementException.
>>> > > >> > >
>>> > > >> > > Thanks,
>>> > > >> > > Mickael
>>> > > >> > >
>>> > > >> > >
>>> > > >> > >
>>> > > >> > > On Mon, Mar 28, 2022 at 6:20 PM Ryanne Dolan <ryannedolan@gmail.com
>>> > > >
>>> > > >> > wrote:
>>> > > >> > > >
>>> > > >> > > > Wondering about InvalidReplicationFactorException. Why would an
>>> > > >> > > > implementation throw this? Given the information passed to the
>>> > > method,
>>> > > >> > > > seems like this could only be thrown if there were obviously
>>> > > invalid
>>> > > >> > > > arguments, like a negative number or zero. Can we just guarantee
>>> > > such
>>> > > >> > > > invalid arguments aren't passed in?
>>> > > >> > > >
>>> > > >> > > > Ryanne
>>> > > >> > > >
>>> > > >> > > > On Sat, Mar 26, 2022, 8:51 AM Luke Chen <sh...@gmail.com>
>>> > > wrote:
>>> > > >> > > >
>>> > > >> > > > > Hi Mickael,
>>> > > >> > > > >
>>> > > >> > > > > Thanks for the KIP!
>>> > > >> > > > > It's indeed a pain point for the Kafka admins.
>>> > > >> > > > >
>>> > > >> > > > > I have some comments:
>>> > > >> > > > > 1. Typo in motivation section: When administrators [when to]
>>> > > remove
>>> > > >> > brokers
>>> > > >> > > > > from a cluster,....
>>> > > >> > > > > 2. If different `replica.placer.class.name` configs are set in
>>> > > all
>>> > > >> > > > > controllers, I think only the config for  "active controller"
>>> > > will
>>> > > >> > take
>>> > > >> > > > > effect, right?
>>> > > >> > > > > 3. Could you explain more about how the proposal fixes some
>>> > > >> > scenarios you
>>> > > >> > > > > listed, ex: the new added broker case. How could we know the
>>> > > broker
>>> > > >> > is new
>>> > > >> > > > > added? I guess it's by checking the broker load via some metrics
>>> > > >> > > > > dynamically, right?
>>> > > >> > > > >
>>> > > >> > > > >
>>> > > >> > > > > Thank you.
>>> > > >> > > > > Luke
>>> > > >> > > > >
>>> > > >> > > > > On Fri, Mar 18, 2022 at 10:30 AM Ryanne Dolan <
>>> > > ryannedolan@gmail.com
>>> > > >> > >
>>> > > >> > > > > wrote:
>>> > > >> > > > >
>>> > > >> > > > > > Thanks Mickael, this makes sense to me! I've been wanting
>>> > > >> > something like
>>> > > >> > > > > > this in order to decommission a broker without new partitions
>>> > > >> > getting
>>> > > >> > > > > > accidentally assigned to it.
>>> > > >> > > > > >
>>> > > >> > > > > > Ryanne
>>> > > >> > > > > >
>>> > > >> > > > > > On Thu, Mar 17, 2022, 5:56 AM Mickael Maison <
>>> > > >> > mickael.maison@gmail.com>
>>> > > >> > > > > > wrote:
>>> > > >> > > > > >
>>> > > >> > > > > > > Hi,
>>> > > >> > > > > > >
>>> > > >> > > > > > > I'd like to start a new discussion on KIP-660. I originally
>>> > > >> > wrote this
>>> > > >> > > > > > > KIP in 2020 and the initial discussion
>>> > > >> > > > > > > (
>>> > > >> > https://lists.apache.org/thread/xn7xyb74nyt281brto4x28r9rzxm4lp9)
>>> > > >> > > > > > > raised some concerns especially around KRaft (which did not
>>> > > >> > exist at
>>> > > >> > > > > > > that time) and scalability.
>>> > > >> > > > > > >
>>> > > >> > > > > > > Since then, we got a new KRaft controller so I've been able
>>> > > to
>>> > > >> > revisit
>>> > > >> > > > > > > this KIP. I kept the KIP number as it's essentially the same
>>> > > >> > idea, but
>>> > > >> > > > > > > the proposal is significantly different:
>>> > > >> > > > > > >
>>> > > >> > > > > > >
>>> > > >> > > > > >
>>> > > >> > > > >
>>> > > >> >
>>> > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-660%3A+Pluggable+ReplicaPlacer
>>> > > >> > > > > > >
>>> > > >> > > > > > > Please take a look and let me know if you have any feedback.
>>> > > >> > > > > > >
>>> > > >> > > > > > > Thanks,
>>> > > >> > > > > > > Mickael
>>> > > >> > > > > > >
>>> > > >> > > > > >
>>> > > >> > > > >
>>> > > >> >
>>> > >

Re: [DISCUSS] KIP-660: Pluggable ReplicaPlacer

Posted by Viktor Somogyi-Vass <vi...@cloudera.com.INVALID>.
Mickael, have you had some time to review this by any chance?

On Tue, Jun 20, 2023 at 5:23 PM Viktor Somogyi-Vass <
viktor.somogyi@cloudera.com> wrote:

> Hey all,
>
> I'd like to revive this discussion. I've created
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-879%3A+Multi-level+Rack+Awareness
> last November and it seems to be that there is a nice overlap between the
> two and would be good to merge. Should we revive KIP-660 and merge the two
> KIPs?
> If you don't have time for this Mickael currently, I'm happy to take it
> over from you and merge the two interfaces, it seems like they're somewhat
> similar (and also with the current internal interface).
>
> Best,
> Viktor
>
> On Tue, May 31, 2022 at 3:57 PM Mickael Maison <mi...@gmail.com>
> wrote:
>
>> Hi Vikas,
>>
>> You make some very good points and most importantly I agree that being
>> able to prevent putting new partitions on a broker should be part of
>> Kafka itself and not require a plugin.
>>
>> This feature would addresses 2 out of the 3 scenarios mentioned in the
>> motivation section. The last one "When adding brokers to a cluster,
>> Kafka currently does not necessarily place new partitions on new
>> brokers" is clearly less important.
>>
>> So I think I'll retire this KIP and I'll follow up with a new KIP to
>> focus on that feature.
>>
>> Thanks,
>> Mickael
>>
>>
>> On Mon, May 9, 2022 at 8:11 PM Vikas Singh <vi...@confluent.io.invalid>
>> wrote:
>> >
>> > Hi Mickael,
>> >
>> > It's a nice proposal. It's appealing to have a pluggable way to override
>> > default kafka placement decisions, and the motivation section lists
>> some of
>> > them. Here are few comments:
>> >
>> > * The motivation section has "When adding brokers to a cluster, Kafka
>> > currently does not necessarily place new partitions on new brokers". I
>> am
>> > not sure how valuable doing this will be. A newly created kafka topic
>> takes
>> > time to reach the same usage level as existing topics, say because the
>> > topic created by a new workload that is getting onboarded, or the
>> expansion
>> > was done to relieve disk pressure on existing nodes etc. While new
>> topics
>> > catch up to existing workload, the new brokers are not sharing equal
>> load
>> > in the cluster, which probably defeats the purpose of adding new
>> brokers.
>> > In addition to that clustering new topics like this on new brokers have
>> > implications from fault domain perspective. A reasonable way to
>> approach it
>> > is to indeed use CruiseControl to move things around so that the newly
>> > added nodes become immediately involved and share cluster load.
>> > * Regarding "When administrators want to remove brokers from a cluster,
>> > there is no way to prevent Kafka from placing partitions on them", this
>> is
>> > indeed an issue. I would argue that this is needed by everyone and
>> should
>> > be part of Kafka, instead of being implemented as part of a plugin
>> > interface by multiple teams.
>> > * For "When some brokers are near their storage/throughput limit, Kafka
>> > could avoid putting new partitions on them", while this can help relieve
>> > short term overload I think again the correct solution here is something
>> > like CruiseControl where the system is monitored and things moved
>> around to
>> > maintain a balanced cluster. A new topic will not take any disk space,
>> so
>> > placing them anywhere normally isn't going to add to the storage
>> overload.
>> > Similar to the previous case, maybe a mechanism in Kafka to put nodes
>> in a
>> > quarantine state is a better way to approach this.
>> >
>> > In terms of the proposed api, I have a couple of comments:
>> >
>> > * It is not clear if the proposal applies to partitions of new topics or
>> > addition on partitions to an existing topic. Explicitly stating that
>> will
>> > be helpful.
>> > * Regarding part "To address the use cases identified in the motivation
>> > section, some knowledge about the current state of the cluster is
>> > necessary. Details whether a new broker has just been added or is being
>> > decommissioned are not part of the cluster metadata. Therefore such
>> > knowledge has to be provided via an external means to the ReplicaPlacer,
>> > for example via the configuration". It's not clear how this will be
>> done.
>> > If I have to implement this interface, it will be helpful to have clear
>> > guidance/examples here which hopefully ties to the use cases in the
>> > motivation section. It also allows us to figure out if the proposed
>> > interface is complete and helps future implementers of the interface.
>> >
>> > Couple of minor comments:
>> > * The KIP is not listed in the main KIP page (
>> >
>> https://cwiki-test.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals
>> ).
>> > Can you please add it there.
>> > * The page has "This is especially true for the 4 scenarios listed in
>> the
>> > Motivation section", but there are only 3 scenarios listed.
>> >
>> > Regards,
>> > Vikas
>> >
>> >
>> > On Tue, May 3, 2022 at 5:51 PM Colin McCabe <cm...@apache.org> wrote:
>> >
>> > > Hi Mickael,
>> > >
>> > > We did discuss this earlier, and I remember not being too enthusiastic
>> > > about a pluggable policy here :)
>> > >
>> > > There have been several changes to the placement code in the last few
>> > > weeks. (These are examples of the kind of changes that are impossible
>> to do
>> > > once an API is established, by the way.) Can you please revise the
>> KIP to
>> > > take these into account?
>> > >
>> > > I'd also like to understand a little bit better why we need this API
>> when
>> > > we have the explicit placement API for createTopics and
>> createPartitions.
>> > > Can you give me a few scenarios where the manual placement API would
>> be
>> > > insufficient?
>> > >
>> > > best,
>> > > Colin
>> > >
>> > >
>> > > On Mon, May 2, 2022, at 09:28, Mickael Maison wrote:
>> > > > Hi,
>> > > >
>> > > > If there are no further comments, I'll start a vote in the next few
>> days.
>> > > >
>> > > > Thanks,
>> > > > Mickael
>> > > >
>> > > > On Wed, Mar 30, 2022 at 3:51 AM Luke Chen <sh...@gmail.com>
>> wrote:
>> > > >>
>> > > >> Hi Mickael,
>> > > >>
>> > > >> Thanks for the update.
>> > > >> It answered my questions!
>> > > >>
>> > > >> Thank you.
>> > > >> Luke
>> > > >>
>> > > >> On Wed, Mar 30, 2022 at 12:09 AM Mickael Maison <
>> > > mickael.maison@gmail.com>
>> > > >> wrote:
>> > > >>
>> > > >> > Hi Luke,
>> > > >> >
>> > > >> > Thanks for the feedback.
>> > > >> >
>> > > >> > 1. Thanks, fixed!
>> > > >> > 2. Yes that's right. It's the same behavior for topic policies
>> > > >> > 3. I've added details about how the mentioned scenarios could be
>> > > >> > addressed. The information required to make such decisions is
>> not part
>> > > >> > of the Kafka cluster metadata so an external input is necessary.
>> This
>> > > >> > KIP does not propose a specific mechanism for doing it.
>> > > >> >
>> > > >> > I hope this answers your questions.
>> > > >> >
>> > > >> > Thanks,
>> > > >> > Mickael
>> > > >> >
>> > > >> >
>> > > >> > On Tue, Mar 29, 2022 at 5:42 PM Mickael Maison <
>> > > mickael.maison@gmail.com>
>> > > >> > wrote:
>> > > >> > >
>> > > >> > > Hi Ryanne,
>> > > >> > >
>> > > >> > > That's a good point!
>> > > >> > >
>> > > >> > > There's no value in having all implementations perform the same
>> > > sanity
>> > > >> > > checks. If the replication factor is < 1 or larger than the
>> current
>> > > >> > > number of registered brokers, the controller should directly
>> throw
>> > > >> > > InvalidReplicationFactorException and not call the
>> ReplicaPlacer.
>> > > I've
>> > > >> > > updated the KIP so the place() method now only throws
>> > > >> > > ReplicaPlacementException.
>> > > >> > >
>> > > >> > > Thanks,
>> > > >> > > Mickael
>> > > >> > >
>> > > >> > >
>> > > >> > >
>> > > >> > > On Mon, Mar 28, 2022 at 6:20 PM Ryanne Dolan <
>> ryannedolan@gmail.com
>> > > >
>> > > >> > wrote:
>> > > >> > > >
>> > > >> > > > Wondering about InvalidReplicationFactorException. Why would
>> an
>> > > >> > > > implementation throw this? Given the information passed to
>> the
>> > > method,
>> > > >> > > > seems like this could only be thrown if there were obviously
>> > > invalid
>> > > >> > > > arguments, like a negative number or zero. Can we just
>> guarantee
>> > > such
>> > > >> > > > invalid arguments aren't passed in?
>> > > >> > > >
>> > > >> > > > Ryanne
>> > > >> > > >
>> > > >> > > > On Sat, Mar 26, 2022, 8:51 AM Luke Chen <sh...@gmail.com>
>> > > wrote:
>> > > >> > > >
>> > > >> > > > > Hi Mickael,
>> > > >> > > > >
>> > > >> > > > > Thanks for the KIP!
>> > > >> > > > > It's indeed a pain point for the Kafka admins.
>> > > >> > > > >
>> > > >> > > > > I have some comments:
>> > > >> > > > > 1. Typo in motivation section: When administrators [when
>> to]
>> > > remove
>> > > >> > brokers
>> > > >> > > > > from a cluster,....
>> > > >> > > > > 2. If different `replica.placer.class.name` configs are
>> set in
>> > > all
>> > > >> > > > > controllers, I think only the config for  "active
>> controller"
>> > > will
>> > > >> > take
>> > > >> > > > > effect, right?
>> > > >> > > > > 3. Could you explain more about how the proposal fixes some
>> > > >> > scenarios you
>> > > >> > > > > listed, ex: the new added broker case. How could we know
>> the
>> > > broker
>> > > >> > is new
>> > > >> > > > > added? I guess it's by checking the broker load via some
>> metrics
>> > > >> > > > > dynamically, right?
>> > > >> > > > >
>> > > >> > > > >
>> > > >> > > > > Thank you.
>> > > >> > > > > Luke
>> > > >> > > > >
>> > > >> > > > > On Fri, Mar 18, 2022 at 10:30 AM Ryanne Dolan <
>> > > ryannedolan@gmail.com
>> > > >> > >
>> > > >> > > > > wrote:
>> > > >> > > > >
>> > > >> > > > > > Thanks Mickael, this makes sense to me! I've been wanting
>> > > >> > something like
>> > > >> > > > > > this in order to decommission a broker without new
>> partitions
>> > > >> > getting
>> > > >> > > > > > accidentally assigned to it.
>> > > >> > > > > >
>> > > >> > > > > > Ryanne
>> > > >> > > > > >
>> > > >> > > > > > On Thu, Mar 17, 2022, 5:56 AM Mickael Maison <
>> > > >> > mickael.maison@gmail.com>
>> > > >> > > > > > wrote:
>> > > >> > > > > >
>> > > >> > > > > > > Hi,
>> > > >> > > > > > >
>> > > >> > > > > > > I'd like to start a new discussion on KIP-660. I
>> originally
>> > > >> > wrote this
>> > > >> > > > > > > KIP in 2020 and the initial discussion
>> > > >> > > > > > > (
>> > > >> > https://lists.apache.org/thread/xn7xyb74nyt281brto4x28r9rzxm4lp9
>> )
>> > > >> > > > > > > raised some concerns especially around KRaft (which
>> did not
>> > > >> > exist at
>> > > >> > > > > > > that time) and scalability.
>> > > >> > > > > > >
>> > > >> > > > > > > Since then, we got a new KRaft controller so I've been
>> able
>> > > to
>> > > >> > revisit
>> > > >> > > > > > > this KIP. I kept the KIP number as it's essentially
>> the same
>> > > >> > idea, but
>> > > >> > > > > > > the proposal is significantly different:
>> > > >> > > > > > >
>> > > >> > > > > > >
>> > > >> > > > > >
>> > > >> > > > >
>> > > >> >
>> > >
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-660%3A+Pluggable+ReplicaPlacer
>> > > >> > > > > > >
>> > > >> > > > > > > Please take a look and let me know if you have any
>> feedback.
>> > > >> > > > > > >
>> > > >> > > > > > > Thanks,
>> > > >> > > > > > > Mickael
>> > > >> > > > > > >
>> > > >> > > > > >
>> > > >> > > > >
>> > > >> >
>> > >
>>
>

Re: [DISCUSS] KIP-660: Pluggable ReplicaPlacer

Posted by Viktor Somogyi-Vass <vi...@cloudera.com.INVALID>.
Hey all,

I'd like to revive this discussion. I've created
https://cwiki.apache.org/confluence/display/KAFKA/KIP-879%3A+Multi-level+Rack+Awareness
last November and it seems to be that there is a nice overlap between the
two and would be good to merge. Should we revive KIP-660 and merge the two
KIPs?
If you don't have time for this Mickael currently, I'm happy to take it
over from you and merge the two interfaces, it seems like they're somewhat
similar (and also with the current internal interface).

Best,
Viktor

On Tue, May 31, 2022 at 3:57 PM Mickael Maison <mi...@gmail.com>
wrote:

> Hi Vikas,
>
> You make some very good points and most importantly I agree that being
> able to prevent putting new partitions on a broker should be part of
> Kafka itself and not require a plugin.
>
> This feature would addresses 2 out of the 3 scenarios mentioned in the
> motivation section. The last one "When adding brokers to a cluster,
> Kafka currently does not necessarily place new partitions on new
> brokers" is clearly less important.
>
> So I think I'll retire this KIP and I'll follow up with a new KIP to
> focus on that feature.
>
> Thanks,
> Mickael
>
>
> On Mon, May 9, 2022 at 8:11 PM Vikas Singh <vi...@confluent.io.invalid>
> wrote:
> >
> > Hi Mickael,
> >
> > It's a nice proposal. It's appealing to have a pluggable way to override
> > default kafka placement decisions, and the motivation section lists some
> of
> > them. Here are few comments:
> >
> > * The motivation section has "When adding brokers to a cluster, Kafka
> > currently does not necessarily place new partitions on new brokers". I am
> > not sure how valuable doing this will be. A newly created kafka topic
> takes
> > time to reach the same usage level as existing topics, say because the
> > topic created by a new workload that is getting onboarded, or the
> expansion
> > was done to relieve disk pressure on existing nodes etc. While new topics
> > catch up to existing workload, the new brokers are not sharing equal load
> > in the cluster, which probably defeats the purpose of adding new brokers.
> > In addition to that clustering new topics like this on new brokers have
> > implications from fault domain perspective. A reasonable way to approach
> it
> > is to indeed use CruiseControl to move things around so that the newly
> > added nodes become immediately involved and share cluster load.
> > * Regarding "When administrators want to remove brokers from a cluster,
> > there is no way to prevent Kafka from placing partitions on them", this
> is
> > indeed an issue. I would argue that this is needed by everyone and should
> > be part of Kafka, instead of being implemented as part of a plugin
> > interface by multiple teams.
> > * For "When some brokers are near their storage/throughput limit, Kafka
> > could avoid putting new partitions on them", while this can help relieve
> > short term overload I think again the correct solution here is something
> > like CruiseControl where the system is monitored and things moved around
> to
> > maintain a balanced cluster. A new topic will not take any disk space, so
> > placing them anywhere normally isn't going to add to the storage
> overload.
> > Similar to the previous case, maybe a mechanism in Kafka to put nodes in
> a
> > quarantine state is a better way to approach this.
> >
> > In terms of the proposed api, I have a couple of comments:
> >
> > * It is not clear if the proposal applies to partitions of new topics or
> > addition on partitions to an existing topic. Explicitly stating that will
> > be helpful.
> > * Regarding part "To address the use cases identified in the motivation
> > section, some knowledge about the current state of the cluster is
> > necessary. Details whether a new broker has just been added or is being
> > decommissioned are not part of the cluster metadata. Therefore such
> > knowledge has to be provided via an external means to the ReplicaPlacer,
> > for example via the configuration". It's not clear how this will be done.
> > If I have to implement this interface, it will be helpful to have clear
> > guidance/examples here which hopefully ties to the use cases in the
> > motivation section. It also allows us to figure out if the proposed
> > interface is complete and helps future implementers of the interface.
> >
> > Couple of minor comments:
> > * The KIP is not listed in the main KIP page (
> >
> https://cwiki-test.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals
> ).
> > Can you please add it there.
> > * The page has "This is especially true for the 4 scenarios listed in the
> > Motivation section", but there are only 3 scenarios listed.
> >
> > Regards,
> > Vikas
> >
> >
> > On Tue, May 3, 2022 at 5:51 PM Colin McCabe <cm...@apache.org> wrote:
> >
> > > Hi Mickael,
> > >
> > > We did discuss this earlier, and I remember not being too enthusiastic
> > > about a pluggable policy here :)
> > >
> > > There have been several changes to the placement code in the last few
> > > weeks. (These are examples of the kind of changes that are impossible
> to do
> > > once an API is established, by the way.) Can you please revise the KIP
> to
> > > take these into account?
> > >
> > > I'd also like to understand a little bit better why we need this API
> when
> > > we have the explicit placement API for createTopics and
> createPartitions.
> > > Can you give me a few scenarios where the manual placement API would be
> > > insufficient?
> > >
> > > best,
> > > Colin
> > >
> > >
> > > On Mon, May 2, 2022, at 09:28, Mickael Maison wrote:
> > > > Hi,
> > > >
> > > > If there are no further comments, I'll start a vote in the next few
> days.
> > > >
> > > > Thanks,
> > > > Mickael
> > > >
> > > > On Wed, Mar 30, 2022 at 3:51 AM Luke Chen <sh...@gmail.com> wrote:
> > > >>
> > > >> Hi Mickael,
> > > >>
> > > >> Thanks for the update.
> > > >> It answered my questions!
> > > >>
> > > >> Thank you.
> > > >> Luke
> > > >>
> > > >> On Wed, Mar 30, 2022 at 12:09 AM Mickael Maison <
> > > mickael.maison@gmail.com>
> > > >> wrote:
> > > >>
> > > >> > Hi Luke,
> > > >> >
> > > >> > Thanks for the feedback.
> > > >> >
> > > >> > 1. Thanks, fixed!
> > > >> > 2. Yes that's right. It's the same behavior for topic policies
> > > >> > 3. I've added details about how the mentioned scenarios could be
> > > >> > addressed. The information required to make such decisions is not
> part
> > > >> > of the Kafka cluster metadata so an external input is necessary.
> This
> > > >> > KIP does not propose a specific mechanism for doing it.
> > > >> >
> > > >> > I hope this answers your questions.
> > > >> >
> > > >> > Thanks,
> > > >> > Mickael
> > > >> >
> > > >> >
> > > >> > On Tue, Mar 29, 2022 at 5:42 PM Mickael Maison <
> > > mickael.maison@gmail.com>
> > > >> > wrote:
> > > >> > >
> > > >> > > Hi Ryanne,
> > > >> > >
> > > >> > > That's a good point!
> > > >> > >
> > > >> > > There's no value in having all implementations perform the same
> > > sanity
> > > >> > > checks. If the replication factor is < 1 or larger than the
> current
> > > >> > > number of registered brokers, the controller should directly
> throw
> > > >> > > InvalidReplicationFactorException and not call the
> ReplicaPlacer.
> > > I've
> > > >> > > updated the KIP so the place() method now only throws
> > > >> > > ReplicaPlacementException.
> > > >> > >
> > > >> > > Thanks,
> > > >> > > Mickael
> > > >> > >
> > > >> > >
> > > >> > >
> > > >> > > On Mon, Mar 28, 2022 at 6:20 PM Ryanne Dolan <
> ryannedolan@gmail.com
> > > >
> > > >> > wrote:
> > > >> > > >
> > > >> > > > Wondering about InvalidReplicationFactorException. Why would
> an
> > > >> > > > implementation throw this? Given the information passed to the
> > > method,
> > > >> > > > seems like this could only be thrown if there were obviously
> > > invalid
> > > >> > > > arguments, like a negative number or zero. Can we just
> guarantee
> > > such
> > > >> > > > invalid arguments aren't passed in?
> > > >> > > >
> > > >> > > > Ryanne
> > > >> > > >
> > > >> > > > On Sat, Mar 26, 2022, 8:51 AM Luke Chen <sh...@gmail.com>
> > > wrote:
> > > >> > > >
> > > >> > > > > Hi Mickael,
> > > >> > > > >
> > > >> > > > > Thanks for the KIP!
> > > >> > > > > It's indeed a pain point for the Kafka admins.
> > > >> > > > >
> > > >> > > > > I have some comments:
> > > >> > > > > 1. Typo in motivation section: When administrators [when to]
> > > remove
> > > >> > brokers
> > > >> > > > > from a cluster,....
> > > >> > > > > 2. If different `replica.placer.class.name` configs are
> set in
> > > all
> > > >> > > > > controllers, I think only the config for  "active
> controller"
> > > will
> > > >> > take
> > > >> > > > > effect, right?
> > > >> > > > > 3. Could you explain more about how the proposal fixes some
> > > >> > scenarios you
> > > >> > > > > listed, ex: the new added broker case. How could we know the
> > > broker
> > > >> > is new
> > > >> > > > > added? I guess it's by checking the broker load via some
> metrics
> > > >> > > > > dynamically, right?
> > > >> > > > >
> > > >> > > > >
> > > >> > > > > Thank you.
> > > >> > > > > Luke
> > > >> > > > >
> > > >> > > > > On Fri, Mar 18, 2022 at 10:30 AM Ryanne Dolan <
> > > ryannedolan@gmail.com
> > > >> > >
> > > >> > > > > wrote:
> > > >> > > > >
> > > >> > > > > > Thanks Mickael, this makes sense to me! I've been wanting
> > > >> > something like
> > > >> > > > > > this in order to decommission a broker without new
> partitions
> > > >> > getting
> > > >> > > > > > accidentally assigned to it.
> > > >> > > > > >
> > > >> > > > > > Ryanne
> > > >> > > > > >
> > > >> > > > > > On Thu, Mar 17, 2022, 5:56 AM Mickael Maison <
> > > >> > mickael.maison@gmail.com>
> > > >> > > > > > wrote:
> > > >> > > > > >
> > > >> > > > > > > Hi,
> > > >> > > > > > >
> > > >> > > > > > > I'd like to start a new discussion on KIP-660. I
> originally
> > > >> > wrote this
> > > >> > > > > > > KIP in 2020 and the initial discussion
> > > >> > > > > > > (
> > > >> > https://lists.apache.org/thread/xn7xyb74nyt281brto4x28r9rzxm4lp9)
> > > >> > > > > > > raised some concerns especially around KRaft (which did
> not
> > > >> > exist at
> > > >> > > > > > > that time) and scalability.
> > > >> > > > > > >
> > > >> > > > > > > Since then, we got a new KRaft controller so I've been
> able
> > > to
> > > >> > revisit
> > > >> > > > > > > this KIP. I kept the KIP number as it's essentially the
> same
> > > >> > idea, but
> > > >> > > > > > > the proposal is significantly different:
> > > >> > > > > > >
> > > >> > > > > > >
> > > >> > > > > >
> > > >> > > > >
> > > >> >
> > >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-660%3A+Pluggable+ReplicaPlacer
> > > >> > > > > > >
> > > >> > > > > > > Please take a look and let me know if you have any
> feedback.
> > > >> > > > > > >
> > > >> > > > > > > Thanks,
> > > >> > > > > > > Mickael
> > > >> > > > > > >
> > > >> > > > > >
> > > >> > > > >
> > > >> >
> > >
>

Re: [DISCUSS] KIP-660: Pluggable ReplicaPlacer

Posted by Mickael Maison <mi...@gmail.com>.
Hi Vikas,

You make some very good points and most importantly I agree that being
able to prevent putting new partitions on a broker should be part of
Kafka itself and not require a plugin.

This feature would addresses 2 out of the 3 scenarios mentioned in the
motivation section. The last one "When adding brokers to a cluster,
Kafka currently does not necessarily place new partitions on new
brokers" is clearly less important.

So I think I'll retire this KIP and I'll follow up with a new KIP to
focus on that feature.

Thanks,
Mickael


On Mon, May 9, 2022 at 8:11 PM Vikas Singh <vi...@confluent.io.invalid> wrote:
>
> Hi Mickael,
>
> It's a nice proposal. It's appealing to have a pluggable way to override
> default kafka placement decisions, and the motivation section lists some of
> them. Here are few comments:
>
> * The motivation section has "When adding brokers to a cluster, Kafka
> currently does not necessarily place new partitions on new brokers". I am
> not sure how valuable doing this will be. A newly created kafka topic takes
> time to reach the same usage level as existing topics, say because the
> topic created by a new workload that is getting onboarded, or the expansion
> was done to relieve disk pressure on existing nodes etc. While new topics
> catch up to existing workload, the new brokers are not sharing equal load
> in the cluster, which probably defeats the purpose of adding new brokers.
> In addition to that clustering new topics like this on new brokers have
> implications from fault domain perspective. A reasonable way to approach it
> is to indeed use CruiseControl to move things around so that the newly
> added nodes become immediately involved and share cluster load.
> * Regarding "When administrators want to remove brokers from a cluster,
> there is no way to prevent Kafka from placing partitions on them", this is
> indeed an issue. I would argue that this is needed by everyone and should
> be part of Kafka, instead of being implemented as part of a plugin
> interface by multiple teams.
> * For "When some brokers are near their storage/throughput limit, Kafka
> could avoid putting new partitions on them", while this can help relieve
> short term overload I think again the correct solution here is something
> like CruiseControl where the system is monitored and things moved around to
> maintain a balanced cluster. A new topic will not take any disk space, so
> placing them anywhere normally isn't going to add to the storage overload.
> Similar to the previous case, maybe a mechanism in Kafka to put nodes in a
> quarantine state is a better way to approach this.
>
> In terms of the proposed api, I have a couple of comments:
>
> * It is not clear if the proposal applies to partitions of new topics or
> addition on partitions to an existing topic. Explicitly stating that will
> be helpful.
> * Regarding part "To address the use cases identified in the motivation
> section, some knowledge about the current state of the cluster is
> necessary. Details whether a new broker has just been added or is being
> decommissioned are not part of the cluster metadata. Therefore such
> knowledge has to be provided via an external means to the ReplicaPlacer,
> for example via the configuration". It's not clear how this will be done.
> If I have to implement this interface, it will be helpful to have clear
> guidance/examples here which hopefully ties to the use cases in the
> motivation section. It also allows us to figure out if the proposed
> interface is complete and helps future implementers of the interface.
>
> Couple of minor comments:
> * The KIP is not listed in the main KIP page (
> https://cwiki-test.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals).
> Can you please add it there.
> * The page has "This is especially true for the 4 scenarios listed in the
> Motivation section", but there are only 3 scenarios listed.
>
> Regards,
> Vikas
>
>
> On Tue, May 3, 2022 at 5:51 PM Colin McCabe <cm...@apache.org> wrote:
>
> > Hi Mickael,
> >
> > We did discuss this earlier, and I remember not being too enthusiastic
> > about a pluggable policy here :)
> >
> > There have been several changes to the placement code in the last few
> > weeks. (These are examples of the kind of changes that are impossible to do
> > once an API is established, by the way.) Can you please revise the KIP to
> > take these into account?
> >
> > I'd also like to understand a little bit better why we need this API when
> > we have the explicit placement API for createTopics and createPartitions.
> > Can you give me a few scenarios where the manual placement API would be
> > insufficient?
> >
> > best,
> > Colin
> >
> >
> > On Mon, May 2, 2022, at 09:28, Mickael Maison wrote:
> > > Hi,
> > >
> > > If there are no further comments, I'll start a vote in the next few days.
> > >
> > > Thanks,
> > > Mickael
> > >
> > > On Wed, Mar 30, 2022 at 3:51 AM Luke Chen <sh...@gmail.com> wrote:
> > >>
> > >> Hi Mickael,
> > >>
> > >> Thanks for the update.
> > >> It answered my questions!
> > >>
> > >> Thank you.
> > >> Luke
> > >>
> > >> On Wed, Mar 30, 2022 at 12:09 AM Mickael Maison <
> > mickael.maison@gmail.com>
> > >> wrote:
> > >>
> > >> > Hi Luke,
> > >> >
> > >> > Thanks for the feedback.
> > >> >
> > >> > 1. Thanks, fixed!
> > >> > 2. Yes that's right. It's the same behavior for topic policies
> > >> > 3. I've added details about how the mentioned scenarios could be
> > >> > addressed. The information required to make such decisions is not part
> > >> > of the Kafka cluster metadata so an external input is necessary. This
> > >> > KIP does not propose a specific mechanism for doing it.
> > >> >
> > >> > I hope this answers your questions.
> > >> >
> > >> > Thanks,
> > >> > Mickael
> > >> >
> > >> >
> > >> > On Tue, Mar 29, 2022 at 5:42 PM Mickael Maison <
> > mickael.maison@gmail.com>
> > >> > wrote:
> > >> > >
> > >> > > Hi Ryanne,
> > >> > >
> > >> > > That's a good point!
> > >> > >
> > >> > > There's no value in having all implementations perform the same
> > sanity
> > >> > > checks. If the replication factor is < 1 or larger than the current
> > >> > > number of registered brokers, the controller should directly throw
> > >> > > InvalidReplicationFactorException and not call the ReplicaPlacer.
> > I've
> > >> > > updated the KIP so the place() method now only throws
> > >> > > ReplicaPlacementException.
> > >> > >
> > >> > > Thanks,
> > >> > > Mickael
> > >> > >
> > >> > >
> > >> > >
> > >> > > On Mon, Mar 28, 2022 at 6:20 PM Ryanne Dolan <ryannedolan@gmail.com
> > >
> > >> > wrote:
> > >> > > >
> > >> > > > Wondering about InvalidReplicationFactorException. Why would an
> > >> > > > implementation throw this? Given the information passed to the
> > method,
> > >> > > > seems like this could only be thrown if there were obviously
> > invalid
> > >> > > > arguments, like a negative number or zero. Can we just guarantee
> > such
> > >> > > > invalid arguments aren't passed in?
> > >> > > >
> > >> > > > Ryanne
> > >> > > >
> > >> > > > On Sat, Mar 26, 2022, 8:51 AM Luke Chen <sh...@gmail.com>
> > wrote:
> > >> > > >
> > >> > > > > Hi Mickael,
> > >> > > > >
> > >> > > > > Thanks for the KIP!
> > >> > > > > It's indeed a pain point for the Kafka admins.
> > >> > > > >
> > >> > > > > I have some comments:
> > >> > > > > 1. Typo in motivation section: When administrators [when to]
> > remove
> > >> > brokers
> > >> > > > > from a cluster,....
> > >> > > > > 2. If different `replica.placer.class.name` configs are set in
> > all
> > >> > > > > controllers, I think only the config for  "active controller"
> > will
> > >> > take
> > >> > > > > effect, right?
> > >> > > > > 3. Could you explain more about how the proposal fixes some
> > >> > scenarios you
> > >> > > > > listed, ex: the new added broker case. How could we know the
> > broker
> > >> > is new
> > >> > > > > added? I guess it's by checking the broker load via some metrics
> > >> > > > > dynamically, right?
> > >> > > > >
> > >> > > > >
> > >> > > > > Thank you.
> > >> > > > > Luke
> > >> > > > >
> > >> > > > > On Fri, Mar 18, 2022 at 10:30 AM Ryanne Dolan <
> > ryannedolan@gmail.com
> > >> > >
> > >> > > > > wrote:
> > >> > > > >
> > >> > > > > > Thanks Mickael, this makes sense to me! I've been wanting
> > >> > something like
> > >> > > > > > this in order to decommission a broker without new partitions
> > >> > getting
> > >> > > > > > accidentally assigned to it.
> > >> > > > > >
> > >> > > > > > Ryanne
> > >> > > > > >
> > >> > > > > > On Thu, Mar 17, 2022, 5:56 AM Mickael Maison <
> > >> > mickael.maison@gmail.com>
> > >> > > > > > wrote:
> > >> > > > > >
> > >> > > > > > > Hi,
> > >> > > > > > >
> > >> > > > > > > I'd like to start a new discussion on KIP-660. I originally
> > >> > wrote this
> > >> > > > > > > KIP in 2020 and the initial discussion
> > >> > > > > > > (
> > >> > https://lists.apache.org/thread/xn7xyb74nyt281brto4x28r9rzxm4lp9)
> > >> > > > > > > raised some concerns especially around KRaft (which did not
> > >> > exist at
> > >> > > > > > > that time) and scalability.
> > >> > > > > > >
> > >> > > > > > > Since then, we got a new KRaft controller so I've been able
> > to
> > >> > revisit
> > >> > > > > > > this KIP. I kept the KIP number as it's essentially the same
> > >> > idea, but
> > >> > > > > > > the proposal is significantly different:
> > >> > > > > > >
> > >> > > > > > >
> > >> > > > > >
> > >> > > > >
> > >> >
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-660%3A+Pluggable+ReplicaPlacer
> > >> > > > > > >
> > >> > > > > > > Please take a look and let me know if you have any feedback.
> > >> > > > > > >
> > >> > > > > > > Thanks,
> > >> > > > > > > Mickael
> > >> > > > > > >
> > >> > > > > >
> > >> > > > >
> > >> >
> >

Re: [DISCUSS] KIP-660: Pluggable ReplicaPlacer

Posted by Vikas Singh <vi...@confluent.io.INVALID>.
Hi Mickael,

It's a nice proposal. It's appealing to have a pluggable way to override
default kafka placement decisions, and the motivation section lists some of
them. Here are few comments:

* The motivation section has "When adding brokers to a cluster, Kafka
currently does not necessarily place new partitions on new brokers". I am
not sure how valuable doing this will be. A newly created kafka topic takes
time to reach the same usage level as existing topics, say because the
topic created by a new workload that is getting onboarded, or the expansion
was done to relieve disk pressure on existing nodes etc. While new topics
catch up to existing workload, the new brokers are not sharing equal load
in the cluster, which probably defeats the purpose of adding new brokers.
In addition to that clustering new topics like this on new brokers have
implications from fault domain perspective. A reasonable way to approach it
is to indeed use CruiseControl to move things around so that the newly
added nodes become immediately involved and share cluster load.
* Regarding "When administrators want to remove brokers from a cluster,
there is no way to prevent Kafka from placing partitions on them", this is
indeed an issue. I would argue that this is needed by everyone and should
be part of Kafka, instead of being implemented as part of a plugin
interface by multiple teams.
* For "When some brokers are near their storage/throughput limit, Kafka
could avoid putting new partitions on them", while this can help relieve
short term overload I think again the correct solution here is something
like CruiseControl where the system is monitored and things moved around to
maintain a balanced cluster. A new topic will not take any disk space, so
placing them anywhere normally isn't going to add to the storage overload.
Similar to the previous case, maybe a mechanism in Kafka to put nodes in a
quarantine state is a better way to approach this.

In terms of the proposed api, I have a couple of comments:

* It is not clear if the proposal applies to partitions of new topics or
addition on partitions to an existing topic. Explicitly stating that will
be helpful.
* Regarding part "To address the use cases identified in the motivation
section, some knowledge about the current state of the cluster is
necessary. Details whether a new broker has just been added or is being
decommissioned are not part of the cluster metadata. Therefore such
knowledge has to be provided via an external means to the ReplicaPlacer,
for example via the configuration". It's not clear how this will be done.
If I have to implement this interface, it will be helpful to have clear
guidance/examples here which hopefully ties to the use cases in the
motivation section. It also allows us to figure out if the proposed
interface is complete and helps future implementers of the interface.

Couple of minor comments:
* The KIP is not listed in the main KIP page (
https://cwiki-test.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals).
Can you please add it there.
* The page has "This is especially true for the 4 scenarios listed in the
Motivation section", but there are only 3 scenarios listed.

Regards,
Vikas


On Tue, May 3, 2022 at 5:51 PM Colin McCabe <cm...@apache.org> wrote:

> Hi Mickael,
>
> We did discuss this earlier, and I remember not being too enthusiastic
> about a pluggable policy here :)
>
> There have been several changes to the placement code in the last few
> weeks. (These are examples of the kind of changes that are impossible to do
> once an API is established, by the way.) Can you please revise the KIP to
> take these into account?
>
> I'd also like to understand a little bit better why we need this API when
> we have the explicit placement API for createTopics and createPartitions.
> Can you give me a few scenarios where the manual placement API would be
> insufficient?
>
> best,
> Colin
>
>
> On Mon, May 2, 2022, at 09:28, Mickael Maison wrote:
> > Hi,
> >
> > If there are no further comments, I'll start a vote in the next few days.
> >
> > Thanks,
> > Mickael
> >
> > On Wed, Mar 30, 2022 at 3:51 AM Luke Chen <sh...@gmail.com> wrote:
> >>
> >> Hi Mickael,
> >>
> >> Thanks for the update.
> >> It answered my questions!
> >>
> >> Thank you.
> >> Luke
> >>
> >> On Wed, Mar 30, 2022 at 12:09 AM Mickael Maison <
> mickael.maison@gmail.com>
> >> wrote:
> >>
> >> > Hi Luke,
> >> >
> >> > Thanks for the feedback.
> >> >
> >> > 1. Thanks, fixed!
> >> > 2. Yes that's right. It's the same behavior for topic policies
> >> > 3. I've added details about how the mentioned scenarios could be
> >> > addressed. The information required to make such decisions is not part
> >> > of the Kafka cluster metadata so an external input is necessary. This
> >> > KIP does not propose a specific mechanism for doing it.
> >> >
> >> > I hope this answers your questions.
> >> >
> >> > Thanks,
> >> > Mickael
> >> >
> >> >
> >> > On Tue, Mar 29, 2022 at 5:42 PM Mickael Maison <
> mickael.maison@gmail.com>
> >> > wrote:
> >> > >
> >> > > Hi Ryanne,
> >> > >
> >> > > That's a good point!
> >> > >
> >> > > There's no value in having all implementations perform the same
> sanity
> >> > > checks. If the replication factor is < 1 or larger than the current
> >> > > number of registered brokers, the controller should directly throw
> >> > > InvalidReplicationFactorException and not call the ReplicaPlacer.
> I've
> >> > > updated the KIP so the place() method now only throws
> >> > > ReplicaPlacementException.
> >> > >
> >> > > Thanks,
> >> > > Mickael
> >> > >
> >> > >
> >> > >
> >> > > On Mon, Mar 28, 2022 at 6:20 PM Ryanne Dolan <ryannedolan@gmail.com
> >
> >> > wrote:
> >> > > >
> >> > > > Wondering about InvalidReplicationFactorException. Why would an
> >> > > > implementation throw this? Given the information passed to the
> method,
> >> > > > seems like this could only be thrown if there were obviously
> invalid
> >> > > > arguments, like a negative number or zero. Can we just guarantee
> such
> >> > > > invalid arguments aren't passed in?
> >> > > >
> >> > > > Ryanne
> >> > > >
> >> > > > On Sat, Mar 26, 2022, 8:51 AM Luke Chen <sh...@gmail.com>
> wrote:
> >> > > >
> >> > > > > Hi Mickael,
> >> > > > >
> >> > > > > Thanks for the KIP!
> >> > > > > It's indeed a pain point for the Kafka admins.
> >> > > > >
> >> > > > > I have some comments:
> >> > > > > 1. Typo in motivation section: When administrators [when to]
> remove
> >> > brokers
> >> > > > > from a cluster,....
> >> > > > > 2. If different `replica.placer.class.name` configs are set in
> all
> >> > > > > controllers, I think only the config for  "active controller"
> will
> >> > take
> >> > > > > effect, right?
> >> > > > > 3. Could you explain more about how the proposal fixes some
> >> > scenarios you
> >> > > > > listed, ex: the new added broker case. How could we know the
> broker
> >> > is new
> >> > > > > added? I guess it's by checking the broker load via some metrics
> >> > > > > dynamically, right?
> >> > > > >
> >> > > > >
> >> > > > > Thank you.
> >> > > > > Luke
> >> > > > >
> >> > > > > On Fri, Mar 18, 2022 at 10:30 AM Ryanne Dolan <
> ryannedolan@gmail.com
> >> > >
> >> > > > > wrote:
> >> > > > >
> >> > > > > > Thanks Mickael, this makes sense to me! I've been wanting
> >> > something like
> >> > > > > > this in order to decommission a broker without new partitions
> >> > getting
> >> > > > > > accidentally assigned to it.
> >> > > > > >
> >> > > > > > Ryanne
> >> > > > > >
> >> > > > > > On Thu, Mar 17, 2022, 5:56 AM Mickael Maison <
> >> > mickael.maison@gmail.com>
> >> > > > > > wrote:
> >> > > > > >
> >> > > > > > > Hi,
> >> > > > > > >
> >> > > > > > > I'd like to start a new discussion on KIP-660. I originally
> >> > wrote this
> >> > > > > > > KIP in 2020 and the initial discussion
> >> > > > > > > (
> >> > https://lists.apache.org/thread/xn7xyb74nyt281brto4x28r9rzxm4lp9)
> >> > > > > > > raised some concerns especially around KRaft (which did not
> >> > exist at
> >> > > > > > > that time) and scalability.
> >> > > > > > >
> >> > > > > > > Since then, we got a new KRaft controller so I've been able
> to
> >> > revisit
> >> > > > > > > this KIP. I kept the KIP number as it's essentially the same
> >> > idea, but
> >> > > > > > > the proposal is significantly different:
> >> > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-660%3A+Pluggable+ReplicaPlacer
> >> > > > > > >
> >> > > > > > > Please take a look and let me know if you have any feedback.
> >> > > > > > >
> >> > > > > > > Thanks,
> >> > > > > > > Mickael
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> >
>

Re: [DISCUSS] KIP-660: Pluggable ReplicaPlacer

Posted by Colin McCabe <cm...@apache.org>.
Hi Mickael,

We did discuss this earlier, and I remember not being too enthusiastic about a pluggable policy here :)

There have been several changes to the placement code in the last few weeks. (These are examples of the kind of changes that are impossible to do once an API is established, by the way.) Can you please revise the KIP to take these into account?

I'd also like to understand a little bit better why we need this API when we have the explicit placement API for createTopics and createPartitions. Can you give me a few scenarios where the manual placement API would be insufficient?

best,
Colin


On Mon, May 2, 2022, at 09:28, Mickael Maison wrote:
> Hi,
>
> If there are no further comments, I'll start a vote in the next few days.
>
> Thanks,
> Mickael
>
> On Wed, Mar 30, 2022 at 3:51 AM Luke Chen <sh...@gmail.com> wrote:
>>
>> Hi Mickael,
>>
>> Thanks for the update.
>> It answered my questions!
>>
>> Thank you.
>> Luke
>>
>> On Wed, Mar 30, 2022 at 12:09 AM Mickael Maison <mi...@gmail.com>
>> wrote:
>>
>> > Hi Luke,
>> >
>> > Thanks for the feedback.
>> >
>> > 1. Thanks, fixed!
>> > 2. Yes that's right. It's the same behavior for topic policies
>> > 3. I've added details about how the mentioned scenarios could be
>> > addressed. The information required to make such decisions is not part
>> > of the Kafka cluster metadata so an external input is necessary. This
>> > KIP does not propose a specific mechanism for doing it.
>> >
>> > I hope this answers your questions.
>> >
>> > Thanks,
>> > Mickael
>> >
>> >
>> > On Tue, Mar 29, 2022 at 5:42 PM Mickael Maison <mi...@gmail.com>
>> > wrote:
>> > >
>> > > Hi Ryanne,
>> > >
>> > > That's a good point!
>> > >
>> > > There's no value in having all implementations perform the same sanity
>> > > checks. If the replication factor is < 1 or larger than the current
>> > > number of registered brokers, the controller should directly throw
>> > > InvalidReplicationFactorException and not call the ReplicaPlacer. I've
>> > > updated the KIP so the place() method now only throws
>> > > ReplicaPlacementException.
>> > >
>> > > Thanks,
>> > > Mickael
>> > >
>> > >
>> > >
>> > > On Mon, Mar 28, 2022 at 6:20 PM Ryanne Dolan <ry...@gmail.com>
>> > wrote:
>> > > >
>> > > > Wondering about InvalidReplicationFactorException. Why would an
>> > > > implementation throw this? Given the information passed to the method,
>> > > > seems like this could only be thrown if there were obviously invalid
>> > > > arguments, like a negative number or zero. Can we just guarantee such
>> > > > invalid arguments aren't passed in?
>> > > >
>> > > > Ryanne
>> > > >
>> > > > On Sat, Mar 26, 2022, 8:51 AM Luke Chen <sh...@gmail.com> wrote:
>> > > >
>> > > > > Hi Mickael,
>> > > > >
>> > > > > Thanks for the KIP!
>> > > > > It's indeed a pain point for the Kafka admins.
>> > > > >
>> > > > > I have some comments:
>> > > > > 1. Typo in motivation section: When administrators [when to] remove
>> > brokers
>> > > > > from a cluster,....
>> > > > > 2. If different `replica.placer.class.name` configs are set in all
>> > > > > controllers, I think only the config for  "active controller" will
>> > take
>> > > > > effect, right?
>> > > > > 3. Could you explain more about how the proposal fixes some
>> > scenarios you
>> > > > > listed, ex: the new added broker case. How could we know the broker
>> > is new
>> > > > > added? I guess it's by checking the broker load via some metrics
>> > > > > dynamically, right?
>> > > > >
>> > > > >
>> > > > > Thank you.
>> > > > > Luke
>> > > > >
>> > > > > On Fri, Mar 18, 2022 at 10:30 AM Ryanne Dolan <ryannedolan@gmail.com
>> > >
>> > > > > wrote:
>> > > > >
>> > > > > > Thanks Mickael, this makes sense to me! I've been wanting
>> > something like
>> > > > > > this in order to decommission a broker without new partitions
>> > getting
>> > > > > > accidentally assigned to it.
>> > > > > >
>> > > > > > Ryanne
>> > > > > >
>> > > > > > On Thu, Mar 17, 2022, 5:56 AM Mickael Maison <
>> > mickael.maison@gmail.com>
>> > > > > > wrote:
>> > > > > >
>> > > > > > > Hi,
>> > > > > > >
>> > > > > > > I'd like to start a new discussion on KIP-660. I originally
>> > wrote this
>> > > > > > > KIP in 2020 and the initial discussion
>> > > > > > > (
>> > https://lists.apache.org/thread/xn7xyb74nyt281brto4x28r9rzxm4lp9)
>> > > > > > > raised some concerns especially around KRaft (which did not
>> > exist at
>> > > > > > > that time) and scalability.
>> > > > > > >
>> > > > > > > Since then, we got a new KRaft controller so I've been able to
>> > revisit
>> > > > > > > this KIP. I kept the KIP number as it's essentially the same
>> > idea, but
>> > > > > > > the proposal is significantly different:
>> > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-660%3A+Pluggable+ReplicaPlacer
>> > > > > > >
>> > > > > > > Please take a look and let me know if you have any feedback.
>> > > > > > >
>> > > > > > > Thanks,
>> > > > > > > Mickael
>> > > > > > >
>> > > > > >
>> > > > >
>> >