You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@kafka.apache.org by Stanislav Kozlovski <st...@confluent.io> on 2019/11/01 08:49:55 UTC

Re: [DISCUSS] KIP-542: Partition Reassignment Throttling

Hey Viktor. Thanks for the KIP!

> We will introduce two new configs in order to eventually replace
*.replication.throttled.rate.
Just to clarify, you mean to replace said config in the context of
reassignment throttling, right? We are not planning to remove that config

And also to clarify, *.throttled.replicas will not apply to the new
*reassignment* configs, correct? We will throttle all reassigning replicas.
(I am +1 on this, I believe it is easier to reason about. We could always
add a new config later)

I have one comment about backwards-compatibility - should we ensure that
the old `*.replication.throttled.rate` and `*.throttled.replicas` still
apply to reassigning traffic if set? We could have the new config take
precedence, but still preserve backwards compatibility.

Thanks,
Stanislav

On Thu, Oct 24, 2019 at 1:38 PM Viktor Somogyi-Vass <vi...@gmail.com>
wrote:

> Hi People,
>
> I've created a KIP to improve replication quotas by handling reassignment
> related throttling as a separate case with its own configurable limits and
> change the kafka-reassign-partitions tool to use these new configs going
> forward.
> Please have a look, I'd be happy to receive any feedback and answer
> all your questions.
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-542%3A+Partition+Reassignment+Throttling
>
> Thanks,
> Viktor
>

-- 
Best,
Stanislav

Re: [DISCUSS] KIP-542: Partition Reassignment Throttling

Posted by Viktor Somogyi-Vass <vi...@gmail.com>.

Hi Stan,

I was about to start a vote on this one but I think I have one more idea to
your last point about the total cap.
What if we said that the (leader|follower).replication.throttled.rate is
the overall limit which we allow for leadership replication (so the total
cap) and (leader|follower).reassignment.throttled.rate must have a value
lower than that. By default it'd be -1 which would mean that
replication.throttled.rate should be applied (so the backward compatible
behavior). Increasing this value would mean that we put a limit on
reassignment throttling. For other replication throttling the
replication.throttled.rate - reassignment.throttled.rate would be applied.
If replication.throttled.rate is not specified but
reassignment.throttled.rate is specified, then the reassignment is bounded
and other replication traffic isn't. Finally,
replication.throttled.replicas would be applied on reassignment too if
specified so the reassignment won't "escape" the boundaries given by the
replication throttle side.
I think this is a fair solution to solve the total cap problem and would be
aligned with the current config.
What do you think?

Viktor

On Mon, Nov 4, 2019 at 3:55 PM Viktor Somogyi-Vass <vi...@gmail.com>
wrote:

> Exactly. I also can't envision scenarios where we would like to throttle
> the reassignment traffic to only a subset of the reassigning replicas.
>
> The other day I was wondering about that with specialized quotas we can
> solve the incremental partition reassignment too. Basically the controller
> would throttle most of the partitions to 0 and let only some of them to
> reassign but I discarded the idea because it is more intuitive to actually
> break up a big reassignment into smaller steps (and more traceable too).
> But perhaps there is a need for throttling the reassigning replicas
> differently depending on the produce rate on those partitions, however in
> my mind I was planning with the incremental partition reassignment so
> perhaps it'd be the best if the controller would be able to decide how many
> partition can be fitted into the given bandwidth and we'd just expose
> simple configs.
>
> If we always take the lowest value, this means that the reassignment
> throttle must always be equal to or lower than the replication throttle.
> Doesn't that mean that the reassigning partitions may never catch up? I
> guess not, since we expect to always be moving less than the total number
> of partitions at one time.
> I have mixed feelings about this - I like the flexibility of being able to
> configure whatever value we please, yet I struggle to come up with a
> scenario where we would want a higher reassignment throttle than
> replication. Perhaps your suggestion is better.
>
> Yes it could mean that, however concern with preferring reassignment
> quotas is that it could cause the "bootstrapping broker problem", so the
> sum of follower reassignment + replication quotas would eat away the
> bandwidth from the leaders. In this case I think it's a better problem to
> have a reassignment that you can't finish than leaders unable to answer
> fetch requests fast enough. The reassignment problem can be mitigated in
> this case by carefully increasing the replication & reassignment quotas in
> this case for the given partition. I'll set up a test environment for this
> though and get back if something doesn't add up.
>
> This begs another question - since we're separating the replication
> throttle from the reassignment throttle, the maximum traffic a broker may
> replicate now becomes `replication.throttled.rate` + `
> reassignment.throttled.rate`
> Seems like we would benefit from having a total cap to ensure users don't
> shoot themselves in the foot.
>
> We could have a new config that denotes the total possible throttle rate
> and we then divide that by reassignment and replication. But that assumes
> that we would set the replication.throttled.rate much lower than what the
> broker could handle.
>
> Perhaps the best approach would be to denote how much the broker can handle
> (total.replication.throttle.rate) and then allow only up to N% of that go
> towards reassignments (reassignment.throttled.rate) in a best-effort way
> (preferring replication traffic). That sounds tricky to implement though
> Interested to hear what others think
>
> Good catch. I'm also leaning towards to having simpler configs and
> improving the broker/controller code to make more intelligent decisions. I
> also agree with having a total.replication.throttle.rate but I think we
> should stay with the byte based notation as that is more conventional in
> the quota world and easier to handle. That way you can say that your total
> replication quota is 10, your leader and follower replication quota is 3
> each, the reassignment ones are 2 each and then you maxed out your limit.
> We can print warnings/errors if the overall value doesn't match up to the
> max.
>
> Viktor
>
> On Mon, Nov 4, 2019 at 12:27 PM Stanislav Kozlovski <
> stanislav@confluent.io> wrote:
>
>> Hi Viktor,
>>
>> > As for the first question I think is no need for *.throttled.replicas in
>> case of reassignment because the LeaderAndIsrRequest exactly specifies the
>> replicas needed to be throttled.
>>
>> Exactly. I also can't envision scenarios where we would like to throttle
>> the reassignment traffic to only a subset of the reassigning replicas.
>>
>> > For instance a bootstrapping server where all replicas are throttled and
>> there are reassigning replicas and the reassignment throttle set higher I
>> think we should still apply the replication throttle to ensure the broker
>> won't have problems. What do you think?
>>
>> If we always take the lowest value, this means that the reassignment
>> throttle must always be equal to or lower than the replication throttle.
>> Doesn't that mean that the reassigning partitions may never catch up? I
>> guess not, since we expect to always be moving less than the total number
>> of partitions at one time.
>> I have mixed feelings about this - I like the flexibility of being able to
>> configure whatever value we please, yet I struggle to come up with a
>> scenario where we would want a higher reassignment throttle than
>> replication. Perhaps your suggestion is better.
>>
>> This begs another question - since we're separating the replication
>> throttle from the reassignment throttle, the maximum traffic a broker may
>> replicate now becomes `replication.throttled.rate` + `
>> reassignment.throttled.rate`
>> Seems like we would benefit from having a total cap to ensure users don't
>> shoot themselves in the foot.
>>
>> We could have a new config that denotes the total possible throttle rate
>> and we then divide that by reassignment and replication. But that assumes
>> that we would set the replication.throttled.rate much lower than what the
>> broker could handle.
>>
>> Perhaps the best approach would be to denote how much the broker can
>> handle
>> (total.replication.throttle.rate) and then allow only up to N% of that go
>> towards reassignments (reassignment.throttled.rate) in a best-effort way
>> (preferring replication traffic). That sounds tricky to implement though
>> Interested to hear what others think
>>
>> Best,
>> Stanislav
>>
>>
>> On Mon, Nov 4, 2019 at 11:08 AM Viktor Somogyi-Vass <
>> viktorsomogyi@gmail.com>
>> wrote:
>>
>> > Hey Stan,
>> >
>> > > We will introduce two new configs in order to eventually replace
>> > *.replication.throttled.rate.
>> > Just to clarify, you mean to replace said config in the context of
>> > reassignment throttling, right? We are not planning to remove that
>> config
>> >
>> > Yes, I don't want to remove that config either. Removed that sentence.
>> >
>> > And also to clarify, *.throttled.replicas will not apply to the new
>> > *reassignment* configs, correct? We will throttle all reassigning
>> replicas.
>> > (I am +1 on this, I believe it is easier to reason about. We could
>> always
>> > add a new config later)
>> >
>> > Are you asking whether there is a need for a
>> > leader.reassignment.throttled.replicas and
>> > follower.reassignment.throttled.replicas config or are you interested in
>> > the behavior between the old and the new configs?
>> > As for the first question I think is no need for *.throttled.replicas in
>> > case of reassignment because the LeaderAndIsrRequest exactly specifies
>> the
>> > replicas needed to be throttled.
>> > As for the second, see below.
>> >
>> > I have one comment about backwards-compatibility - should we ensure that
>> > the old `*.replication.throttled.rate` and `*.throttled.replicas` still
>> > apply to reassigning traffic if set? We could have the new config take
>> > precedence, but still preserve backwards compatibility.
>> >
>> > Sure, we should apply replication throttling to reassignment too if set.
>> > But instead of the new taking precedence I'd apply whichever has lower
>> > value.
>> > For instance a bootstrapping server where all replicas are throttled and
>> > there are reassigning replicas and the reassignment throttle set higher
>> I
>> > think we should still apply the replication throttle to ensure the
>> broker
>> > won't have problems. What do you think?
>> >
>> > Thanks,
>> > Viktor
>> >
>> >
>> > On Fri, Nov 1, 2019 at 9:57 AM Stanislav Kozlovski <
>> stanislav@confluent.io
>> > >
>> > wrote:
>> >
>> > > Hey Viktor. Thanks for the KIP!
>> > >
>> > > > We will introduce two new configs in order to eventually replace
>> > > *.replication.throttled.rate.
>> > > Just to clarify, you mean to replace said config in the context of
>> > > reassignment throttling, right? We are not planning to remove that
>> config
>> > >
>> > > And also to clarify, *.throttled.replicas will not apply to the new
>> > > *reassignment* configs, correct? We will throttle all reassigning
>> > replicas.
>> > > (I am +1 on this, I believe it is easier to reason about. We could
>> always
>> > > add a new config later)
>> > >
>> > > I have one comment about backwards-compatibility - should we ensure
>> that
>> > > the old `*.replication.throttled.rate` and `*.throttled.replicas`
>> still
>> > > apply to reassigning traffic if set? We could have the new config take
>> > > precedence, but still preserve backwards compatibility.
>> > >
>> > > Thanks,
>> > > Stanislav
>> > >
>> > > On Thu, Oct 24, 2019 at 1:38 PM Viktor Somogyi-Vass <
>> > > viktorsomogyi@gmail.com>
>> > > wrote:
>> > >
>> > > > Hi People,
>> > > >
>> > > > I've created a KIP to improve replication quotas by handling
>> > reassignment
>> > > > related throttling as a separate case with its own configurable
>> limits
>> > > and
>> > > > change the kafka-reassign-partitions tool to use these new configs
>> > going
>> > > > forward.
>> > > > Please have a look, I'd be happy to receive any feedback and answer
>> > > > all your questions.
>> > > >
>> > > >
>> > >
>> >
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-542%3A+Partition+Reassignment+Throttling
>> > > >
>> > > > Thanks,
>> > > > Viktor
>> > > >
>> > >
>> > >
>> > > --
>> > > Best,
>> > > Stanislav
>> > >
>> >
>>
>>
>> --
>> Best,
>> Stanislav
>>
>

Re: [DISCUSS] KIP-542: Partition Reassignment Throttling

Posted by Viktor Somogyi-Vass <vi...@gmail.com>.

Exactly. I also can't envision scenarios where we would like to throttle
the reassignment traffic to only a subset of the reassigning replicas.

The other day I was wondering about that with specialized quotas we can
solve the incremental partition reassignment too. Basically the controller
would throttle most of the partitions to 0 and let only some of them to
reassign but I discarded the idea because it is more intuitive to actually
break up a big reassignment into smaller steps (and more traceable too).
But perhaps there is a need for throttling the reassigning replicas
differently depending on the produce rate on those partitions, however in
my mind I was planning with the incremental partition reassignment so
perhaps it'd be the best if the controller would be able to decide how many
partition can be fitted into the given bandwidth and we'd just expose
simple configs.

If we always take the lowest value, this means that the reassignment
throttle must always be equal to or lower than the replication throttle.
Doesn't that mean that the reassigning partitions may never catch up? I
guess not, since we expect to always be moving less than the total number
of partitions at one time.
I have mixed feelings about this - I like the flexibility of being able to
configure whatever value we please, yet I struggle to come up with a
scenario where we would want a higher reassignment throttle than
replication. Perhaps your suggestion is better.

Yes it could mean that, however concern with preferring reassignment quotas
is that it could cause the "bootstrapping broker problem", so the sum of
follower reassignment + replication quotas would eat away the bandwidth
from the leaders. In this case I think it's a better problem to have a
reassignment that you can't finish than leaders unable to answer fetch
requests fast enough. The reassignment problem can be mitigated in this
case by carefully increasing the replication & reassignment quotas in this
case for the given partition. I'll set up a test environment for this
though and get back if something doesn't add up.

This begs another question - since we're separating the replication
throttle from the reassignment throttle, the maximum traffic a broker may
replicate now becomes `replication.throttled.rate` + `
reassignment.throttled.rate`
Seems like we would benefit from having a total cap to ensure users don't
shoot themselves in the foot.

We could have a new config that denotes the total possible throttle rate
and we then divide that by reassignment and replication. But that assumes
that we would set the replication.throttled.rate much lower than what the
broker could handle.

Perhaps the best approach would be to denote how much the broker can handle
(total.replication.throttle.rate) and then allow only up to N% of that go
towards reassignments (reassignment.throttled.rate) in a best-effort way
(preferring replication traffic). That sounds tricky to implement though
Interested to hear what others think

Good catch. I'm also leaning towards to having simpler configs and
improving the broker/controller code to make more intelligent decisions. I
also agree with having a total.replication.throttle.rate but I think we
should stay with the byte based notation as that is more conventional in
the quota world and easier to handle. That way you can say that your total
replication quota is 10, your leader and follower replication quota is 3
each, the reassignment ones are 2 each and then you maxed out your limit.
We can print warnings/errors if the overall value doesn't match up to the
max.

Viktor

On Mon, Nov 4, 2019 at 12:27 PM Stanislav Kozlovski <st...@confluent.io>
wrote:

> Hi Viktor,
>
> > As for the first question I think is no need for *.throttled.replicas in
> case of reassignment because the LeaderAndIsrRequest exactly specifies the
> replicas needed to be throttled.
>
> Exactly. I also can't envision scenarios where we would like to throttle
> the reassignment traffic to only a subset of the reassigning replicas.
>
> > For instance a bootstrapping server where all replicas are throttled and
> there are reassigning replicas and the reassignment throttle set higher I
> think we should still apply the replication throttle to ensure the broker
> won't have problems. What do you think?
>
> If we always take the lowest value, this means that the reassignment
> throttle must always be equal to or lower than the replication throttle.
> Doesn't that mean that the reassigning partitions may never catch up? I
> guess not, since we expect to always be moving less than the total number
> of partitions at one time.
> I have mixed feelings about this - I like the flexibility of being able to
> configure whatever value we please, yet I struggle to come up with a
> scenario where we would want a higher reassignment throttle than
> replication. Perhaps your suggestion is better.
>
> This begs another question - since we're separating the replication
> throttle from the reassignment throttle, the maximum traffic a broker may
> replicate now becomes `replication.throttled.rate` + `
> reassignment.throttled.rate`
> Seems like we would benefit from having a total cap to ensure users don't
> shoot themselves in the foot.
>
> We could have a new config that denotes the total possible throttle rate
> and we then divide that by reassignment and replication. But that assumes
> that we would set the replication.throttled.rate much lower than what the
> broker could handle.
>
> Perhaps the best approach would be to denote how much the broker can handle
> (total.replication.throttle.rate) and then allow only up to N% of that go
> towards reassignments (reassignment.throttled.rate) in a best-effort way
> (preferring replication traffic). That sounds tricky to implement though
> Interested to hear what others think
>
> Best,
> Stanislav
>
>
> On Mon, Nov 4, 2019 at 11:08 AM Viktor Somogyi-Vass <
> viktorsomogyi@gmail.com>
> wrote:
>
> > Hey Stan,
> >
> > > We will introduce two new configs in order to eventually replace
> > *.replication.throttled.rate.
> > Just to clarify, you mean to replace said config in the context of
> > reassignment throttling, right? We are not planning to remove that config
> >
> > Yes, I don't want to remove that config either. Removed that sentence.
> >
> > And also to clarify, *.throttled.replicas will not apply to the new
> > *reassignment* configs, correct? We will throttle all reassigning
> replicas.
> > (I am +1 on this, I believe it is easier to reason about. We could always
> > add a new config later)
> >
> > Are you asking whether there is a need for a
> > leader.reassignment.throttled.replicas and
> > follower.reassignment.throttled.replicas config or are you interested in
> > the behavior between the old and the new configs?
> > As for the first question I think is no need for *.throttled.replicas in
> > case of reassignment because the LeaderAndIsrRequest exactly specifies
> the
> > replicas needed to be throttled.
> > As for the second, see below.
> >
> > I have one comment about backwards-compatibility - should we ensure that
> > the old `*.replication.throttled.rate` and `*.throttled.replicas` still
> > apply to reassigning traffic if set? We could have the new config take
> > precedence, but still preserve backwards compatibility.
> >
> > Sure, we should apply replication throttling to reassignment too if set.
> > But instead of the new taking precedence I'd apply whichever has lower
> > value.
> > For instance a bootstrapping server where all replicas are throttled and
> > there are reassigning replicas and the reassignment throttle set higher I
> > think we should still apply the replication throttle to ensure the broker
> > won't have problems. What do you think?
> >
> > Thanks,
> > Viktor
> >
> >
> > On Fri, Nov 1, 2019 at 9:57 AM Stanislav Kozlovski <
> stanislav@confluent.io
> > >
> > wrote:
> >
> > > Hey Viktor. Thanks for the KIP!
> > >
> > > > We will introduce two new configs in order to eventually replace
> > > *.replication.throttled.rate.
> > > Just to clarify, you mean to replace said config in the context of
> > > reassignment throttling, right? We are not planning to remove that
> config
> > >
> > > And also to clarify, *.throttled.replicas will not apply to the new
> > > *reassignment* configs, correct? We will throttle all reassigning
> > replicas.
> > > (I am +1 on this, I believe it is easier to reason about. We could
> always
> > > add a new config later)
> > >
> > > I have one comment about backwards-compatibility - should we ensure
> that
> > > the old `*.replication.throttled.rate` and `*.throttled.replicas` still
> > > apply to reassigning traffic if set? We could have the new config take
> > > precedence, but still preserve backwards compatibility.
> > >
> > > Thanks,
> > > Stanislav
> > >
> > > On Thu, Oct 24, 2019 at 1:38 PM Viktor Somogyi-Vass <
> > > viktorsomogyi@gmail.com>
> > > wrote:
> > >
> > > > Hi People,
> > > >
> > > > I've created a KIP to improve replication quotas by handling
> > reassignment
> > > > related throttling as a separate case with its own configurable
> limits
> > > and
> > > > change the kafka-reassign-partitions tool to use these new configs
> > going
> > > > forward.
> > > > Please have a look, I'd be happy to receive any feedback and answer
> > > > all your questions.
> > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-542%3A+Partition+Reassignment+Throttling
> > > >
> > > > Thanks,
> > > > Viktor
> > > >
> > >
> > >
> > > --
> > > Best,
> > > Stanislav
> > >
> >
>
>
> --
> Best,
> Stanislav
>

Re: [DISCUSS] KIP-542: Partition Reassignment Throttling

Posted by Stanislav Kozlovski <st...@confluent.io>.

Hi Viktor,

> As for the first question I think is no need for *.throttled.replicas in
case of reassignment because the LeaderAndIsrRequest exactly specifies the
replicas needed to be throttled.

Exactly. I also can't envision scenarios where we would like to throttle
the reassignment traffic to only a subset of the reassigning replicas.

> For instance a bootstrapping server where all replicas are throttled and
there are reassigning replicas and the reassignment throttle set higher I
think we should still apply the replication throttle to ensure the broker
won't have problems. What do you think?

If we always take the lowest value, this means that the reassignment
throttle must always be equal to or lower than the replication throttle.
Doesn't that mean that the reassigning partitions may never catch up? I
guess not, since we expect to always be moving less than the total number
of partitions at one time.
I have mixed feelings about this - I like the flexibility of being able to
configure whatever value we please, yet I struggle to come up with a
scenario where we would want a higher reassignment throttle than
replication. Perhaps your suggestion is better.

This begs another question - since we're separating the replication
throttle from the reassignment throttle, the maximum traffic a broker may
replicate now becomes `replication.throttled.rate` + `
reassignment.throttled.rate`
Seems like we would benefit from having a total cap to ensure users don't
shoot themselves in the foot.

We could have a new config that denotes the total possible throttle rate
and we then divide that by reassignment and replication. But that assumes
that we would set the replication.throttled.rate much lower than what the
broker could handle.

Perhaps the best approach would be to denote how much the broker can handle
(total.replication.throttle.rate) and then allow only up to N% of that go
towards reassignments (reassignment.throttled.rate) in a best-effort way
(preferring replication traffic). That sounds tricky to implement though
Interested to hear what others think

Best,
Stanislav

On Mon, Nov 4, 2019 at 11:08 AM Viktor Somogyi-Vass <vi...@gmail.com>
wrote:

> Hey Stan,
>
> > We will introduce two new configs in order to eventually replace
> *.replication.throttled.rate.
> Just to clarify, you mean to replace said config in the context of
> reassignment throttling, right? We are not planning to remove that config
>
> Yes, I don't want to remove that config either. Removed that sentence.
>
> And also to clarify, *.throttled.replicas will not apply to the new
> *reassignment* configs, correct? We will throttle all reassigning replicas.
> (I am +1 on this, I believe it is easier to reason about. We could always
> add a new config later)
>
> Are you asking whether there is a need for a
> leader.reassignment.throttled.replicas and
> follower.reassignment.throttled.replicas config or are you interested in
> the behavior between the old and the new configs?
> As for the first question I think is no need for *.throttled.replicas in
> case of reassignment because the LeaderAndIsrRequest exactly specifies the
> replicas needed to be throttled.
> As for the second, see below.
>
> I have one comment about backwards-compatibility - should we ensure that
> the old `*.replication.throttled.rate` and `*.throttled.replicas` still
> apply to reassigning traffic if set? We could have the new config take
> precedence, but still preserve backwards compatibility.
>
> Sure, we should apply replication throttling to reassignment too if set.
> But instead of the new taking precedence I'd apply whichever has lower
> value.
> For instance a bootstrapping server where all replicas are throttled and
> there are reassigning replicas and the reassignment throttle set higher I
> think we should still apply the replication throttle to ensure the broker
> won't have problems. What do you think?
>
> Thanks,
> Viktor
>
>
> On Fri, Nov 1, 2019 at 9:57 AM Stanislav Kozlovski <stanislav@confluent.io
> >
> wrote:
>
> > Hey Viktor. Thanks for the KIP!
> >
> > > We will introduce two new configs in order to eventually replace
> > *.replication.throttled.rate.
> > Just to clarify, you mean to replace said config in the context of
> > reassignment throttling, right? We are not planning to remove that config
> >
> > And also to clarify, *.throttled.replicas will not apply to the new
> > *reassignment* configs, correct? We will throttle all reassigning
> replicas.
> > (I am +1 on this, I believe it is easier to reason about. We could always
> > add a new config later)
> >
> > I have one comment about backwards-compatibility - should we ensure that
> > the old `*.replication.throttled.rate` and `*.throttled.replicas` still
> > apply to reassigning traffic if set? We could have the new config take
> > precedence, but still preserve backwards compatibility.
> >
> > Thanks,
> > Stanislav
> >
> > On Thu, Oct 24, 2019 at 1:38 PM Viktor Somogyi-Vass <
> > viktorsomogyi@gmail.com>
> > wrote:
> >
> > > Hi People,
> > >
> > > I've created a KIP to improve replication quotas by handling
> reassignment
> > > related throttling as a separate case with its own configurable limits
> > and
> > > change the kafka-reassign-partitions tool to use these new configs
> going
> > > forward.
> > > Please have a look, I'd be happy to receive any feedback and answer
> > > all your questions.
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-542%3A+Partition+Reassignment+Throttling
> > >
> > > Thanks,
> > > Viktor
> > >
> >
> >
> > --
> > Best,
> > Stanislav
> >
>

-- 
Best,
Stanislav

Re: [DISCUSS] KIP-542: Partition Reassignment Throttling

Posted by Viktor Somogyi-Vass <vi...@gmail.com>.

Hey Stan,

> We will introduce two new configs in order to eventually replace
*.replication.throttled.rate.
Just to clarify, you mean to replace said config in the context of
reassignment throttling, right? We are not planning to remove that config

Yes, I don't want to remove that config either. Removed that sentence.

And also to clarify, *.throttled.replicas will not apply to the new
*reassignment* configs, correct? We will throttle all reassigning replicas.
(I am +1 on this, I believe it is easier to reason about. We could always
add a new config later)

Are you asking whether there is a need for a
leader.reassignment.throttled.replicas and
follower.reassignment.throttled.replicas config or are you interested in
the behavior between the old and the new configs?
As for the first question I think is no need for *.throttled.replicas in
case of reassignment because the LeaderAndIsrRequest exactly specifies the
replicas needed to be throttled.
As for the second, see below.

I have one comment about backwards-compatibility - should we ensure that
the old `*.replication.throttled.rate` and `*.throttled.replicas` still
apply to reassigning traffic if set? We could have the new config take
precedence, but still preserve backwards compatibility.

Sure, we should apply replication throttling to reassignment too if set.
But instead of the new taking precedence I'd apply whichever has lower
value.
For instance a bootstrapping server where all replicas are throttled and
there are reassigning replicas and the reassignment throttle set higher I
think we should still apply the replication throttle to ensure the broker
won't have problems. What do you think?

Thanks,
Viktor

On Fri, Nov 1, 2019 at 9:57 AM Stanislav Kozlovski <st...@confluent.io>
wrote:

> Hey Viktor. Thanks for the KIP!
>
> > We will introduce two new configs in order to eventually replace
> *.replication.throttled.rate.
> Just to clarify, you mean to replace said config in the context of
> reassignment throttling, right? We are not planning to remove that config
>
> And also to clarify, *.throttled.replicas will not apply to the new
> *reassignment* configs, correct? We will throttle all reassigning replicas.
> (I am +1 on this, I believe it is easier to reason about. We could always
> add a new config later)
>
> I have one comment about backwards-compatibility - should we ensure that
> the old `*.replication.throttled.rate` and `*.throttled.replicas` still
> apply to reassigning traffic if set? We could have the new config take
> precedence, but still preserve backwards compatibility.
>
> Thanks,
> Stanislav
>
> On Thu, Oct 24, 2019 at 1:38 PM Viktor Somogyi-Vass <
> viktorsomogyi@gmail.com>
> wrote:
>
> > Hi People,
> >
> > I've created a KIP to improve replication quotas by handling reassignment
> > related throttling as a separate case with its own configurable limits
> and
> > change the kafka-reassign-partitions tool to use these new configs going
> > forward.
> > Please have a look, I'd be happy to receive any feedback and answer
> > all your questions.
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-542%3A+Partition+Reassignment+Throttling
> >
> > Thanks,
> > Viktor
> >
>
>
> --
> Best,
> Stanislav
>