You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by Tom Bentley <t....@gmail.com> on 2017/11/01 10:20:18 UTC

Re: [DISCUSS] KIP-179: Change ReassignPartitionsCommand to use AdminClient

This thread has been very quiet for a while now. It's unclear whether this
is because no one has anything more to say, or whether no one has taken a
look at it in its current form. I suspect the latter, so I'm not calling
the vote today, but instead asking for more review.

What's currently proposed – in addition to the reassignPartitions() API
itself – is to have a pair of RPCs for managing throttles. This is quite
different from the earlier proposal to reuse alterConfigs(). The benefits
of this specific API include:

* being more typesafe,
* allowing for the automatic removal of throttles when reassignment has
completed,
* being careful about correct management of the throttles wrt controller
failover

Surely someone has something to say about this, before we reach the vote
stage?

https://cwiki.apache.org/confluence/display/KAFKA/KIP-179+-+Change+
ReassignPartitionsCommand+to+use+AdminClient

Thanks,

Tom


On 25 October 2017 at 10:33, Tom Bentley <t....@gmail.com> wrote:

> If there are no further comments, I will start a vote on this next week.
>
> Thanks,
>
> Tom
>
> On 20 October 2017 at 08:33, Tom Bentley <t....@gmail.com> wrote:
>
>> Hi,
>>
>> I've made a fairly major update to KIP-179 to propose APIs for setting
>> throttled rates and throttled replicas with the ability to remove these
>> automatically at the end of reassignment.
>>
>> I'd be grateful for your feedback:
>>
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-179+-+
>> Change+ReassignPartitionsCommand+to+use+AdminClient
>>
>> Thanks,
>>
>> Tom
>>
>> On 2 October 2017 at 13:15, Tom Bentley <t....@gmail.com> wrote:
>>
>>> One question I have is about whether/how to scope throttling to a
>>> reassignment. Currently throttles are only loosely associated with
>>> reassignment: You can start a reassignment without any throttling, add
>>> throttling to an in-flight reassignment, and remember/forget to remove
>>> throttling after the reassignment is complete. There's is great flexibility
>>> in that, but also the risk that you forget the remove the throttle(s).
>>>
>>> Just adding an API for setting the throttled rate makes this situation
>>> worse: While it's nice to be able to auto-remove the throttles rate what
>>> about the config for the throttled replicas? Also you might add a throttle
>>> thinking a reassignment is in-flight, but it has in fact just finished:
>>> Those throttles will now hang around until reset or the end of the next
>>> reassignment. For these reasons it would be good if the throttle were more
>>> directly scoped to the reassignment.
>>>
>>> On the other hand, taking LinkedIn's Cruise Control as an example, there
>>> they seem to modify the reassignment znode directly and incrementally and
>>> so there is no notion of "the reassignment". Reassignments will be running
>>> continuously, with partitions added before all of the current partitions
>>> have completed. If there is no meaningful cluster-wide "reassignment" then
>>> it would be better to remove remove the throttle by changing the list of
>>> replicas as each replica catches up.
>>>
>>> I'm interested in any use cases people can share on this, as I'd like
>>> the throttle API to be useful for a broad range of use cases, rather than
>>> being too narrowly focussed on what's needed by the existing CLI tools.
>>>
>>> Thanks,
>>>
>>> Tom
>>>
>>>
>>>
>>>
>>> On 28 September 2017 at 17:22, Tom Bentley <t....@gmail.com>
>>> wrote:
>>>
>>>> I'm starting to think about KIP-179 again. In order to have more
>>>> manageably-scoped KIPs and PRs I think it might be worth factoring-out the
>>>> throttling part into a separate KIP. Wdyt?
>>>>
>>>> Keeping the throttling discussion in this thread for the moment...
>>>>
>>>> The throttling behaviour is currently spread across the
>>>> `(leader|follower).replication.throttled.replicas` topic config and
>>>> the `(leader|follower).replication.throttled.rate` dynamic broker
>>>> config. It's not really clear to me exactly what "removing the throttle" is
>>>> supposed to mean. I mean we could reset the rate to Long.MAV_VALUE or we
>>>> could change the list of replicas to an empty list. The
>>>> ReassignPartitionsCommand does both, but there is some small utility in
>>>> leaving the rate, but clearing the list, if you've discovered the "right"
>>>> rate for your cluster/workload and to want it to be sticky for next time.
>>>> Does any one do this in practice?
>>>>
>>>> With regards to throttling, it would be
>>>>>> worth thinking about a way where the throttling configs can be
>>>>>> automatically removed without the user having to re-run the tool.
>>>>>>
>>>>>
>>>>> Isn't that just a matter of updating the topic configs for
>>>>> (leader|follower).replication.throttled.replicas at the same time we
>>>>> remove the reassignment znode? That leaves open the question about whether
>>>>> to reset the rates at the same time.
>>>>>
>>>>
>>>> Thinking some more about my "update the configs at the same time we
>>>> remove the reassignment znode" suggestion. The reassignment znode is
>>>> persistent, so the reassignment will survive a zookeeper restart. If there
>>>> was a flag for the auto-removal of the throttle it would likewise need to
>>>> be persistent. Otherwise a ZK restart would remember the reassignment, but
>>>> forget about the preference for auto removal of throttles. So, we would use
>>>> a persistent znode (a child of the reassignment path, perhaps) to store a
>>>> flag for throttle removal.
>>>>
>>>> Thoughts?
>>>>
>>>> Cheers,
>>>>
>>>> Tom
>>>>
>>>
>>>
>>
>

Re: [DISCUSS] KIP-179: Change ReassignPartitionsCommand to use AdminClient

Posted by Tom Bentley <t....@gmail.com>.
Hi all,

I've been thinking about the proposed changes in KIP-179 and, on reflection,
I don't think the API presented is really ideal. Some of the limitations it
has include:

1. It sticks to the current, batch oriented (i.e. a single set of
reassignments at a time), model.
2. It still doesn't really provide a nice way of knowing that a
reassignment is complete.
3. As presented, the automatic removal of throttles only happens at the end
of
the reassignment batch. But individual brokers could be unthrottled before
then.

As an illustration of this, https://issues.apache.org/jira/browse/KAFKA-6304
provides a use case for wanting to cancel a reassignment because one of the
brokers in the new assignment has failed. With the proposed API:

1. We can't identify the subset of the reassignment batch which we want to
cancel.
2. All we could do would be to revise the proposed API to allow calling
   reassignPartitions() while a reassignment was in progress. This second
call
   could revert the subset of reassignments involving the failed broker.
3. But the API has no way to express that the original reassignment was
cancelled.

Another illustration of the problem: An advanced cluster balancer
(such as LinkedIn's cruise control) has to batch large reassignments
(partly so as to make cancellation easier). This batching itself leads in
inefficiency because some of the partitions in the batch will finish before
others, so time is wasted with the cluster only moving a small number of
partitions (when most in the batch have finished).

In hindsight, I think I was too influenced by reproducing what the
kafka-reassign-partitions.sh tool does today. I think what's actually
needed
(for things like cruise control) is an API that's more fine-grained, and
less batch oriented. I am therefore withdrawing KIP-179 and
intend to start a new KIP to propose a different API for partition
reassignment.

I'm still interested in hearing about other deficiencies of the KIP-179
proposal,
so I can avoid them in the new proposal. Similarly, if there are features
you'd like to see in the API, please let me know.

I won't go in to details of the new API here, but the basic idea I'd like
to use is to
give the reassignment of each partition an identity (though this wouldn't
be exposed directly in the API). This is necessary to allow new
reassignments
to be added while some are already running. API methods would then be
provided to discover
all the currently running reassignments, determine if a reassignment is
still running etc.

Cheers,

Tom

On 1 November 2017 at 10:20, Tom Bentley <t....@gmail.com> wrote:

> This thread has been very quiet for a while now. It's unclear whether this
> is because no one has anything more to say, or whether no one has taken a
> look at it in its current form. I suspect the latter, so I'm not calling
> the vote today, but instead asking for more review.
>
> What's currently proposed – in addition to the reassignPartitions() API
> itself – is to have a pair of RPCs for managing throttles. This is quite
> different from the earlier proposal to reuse alterConfigs(). The benefits
> of this specific API include:
>
> * being more typesafe,
> * allowing for the automatic removal of throttles when reassignment has
> completed,
> * being careful about correct management of the throttles wrt controller
> failover
>
> Surely someone has something to say about this, before we reach the vote
> stage?
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-179+-+
> Change+ReassignPartitionsCommand+to+use+AdminClient
>
> Thanks,
>
> Tom
>
>
> On 25 October 2017 at 10:33, Tom Bentley <t....@gmail.com> wrote:
>
>> If there are no further comments, I will start a vote on this next week.
>>
>> Thanks,
>>
>> Tom
>>
>> On 20 October 2017 at 08:33, Tom Bentley <t....@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> I've made a fairly major update to KIP-179 to propose APIs for setting
>>> throttled rates and throttled replicas with the ability to remove these
>>> automatically at the end of reassignment.
>>>
>>> I'd be grateful for your feedback:
>>>
>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-179+-+
>>> Change+ReassignPartitionsCommand+to+use+AdminClient
>>>
>>> Thanks,
>>>
>>> Tom
>>>
>>> On 2 October 2017 at 13:15, Tom Bentley <t....@gmail.com> wrote:
>>>
>>>> One question I have is about whether/how to scope throttling to a
>>>> reassignment. Currently throttles are only loosely associated with
>>>> reassignment: You can start a reassignment without any throttling, add
>>>> throttling to an in-flight reassignment, and remember/forget to remove
>>>> throttling after the reassignment is complete. There's is great flexibility
>>>> in that, but also the risk that you forget the remove the throttle(s).
>>>>
>>>> Just adding an API for setting the throttled rate makes this situation
>>>> worse: While it's nice to be able to auto-remove the throttles rate what
>>>> about the config for the throttled replicas? Also you might add a throttle
>>>> thinking a reassignment is in-flight, but it has in fact just finished:
>>>> Those throttles will now hang around until reset or the end of the next
>>>> reassignment. For these reasons it would be good if the throttle were more
>>>> directly scoped to the reassignment.
>>>>
>>>> On the other hand, taking LinkedIn's Cruise Control as an example,
>>>> there they seem to modify the reassignment znode directly and incrementally
>>>> and so there is no notion of "the reassignment". Reassignments will be
>>>> running continuously, with partitions added before all of the current
>>>> partitions have completed. If there is no meaningful cluster-wide
>>>> "reassignment" then it would be better to remove remove the throttle by
>>>> changing the list of replicas as each replica catches up.
>>>>
>>>> I'm interested in any use cases people can share on this, as I'd like
>>>> the throttle API to be useful for a broad range of use cases, rather than
>>>> being too narrowly focussed on what's needed by the existing CLI tools.
>>>>
>>>> Thanks,
>>>>
>>>> Tom
>>>>
>>>>
>>>>
>>>>
>>>> On 28 September 2017 at 17:22, Tom Bentley <t....@gmail.com>
>>>> wrote:
>>>>
>>>>> I'm starting to think about KIP-179 again. In order to have more
>>>>> manageably-scoped KIPs and PRs I think it might be worth factoring-out the
>>>>> throttling part into a separate KIP. Wdyt?
>>>>>
>>>>> Keeping the throttling discussion in this thread for the moment...
>>>>>
>>>>> The throttling behaviour is currently spread across the
>>>>> `(leader|follower).replication.throttled.replicas` topic config and
>>>>> the `(leader|follower).replication.throttled.rate` dynamic broker
>>>>> config. It's not really clear to me exactly what "removing the throttle" is
>>>>> supposed to mean. I mean we could reset the rate to Long.MAV_VALUE or we
>>>>> could change the list of replicas to an empty list. The
>>>>> ReassignPartitionsCommand does both, but there is some small utility in
>>>>> leaving the rate, but clearing the list, if you've discovered the "right"
>>>>> rate for your cluster/workload and to want it to be sticky for next time.
>>>>> Does any one do this in practice?
>>>>>
>>>>> With regards to throttling, it would be
>>>>>>> worth thinking about a way where the throttling configs can be
>>>>>>> automatically removed without the user having to re-run the tool.
>>>>>>>
>>>>>>
>>>>>> Isn't that just a matter of updating the topic configs for
>>>>>> (leader|follower).replication.throttled.replicas at the same time we
>>>>>> remove the reassignment znode? That leaves open the question about whether
>>>>>> to reset the rates at the same time.
>>>>>>
>>>>>
>>>>> Thinking some more about my "update the configs at the same time we
>>>>> remove the reassignment znode" suggestion. The reassignment znode is
>>>>> persistent, so the reassignment will survive a zookeeper restart. If there
>>>>> was a flag for the auto-removal of the throttle it would likewise need to
>>>>> be persistent. Otherwise a ZK restart would remember the reassignment, but
>>>>> forget about the preference for auto removal of throttles. So, we would use
>>>>> a persistent znode (a child of the reassignment path, perhaps) to store a
>>>>> flag for throttle removal.
>>>>>
>>>>> Thoughts?
>>>>>
>>>>> Cheers,
>>>>>
>>>>> Tom
>>>>>
>>>>
>>>>
>>>
>>
>