You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by David Jacot <dj...@confluent.io> on 2020/02/24 10:16:03 UTC

[VOTE] KIP-570: Add leader epoch in StopReplicaRequest

Hi all,

I would like to start a vote on KIP-570: Add leader epoch in
StopReplicaRequest

The KIP is here:
https://cwiki.apache.org/confluence/display/KAFKA/KIP-570%3A+Add+leader+epoch+in+StopReplicaRequest

Thanks,
David

Re: [VOTE] KIP-570: Add leader epoch in StopReplicaRequest

Posted by Jason Gustafson <ja...@confluent.io>.
+1

On Mon, Feb 24, 2020 at 2:16 AM David Jacot <dj...@confluent.io> wrote:

> Hi all,
>
> I would like to start a vote on KIP-570: Add leader epoch in
> StopReplicaRequest
>
> The KIP is here:
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-570%3A+Add+leader+epoch+in+StopReplicaRequest
>
> Thanks,
> David
>

Re: [VOTE] KIP-570: Add leader epoch in StopReplicaRequest

Posted by Jason Gustafson <ja...@confluent.io>.
I'm +1 for the change. I think the main advantage is that it simplifies the
batching. More like the LeaderAndIsr/UpdateMetadata requests, the request
grouping becomes less significant. I'm not sure how much of the benefit can
be realized given the need to keep compatibility, but it still seems like a
good improvement to the protocol.

-Jason

On Thu, Mar 26, 2020 at 6:25 AM David Jacot <dj...@confluent.io> wrote:

> I shouldn't have said "always", sorry.
>
> When a replica is deleted, either due to a topic deletion or a
> reassignment,
> the controller transitions the replica to the `OfflineReplica` state and
> then to
> the `ReplicaDeletionStarted` state. The first transition issues a
> StopReplicaRequest
> with DeletePartitions=false and the second transition issues a
> StopReplicaRequest
> with DeleatePartitions=true. The two operations are actually doing the same
> except that the latter one deletes the replica.
>
> At the moment, the `ControllerRequestBatch` partitions the accumulated stop
> replica requests and sends two requests.
>
> https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/controller/ControllerChannelManager.scala#L554
>
> Having a per-partition flag allows us to combine everything into one
> request
> here if the requests are sent within the same batch obviously. This won't
> guarantee that all requests will be combined though because requests are
> batched optimistically in the controller but it opens the door to improve
> it in
> the future.
>
> Best,
> David
>
>
> One side note: The batching of the two requests depends on how the
>
> On Wed, Mar 25, 2020 at 6:12 PM Ismael Juma <is...@juma.me.uk> wrote:
>
> > Is it really true that the controller always sends two requests? Aren't
> the
> > operations different (stop replica with delete versus stop replica
> > without)?
> >
> > On Wed, Mar 25, 2020, 9:59 AM David Jacot <dj...@confluent.io> wrote:
> >
> > > Hi all,
> > >
> > > I'd like to inform you that I have slightly changed the schema which
> was
> > > proposed
> > > in the KIP. During the implementation, I have realized that the
> proposed
> > > schema
> > > did not work. The new one reorganises how topics/partitions are stored.
> > >
> > > I'd like to amend the current KIP with the following:
> > >
> > > At the moment, the StopReplicaRequest has a top level field named
> > > `DeletePartitions`
> > > which indicates whether the partitions present in the request must be
> > > deleted or not.
> > > The downside of this is that the controller always ends up sending two
> > > StopReplica
> > > requests, one with DeletePartitions=true and one with
> > > DeletePartitions=false.
> > >
> > > Instead, I'd like to add a per-partition DeletePartition field to
> combine
> > > everything in
> > > one request. This will reduce the number of requests sent to each
> broker
> > > and also
> > > increase the batching. I've already implemented it.
> > >
> > > I've already updated the schema in the KIP if you want to see it. I
> will
> > > update the
> > > KIP itself if you agree with the amendment.
> > >
> > > What do you think? Does it sound reasonable?
> > >
> > > Best,
> > > David
> > >
> > > On Fri, Mar 6, 2020 at 3:37 PM David Jacot <dj...@confluent.io>
> wrote:
> > >
> > > > Hi all,
> > > >
> > > > The vote has passed with +3 binding votes (Jason Gustafson, Gwen
> > Shapira,
> > > > Jun Rao).
> > > >
> > > > Thanks to everyone!
> > > >
> > > > Best,
> > > > David
> > > >
> > > > On Wed, Mar 4, 2020 at 9:02 AM David Jacot <dj...@confluent.io>
> > wrote:
> > > >
> > > >> Hi Jun,
> > > >>
> > > >> You're right. I have noticed it while implementing it. I plan to
> use a
> > > >> default
> > > >> value as a sentinel in the protocol (e.g. -2) to cover this case.
> > > >>
> > > >> David
> > > >>
> > > >> On Wed, Mar 4, 2020 at 3:18 AM Jun Rao <ju...@confluent.io> wrote:
> > > >>
> > > >>> Hi, David,
> > > >>>
> > > >>> Thanks for the KIP. +1 from me too. Just one comment below.
> > > >>>
> > > >>> 1. Regarding the sentinel leader epoch to indicate topic deletion,
> it
> > > >>> seems
> > > >>> that we need to use a different sentinel value to indicate that the
> > > >>> leader
> > > >>> epoch is not present when the controller is still on the old
> version
> > > >>> during
> > > >>> upgrade.
> > > >>>
> > > >>> Jun
> > > >>>
> > > >>> On Mon, Mar 2, 2020 at 11:20 AM Gwen Shapira <gw...@confluent.io>
> > > wrote:
> > > >>>
> > > >>> > +1
> > > >>> >
> > > >>> > On Mon, Feb 24, 2020, 2:16 AM David Jacot <dj...@confluent.io>
> > > wrote:
> > > >>> >
> > > >>> > > Hi all,
> > > >>> > >
> > > >>> > > I would like to start a vote on KIP-570: Add leader epoch in
> > > >>> > > StopReplicaRequest
> > > >>> > >
> > > >>> > > The KIP is here:
> > > >>> > >
> > > >>> > >
> > > >>> >
> > > >>>
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-570%3A+Add+leader+epoch+in+StopReplicaRequest
> > > >>> > >
> > > >>> > > Thanks,
> > > >>> > > David
> > > >>> > >
> > > >>> >
> > > >>>
> > > >>
> > >
> >
>

Re: [VOTE] KIP-570: Add leader epoch in StopReplicaRequest

Posted by David Jacot <dj...@confluent.io>.
I shouldn't have said "always", sorry.

When a replica is deleted, either due to a topic deletion or a reassignment,
the controller transitions the replica to the `OfflineReplica` state and
then to
the `ReplicaDeletionStarted` state. The first transition issues a
StopReplicaRequest
with DeletePartitions=false and the second transition issues a
StopReplicaRequest
with DeleatePartitions=true. The two operations are actually doing the same
except that the latter one deletes the replica.

At the moment, the `ControllerRequestBatch` partitions the accumulated stop
replica requests and sends two requests.
https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/controller/ControllerChannelManager.scala#L554

Having a per-partition flag allows us to combine everything into one request
here if the requests are sent within the same batch obviously. This won't
guarantee that all requests will be combined though because requests are
batched optimistically in the controller but it opens the door to improve
it in
the future.

Best,
David


One side note: The batching of the two requests depends on how the

On Wed, Mar 25, 2020 at 6:12 PM Ismael Juma <is...@juma.me.uk> wrote:

> Is it really true that the controller always sends two requests? Aren't the
> operations different (stop replica with delete versus stop replica
> without)?
>
> On Wed, Mar 25, 2020, 9:59 AM David Jacot <dj...@confluent.io> wrote:
>
> > Hi all,
> >
> > I'd like to inform you that I have slightly changed the schema which was
> > proposed
> > in the KIP. During the implementation, I have realized that the proposed
> > schema
> > did not work. The new one reorganises how topics/partitions are stored.
> >
> > I'd like to amend the current KIP with the following:
> >
> > At the moment, the StopReplicaRequest has a top level field named
> > `DeletePartitions`
> > which indicates whether the partitions present in the request must be
> > deleted or not.
> > The downside of this is that the controller always ends up sending two
> > StopReplica
> > requests, one with DeletePartitions=true and one with
> > DeletePartitions=false.
> >
> > Instead, I'd like to add a per-partition DeletePartition field to combine
> > everything in
> > one request. This will reduce the number of requests sent to each broker
> > and also
> > increase the batching. I've already implemented it.
> >
> > I've already updated the schema in the KIP if you want to see it. I will
> > update the
> > KIP itself if you agree with the amendment.
> >
> > What do you think? Does it sound reasonable?
> >
> > Best,
> > David
> >
> > On Fri, Mar 6, 2020 at 3:37 PM David Jacot <dj...@confluent.io> wrote:
> >
> > > Hi all,
> > >
> > > The vote has passed with +3 binding votes (Jason Gustafson, Gwen
> Shapira,
> > > Jun Rao).
> > >
> > > Thanks to everyone!
> > >
> > > Best,
> > > David
> > >
> > > On Wed, Mar 4, 2020 at 9:02 AM David Jacot <dj...@confluent.io>
> wrote:
> > >
> > >> Hi Jun,
> > >>
> > >> You're right. I have noticed it while implementing it. I plan to use a
> > >> default
> > >> value as a sentinel in the protocol (e.g. -2) to cover this case.
> > >>
> > >> David
> > >>
> > >> On Wed, Mar 4, 2020 at 3:18 AM Jun Rao <ju...@confluent.io> wrote:
> > >>
> > >>> Hi, David,
> > >>>
> > >>> Thanks for the KIP. +1 from me too. Just one comment below.
> > >>>
> > >>> 1. Regarding the sentinel leader epoch to indicate topic deletion, it
> > >>> seems
> > >>> that we need to use a different sentinel value to indicate that the
> > >>> leader
> > >>> epoch is not present when the controller is still on the old version
> > >>> during
> > >>> upgrade.
> > >>>
> > >>> Jun
> > >>>
> > >>> On Mon, Mar 2, 2020 at 11:20 AM Gwen Shapira <gw...@confluent.io>
> > wrote:
> > >>>
> > >>> > +1
> > >>> >
> > >>> > On Mon, Feb 24, 2020, 2:16 AM David Jacot <dj...@confluent.io>
> > wrote:
> > >>> >
> > >>> > > Hi all,
> > >>> > >
> > >>> > > I would like to start a vote on KIP-570: Add leader epoch in
> > >>> > > StopReplicaRequest
> > >>> > >
> > >>> > > The KIP is here:
> > >>> > >
> > >>> > >
> > >>> >
> > >>>
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-570%3A+Add+leader+epoch+in+StopReplicaRequest
> > >>> > >
> > >>> > > Thanks,
> > >>> > > David
> > >>> > >
> > >>> >
> > >>>
> > >>
> >
>

Re: [VOTE] KIP-570: Add leader epoch in StopReplicaRequest

Posted by Ismael Juma <is...@juma.me.uk>.
Is it really true that the controller always sends two requests? Aren't the
operations different (stop replica with delete versus stop replica without)?

On Wed, Mar 25, 2020, 9:59 AM David Jacot <dj...@confluent.io> wrote:

> Hi all,
>
> I'd like to inform you that I have slightly changed the schema which was
> proposed
> in the KIP. During the implementation, I have realized that the proposed
> schema
> did not work. The new one reorganises how topics/partitions are stored.
>
> I'd like to amend the current KIP with the following:
>
> At the moment, the StopReplicaRequest has a top level field named
> `DeletePartitions`
> which indicates whether the partitions present in the request must be
> deleted or not.
> The downside of this is that the controller always ends up sending two
> StopReplica
> requests, one with DeletePartitions=true and one with
> DeletePartitions=false.
>
> Instead, I'd like to add a per-partition DeletePartition field to combine
> everything in
> one request. This will reduce the number of requests sent to each broker
> and also
> increase the batching. I've already implemented it.
>
> I've already updated the schema in the KIP if you want to see it. I will
> update the
> KIP itself if you agree with the amendment.
>
> What do you think? Does it sound reasonable?
>
> Best,
> David
>
> On Fri, Mar 6, 2020 at 3:37 PM David Jacot <dj...@confluent.io> wrote:
>
> > Hi all,
> >
> > The vote has passed with +3 binding votes (Jason Gustafson, Gwen Shapira,
> > Jun Rao).
> >
> > Thanks to everyone!
> >
> > Best,
> > David
> >
> > On Wed, Mar 4, 2020 at 9:02 AM David Jacot <dj...@confluent.io> wrote:
> >
> >> Hi Jun,
> >>
> >> You're right. I have noticed it while implementing it. I plan to use a
> >> default
> >> value as a sentinel in the protocol (e.g. -2) to cover this case.
> >>
> >> David
> >>
> >> On Wed, Mar 4, 2020 at 3:18 AM Jun Rao <ju...@confluent.io> wrote:
> >>
> >>> Hi, David,
> >>>
> >>> Thanks for the KIP. +1 from me too. Just one comment below.
> >>>
> >>> 1. Regarding the sentinel leader epoch to indicate topic deletion, it
> >>> seems
> >>> that we need to use a different sentinel value to indicate that the
> >>> leader
> >>> epoch is not present when the controller is still on the old version
> >>> during
> >>> upgrade.
> >>>
> >>> Jun
> >>>
> >>> On Mon, Mar 2, 2020 at 11:20 AM Gwen Shapira <gw...@confluent.io>
> wrote:
> >>>
> >>> > +1
> >>> >
> >>> > On Mon, Feb 24, 2020, 2:16 AM David Jacot <dj...@confluent.io>
> wrote:
> >>> >
> >>> > > Hi all,
> >>> > >
> >>> > > I would like to start a vote on KIP-570: Add leader epoch in
> >>> > > StopReplicaRequest
> >>> > >
> >>> > > The KIP is here:
> >>> > >
> >>> > >
> >>> >
> >>>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-570%3A+Add+leader+epoch+in+StopReplicaRequest
> >>> > >
> >>> > > Thanks,
> >>> > > David
> >>> > >
> >>> >
> >>>
> >>
>

Re: [VOTE] KIP-570: Add leader epoch in StopReplicaRequest

Posted by David Jacot <dj...@confluent.io>.
Hi all,

I'd like to inform you that I have slightly changed the schema which was
proposed
in the KIP. During the implementation, I have realized that the proposed
schema
did not work. The new one reorganises how topics/partitions are stored.

I'd like to amend the current KIP with the following:

At the moment, the StopReplicaRequest has a top level field named
`DeletePartitions`
which indicates whether the partitions present in the request must be
deleted or not.
The downside of this is that the controller always ends up sending two
StopReplica
requests, one with DeletePartitions=true and one with
DeletePartitions=false.

Instead, I'd like to add a per-partition DeletePartition field to combine
everything in
one request. This will reduce the number of requests sent to each broker
and also
increase the batching. I've already implemented it.

I've already updated the schema in the KIP if you want to see it. I will
update the
KIP itself if you agree with the amendment.

What do you think? Does it sound reasonable?

Best,
David

On Fri, Mar 6, 2020 at 3:37 PM David Jacot <dj...@confluent.io> wrote:

> Hi all,
>
> The vote has passed with +3 binding votes (Jason Gustafson, Gwen Shapira,
> Jun Rao).
>
> Thanks to everyone!
>
> Best,
> David
>
> On Wed, Mar 4, 2020 at 9:02 AM David Jacot <dj...@confluent.io> wrote:
>
>> Hi Jun,
>>
>> You're right. I have noticed it while implementing it. I plan to use a
>> default
>> value as a sentinel in the protocol (e.g. -2) to cover this case.
>>
>> David
>>
>> On Wed, Mar 4, 2020 at 3:18 AM Jun Rao <ju...@confluent.io> wrote:
>>
>>> Hi, David,
>>>
>>> Thanks for the KIP. +1 from me too. Just one comment below.
>>>
>>> 1. Regarding the sentinel leader epoch to indicate topic deletion, it
>>> seems
>>> that we need to use a different sentinel value to indicate that the
>>> leader
>>> epoch is not present when the controller is still on the old version
>>> during
>>> upgrade.
>>>
>>> Jun
>>>
>>> On Mon, Mar 2, 2020 at 11:20 AM Gwen Shapira <gw...@confluent.io> wrote:
>>>
>>> > +1
>>> >
>>> > On Mon, Feb 24, 2020, 2:16 AM David Jacot <dj...@confluent.io> wrote:
>>> >
>>> > > Hi all,
>>> > >
>>> > > I would like to start a vote on KIP-570: Add leader epoch in
>>> > > StopReplicaRequest
>>> > >
>>> > > The KIP is here:
>>> > >
>>> > >
>>> >
>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-570%3A+Add+leader+epoch+in+StopReplicaRequest
>>> > >
>>> > > Thanks,
>>> > > David
>>> > >
>>> >
>>>
>>

Re: [VOTE] KIP-570: Add leader epoch in StopReplicaRequest

Posted by David Jacot <dj...@confluent.io>.
Hi all,

The vote has passed with +3 binding votes (Jason Gustafson, Gwen Shapira,
Jun Rao).

Thanks to everyone!

Best,
David

On Wed, Mar 4, 2020 at 9:02 AM David Jacot <dj...@confluent.io> wrote:

> Hi Jun,
>
> You're right. I have noticed it while implementing it. I plan to use a
> default
> value as a sentinel in the protocol (e.g. -2) to cover this case.
>
> David
>
> On Wed, Mar 4, 2020 at 3:18 AM Jun Rao <ju...@confluent.io> wrote:
>
>> Hi, David,
>>
>> Thanks for the KIP. +1 from me too. Just one comment below.
>>
>> 1. Regarding the sentinel leader epoch to indicate topic deletion, it
>> seems
>> that we need to use a different sentinel value to indicate that the leader
>> epoch is not present when the controller is still on the old version
>> during
>> upgrade.
>>
>> Jun
>>
>> On Mon, Mar 2, 2020 at 11:20 AM Gwen Shapira <gw...@confluent.io> wrote:
>>
>> > +1
>> >
>> > On Mon, Feb 24, 2020, 2:16 AM David Jacot <dj...@confluent.io> wrote:
>> >
>> > > Hi all,
>> > >
>> > > I would like to start a vote on KIP-570: Add leader epoch in
>> > > StopReplicaRequest
>> > >
>> > > The KIP is here:
>> > >
>> > >
>> >
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-570%3A+Add+leader+epoch+in+StopReplicaRequest
>> > >
>> > > Thanks,
>> > > David
>> > >
>> >
>>
>

Re: [VOTE] KIP-570: Add leader epoch in StopReplicaRequest

Posted by David Jacot <dj...@confluent.io>.
Hi Jun,

You're right. I have noticed it while implementing it. I plan to use a
default
value as a sentinel in the protocol (e.g. -2) to cover this case.

David

On Wed, Mar 4, 2020 at 3:18 AM Jun Rao <ju...@confluent.io> wrote:

> Hi, David,
>
> Thanks for the KIP. +1 from me too. Just one comment below.
>
> 1. Regarding the sentinel leader epoch to indicate topic deletion, it seems
> that we need to use a different sentinel value to indicate that the leader
> epoch is not present when the controller is still on the old version during
> upgrade.
>
> Jun
>
> On Mon, Mar 2, 2020 at 11:20 AM Gwen Shapira <gw...@confluent.io> wrote:
>
> > +1
> >
> > On Mon, Feb 24, 2020, 2:16 AM David Jacot <dj...@confluent.io> wrote:
> >
> > > Hi all,
> > >
> > > I would like to start a vote on KIP-570: Add leader epoch in
> > > StopReplicaRequest
> > >
> > > The KIP is here:
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-570%3A+Add+leader+epoch+in+StopReplicaRequest
> > >
> > > Thanks,
> > > David
> > >
> >
>

Re: [VOTE] KIP-570: Add leader epoch in StopReplicaRequest

Posted by Jun Rao <ju...@confluent.io>.
Hi, David,

Thanks for the KIP. +1 from me too. Just one comment below.

1. Regarding the sentinel leader epoch to indicate topic deletion, it seems
that we need to use a different sentinel value to indicate that the leader
epoch is not present when the controller is still on the old version during
upgrade.

Jun

On Mon, Mar 2, 2020 at 11:20 AM Gwen Shapira <gw...@confluent.io> wrote:

> +1
>
> On Mon, Feb 24, 2020, 2:16 AM David Jacot <dj...@confluent.io> wrote:
>
> > Hi all,
> >
> > I would like to start a vote on KIP-570: Add leader epoch in
> > StopReplicaRequest
> >
> > The KIP is here:
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-570%3A+Add+leader+epoch+in+StopReplicaRequest
> >
> > Thanks,
> > David
> >
>

Re: [VOTE] KIP-570: Add leader epoch in StopReplicaRequest

Posted by Gwen Shapira <gw...@confluent.io>.
+1

On Mon, Feb 24, 2020, 2:16 AM David Jacot <dj...@confluent.io> wrote:

> Hi all,
>
> I would like to start a vote on KIP-570: Add leader epoch in
> StopReplicaRequest
>
> The KIP is here:
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-570%3A+Add+leader+epoch+in+StopReplicaRequest
>
> Thanks,
> David
>