You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by Edoardo Comar <ed...@gmail.com> on 2019/02/04 17:45:20 UTC

Re: [DISCUSS] KIP-391: Allow Producing with Offsets for Cluster Replication

Hi Radai,
thanks for the observation on the the kip-320 conflict.

I would not have the destination broker treat the __consumer_offsets
as a special
case (if this is what you suggested).

Rather in the replicator the __consumer_offsets topic could be treated as a
special case
where instead of just replicating the value as-is - it would edit it by
stripping the epoch.

As previously mentioned, the __consumer_offsets topic does not need to be
replicated by producing-with-offsets to it.

--------------------------------------------------
Edoardo Comar
IBM Event Streams

On Wed, 23 Jan 2019 at 03:18, radai <ra...@gmail.com> wrote:

> the kip-320 conflict can be resolved by saying that the leader broker
> on the destination "stamps" is own local leader epoch on the incoming
> msgs - meaning the offsets "transfer" but leader epochs do not.
>
> On Mon, Jan 7, 2019 at 1:38 PM Edoardo Comar <EC...@uk.ibm.com> wrote:
> >
> > Hi,
> > I delayed starting the voting thread due to the festive period. I would
> > like to start it this week.
> > Has anyone any more feedback ?
> >
> > --------------------------------------------------
> >
> > Edoardo Comar
> >
> > IBM Event Streams
> >
> >
> > Edoardo Comar <EC...@uk.ibm.com> wrote on 13/12/2018 17:50:30:
> >
> > > From: Edoardo Comar <EC...@uk.ibm.com>
> > > To: dev@kafka.apache.org
> > > Date: 13/12/2018 17:50
> > > Subject: Re: [DISCUSS] KIP-391: Allow Producing with Offsets for
> > > Cluster Replication
> > >
> > > Hi,
> > > as we haven't got any more feedback, we'd like to start a vote on
> > KIP-391
> > > on Monday
> > >
> > > INVALID URI REMOVED
> > >
> >
> u=https-3A__cwiki.apache.org_confluence_display_KAFKA_KIP-2D391-253A-2BAllow-2BProducing-2Bwith-2BOffsets-2Bfor-2BCluster-2BReplication&d=DwIFAg&c=jf_iaSHvJObTbx-
> > >
> >
> siA1ZOg&r=EzRhmSah4IHsUZVekRUIINhltZK7U0OaeRo7hgW4_tQ&m=hxekG7cvm8Peoyd4oPqvSwRFRuGIyi9Pc_h2GhHbgtw&s=4SGyJsJAuYWZWADpzAaSEPqzYnde0WRW6XgZ3L4haB4&e=
> > >
> > > --------------------------------------------------
> > >
> > > Edoardo Comar
> > >
> > > IBM Event Streams
> > > IBM UK Ltd, Hursley Park, SO21 2JN
> > >
> > >
> > > Edoardo Comar/UK/IBM wrote on 10/12/2018 10:20:06:
> > >
> > > > From: Edoardo Comar/UK/IBM
> > > > To: dev@kafka.apache.org
> > > > Date: 10/12/2018 10:20
> > > > Subject: Re: [DISCUSS] KIP-391: Allow Producing with Offsets for
> > > > Cluster Replication
> > > >
> > > > (shameless bump) any additional feedback is welcome ... thanks!
> > > >
> > > > Edoardo Comar <EC...@uk.ibm.com> wrote on 27/11/2018 15:35:09:
> > > >
> > > > > From: Edoardo Comar <EC...@uk.ibm.com>
> > > > > To: dev@kafka.apache.org
> > > > > Date: 27/11/2018 15:35
> > > > > Subject: Re: [DISCUSS] KIP-391: Allow Producing with Offsets for
> > > > > Cluster Replication
> > > > >
> > > > > Hi Jason
> > > > >
> > > > > we envisioned the replicator to replicate the __consumer_offsets
> > topic
> > > too
> > > > > (although without producing-with-offsets to it!).
> > > > >
> > > > > As there is no client-side implementation yet using the leader
> > epoch,
> > > > > we could not yet see the impact of writing to the destination
> > cluster
> > > > > __consumer_offsets records with an invalid leader epoch.
> > > > >
> > > > > Also, applications might still use external storage mechanism for
> > > consumer
> > > > > offsets where the leader_epoch is missing.
> > > > >
> > > > > Perhaps the replicator could - for the __consumer_offsets topic -
> > just
> > >
> > > > > omit the leader_epoch field in the data sent to destination.
> > > > >
> > > > > What do you think ?
> > > > >
> > > > >
> > > > > Jason Gustafson <ja...@confluent.io> wrote on 27/11/2018 00:09:56:
> > > > >
> > > > > > Another wrinkle to consider is KIP-320. If you are planning to
> > > replicate
> > > > > > __consumer_offsets directly, then you will have to account for
> > > leader
> > > > > epoch
> > > > > > information which is stored with the committed offsets. But I
> > cannot
> > >
> > > > > think
> > > > > > how it would be possible to replicate the leader epoch
> information
> >
> > > in
> > > > > > messages even if you can preserve offsets.
> > > > > >
> > > > > > -Jason
> > > > > >
> > > > > > On Mon, Nov 26, 2018 at 1:16 PM Mayuresh Gharat
> > > > > <gh...@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > Hi Edoardo,
> > > > > > >
> > > > > > > Thanks a lot for the KIP.
> > > > > > >  I have a few questions/suggestions in addition to what Radai
> > has
> > > > > mentioned
> > > > > > > above :
> > > > > > >
> > > > > > >    1. Is this meant only for 1:1 replication, for example one
> > > Kafka
> > > > > cluster
> > > > > > >    replicating to other, instead of having multiple Kafka
> > clusters
> > > > > > > mirroring
> > > > > > >    into one Kafka cluster?
> > > > > > >    2. Are we relying on exactly once produce in the replicator?
> > If
> > >
> > > > > not, how
> > > > > > >    are retries handled in the replicator ?
> > > > > > >    3. What is the recommended value for inflight requests,
> here.
> >
> > > Is it
> > > > > > >    suppose to be strictly 1, if yes, it would be great to
> > mention
> > > that
> > > > > in
> > > > > > > the
> > > > > > >    KIP.
> > > > > > >    4. How is unclean Leader election between source cluster and
> > > > > destination
> > > > > > >    cluster handled?
> > > > > > >    5. How are offsets resets in case of the replicator's
> > consumer
> > > > > handled?
> > > > > > >    6. It would be good to explain the workflow in the KIP, with
> > an
> > > > > > >    example,  regarding how this KIP will change the replication
> > > > > scenario
> > > > > > > and
> > > > > > >    how it will benefit the consumer apps.
> > > > > > >
> > > > > > > Thanks,
> > > > > > >
> > > > > > > Mayuresh
> > > > > > >
> > > > > > > On Mon, Nov 26, 2018 at 8:08 AM radai
> > <ra...@gmail.com>
> > >
> > > > > wrote:
> > > > > > >
> > > > > > > > a few questions:
> > > > > > > >
> > > > > > > > 1. how do you handle possible duplications caused by the
> > > "special"
> > > > > > > > producer timing-out/retrying? are you explicitely relying on
> > the
> > > > > > > > "exactly once" sequencing?
> > > > > > > > 2. what about the combination of log compacted topics +
> > > replicator
> > > > > > > > downtime? by the time the replicator comes back up there
> might
> >
> > > be
> > > > > > > > "holes" in the source offsets (some msgs might have been
> > > compacted
> > > > > > > > out)? how is that recoverable?
> > > > > > > > 3. similarly, what if you try and fire up replication on a
> > > non-empty
> > > > > > > > source topic? does the kip allow for offsets starting at some
> > > > > > > > arbitrary X > 0 ? or would this have to be designed from the
> > > start.
> > > > > > > >
> > > > > > > > and lastly, since this KIP seems to be designed fro
> > > active-passive
> > > > > > > > failover (there can be no produce traffic except the
> > replicator)
> > > > > > > > wouldnt a solution based on seeking to a time offset be more
> > > > > generic?
> > > > > > > > your producers could checkpoint the last (say log append)
> > > timestamp
> > > > > of
> > > > > > > > records theyve seen, and when restoring in the remote site
> > seek
> > > to
> > > > > > > > those timestamps (which will be metadata in their committed
> > > offsets)
> > > > > -
> > > > > > > > assumming replication takes > 0 time you'd need to handle
> some
> >
> > > dups,
> > > > > > > > but every kafka consumer setup needs to know how to handle
> > those
> > > > > > > > anyway.
> > > > > > > > On Fri, Nov 23, 2018 at 2:27 AM Edoardo Comar
> > > <EC...@uk.ibm.com>
> > > > > wrote:
> > > > > > > > >
> > > > > > > > > Hi Stanislav
> > > > > > > > >
> > > > > > > > > > > The flag is needed to distinguish a batch with a
> desired
> >
> > > base
> > > > > > > offset
> > > > > > > > > of
> > > > > > > > > > 0,
> > > > > > > > > > from a regular batch for which offsets need to be
> > generated.
> > > > > > > > > > If the producer can provide offsets, why not provide a
> > base
> > > > > offset of
> > > > > > > > 0?
> > > > > > > > >
> > > > > > > > > a regular batch (for which offsets are generated by the
> > broker
> > > on
> > > > > > > write)
> > > > > > > > > is sent with a base offset of 0.
> > > > > > > > > How could you distinguish it from a batch where you *want*
> > the
> > >
> > > > > first
> > > > > > > > > record to be written at offset 0 (i.e. be the first in the
> > > > > partition
> > > > > > > and
> > > > > > > > > be rejected if there are records on the log already) ?
> > > > > > > > > We wanted to avoid a "deep" inspection (and potentially
> > > > > decompression)
> > > > > > > of
> > > > > > > > > the records.
> > > > > > > > >
> > > > > > > > > For the replicator use case, a single produce request where
> > > all
> > > > > the
> > > > > > > data
> > > > > > > > > is to be assumed with offset,
> > > > > > > > > or all without offsets, seems to suffice,
> > > > > > > > > So we added only a toplevel flag, not a per-topic-partition
> > > one.
> > > > > > > > >
> > > > > > > > > Thanks for your interest !
> > > > > > > > > cheers
> > > > > > > > > Edo
> > > > > > > > > --------------------------------------------------
> > > > > > > > >
> > > > > > > > > Edoardo Comar
> > > > > > > > >
> > > > > > > > > IBM Event Streams
> > > > > > > > > IBM UK Ltd, Hursley Park, SO21 2JN
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Stanislav Kozlovski <st...@confluent.io> wrote on
> > > 22/11/2018
> > > > > > > > 22:32:42:
> > > > > > > > >
> > > > > > > > > > From: Stanislav Kozlovski <st...@confluent.io>
> > > > > > > > > > To: dev@kafka.apache.org
> > > > > > > > > > Date: 22/11/2018 22:33
> > > > > > > > > > Subject: Re: [DISCUSS] KIP-391: Allow Producing with
> > Offsets
> > > for
> > > > > > > > > > Cluster Replication
> > > > > > > > > >
> > > > > > > > > > Hey Edo & Mickael,
> > > > > > > > > >
> > > > > > > > > > > The flag is needed to distinguish a batch with a
> desired
> >
> > > base
> > > > > > > offset
> > > > > > > > > of
> > > > > > > > > > 0,
> > > > > > > > > > from a regular batch for which offsets need to be
> > generated.
> > > > > > > > > > If the producer can provide offsets, why not provide a
> > base
> > > > > offset of
> > > > > > > > 0?
> > > > > > > > > >
> > > > > > > > > > > (I am reading your post thinking about
> > > > > > > > > > partitions rather than topics).
> > > > > > > > > > Yes, I meant partitions. Sorry about that.
> > > > > > > > > >
> > > > > > > > > > Thanks for answering my questions :)
> > > > > > > > > >
> > > > > > > > > > Best,
> > > > > > > > > > Stanislav
> > > > > > > > > >
> > > > > > > > > > On Thu, Nov 22, 2018 at 5:28 PM Edoardo Comar
> > > > > <EC...@uk.ibm.com>
> > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Hi Stanislav,
> > > > > > > > > > >
> > > > > > > > > > > you're right we envision the replicator use case to
> have
> > a
> > >
> > > > > single
> > > > > > > > > producer
> > > > > > > > > > > with offsets per partition (I am reading your post
> > > thinking
> > > > > about
> > > > > > > > > > > partitions rather than topics).
> > > > > > > > > > >
> > > > > > > > > > > If a regular producer was to send its own records at
> the
> >
> > > same
> > > > > time,
> > > > > > > > > it's
> > > > > > > > > > > very likely that the one sending with an offset will
> > fail
> > > > > because
> > > > > > > of
> > > > > > > > > > > invalid offsets.
> > > > > > > > > > > Same if two producers were sending with offsets, likely
> > > both
> > > > > would
> > > > > > > > > then
> > > > > > > > > > > fail.
> > > > > > > > > > >
> > > > > > > > > > > > Does it make sense to *lock* the topic from other
> > > producers
> > > > > while
> > > > > > > > > there
> > > > > > > > > > > is
> > > > > > > > > > > > one that uses offsets?
> > > > > > > > > > >
> > > > > > > > > > > You could do that with ACL permissions if you wanted, I
> > > don't
> > > > > think
> > > > > > > > it
> > > > > > > > > > > needs to be mandated by changing the broker logic.
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > > Since we are tying the produce-with-offset request to
> > > the
> > > > > ACL, do
> > > > > > > > we
> > > > > > > > > > > need
> > > > > > > > > > > > the `use_offset` field in the produce request? Maybe
> > we
> > > make
> > > > > it
> > > > > > > > > > > mandatory
> > > > > > > > > > > > for produce requests with that ACL to have offsets.
> > > > > > > > > > >
> > > > > > > > > > > The flag is needed to distinguish a batch with a
> desired
> >
> > > base
> > > > > > > offset
> > > > > > > > > of 0,
> > > > > > > > > > > from a regular batch for which offsets need to be
> > > generated.
> > > > > > > > > > > I would not restrict a principal to only
> > send-with-offsets
> > > (by
> > > > > > > making
> > > > > > > > > that
> > > > > > > > > > > mandatory via the ACL).
> > > > > > > > > > >
> > > > > > > > > > > Thanks
> > > > > > > > > > > Edo & Mickael
> > > > > > > > > > >
> > > > > > > > > > > --------------------------------------------------
> > > > > > > > > > >
> > > > > > > > > > > Edoardo Comar
> > > > > > > > > > >
> > > > > > > > > > > IBM Event Streams
> > > > > > > > > > > IBM UK Ltd, Hursley Park, SO21 2JN
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Stanislav Kozlovski <st...@confluent.io> wrote on
> > > > > 22/11/2018
> > > > > > > > > 16:17:11:
> > > > > > > > > > >
> > > > > > > > > > > > From: Stanislav Kozlovski <st...@confluent.io>
> > > > > > > > > > > > To: dev@kafka.apache.org
> > > > > > > > > > > > Date: 22/11/2018 16:17
> > > > > > > > > > > > Subject: Re: [DISCUSS] KIP-391: Allow Producing with
> > > Offsets
> > > > > for
> > > > > > > > > > > > Cluster Replication
> > > > > > > > > > > >
> > > > > > > > > > > > Hey Edurdo, thanks for the KIP!
> > > > > > > > > > > >
> > > > > > > > > > > > I have some questions, apologies if they are naive:
> > > > > > > > > > > > Is this intended to work for a single producer use
> > case
> > > > > only?
> > > > > > > > > > > > How would it work if two producers were producing to
> > the
> > >
> > > > > same
> > > > > > > topic
> > > > > > > > > with
> > > > > > > > > > > > offsets?
> > > > > > > > > > > > How would it work if two producers, one with offsets
> > and
> > > one
> > > > > > > > without
> > > > > > > > > > > were
> > > > > > > > > > > > producing to a topic?
> > > > > > > > > > > > Does it make sense to *lock* the topic from other
> > > producers
> > > > > while
> > > > > > > > > there
> > > > > > > > > > > is
> > > > > > > > > > > > one that uses offsets?
> > > > > > > > > > > >
> > > > > > > > > > > > Since we are tying the produce-with-offset request to
> > > the
> > > > > ACL, do
> > > > > > > > we
> > > > > > > > > > > need
> > > > > > > > > > > > the `use_offset` field in the produce request? Maybe
> > we
> > > make
> > > > > it
> > > > > > > > > > > mandatory
> > > > > > > > > > > > for produce requests with that ACL to have offsets.
> > > > > > > > > > > >
> > > > > > > > > > > > Best,
> > > > > > > > > > > > Stanislav
> > > > > > > > > > > >
> > > > > > > > > > > > On Wed, Nov 21, 2018 at 5:14 PM Edoardo Comar
> > > > > <ECOMAR@uk.ibm.com
> > > > > > > >
> > > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > Hi,
> > > > > > > > > > > > > we've opened a KIP to improve data replication
> > between
> > >
> > > > > Kafka
> > > > > > > > > clusters
> > > > > > > > > > > :
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > INVALID URI REMOVED
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> u=https-3A__cwiki.apache.org_confluence_display_KAFKA_KIP-2D391-253A-2BAllow-2BProducing-2Bwith-2BOffsets-2Bfor-2BCluster-2BReplication&d=DwIBaQ&c=jf_iaSHvJObTbx-
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > >
> > siA1ZOg&r=EzRhmSah4IHsUZVekRUIINhltZK7U0OaeRo7hgW4_tQ&m=uUj9C3BdbYz0dDNA-
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > >
> > >
> >
> E6iXreg1M5hWiWgG6ClS86VIPI&s=Vav8_-N7_OpfYEW33yGOf_or8ESMUJ4S45t2g-EUWKg&e=
> > > > > > > > > > > > >
> > > > > > > > > > > > > We'd like to start a discussion, please post your
> > > feedback
> > > > > in
> > > > > > > > this
> > > > > > > > > > > thread.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Thank you
> > > > > > > > > > > > > Edo and Mickael
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > --------------------------------------------------
> > > > > > > > > > > > >
> > > > > > > > > > > > > Edoardo Comar
> > > > > > > > > > > > >
> > > > > > > > > > > > > IBM Event Streams
> > > > > > > > > > > > > IBM UK Ltd, Hursley Park, SO21 2JN
> > > > > > > > > > > > >
> > > > > > > > > > > > > Unless stated otherwise above:
> > > > > > > > > > > > > IBM United Kingdom Limited - Registered in England
> > and
> > >
> > > > > Wales
> > > > > > > with
> > > > > > > > > > > number
> > > > > > > > > > > > > 741598.
> > > > > > > > > > > > > Registered office: PO Box 41, North Harbour,
> > > Portsmouth,
> > > > > > > > Hampshire
> > > > > > > > > PO6
> > > > > > > > > > > 3AU
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > --
> > > > > > > > > > > > Best,
> > > > > > > > > > > > Stanislav
> > > > > > > > > > >
> > > > > > > > > > > Unless stated otherwise above:
> > > > > > > > > > > IBM United Kingdom Limited - Registered in England and
> > > Wales
> > > > > with
> > > > > > > > > number
> > > > > > > > > > > 741598.
> > > > > > > > > > > Registered office: PO Box 41, North Harbour,
> Portsmouth,
> >
> > > > > Hampshire
> > > > > > > > PO6
> > > > > > > > > 3AU
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > --
> > > > > > > > > > Best,
> > > > > > > > > > Stanislav
> > > > > > > > >
> > > > > > > > > Unless stated otherwise above:
> > > > > > > > > IBM United Kingdom Limited - Registered in England and
> Wales
> >
> > > with
> > > > > > > number
> > > > > > > > > 741598.
> > > > > > > > > Registered office: PO Box 41, North Harbour, Portsmouth,
> > > Hampshire
> > > > > PO6
> > > > > > > > 3AU
> > > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > -Regards,
> > > > > > > Mayuresh R. Gharat
> > > > > > > (862) 250-7125
> > > > > > >
> > > > >
> > > > > Unless stated otherwise above:
> > > > > IBM United Kingdom Limited - Registered in England and Wales with
> > > number
> > > > > 741598.
> > > > > Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire
> > PO6
> > > 3AU
> > > >
> > > > Unless stated otherwise above:
> > > > IBM United Kingdom Limited - Registered in England and Wales with
> > > > number 741598.
> > > > Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire
> PO6
> >
> > > 3AU
> > >
> > > Unless stated otherwise above:
> > > IBM United Kingdom Limited - Registered in England and Wales with
> number
> >
> > > 741598.
> > > Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
> > 3AU
> >
> > Unless stated otherwise above:
> > IBM United Kingdom Limited - Registered in England and Wales with number
> > 741598.
> > Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
> 3AU
>


-- 
"When the people fear their government, there is tyranny; when the
government fears the people, there is liberty." [Thomas Jefferson]