You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by Edoardo Comar <EC...@uk.ibm.com> on 2018/11/21 17:07:39 UTC

[DISCUSS] KIP-391: Allow Producing with Offsets for Cluster Replication

Hi,
we've opened a KIP to improve data replication between Kafka clusters :

https://cwiki.apache.org/confluence/display/KAFKA/KIP-391%3A+Allow+Producing+with+Offsets+for+Cluster+Replication

We'd like to start a discussion, please post your feedback in this thread.

Thank you
Edo and Mickael


--------------------------------------------------

Edoardo Comar

IBM Event Streams
IBM UK Ltd, Hursley Park, SO21 2JN

Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU

Re: [DISCUSS] KIP-391: Allow Producing with Offsets for Cluster Replication

Posted by Edoardo Comar <EC...@uk.ibm.com>.
(shameless bump) any additional feedback is welcome ... thanks!


Edoardo Comar <EC...@uk.ibm.com> wrote on 27/11/2018 15:35:09:

> From: Edoardo Comar <EC...@uk.ibm.com>
> To: dev@kafka.apache.org
> Date: 27/11/2018 15:35
> Subject: Re: [DISCUSS] KIP-391: Allow Producing with Offsets for 
> Cluster Replication
> 
> Hi Jason
> 
> we envisioned the replicator to replicate the __consumer_offsets topic 
too 
> (although without producing-with-offsets to it!).
> 
> As there is no client-side implementation yet using the leader epoch, 
> we could not yet see the impact of writing to the destination cluster 
> __consumer_offsets records with an invalid leader epoch.
> 
> Also, applications might still use external storage mechanism for 
consumer 
> offsets where the leader_epoch is missing.
> 
> Perhaps the replicator could - for the __consumer_offsets topic - just 
> omit the leader_epoch field in the data sent to destination.
> 
> What do you think ?
> 
> 
> Jason Gustafson <ja...@confluent.io> wrote on 27/11/2018 00:09:56:
> 
> > Another wrinkle to consider is KIP-320. If you are planning to 
replicate
> > __consumer_offsets directly, then you will have to account for leader 
> epoch
> > information which is stored with the committed offsets. But I cannot 
> think
> > how it would be possible to replicate the leader epoch information in
> > messages even if you can preserve offsets.
> > 
> > -Jason
> > 
> > On Mon, Nov 26, 2018 at 1:16 PM Mayuresh Gharat 
> <gh...@gmail.com>
> > wrote:
> > 
> > > Hi Edoardo,
> > >
> > > Thanks a lot for the KIP.
> > >  I have a few questions/suggestions in addition to what Radai has 
> mentioned
> > > above :
> > >
> > >    1. Is this meant only for 1:1 replication, for example one Kafka 
> cluster
> > >    replicating to other, instead of having multiple Kafka clusters
> > > mirroring
> > >    into one Kafka cluster?
> > >    2. Are we relying on exactly once produce in the replicator? If 
> not, how
> > >    are retries handled in the replicator ?
> > >    3. What is the recommended value for inflight requests, here. Is 
it
> > >    suppose to be strictly 1, if yes, it would be great to mention 
that 
> in
> > > the
> > >    KIP.
> > >    4. How is unclean Leader election between source cluster and 
> destination
> > >    cluster handled?
> > >    5. How are offsets resets in case of the replicator's consumer 
> handled?
> > >    6. It would be good to explain the workflow in the KIP, with an
> > >    example,  regarding how this KIP will change the replication 
> scenario
> > > and
> > >    how it will benefit the consumer apps.
> > >
> > > Thanks,
> > >
> > > Mayuresh
> > >
> > > On Mon, Nov 26, 2018 at 8:08 AM radai <ra...@gmail.com> 
> wrote:
> > >
> > > > a few questions:
> > > >
> > > > 1. how do you handle possible duplications caused by the "special"
> > > > producer timing-out/retrying? are you explicitely relying on the
> > > > "exactly once" sequencing?
> > > > 2. what about the combination of log compacted topics + replicator
> > > > downtime? by the time the replicator comes back up there might be
> > > > "holes" in the source offsets (some msgs might have been compacted
> > > > out)? how is that recoverable?
> > > > 3. similarly, what if you try and fire up replication on a 
non-empty
> > > > source topic? does the kip allow for offsets starting at some
> > > > arbitrary X > 0 ? or would this have to be designed from the 
start.
> > > >
> > > > and lastly, since this KIP seems to be designed fro active-passive
> > > > failover (there can be no produce traffic except the replicator)
> > > > wouldnt a solution based on seeking to a time offset be more 
> generic?
> > > > your producers could checkpoint the last (say log append) 
timestamp 
> of
> > > > records theyve seen, and when restoring in the remote site seek to
> > > > those timestamps (which will be metadata in their committed 
offsets) 
> -
> > > > assumming replication takes > 0 time you'd need to handle some 
dups,
> > > > but every kafka consumer setup needs to know how to handle those
> > > > anyway.
> > > > On Fri, Nov 23, 2018 at 2:27 AM Edoardo Comar <EC...@uk.ibm.com> 
> wrote:
> > > > >
> > > > > Hi Stanislav
> > > > >
> > > > > > > The flag is needed to distinguish a batch with a desired 
base
> > > offset
> > > > > of
> > > > > > 0,
> > > > > > from a regular batch for which offsets need to be generated.
> > > > > > If the producer can provide offsets, why not provide a base 
> offset of
> > > > 0?
> > > > >
> > > > > a regular batch (for which offsets are generated by the broker 
on
> > > write)
> > > > > is sent with a base offset of 0.
> > > > > How could you distinguish it from a batch where you *want* the 
> first
> > > > > record to be written at offset 0 (i.e. be the first in the 
> partition
> > > and
> > > > > be rejected if there are records on the log already) ?
> > > > > We wanted to avoid a "deep" inspection (and potentially 
> decompression)
> > > of
> > > > > the records.
> > > > >
> > > > > For the replicator use case, a single produce request where all 
> the
> > > data
> > > > > is to be assumed with offset,
> > > > > or all without offsets, seems to suffice,
> > > > > So we added only a toplevel flag, not a per-topic-partition one.
> > > > >
> > > > > Thanks for your interest !
> > > > > cheers
> > > > > Edo
> > > > > --------------------------------------------------
> > > > >
> > > > > Edoardo Comar
> > > > >
> > > > > IBM Event Streams
> > > > > IBM UK Ltd, Hursley Park, SO21 2JN
> > > > >
> > > > >
> > > > > Stanislav Kozlovski <st...@confluent.io> wrote on 22/11/2018
> > > > 22:32:42:
> > > > >
> > > > > > From: Stanislav Kozlovski <st...@confluent.io>
> > > > > > To: dev@kafka.apache.org
> > > > > > Date: 22/11/2018 22:33
> > > > > > Subject: Re: [DISCUSS] KIP-391: Allow Producing with Offsets 
for
> > > > > > Cluster Replication
> > > > > >
> > > > > > Hey Edo & Mickael,
> > > > > >
> > > > > > > The flag is needed to distinguish a batch with a desired 
base
> > > offset
> > > > > of
> > > > > > 0,
> > > > > > from a regular batch for which offsets need to be generated.
> > > > > > If the producer can provide offsets, why not provide a base 
> offset of
> > > > 0?
> > > > > >
> > > > > > > (I am reading your post thinking about
> > > > > > partitions rather than topics).
> > > > > > Yes, I meant partitions. Sorry about that.
> > > > > >
> > > > > > Thanks for answering my questions :)
> > > > > >
> > > > > > Best,
> > > > > > Stanislav
> > > > > >
> > > > > > On Thu, Nov 22, 2018 at 5:28 PM Edoardo Comar 
> <EC...@uk.ibm.com>
> > > > wrote:
> > > > > >
> > > > > > > Hi Stanislav,
> > > > > > >
> > > > > > > you're right we envision the replicator use case to have a 
> single
> > > > > producer
> > > > > > > with offsets per partition (I am reading your post thinking 
> about
> > > > > > > partitions rather than topics).
> > > > > > >
> > > > > > > If a regular producer was to send its own records at the 
same 
> time,
> > > > > it's
> > > > > > > very likely that the one sending with an offset will fail 
> because
> > > of
> > > > > > > invalid offsets.
> > > > > > > Same if two producers were sending with offsets, likely both 

> would
> > > > > then
> > > > > > > fail.
> > > > > > >
> > > > > > > > Does it make sense to *lock* the topic from other 
producers 
> while
> > > > > there
> > > > > > > is
> > > > > > > > one that uses offsets?
> > > > > > >
> > > > > > > You could do that with ACL permissions if you wanted, I 
don't 
> think
> > > > it
> > > > > > > needs to be mandated by changing the broker logic.
> > > > > > >
> > > > > > >
> > > > > > > > Since we are tying the produce-with-offset request to the 
> ACL, do
> > > > we
> > > > > > > need
> > > > > > > > the `use_offset` field in the produce request? Maybe we 
make 
> it
> > > > > > > mandatory
> > > > > > > > for produce requests with that ACL to have offsets.
> > > > > > >
> > > > > > > The flag is needed to distinguish a batch with a desired 
base
> > > offset
> > > > > of 0,
> > > > > > > from a regular batch for which offsets need to be generated.
> > > > > > > I would not restrict a principal to only send-with-offsets 
(by
> > > making
> > > > > that
> > > > > > > mandatory via the ACL).
> > > > > > >
> > > > > > > Thanks
> > > > > > > Edo & Mickael
> > > > > > >
> > > > > > > --------------------------------------------------
> > > > > > >
> > > > > > > Edoardo Comar
> > > > > > >
> > > > > > > IBM Event Streams
> > > > > > > IBM UK Ltd, Hursley Park, SO21 2JN
> > > > > > >
> > > > > > >
> > > > > > > Stanislav Kozlovski <st...@confluent.io> wrote on 
> 22/11/2018
> > > > > 16:17:11:
> > > > > > >
> > > > > > > > From: Stanislav Kozlovski <st...@confluent.io>
> > > > > > > > To: dev@kafka.apache.org
> > > > > > > > Date: 22/11/2018 16:17
> > > > > > > > Subject: Re: [DISCUSS] KIP-391: Allow Producing with 
Offsets 
> for
> > > > > > > > Cluster Replication
> > > > > > > >
> > > > > > > > Hey Edurdo, thanks for the KIP!
> > > > > > > >
> > > > > > > > I have some questions, apologies if they are naive:
> > > > > > > > Is this intended to work for a single producer use case 
> only?
> > > > > > > > How would it work if two producers were producing to the 
> same
> > > topic
> > > > > with
> > > > > > > > offsets?
> > > > > > > > How would it work if two producers, one with offsets and 
one
> > > > without
> > > > > > > were
> > > > > > > > producing to a topic?
> > > > > > > > Does it make sense to *lock* the topic from other 
producers 
> while
> > > > > there
> > > > > > > is
> > > > > > > > one that uses offsets?
> > > > > > > >
> > > > > > > > Since we are tying the produce-with-offset request to the 
> ACL, do
> > > > we
> > > > > > > need
> > > > > > > > the `use_offset` field in the produce request? Maybe we 
make 
> it
> > > > > > > mandatory
> > > > > > > > for produce requests with that ACL to have offsets.
> > > > > > > >
> > > > > > > > Best,
> > > > > > > > Stanislav
> > > > > > > >
> > > > > > > > On Wed, Nov 21, 2018 at 5:14 PM Edoardo Comar 
> <ECOMAR@uk.ibm.com
> > > >
> > > > > wrote:
> > > > > > > >
> > > > > > > > > Hi,
> > > > > > > > > we've opened a KIP to improve data replication between 
> Kafka
> > > > > clusters
> > > > > > > :
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > INVALID URI REMOVED
> > > > > > > >
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > > 
> > 
> 
u=https-3A__cwiki.apache.org_confluence_display_KAFKA_KIP-2D391-253A-2BAllow-2BProducing-2Bwith-2BOffsets-2Bfor-2BCluster-2BReplication&d=DwIBaQ&c=jf_iaSHvJObTbx-
> > > > > > > >
> > > > > > >
> > > > >
> > > 
> 
siA1ZOg&r=EzRhmSah4IHsUZVekRUIINhltZK7U0OaeRo7hgW4_tQ&m=uUj9C3BdbYz0dDNA-
> > > > > > > >
> > > > > > >
> > > > >
> > > >
> > > 
> 
E6iXreg1M5hWiWgG6ClS86VIPI&s=Vav8_-N7_OpfYEW33yGOf_or8ESMUJ4S45t2g-EUWKg&e=
> > > > > > > > >
> > > > > > > > > We'd like to start a discussion, please post your 
feedback 
> in
> > > > this
> > > > > > > thread.
> > > > > > > > >
> > > > > > > > > Thank you
> > > > > > > > > Edo and Mickael
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > --------------------------------------------------
> > > > > > > > >
> > > > > > > > > Edoardo Comar
> > > > > > > > >
> > > > > > > > > IBM Event Streams
> > > > > > > > > IBM UK Ltd, Hursley Park, SO21 2JN
> > > > > > > > >
> > > > > > > > > Unless stated otherwise above:
> > > > > > > > > IBM United Kingdom Limited - Registered in England and 
> Wales
> > > with
> > > > > > > number
> > > > > > > > > 741598.
> > > > > > > > > Registered office: PO Box 41, North Harbour, Portsmouth,
> > > > Hampshire
> > > > > PO6
> > > > > > > 3AU
> > > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > --
> > > > > > > > Best,
> > > > > > > > Stanislav
> > > > > > >
> > > > > > > Unless stated otherwise above:
> > > > > > > IBM United Kingdom Limited - Registered in England and Wales 

> with
> > > > > number
> > > > > > > 741598.
> > > > > > > Registered office: PO Box 41, North Harbour, Portsmouth, 
> Hampshire
> > > > PO6
> > > > > 3AU
> > > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Best,
> > > > > > Stanislav
> > > > >
> > > > > Unless stated otherwise above:
> > > > > IBM United Kingdom Limited - Registered in England and Wales 
with
> > > number
> > > > > 741598.
> > > > > Registered office: PO Box 41, North Harbour, Portsmouth, 
Hampshire 
> PO6
> > > > 3AU
> > > >
> > >
> > >
> > > --
> > > -Regards,
> > > Mayuresh R. Gharat
> > > (862) 250-7125
> > >
> 
> Unless stated otherwise above:
> IBM United Kingdom Limited - Registered in England and Wales with number 

> 741598. 
> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 
3AU

Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU

Re: [DISCUSS] KIP-391: Allow Producing with Offsets for Cluster Replication

Posted by Edoardo Comar <ed...@gmail.com>.
Hi Radai,
thanks for the observation on the the kip-320 conflict.

I would not have the destination broker treat the __consumer_offsets
as a special
case (if this is what you suggested).

Rather in the replicator the __consumer_offsets topic could be treated as a
special case
where instead of just replicating the value as-is - it would edit it by
stripping the epoch.

As previously mentioned, the __consumer_offsets topic does not need to be
replicated by producing-with-offsets to it.

--------------------------------------------------
Edoardo Comar
IBM Event Streams

On Wed, 23 Jan 2019 at 03:18, radai <ra...@gmail.com> wrote:

> the kip-320 conflict can be resolved by saying that the leader broker
> on the destination "stamps" is own local leader epoch on the incoming
> msgs - meaning the offsets "transfer" but leader epochs do not.
>
> On Mon, Jan 7, 2019 at 1:38 PM Edoardo Comar <EC...@uk.ibm.com> wrote:
> >
> > Hi,
> > I delayed starting the voting thread due to the festive period. I would
> > like to start it this week.
> > Has anyone any more feedback ?
> >
> > --------------------------------------------------
> >
> > Edoardo Comar
> >
> > IBM Event Streams
> >
> >
> > Edoardo Comar <EC...@uk.ibm.com> wrote on 13/12/2018 17:50:30:
> >
> > > From: Edoardo Comar <EC...@uk.ibm.com>
> > > To: dev@kafka.apache.org
> > > Date: 13/12/2018 17:50
> > > Subject: Re: [DISCUSS] KIP-391: Allow Producing with Offsets for
> > > Cluster Replication
> > >
> > > Hi,
> > > as we haven't got any more feedback, we'd like to start a vote on
> > KIP-391
> > > on Monday
> > >
> > > INVALID URI REMOVED
> > >
> >
> u=https-3A__cwiki.apache.org_confluence_display_KAFKA_KIP-2D391-253A-2BAllow-2BProducing-2Bwith-2BOffsets-2Bfor-2BCluster-2BReplication&d=DwIFAg&c=jf_iaSHvJObTbx-
> > >
> >
> siA1ZOg&r=EzRhmSah4IHsUZVekRUIINhltZK7U0OaeRo7hgW4_tQ&m=hxekG7cvm8Peoyd4oPqvSwRFRuGIyi9Pc_h2GhHbgtw&s=4SGyJsJAuYWZWADpzAaSEPqzYnde0WRW6XgZ3L4haB4&e=
> > >
> > > --------------------------------------------------
> > >
> > > Edoardo Comar
> > >
> > > IBM Event Streams
> > > IBM UK Ltd, Hursley Park, SO21 2JN
> > >
> > >
> > > Edoardo Comar/UK/IBM wrote on 10/12/2018 10:20:06:
> > >
> > > > From: Edoardo Comar/UK/IBM
> > > > To: dev@kafka.apache.org
> > > > Date: 10/12/2018 10:20
> > > > Subject: Re: [DISCUSS] KIP-391: Allow Producing with Offsets for
> > > > Cluster Replication
> > > >
> > > > (shameless bump) any additional feedback is welcome ... thanks!
> > > >
> > > > Edoardo Comar <EC...@uk.ibm.com> wrote on 27/11/2018 15:35:09:
> > > >
> > > > > From: Edoardo Comar <EC...@uk.ibm.com>
> > > > > To: dev@kafka.apache.org
> > > > > Date: 27/11/2018 15:35
> > > > > Subject: Re: [DISCUSS] KIP-391: Allow Producing with Offsets for
> > > > > Cluster Replication
> > > > >
> > > > > Hi Jason
> > > > >
> > > > > we envisioned the replicator to replicate the __consumer_offsets
> > topic
> > > too
> > > > > (although without producing-with-offsets to it!).
> > > > >
> > > > > As there is no client-side implementation yet using the leader
> > epoch,
> > > > > we could not yet see the impact of writing to the destination
> > cluster
> > > > > __consumer_offsets records with an invalid leader epoch.
> > > > >
> > > > > Also, applications might still use external storage mechanism for
> > > consumer
> > > > > offsets where the leader_epoch is missing.
> > > > >
> > > > > Perhaps the replicator could - for the __consumer_offsets topic -
> > just
> > >
> > > > > omit the leader_epoch field in the data sent to destination.
> > > > >
> > > > > What do you think ?
> > > > >
> > > > >
> > > > > Jason Gustafson <ja...@confluent.io> wrote on 27/11/2018 00:09:56:
> > > > >
> > > > > > Another wrinkle to consider is KIP-320. If you are planning to
> > > replicate
> > > > > > __consumer_offsets directly, then you will have to account for
> > > leader
> > > > > epoch
> > > > > > information which is stored with the committed offsets. But I
> > cannot
> > >
> > > > > think
> > > > > > how it would be possible to replicate the leader epoch
> information
> >
> > > in
> > > > > > messages even if you can preserve offsets.
> > > > > >
> > > > > > -Jason
> > > > > >
> > > > > > On Mon, Nov 26, 2018 at 1:16 PM Mayuresh Gharat
> > > > > <gh...@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > Hi Edoardo,
> > > > > > >
> > > > > > > Thanks a lot for the KIP.
> > > > > > >  I have a few questions/suggestions in addition to what Radai
> > has
> > > > > mentioned
> > > > > > > above :
> > > > > > >
> > > > > > >    1. Is this meant only for 1:1 replication, for example one
> > > Kafka
> > > > > cluster
> > > > > > >    replicating to other, instead of having multiple Kafka
> > clusters
> > > > > > > mirroring
> > > > > > >    into one Kafka cluster?
> > > > > > >    2. Are we relying on exactly once produce in the replicator?
> > If
> > >
> > > > > not, how
> > > > > > >    are retries handled in the replicator ?
> > > > > > >    3. What is the recommended value for inflight requests,
> here.
> >
> > > Is it
> > > > > > >    suppose to be strictly 1, if yes, it would be great to
> > mention
> > > that
> > > > > in
> > > > > > > the
> > > > > > >    KIP.
> > > > > > >    4. How is unclean Leader election between source cluster and
> > > > > destination
> > > > > > >    cluster handled?
> > > > > > >    5. How are offsets resets in case of the replicator's
> > consumer
> > > > > handled?
> > > > > > >    6. It would be good to explain the workflow in the KIP, with
> > an
> > > > > > >    example,  regarding how this KIP will change the replication
> > > > > scenario
> > > > > > > and
> > > > > > >    how it will benefit the consumer apps.
> > > > > > >
> > > > > > > Thanks,
> > > > > > >
> > > > > > > Mayuresh
> > > > > > >
> > > > > > > On Mon, Nov 26, 2018 at 8:08 AM radai
> > <ra...@gmail.com>
> > >
> > > > > wrote:
> > > > > > >
> > > > > > > > a few questions:
> > > > > > > >
> > > > > > > > 1. how do you handle possible duplications caused by the
> > > "special"
> > > > > > > > producer timing-out/retrying? are you explicitely relying on
> > the
> > > > > > > > "exactly once" sequencing?
> > > > > > > > 2. what about the combination of log compacted topics +
> > > replicator
> > > > > > > > downtime? by the time the replicator comes back up there
> might
> >
> > > be
> > > > > > > > "holes" in the source offsets (some msgs might have been
> > > compacted
> > > > > > > > out)? how is that recoverable?
> > > > > > > > 3. similarly, what if you try and fire up replication on a
> > > non-empty
> > > > > > > > source topic? does the kip allow for offsets starting at some
> > > > > > > > arbitrary X > 0 ? or would this have to be designed from the
> > > start.
> > > > > > > >
> > > > > > > > and lastly, since this KIP seems to be designed fro
> > > active-passive
> > > > > > > > failover (there can be no produce traffic except the
> > replicator)
> > > > > > > > wouldnt a solution based on seeking to a time offset be more
> > > > > generic?
> > > > > > > > your producers could checkpoint the last (say log append)
> > > timestamp
> > > > > of
> > > > > > > > records theyve seen, and when restoring in the remote site
> > seek
> > > to
> > > > > > > > those timestamps (which will be metadata in their committed
> > > offsets)
> > > > > -
> > > > > > > > assumming replication takes > 0 time you'd need to handle
> some
> >
> > > dups,
> > > > > > > > but every kafka consumer setup needs to know how to handle
> > those
> > > > > > > > anyway.
> > > > > > > > On Fri, Nov 23, 2018 at 2:27 AM Edoardo Comar
> > > <EC...@uk.ibm.com>
> > > > > wrote:
> > > > > > > > >
> > > > > > > > > Hi Stanislav
> > > > > > > > >
> > > > > > > > > > > The flag is needed to distinguish a batch with a
> desired
> >
> > > base
> > > > > > > offset
> > > > > > > > > of
> > > > > > > > > > 0,
> > > > > > > > > > from a regular batch for which offsets need to be
> > generated.
> > > > > > > > > > If the producer can provide offsets, why not provide a
> > base
> > > > > offset of
> > > > > > > > 0?
> > > > > > > > >
> > > > > > > > > a regular batch (for which offsets are generated by the
> > broker
> > > on
> > > > > > > write)
> > > > > > > > > is sent with a base offset of 0.
> > > > > > > > > How could you distinguish it from a batch where you *want*
> > the
> > >
> > > > > first
> > > > > > > > > record to be written at offset 0 (i.e. be the first in the
> > > > > partition
> > > > > > > and
> > > > > > > > > be rejected if there are records on the log already) ?
> > > > > > > > > We wanted to avoid a "deep" inspection (and potentially
> > > > > decompression)
> > > > > > > of
> > > > > > > > > the records.
> > > > > > > > >
> > > > > > > > > For the replicator use case, a single produce request where
> > > all
> > > > > the
> > > > > > > data
> > > > > > > > > is to be assumed with offset,
> > > > > > > > > or all without offsets, seems to suffice,
> > > > > > > > > So we added only a toplevel flag, not a per-topic-partition
> > > one.
> > > > > > > > >
> > > > > > > > > Thanks for your interest !
> > > > > > > > > cheers
> > > > > > > > > Edo
> > > > > > > > > --------------------------------------------------
> > > > > > > > >
> > > > > > > > > Edoardo Comar
> > > > > > > > >
> > > > > > > > > IBM Event Streams
> > > > > > > > > IBM UK Ltd, Hursley Park, SO21 2JN
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Stanislav Kozlovski <st...@confluent.io> wrote on
> > > 22/11/2018
> > > > > > > > 22:32:42:
> > > > > > > > >
> > > > > > > > > > From: Stanislav Kozlovski <st...@confluent.io>
> > > > > > > > > > To: dev@kafka.apache.org
> > > > > > > > > > Date: 22/11/2018 22:33
> > > > > > > > > > Subject: Re: [DISCUSS] KIP-391: Allow Producing with
> > Offsets
> > > for
> > > > > > > > > > Cluster Replication
> > > > > > > > > >
> > > > > > > > > > Hey Edo & Mickael,
> > > > > > > > > >
> > > > > > > > > > > The flag is needed to distinguish a batch with a
> desired
> >
> > > base
> > > > > > > offset
> > > > > > > > > of
> > > > > > > > > > 0,
> > > > > > > > > > from a regular batch for which offsets need to be
> > generated.
> > > > > > > > > > If the producer can provide offsets, why not provide a
> > base
> > > > > offset of
> > > > > > > > 0?
> > > > > > > > > >
> > > > > > > > > > > (I am reading your post thinking about
> > > > > > > > > > partitions rather than topics).
> > > > > > > > > > Yes, I meant partitions. Sorry about that.
> > > > > > > > > >
> > > > > > > > > > Thanks for answering my questions :)
> > > > > > > > > >
> > > > > > > > > > Best,
> > > > > > > > > > Stanislav
> > > > > > > > > >
> > > > > > > > > > On Thu, Nov 22, 2018 at 5:28 PM Edoardo Comar
> > > > > <EC...@uk.ibm.com>
> > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Hi Stanislav,
> > > > > > > > > > >
> > > > > > > > > > > you're right we envision the replicator use case to
> have
> > a
> > >
> > > > > single
> > > > > > > > > producer
> > > > > > > > > > > with offsets per partition (I am reading your post
> > > thinking
> > > > > about
> > > > > > > > > > > partitions rather than topics).
> > > > > > > > > > >
> > > > > > > > > > > If a regular producer was to send its own records at
> the
> >
> > > same
> > > > > time,
> > > > > > > > > it's
> > > > > > > > > > > very likely that the one sending with an offset will
> > fail
> > > > > because
> > > > > > > of
> > > > > > > > > > > invalid offsets.
> > > > > > > > > > > Same if two producers were sending with offsets, likely
> > > both
> > > > > would
> > > > > > > > > then
> > > > > > > > > > > fail.
> > > > > > > > > > >
> > > > > > > > > > > > Does it make sense to *lock* the topic from other
> > > producers
> > > > > while
> > > > > > > > > there
> > > > > > > > > > > is
> > > > > > > > > > > > one that uses offsets?
> > > > > > > > > > >
> > > > > > > > > > > You could do that with ACL permissions if you wanted, I
> > > don't
> > > > > think
> > > > > > > > it
> > > > > > > > > > > needs to be mandated by changing the broker logic.
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > > Since we are tying the produce-with-offset request to
> > > the
> > > > > ACL, do
> > > > > > > > we
> > > > > > > > > > > need
> > > > > > > > > > > > the `use_offset` field in the produce request? Maybe
> > we
> > > make
> > > > > it
> > > > > > > > > > > mandatory
> > > > > > > > > > > > for produce requests with that ACL to have offsets.
> > > > > > > > > > >
> > > > > > > > > > > The flag is needed to distinguish a batch with a
> desired
> >
> > > base
> > > > > > > offset
> > > > > > > > > of 0,
> > > > > > > > > > > from a regular batch for which offsets need to be
> > > generated.
> > > > > > > > > > > I would not restrict a principal to only
> > send-with-offsets
> > > (by
> > > > > > > making
> > > > > > > > > that
> > > > > > > > > > > mandatory via the ACL).
> > > > > > > > > > >
> > > > > > > > > > > Thanks
> > > > > > > > > > > Edo & Mickael
> > > > > > > > > > >
> > > > > > > > > > > --------------------------------------------------
> > > > > > > > > > >
> > > > > > > > > > > Edoardo Comar
> > > > > > > > > > >
> > > > > > > > > > > IBM Event Streams
> > > > > > > > > > > IBM UK Ltd, Hursley Park, SO21 2JN
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Stanislav Kozlovski <st...@confluent.io> wrote on
> > > > > 22/11/2018
> > > > > > > > > 16:17:11:
> > > > > > > > > > >
> > > > > > > > > > > > From: Stanislav Kozlovski <st...@confluent.io>
> > > > > > > > > > > > To: dev@kafka.apache.org
> > > > > > > > > > > > Date: 22/11/2018 16:17
> > > > > > > > > > > > Subject: Re: [DISCUSS] KIP-391: Allow Producing with
> > > Offsets
> > > > > for
> > > > > > > > > > > > Cluster Replication
> > > > > > > > > > > >
> > > > > > > > > > > > Hey Edurdo, thanks for the KIP!
> > > > > > > > > > > >
> > > > > > > > > > > > I have some questions, apologies if they are naive:
> > > > > > > > > > > > Is this intended to work for a single producer use
> > case
> > > > > only?
> > > > > > > > > > > > How would it work if two producers were producing to
> > the
> > >
> > > > > same
> > > > > > > topic
> > > > > > > > > with
> > > > > > > > > > > > offsets?
> > > > > > > > > > > > How would it work if two producers, one with offsets
> > and
> > > one
> > > > > > > > without
> > > > > > > > > > > were
> > > > > > > > > > > > producing to a topic?
> > > > > > > > > > > > Does it make sense to *lock* the topic from other
> > > producers
> > > > > while
> > > > > > > > > there
> > > > > > > > > > > is
> > > > > > > > > > > > one that uses offsets?
> > > > > > > > > > > >
> > > > > > > > > > > > Since we are tying the produce-with-offset request to
> > > the
> > > > > ACL, do
> > > > > > > > we
> > > > > > > > > > > need
> > > > > > > > > > > > the `use_offset` field in the produce request? Maybe
> > we
> > > make
> > > > > it
> > > > > > > > > > > mandatory
> > > > > > > > > > > > for produce requests with that ACL to have offsets.
> > > > > > > > > > > >
> > > > > > > > > > > > Best,
> > > > > > > > > > > > Stanislav
> > > > > > > > > > > >
> > > > > > > > > > > > On Wed, Nov 21, 2018 at 5:14 PM Edoardo Comar
> > > > > <ECOMAR@uk.ibm.com
> > > > > > > >
> > > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > Hi,
> > > > > > > > > > > > > we've opened a KIP to improve data replication
> > between
> > >
> > > > > Kafka
> > > > > > > > > clusters
> > > > > > > > > > > :
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > INVALID URI REMOVED
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> u=https-3A__cwiki.apache.org_confluence_display_KAFKA_KIP-2D391-253A-2BAllow-2BProducing-2Bwith-2BOffsets-2Bfor-2BCluster-2BReplication&d=DwIBaQ&c=jf_iaSHvJObTbx-
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > >
> > siA1ZOg&r=EzRhmSah4IHsUZVekRUIINhltZK7U0OaeRo7hgW4_tQ&m=uUj9C3BdbYz0dDNA-
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > >
> > >
> >
> E6iXreg1M5hWiWgG6ClS86VIPI&s=Vav8_-N7_OpfYEW33yGOf_or8ESMUJ4S45t2g-EUWKg&e=
> > > > > > > > > > > > >
> > > > > > > > > > > > > We'd like to start a discussion, please post your
> > > feedback
> > > > > in
> > > > > > > > this
> > > > > > > > > > > thread.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Thank you
> > > > > > > > > > > > > Edo and Mickael
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > --------------------------------------------------
> > > > > > > > > > > > >
> > > > > > > > > > > > > Edoardo Comar
> > > > > > > > > > > > >
> > > > > > > > > > > > > IBM Event Streams
> > > > > > > > > > > > > IBM UK Ltd, Hursley Park, SO21 2JN
> > > > > > > > > > > > >
> > > > > > > > > > > > > Unless stated otherwise above:
> > > > > > > > > > > > > IBM United Kingdom Limited - Registered in England
> > and
> > >
> > > > > Wales
> > > > > > > with
> > > > > > > > > > > number
> > > > > > > > > > > > > 741598.
> > > > > > > > > > > > > Registered office: PO Box 41, North Harbour,
> > > Portsmouth,
> > > > > > > > Hampshire
> > > > > > > > > PO6
> > > > > > > > > > > 3AU
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > --
> > > > > > > > > > > > Best,
> > > > > > > > > > > > Stanislav
> > > > > > > > > > >
> > > > > > > > > > > Unless stated otherwise above:
> > > > > > > > > > > IBM United Kingdom Limited - Registered in England and
> > > Wales
> > > > > with
> > > > > > > > > number
> > > > > > > > > > > 741598.
> > > > > > > > > > > Registered office: PO Box 41, North Harbour,
> Portsmouth,
> >
> > > > > Hampshire
> > > > > > > > PO6
> > > > > > > > > 3AU
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > --
> > > > > > > > > > Best,
> > > > > > > > > > Stanislav
> > > > > > > > >
> > > > > > > > > Unless stated otherwise above:
> > > > > > > > > IBM United Kingdom Limited - Registered in England and
> Wales
> >
> > > with
> > > > > > > number
> > > > > > > > > 741598.
> > > > > > > > > Registered office: PO Box 41, North Harbour, Portsmouth,
> > > Hampshire
> > > > > PO6
> > > > > > > > 3AU
> > > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > -Regards,
> > > > > > > Mayuresh R. Gharat
> > > > > > > (862) 250-7125
> > > > > > >
> > > > >
> > > > > Unless stated otherwise above:
> > > > > IBM United Kingdom Limited - Registered in England and Wales with
> > > number
> > > > > 741598.
> > > > > Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire
> > PO6
> > > 3AU
> > > >
> > > > Unless stated otherwise above:
> > > > IBM United Kingdom Limited - Registered in England and Wales with
> > > > number 741598.
> > > > Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire
> PO6
> >
> > > 3AU
> > >
> > > Unless stated otherwise above:
> > > IBM United Kingdom Limited - Registered in England and Wales with
> number
> >
> > > 741598.
> > > Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
> > 3AU
> >
> > Unless stated otherwise above:
> > IBM United Kingdom Limited - Registered in England and Wales with number
> > 741598.
> > Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
> 3AU
>


-- 
"When the people fear their government, there is tyranny; when the
government fears the people, there is liberty." [Thomas Jefferson]

Re: [DISCUSS] KIP-391: Allow Producing with Offsets for Cluster Replication

Posted by radai <ra...@gmail.com>.
the kip-320 conflict can be resolved by saying that the leader broker
on the destination "stamps" is own local leader epoch on the incoming
msgs - meaning the offsets "transfer" but leader epochs do not.

On Mon, Jan 7, 2019 at 1:38 PM Edoardo Comar <EC...@uk.ibm.com> wrote:
>
> Hi,
> I delayed starting the voting thread due to the festive period. I would
> like to start it this week.
> Has anyone any more feedback ?
>
> --------------------------------------------------
>
> Edoardo Comar
>
> IBM Event Streams
>
>
> Edoardo Comar <EC...@uk.ibm.com> wrote on 13/12/2018 17:50:30:
>
> > From: Edoardo Comar <EC...@uk.ibm.com>
> > To: dev@kafka.apache.org
> > Date: 13/12/2018 17:50
> > Subject: Re: [DISCUSS] KIP-391: Allow Producing with Offsets for
> > Cluster Replication
> >
> > Hi,
> > as we haven't got any more feedback, we'd like to start a vote on
> KIP-391
> > on Monday
> >
> > INVALID URI REMOVED
> >
> u=https-3A__cwiki.apache.org_confluence_display_KAFKA_KIP-2D391-253A-2BAllow-2BProducing-2Bwith-2BOffsets-2Bfor-2BCluster-2BReplication&d=DwIFAg&c=jf_iaSHvJObTbx-
> >
> siA1ZOg&r=EzRhmSah4IHsUZVekRUIINhltZK7U0OaeRo7hgW4_tQ&m=hxekG7cvm8Peoyd4oPqvSwRFRuGIyi9Pc_h2GhHbgtw&s=4SGyJsJAuYWZWADpzAaSEPqzYnde0WRW6XgZ3L4haB4&e=
> >
> > --------------------------------------------------
> >
> > Edoardo Comar
> >
> > IBM Event Streams
> > IBM UK Ltd, Hursley Park, SO21 2JN
> >
> >
> > Edoardo Comar/UK/IBM wrote on 10/12/2018 10:20:06:
> >
> > > From: Edoardo Comar/UK/IBM
> > > To: dev@kafka.apache.org
> > > Date: 10/12/2018 10:20
> > > Subject: Re: [DISCUSS] KIP-391: Allow Producing with Offsets for
> > > Cluster Replication
> > >
> > > (shameless bump) any additional feedback is welcome ... thanks!
> > >
> > > Edoardo Comar <EC...@uk.ibm.com> wrote on 27/11/2018 15:35:09:
> > >
> > > > From: Edoardo Comar <EC...@uk.ibm.com>
> > > > To: dev@kafka.apache.org
> > > > Date: 27/11/2018 15:35
> > > > Subject: Re: [DISCUSS] KIP-391: Allow Producing with Offsets for
> > > > Cluster Replication
> > > >
> > > > Hi Jason
> > > >
> > > > we envisioned the replicator to replicate the __consumer_offsets
> topic
> > too
> > > > (although without producing-with-offsets to it!).
> > > >
> > > > As there is no client-side implementation yet using the leader
> epoch,
> > > > we could not yet see the impact of writing to the destination
> cluster
> > > > __consumer_offsets records with an invalid leader epoch.
> > > >
> > > > Also, applications might still use external storage mechanism for
> > consumer
> > > > offsets where the leader_epoch is missing.
> > > >
> > > > Perhaps the replicator could - for the __consumer_offsets topic -
> just
> >
> > > > omit the leader_epoch field in the data sent to destination.
> > > >
> > > > What do you think ?
> > > >
> > > >
> > > > Jason Gustafson <ja...@confluent.io> wrote on 27/11/2018 00:09:56:
> > > >
> > > > > Another wrinkle to consider is KIP-320. If you are planning to
> > replicate
> > > > > __consumer_offsets directly, then you will have to account for
> > leader
> > > > epoch
> > > > > information which is stored with the committed offsets. But I
> cannot
> >
> > > > think
> > > > > how it would be possible to replicate the leader epoch information
>
> > in
> > > > > messages even if you can preserve offsets.
> > > > >
> > > > > -Jason
> > > > >
> > > > > On Mon, Nov 26, 2018 at 1:16 PM Mayuresh Gharat
> > > > <gh...@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Hi Edoardo,
> > > > > >
> > > > > > Thanks a lot for the KIP.
> > > > > >  I have a few questions/suggestions in addition to what Radai
> has
> > > > mentioned
> > > > > > above :
> > > > > >
> > > > > >    1. Is this meant only for 1:1 replication, for example one
> > Kafka
> > > > cluster
> > > > > >    replicating to other, instead of having multiple Kafka
> clusters
> > > > > > mirroring
> > > > > >    into one Kafka cluster?
> > > > > >    2. Are we relying on exactly once produce in the replicator?
> If
> >
> > > > not, how
> > > > > >    are retries handled in the replicator ?
> > > > > >    3. What is the recommended value for inflight requests, here.
>
> > Is it
> > > > > >    suppose to be strictly 1, if yes, it would be great to
> mention
> > that
> > > > in
> > > > > > the
> > > > > >    KIP.
> > > > > >    4. How is unclean Leader election between source cluster and
> > > > destination
> > > > > >    cluster handled?
> > > > > >    5. How are offsets resets in case of the replicator's
> consumer
> > > > handled?
> > > > > >    6. It would be good to explain the workflow in the KIP, with
> an
> > > > > >    example,  regarding how this KIP will change the replication
> > > > scenario
> > > > > > and
> > > > > >    how it will benefit the consumer apps.
> > > > > >
> > > > > > Thanks,
> > > > > >
> > > > > > Mayuresh
> > > > > >
> > > > > > On Mon, Nov 26, 2018 at 8:08 AM radai
> <ra...@gmail.com>
> >
> > > > wrote:
> > > > > >
> > > > > > > a few questions:
> > > > > > >
> > > > > > > 1. how do you handle possible duplications caused by the
> > "special"
> > > > > > > producer timing-out/retrying? are you explicitely relying on
> the
> > > > > > > "exactly once" sequencing?
> > > > > > > 2. what about the combination of log compacted topics +
> > replicator
> > > > > > > downtime? by the time the replicator comes back up there might
>
> > be
> > > > > > > "holes" in the source offsets (some msgs might have been
> > compacted
> > > > > > > out)? how is that recoverable?
> > > > > > > 3. similarly, what if you try and fire up replication on a
> > non-empty
> > > > > > > source topic? does the kip allow for offsets starting at some
> > > > > > > arbitrary X > 0 ? or would this have to be designed from the
> > start.
> > > > > > >
> > > > > > > and lastly, since this KIP seems to be designed fro
> > active-passive
> > > > > > > failover (there can be no produce traffic except the
> replicator)
> > > > > > > wouldnt a solution based on seeking to a time offset be more
> > > > generic?
> > > > > > > your producers could checkpoint the last (say log append)
> > timestamp
> > > > of
> > > > > > > records theyve seen, and when restoring in the remote site
> seek
> > to
> > > > > > > those timestamps (which will be metadata in their committed
> > offsets)
> > > > -
> > > > > > > assumming replication takes > 0 time you'd need to handle some
>
> > dups,
> > > > > > > but every kafka consumer setup needs to know how to handle
> those
> > > > > > > anyway.
> > > > > > > On Fri, Nov 23, 2018 at 2:27 AM Edoardo Comar
> > <EC...@uk.ibm.com>
> > > > wrote:
> > > > > > > >
> > > > > > > > Hi Stanislav
> > > > > > > >
> > > > > > > > > > The flag is needed to distinguish a batch with a desired
>
> > base
> > > > > > offset
> > > > > > > > of
> > > > > > > > > 0,
> > > > > > > > > from a regular batch for which offsets need to be
> generated.
> > > > > > > > > If the producer can provide offsets, why not provide a
> base
> > > > offset of
> > > > > > > 0?
> > > > > > > >
> > > > > > > > a regular batch (for which offsets are generated by the
> broker
> > on
> > > > > > write)
> > > > > > > > is sent with a base offset of 0.
> > > > > > > > How could you distinguish it from a batch where you *want*
> the
> >
> > > > first
> > > > > > > > record to be written at offset 0 (i.e. be the first in the
> > > > partition
> > > > > > and
> > > > > > > > be rejected if there are records on the log already) ?
> > > > > > > > We wanted to avoid a "deep" inspection (and potentially
> > > > decompression)
> > > > > > of
> > > > > > > > the records.
> > > > > > > >
> > > > > > > > For the replicator use case, a single produce request where
> > all
> > > > the
> > > > > > data
> > > > > > > > is to be assumed with offset,
> > > > > > > > or all without offsets, seems to suffice,
> > > > > > > > So we added only a toplevel flag, not a per-topic-partition
> > one.
> > > > > > > >
> > > > > > > > Thanks for your interest !
> > > > > > > > cheers
> > > > > > > > Edo
> > > > > > > > --------------------------------------------------
> > > > > > > >
> > > > > > > > Edoardo Comar
> > > > > > > >
> > > > > > > > IBM Event Streams
> > > > > > > > IBM UK Ltd, Hursley Park, SO21 2JN
> > > > > > > >
> > > > > > > >
> > > > > > > > Stanislav Kozlovski <st...@confluent.io> wrote on
> > 22/11/2018
> > > > > > > 22:32:42:
> > > > > > > >
> > > > > > > > > From: Stanislav Kozlovski <st...@confluent.io>
> > > > > > > > > To: dev@kafka.apache.org
> > > > > > > > > Date: 22/11/2018 22:33
> > > > > > > > > Subject: Re: [DISCUSS] KIP-391: Allow Producing with
> Offsets
> > for
> > > > > > > > > Cluster Replication
> > > > > > > > >
> > > > > > > > > Hey Edo & Mickael,
> > > > > > > > >
> > > > > > > > > > The flag is needed to distinguish a batch with a desired
>
> > base
> > > > > > offset
> > > > > > > > of
> > > > > > > > > 0,
> > > > > > > > > from a regular batch for which offsets need to be
> generated.
> > > > > > > > > If the producer can provide offsets, why not provide a
> base
> > > > offset of
> > > > > > > 0?
> > > > > > > > >
> > > > > > > > > > (I am reading your post thinking about
> > > > > > > > > partitions rather than topics).
> > > > > > > > > Yes, I meant partitions. Sorry about that.
> > > > > > > > >
> > > > > > > > > Thanks for answering my questions :)
> > > > > > > > >
> > > > > > > > > Best,
> > > > > > > > > Stanislav
> > > > > > > > >
> > > > > > > > > On Thu, Nov 22, 2018 at 5:28 PM Edoardo Comar
> > > > <EC...@uk.ibm.com>
> > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Hi Stanislav,
> > > > > > > > > >
> > > > > > > > > > you're right we envision the replicator use case to have
> a
> >
> > > > single
> > > > > > > > producer
> > > > > > > > > > with offsets per partition (I am reading your post
> > thinking
> > > > about
> > > > > > > > > > partitions rather than topics).
> > > > > > > > > >
> > > > > > > > > > If a regular producer was to send its own records at the
>
> > same
> > > > time,
> > > > > > > > it's
> > > > > > > > > > very likely that the one sending with an offset will
> fail
> > > > because
> > > > > > of
> > > > > > > > > > invalid offsets.
> > > > > > > > > > Same if two producers were sending with offsets, likely
> > both
> > > > would
> > > > > > > > then
> > > > > > > > > > fail.
> > > > > > > > > >
> > > > > > > > > > > Does it make sense to *lock* the topic from other
> > producers
> > > > while
> > > > > > > > there
> > > > > > > > > > is
> > > > > > > > > > > one that uses offsets?
> > > > > > > > > >
> > > > > > > > > > You could do that with ACL permissions if you wanted, I
> > don't
> > > > think
> > > > > > > it
> > > > > > > > > > needs to be mandated by changing the broker logic.
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > > Since we are tying the produce-with-offset request to
> > the
> > > > ACL, do
> > > > > > > we
> > > > > > > > > > need
> > > > > > > > > > > the `use_offset` field in the produce request? Maybe
> we
> > make
> > > > it
> > > > > > > > > > mandatory
> > > > > > > > > > > for produce requests with that ACL to have offsets.
> > > > > > > > > >
> > > > > > > > > > The flag is needed to distinguish a batch with a desired
>
> > base
> > > > > > offset
> > > > > > > > of 0,
> > > > > > > > > > from a regular batch for which offsets need to be
> > generated.
> > > > > > > > > > I would not restrict a principal to only
> send-with-offsets
> > (by
> > > > > > making
> > > > > > > > that
> > > > > > > > > > mandatory via the ACL).
> > > > > > > > > >
> > > > > > > > > > Thanks
> > > > > > > > > > Edo & Mickael
> > > > > > > > > >
> > > > > > > > > > --------------------------------------------------
> > > > > > > > > >
> > > > > > > > > > Edoardo Comar
> > > > > > > > > >
> > > > > > > > > > IBM Event Streams
> > > > > > > > > > IBM UK Ltd, Hursley Park, SO21 2JN
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Stanislav Kozlovski <st...@confluent.io> wrote on
> > > > 22/11/2018
> > > > > > > > 16:17:11:
> > > > > > > > > >
> > > > > > > > > > > From: Stanislav Kozlovski <st...@confluent.io>
> > > > > > > > > > > To: dev@kafka.apache.org
> > > > > > > > > > > Date: 22/11/2018 16:17
> > > > > > > > > > > Subject: Re: [DISCUSS] KIP-391: Allow Producing with
> > Offsets
> > > > for
> > > > > > > > > > > Cluster Replication
> > > > > > > > > > >
> > > > > > > > > > > Hey Edurdo, thanks for the KIP!
> > > > > > > > > > >
> > > > > > > > > > > I have some questions, apologies if they are naive:
> > > > > > > > > > > Is this intended to work for a single producer use
> case
> > > > only?
> > > > > > > > > > > How would it work if two producers were producing to
> the
> >
> > > > same
> > > > > > topic
> > > > > > > > with
> > > > > > > > > > > offsets?
> > > > > > > > > > > How would it work if two producers, one with offsets
> and
> > one
> > > > > > > without
> > > > > > > > > > were
> > > > > > > > > > > producing to a topic?
> > > > > > > > > > > Does it make sense to *lock* the topic from other
> > producers
> > > > while
> > > > > > > > there
> > > > > > > > > > is
> > > > > > > > > > > one that uses offsets?
> > > > > > > > > > >
> > > > > > > > > > > Since we are tying the produce-with-offset request to
> > the
> > > > ACL, do
> > > > > > > we
> > > > > > > > > > need
> > > > > > > > > > > the `use_offset` field in the produce request? Maybe
> we
> > make
> > > > it
> > > > > > > > > > mandatory
> > > > > > > > > > > for produce requests with that ACL to have offsets.
> > > > > > > > > > >
> > > > > > > > > > > Best,
> > > > > > > > > > > Stanislav
> > > > > > > > > > >
> > > > > > > > > > > On Wed, Nov 21, 2018 at 5:14 PM Edoardo Comar
> > > > <ECOMAR@uk.ibm.com
> > > > > > >
> > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > Hi,
> > > > > > > > > > > > we've opened a KIP to improve data replication
> between
> >
> > > > Kafka
> > > > > > > > clusters
> > > > > > > > > > :
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > INVALID URI REMOVED
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> u=https-3A__cwiki.apache.org_confluence_display_KAFKA_KIP-2D391-253A-2BAllow-2BProducing-2Bwith-2BOffsets-2Bfor-2BCluster-2BReplication&d=DwIBaQ&c=jf_iaSHvJObTbx-
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> >
> siA1ZOg&r=EzRhmSah4IHsUZVekRUIINhltZK7U0OaeRo7hgW4_tQ&m=uUj9C3BdbYz0dDNA-
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > >
> >
> E6iXreg1M5hWiWgG6ClS86VIPI&s=Vav8_-N7_OpfYEW33yGOf_or8ESMUJ4S45t2g-EUWKg&e=
> > > > > > > > > > > >
> > > > > > > > > > > > We'd like to start a discussion, please post your
> > feedback
> > > > in
> > > > > > > this
> > > > > > > > > > thread.
> > > > > > > > > > > >
> > > > > > > > > > > > Thank you
> > > > > > > > > > > > Edo and Mickael
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > --------------------------------------------------
> > > > > > > > > > > >
> > > > > > > > > > > > Edoardo Comar
> > > > > > > > > > > >
> > > > > > > > > > > > IBM Event Streams
> > > > > > > > > > > > IBM UK Ltd, Hursley Park, SO21 2JN
> > > > > > > > > > > >
> > > > > > > > > > > > Unless stated otherwise above:
> > > > > > > > > > > > IBM United Kingdom Limited - Registered in England
> and
> >
> > > > Wales
> > > > > > with
> > > > > > > > > > number
> > > > > > > > > > > > 741598.
> > > > > > > > > > > > Registered office: PO Box 41, North Harbour,
> > Portsmouth,
> > > > > > > Hampshire
> > > > > > > > PO6
> > > > > > > > > > 3AU
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > --
> > > > > > > > > > > Best,
> > > > > > > > > > > Stanislav
> > > > > > > > > >
> > > > > > > > > > Unless stated otherwise above:
> > > > > > > > > > IBM United Kingdom Limited - Registered in England and
> > Wales
> > > > with
> > > > > > > > number
> > > > > > > > > > 741598.
> > > > > > > > > > Registered office: PO Box 41, North Harbour, Portsmouth,
>
> > > > Hampshire
> > > > > > > PO6
> > > > > > > > 3AU
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > --
> > > > > > > > > Best,
> > > > > > > > > Stanislav
> > > > > > > >
> > > > > > > > Unless stated otherwise above:
> > > > > > > > IBM United Kingdom Limited - Registered in England and Wales
>
> > with
> > > > > > number
> > > > > > > > 741598.
> > > > > > > > Registered office: PO Box 41, North Harbour, Portsmouth,
> > Hampshire
> > > > PO6
> > > > > > > 3AU
> > > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > -Regards,
> > > > > > Mayuresh R. Gharat
> > > > > > (862) 250-7125
> > > > > >
> > > >
> > > > Unless stated otherwise above:
> > > > IBM United Kingdom Limited - Registered in England and Wales with
> > number
> > > > 741598.
> > > > Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire
> PO6
> > 3AU
> > >
> > > Unless stated otherwise above:
> > > IBM United Kingdom Limited - Registered in England and Wales with
> > > number 741598.
> > > Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
>
> > 3AU
> >
> > Unless stated otherwise above:
> > IBM United Kingdom Limited - Registered in England and Wales with number
>
> > 741598.
> > Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
> 3AU
>
> Unless stated otherwise above:
> IBM United Kingdom Limited - Registered in England and Wales with number
> 741598.
> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU

Re: [DISCUSS] KIP-391: Allow Producing with Offsets for Cluster Replication

Posted by Edoardo Comar <EC...@uk.ibm.com>.
Hi,
I delayed starting the voting thread due to the festive period. I would 
like to start it this week.
Has anyone any more feedback ?

--------------------------------------------------

Edoardo Comar

IBM Event Streams


Edoardo Comar <EC...@uk.ibm.com> wrote on 13/12/2018 17:50:30:

> From: Edoardo Comar <EC...@uk.ibm.com>
> To: dev@kafka.apache.org
> Date: 13/12/2018 17:50
> Subject: Re: [DISCUSS] KIP-391: Allow Producing with Offsets for 
> Cluster Replication
> 
> Hi,
> as we haven't got any more feedback, we'd like to start a vote on 
KIP-391 
> on Monday
> 
> INVALID URI REMOVED
> 
u=https-3A__cwiki.apache.org_confluence_display_KAFKA_KIP-2D391-253A-2BAllow-2BProducing-2Bwith-2BOffsets-2Bfor-2BCluster-2BReplication&d=DwIFAg&c=jf_iaSHvJObTbx-
> 
siA1ZOg&r=EzRhmSah4IHsUZVekRUIINhltZK7U0OaeRo7hgW4_tQ&m=hxekG7cvm8Peoyd4oPqvSwRFRuGIyi9Pc_h2GhHbgtw&s=4SGyJsJAuYWZWADpzAaSEPqzYnde0WRW6XgZ3L4haB4&e=
> 
> --------------------------------------------------
> 
> Edoardo Comar
> 
> IBM Event Streams
> IBM UK Ltd, Hursley Park, SO21 2JN
> 
> 
> Edoardo Comar/UK/IBM wrote on 10/12/2018 10:20:06:
> 
> > From: Edoardo Comar/UK/IBM
> > To: dev@kafka.apache.org
> > Date: 10/12/2018 10:20
> > Subject: Re: [DISCUSS] KIP-391: Allow Producing with Offsets for 
> > Cluster Replication
> > 
> > (shameless bump) any additional feedback is welcome ... thanks!
> > 
> > Edoardo Comar <EC...@uk.ibm.com> wrote on 27/11/2018 15:35:09:
> > 
> > > From: Edoardo Comar <EC...@uk.ibm.com>
> > > To: dev@kafka.apache.org
> > > Date: 27/11/2018 15:35
> > > Subject: Re: [DISCUSS] KIP-391: Allow Producing with Offsets for 
> > > Cluster Replication
> > > 
> > > Hi Jason
> > > 
> > > we envisioned the replicator to replicate the __consumer_offsets 
topic 
> too 
> > > (although without producing-with-offsets to it!).
> > > 
> > > As there is no client-side implementation yet using the leader 
epoch, 
> > > we could not yet see the impact of writing to the destination 
cluster 
> > > __consumer_offsets records with an invalid leader epoch.
> > > 
> > > Also, applications might still use external storage mechanism for 
> consumer 
> > > offsets where the leader_epoch is missing.
> > > 
> > > Perhaps the replicator could - for the __consumer_offsets topic - 
just 
> 
> > > omit the leader_epoch field in the data sent to destination.
> > > 
> > > What do you think ?
> > > 
> > > 
> > > Jason Gustafson <ja...@confluent.io> wrote on 27/11/2018 00:09:56:
> > > 
> > > > Another wrinkle to consider is KIP-320. If you are planning to 
> replicate
> > > > __consumer_offsets directly, then you will have to account for 
> leader 
> > > epoch
> > > > information which is stored with the committed offsets. But I 
cannot 
> 
> > > think
> > > > how it would be possible to replicate the leader epoch information 

> in
> > > > messages even if you can preserve offsets.
> > > > 
> > > > -Jason
> > > > 
> > > > On Mon, Nov 26, 2018 at 1:16 PM Mayuresh Gharat 
> > > <gh...@gmail.com>
> > > > wrote:
> > > > 
> > > > > Hi Edoardo,
> > > > >
> > > > > Thanks a lot for the KIP.
> > > > >  I have a few questions/suggestions in addition to what Radai 
has 
> > > mentioned
> > > > > above :
> > > > >
> > > > >    1. Is this meant only for 1:1 replication, for example one 
> Kafka 
> > > cluster
> > > > >    replicating to other, instead of having multiple Kafka 
clusters
> > > > > mirroring
> > > > >    into one Kafka cluster?
> > > > >    2. Are we relying on exactly once produce in the replicator? 
If 
> 
> > > not, how
> > > > >    are retries handled in the replicator ?
> > > > >    3. What is the recommended value for inflight requests, here. 

> Is it
> > > > >    suppose to be strictly 1, if yes, it would be great to 
mention 
> that 
> > > in
> > > > > the
> > > > >    KIP.
> > > > >    4. How is unclean Leader election between source cluster and 
> > > destination
> > > > >    cluster handled?
> > > > >    5. How are offsets resets in case of the replicator's 
consumer 
> > > handled?
> > > > >    6. It would be good to explain the workflow in the KIP, with 
an
> > > > >    example,  regarding how this KIP will change the replication 
> > > scenario
> > > > > and
> > > > >    how it will benefit the consumer apps.
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Mayuresh
> > > > >
> > > > > On Mon, Nov 26, 2018 at 8:08 AM radai 
<ra...@gmail.com> 
> 
> > > wrote:
> > > > >
> > > > > > a few questions:
> > > > > >
> > > > > > 1. how do you handle possible duplications caused by the 
> "special"
> > > > > > producer timing-out/retrying? are you explicitely relying on 
the
> > > > > > "exactly once" sequencing?
> > > > > > 2. what about the combination of log compacted topics + 
> replicator
> > > > > > downtime? by the time the replicator comes back up there might 

> be
> > > > > > "holes" in the source offsets (some msgs might have been 
> compacted
> > > > > > out)? how is that recoverable?
> > > > > > 3. similarly, what if you try and fire up replication on a 
> non-empty
> > > > > > source topic? does the kip allow for offsets starting at some
> > > > > > arbitrary X > 0 ? or would this have to be designed from the 
> start.
> > > > > >
> > > > > > and lastly, since this KIP seems to be designed fro 
> active-passive
> > > > > > failover (there can be no produce traffic except the 
replicator)
> > > > > > wouldnt a solution based on seeking to a time offset be more 
> > > generic?
> > > > > > your producers could checkpoint the last (say log append) 
> timestamp 
> > > of
> > > > > > records theyve seen, and when restoring in the remote site 
seek 
> to
> > > > > > those timestamps (which will be metadata in their committed 
> offsets) 
> > > -
> > > > > > assumming replication takes > 0 time you'd need to handle some 

> dups,
> > > > > > but every kafka consumer setup needs to know how to handle 
those
> > > > > > anyway.
> > > > > > On Fri, Nov 23, 2018 at 2:27 AM Edoardo Comar 
> <EC...@uk.ibm.com> 
> > > wrote:
> > > > > > >
> > > > > > > Hi Stanislav
> > > > > > >
> > > > > > > > > The flag is needed to distinguish a batch with a desired 

> base
> > > > > offset
> > > > > > > of
> > > > > > > > 0,
> > > > > > > > from a regular batch for which offsets need to be 
generated.
> > > > > > > > If the producer can provide offsets, why not provide a 
base 
> > > offset of
> > > > > > 0?
> > > > > > >
> > > > > > > a regular batch (for which offsets are generated by the 
broker 
> on
> > > > > write)
> > > > > > > is sent with a base offset of 0.
> > > > > > > How could you distinguish it from a batch where you *want* 
the 
> 
> > > first
> > > > > > > record to be written at offset 0 (i.e. be the first in the 
> > > partition
> > > > > and
> > > > > > > be rejected if there are records on the log already) ?
> > > > > > > We wanted to avoid a "deep" inspection (and potentially 
> > > decompression)
> > > > > of
> > > > > > > the records.
> > > > > > >
> > > > > > > For the replicator use case, a single produce request where 
> all 
> > > the
> > > > > data
> > > > > > > is to be assumed with offset,
> > > > > > > or all without offsets, seems to suffice,
> > > > > > > So we added only a toplevel flag, not a per-topic-partition 
> one.
> > > > > > >
> > > > > > > Thanks for your interest !
> > > > > > > cheers
> > > > > > > Edo
> > > > > > > --------------------------------------------------
> > > > > > >
> > > > > > > Edoardo Comar
> > > > > > >
> > > > > > > IBM Event Streams
> > > > > > > IBM UK Ltd, Hursley Park, SO21 2JN
> > > > > > >
> > > > > > >
> > > > > > > Stanislav Kozlovski <st...@confluent.io> wrote on 
> 22/11/2018
> > > > > > 22:32:42:
> > > > > > >
> > > > > > > > From: Stanislav Kozlovski <st...@confluent.io>
> > > > > > > > To: dev@kafka.apache.org
> > > > > > > > Date: 22/11/2018 22:33
> > > > > > > > Subject: Re: [DISCUSS] KIP-391: Allow Producing with 
Offsets 
> for
> > > > > > > > Cluster Replication
> > > > > > > >
> > > > > > > > Hey Edo & Mickael,
> > > > > > > >
> > > > > > > > > The flag is needed to distinguish a batch with a desired 

> base
> > > > > offset
> > > > > > > of
> > > > > > > > 0,
> > > > > > > > from a regular batch for which offsets need to be 
generated.
> > > > > > > > If the producer can provide offsets, why not provide a 
base 
> > > offset of
> > > > > > 0?
> > > > > > > >
> > > > > > > > > (I am reading your post thinking about
> > > > > > > > partitions rather than topics).
> > > > > > > > Yes, I meant partitions. Sorry about that.
> > > > > > > >
> > > > > > > > Thanks for answering my questions :)
> > > > > > > >
> > > > > > > > Best,
> > > > > > > > Stanislav
> > > > > > > >
> > > > > > > > On Thu, Nov 22, 2018 at 5:28 PM Edoardo Comar 
> > > <EC...@uk.ibm.com>
> > > > > > wrote:
> > > > > > > >
> > > > > > > > > Hi Stanislav,
> > > > > > > > >
> > > > > > > > > you're right we envision the replicator use case to have 
a 
> 
> > > single
> > > > > > > producer
> > > > > > > > > with offsets per partition (I am reading your post 
> thinking 
> > > about
> > > > > > > > > partitions rather than topics).
> > > > > > > > >
> > > > > > > > > If a regular producer was to send its own records at the 

> same 
> > > time,
> > > > > > > it's
> > > > > > > > > very likely that the one sending with an offset will 
fail 
> > > because
> > > > > of
> > > > > > > > > invalid offsets.
> > > > > > > > > Same if two producers were sending with offsets, likely 
> both 
> > > would
> > > > > > > then
> > > > > > > > > fail.
> > > > > > > > >
> > > > > > > > > > Does it make sense to *lock* the topic from other 
> producers 
> > > while
> > > > > > > there
> > > > > > > > > is
> > > > > > > > > > one that uses offsets?
> > > > > > > > >
> > > > > > > > > You could do that with ACL permissions if you wanted, I 
> don't 
> > > think
> > > > > > it
> > > > > > > > > needs to be mandated by changing the broker logic.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > > Since we are tying the produce-with-offset request to 
> the 
> > > ACL, do
> > > > > > we
> > > > > > > > > need
> > > > > > > > > > the `use_offset` field in the produce request? Maybe 
we 
> make 
> > > it
> > > > > > > > > mandatory
> > > > > > > > > > for produce requests with that ACL to have offsets.
> > > > > > > > >
> > > > > > > > > The flag is needed to distinguish a batch with a desired 

> base
> > > > > offset
> > > > > > > of 0,
> > > > > > > > > from a regular batch for which offsets need to be 
> generated.
> > > > > > > > > I would not restrict a principal to only 
send-with-offsets 
> (by
> > > > > making
> > > > > > > that
> > > > > > > > > mandatory via the ACL).
> > > > > > > > >
> > > > > > > > > Thanks
> > > > > > > > > Edo & Mickael
> > > > > > > > >
> > > > > > > > > --------------------------------------------------
> > > > > > > > >
> > > > > > > > > Edoardo Comar
> > > > > > > > >
> > > > > > > > > IBM Event Streams
> > > > > > > > > IBM UK Ltd, Hursley Park, SO21 2JN
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Stanislav Kozlovski <st...@confluent.io> wrote on 
> > > 22/11/2018
> > > > > > > 16:17:11:
> > > > > > > > >
> > > > > > > > > > From: Stanislav Kozlovski <st...@confluent.io>
> > > > > > > > > > To: dev@kafka.apache.org
> > > > > > > > > > Date: 22/11/2018 16:17
> > > > > > > > > > Subject: Re: [DISCUSS] KIP-391: Allow Producing with 
> Offsets 
> > > for
> > > > > > > > > > Cluster Replication
> > > > > > > > > >
> > > > > > > > > > Hey Edurdo, thanks for the KIP!
> > > > > > > > > >
> > > > > > > > > > I have some questions, apologies if they are naive:
> > > > > > > > > > Is this intended to work for a single producer use 
case 
> > > only?
> > > > > > > > > > How would it work if two producers were producing to 
the 
> 
> > > same
> > > > > topic
> > > > > > > with
> > > > > > > > > > offsets?
> > > > > > > > > > How would it work if two producers, one with offsets 
and 
> one
> > > > > > without
> > > > > > > > > were
> > > > > > > > > > producing to a topic?
> > > > > > > > > > Does it make sense to *lock* the topic from other 
> producers 
> > > while
> > > > > > > there
> > > > > > > > > is
> > > > > > > > > > one that uses offsets?
> > > > > > > > > >
> > > > > > > > > > Since we are tying the produce-with-offset request to 
> the 
> > > ACL, do
> > > > > > we
> > > > > > > > > need
> > > > > > > > > > the `use_offset` field in the produce request? Maybe 
we 
> make 
> > > it
> > > > > > > > > mandatory
> > > > > > > > > > for produce requests with that ACL to have offsets.
> > > > > > > > > >
> > > > > > > > > > Best,
> > > > > > > > > > Stanislav
> > > > > > > > > >
> > > > > > > > > > On Wed, Nov 21, 2018 at 5:14 PM Edoardo Comar 
> > > <ECOMAR@uk.ibm.com
> > > > > >
> > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Hi,
> > > > > > > > > > > we've opened a KIP to improve data replication 
between 
> 
> > > Kafka
> > > > > > > clusters
> > > > > > > > > :
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > INVALID URI REMOVED
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > > 
> > > > 
> > > 
> > 
> 
u=https-3A__cwiki.apache.org_confluence_display_KAFKA_KIP-2D391-253A-2BAllow-2BProducing-2Bwith-2BOffsets-2Bfor-2BCluster-2BReplication&d=DwIBaQ&c=jf_iaSHvJObTbx-
> > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > 
> > > 
> 
siA1ZOg&r=EzRhmSah4IHsUZVekRUIINhltZK7U0OaeRo7hgW4_tQ&m=uUj9C3BdbYz0dDNA-
> > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > >
> > > > > 
> > > 
> 
E6iXreg1M5hWiWgG6ClS86VIPI&s=Vav8_-N7_OpfYEW33yGOf_or8ESMUJ4S45t2g-EUWKg&e=
> > > > > > > > > > >
> > > > > > > > > > > We'd like to start a discussion, please post your 
> feedback 
> > > in
> > > > > > this
> > > > > > > > > thread.
> > > > > > > > > > >
> > > > > > > > > > > Thank you
> > > > > > > > > > > Edo and Mickael
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > --------------------------------------------------
> > > > > > > > > > >
> > > > > > > > > > > Edoardo Comar
> > > > > > > > > > >
> > > > > > > > > > > IBM Event Streams
> > > > > > > > > > > IBM UK Ltd, Hursley Park, SO21 2JN
> > > > > > > > > > >
> > > > > > > > > > > Unless stated otherwise above:
> > > > > > > > > > > IBM United Kingdom Limited - Registered in England 
and 
> 
> > > Wales
> > > > > with
> > > > > > > > > number
> > > > > > > > > > > 741598.
> > > > > > > > > > > Registered office: PO Box 41, North Harbour, 
> Portsmouth,
> > > > > > Hampshire
> > > > > > > PO6
> > > > > > > > > 3AU
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > --
> > > > > > > > > > Best,
> > > > > > > > > > Stanislav
> > > > > > > > >
> > > > > > > > > Unless stated otherwise above:
> > > > > > > > > IBM United Kingdom Limited - Registered in England and 
> Wales 
> > > with
> > > > > > > number
> > > > > > > > > 741598.
> > > > > > > > > Registered office: PO Box 41, North Harbour, Portsmouth, 

> > > Hampshire
> > > > > > PO6
> > > > > > > 3AU
> > > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > --
> > > > > > > > Best,
> > > > > > > > Stanislav
> > > > > > >
> > > > > > > Unless stated otherwise above:
> > > > > > > IBM United Kingdom Limited - Registered in England and Wales 

> with
> > > > > number
> > > > > > > 741598.
> > > > > > > Registered office: PO Box 41, North Harbour, Portsmouth, 
> Hampshire 
> > > PO6
> > > > > > 3AU
> > > > > >
> > > > >
> > > > >
> > > > > --
> > > > > -Regards,
> > > > > Mayuresh R. Gharat
> > > > > (862) 250-7125
> > > > >
> > > 
> > > Unless stated otherwise above:
> > > IBM United Kingdom Limited - Registered in England and Wales with 
> number 
> > > 741598. 
> > > Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire 
PO6 
> 3AU
> > 
> > Unless stated otherwise above:
> > IBM United Kingdom Limited - Registered in England and Wales with 
> > number 741598. 
> > Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 

> 3AU
> 
> Unless stated otherwise above:
> IBM United Kingdom Limited - Registered in England and Wales with number 

> 741598. 
> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 
3AU

Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU

Re: [DISCUSS] KIP-391: Allow Producing with Offsets for Cluster Replication

Posted by Edoardo Comar <EC...@uk.ibm.com>.
Hi,
as we haven't got any more feedback, we'd like to start a vote on KIP-391 
on Monday

https://cwiki.apache.org/confluence/display/KAFKA/KIP-391%3A+Allow+Producing+with+Offsets+for+Cluster+Replication

--------------------------------------------------

Edoardo Comar

IBM Event Streams
IBM UK Ltd, Hursley Park, SO21 2JN


Edoardo Comar/UK/IBM wrote on 10/12/2018 10:20:06:

> From: Edoardo Comar/UK/IBM
> To: dev@kafka.apache.org
> Date: 10/12/2018 10:20
> Subject: Re: [DISCUSS] KIP-391: Allow Producing with Offsets for 
> Cluster Replication
> 
> (shameless bump) any additional feedback is welcome ... thanks!
> 
> Edoardo Comar <EC...@uk.ibm.com> wrote on 27/11/2018 15:35:09:
> 
> > From: Edoardo Comar <EC...@uk.ibm.com>
> > To: dev@kafka.apache.org
> > Date: 27/11/2018 15:35
> > Subject: Re: [DISCUSS] KIP-391: Allow Producing with Offsets for 
> > Cluster Replication
> > 
> > Hi Jason
> > 
> > we envisioned the replicator to replicate the __consumer_offsets topic 
too 
> > (although without producing-with-offsets to it!).
> > 
> > As there is no client-side implementation yet using the leader epoch, 
> > we could not yet see the impact of writing to the destination cluster 
> > __consumer_offsets records with an invalid leader epoch.
> > 
> > Also, applications might still use external storage mechanism for 
consumer 
> > offsets where the leader_epoch is missing.
> > 
> > Perhaps the replicator could - for the __consumer_offsets topic - just 

> > omit the leader_epoch field in the data sent to destination.
> > 
> > What do you think ?
> > 
> > 
> > Jason Gustafson <ja...@confluent.io> wrote on 27/11/2018 00:09:56:
> > 
> > > Another wrinkle to consider is KIP-320. If you are planning to 
replicate
> > > __consumer_offsets directly, then you will have to account for 
leader 
> > epoch
> > > information which is stored with the committed offsets. But I cannot 

> > think
> > > how it would be possible to replicate the leader epoch information 
in
> > > messages even if you can preserve offsets.
> > > 
> > > -Jason
> > > 
> > > On Mon, Nov 26, 2018 at 1:16 PM Mayuresh Gharat 
> > <gh...@gmail.com>
> > > wrote:
> > > 
> > > > Hi Edoardo,
> > > >
> > > > Thanks a lot for the KIP.
> > > >  I have a few questions/suggestions in addition to what Radai has 
> > mentioned
> > > > above :
> > > >
> > > >    1. Is this meant only for 1:1 replication, for example one 
Kafka 
> > cluster
> > > >    replicating to other, instead of having multiple Kafka clusters
> > > > mirroring
> > > >    into one Kafka cluster?
> > > >    2. Are we relying on exactly once produce in the replicator? If 

> > not, how
> > > >    are retries handled in the replicator ?
> > > >    3. What is the recommended value for inflight requests, here. 
Is it
> > > >    suppose to be strictly 1, if yes, it would be great to mention 
that 
> > in
> > > > the
> > > >    KIP.
> > > >    4. How is unclean Leader election between source cluster and 
> > destination
> > > >    cluster handled?
> > > >    5. How are offsets resets in case of the replicator's consumer 
> > handled?
> > > >    6. It would be good to explain the workflow in the KIP, with an
> > > >    example,  regarding how this KIP will change the replication 
> > scenario
> > > > and
> > > >    how it will benefit the consumer apps.
> > > >
> > > > Thanks,
> > > >
> > > > Mayuresh
> > > >
> > > > On Mon, Nov 26, 2018 at 8:08 AM radai <ra...@gmail.com> 

> > wrote:
> > > >
> > > > > a few questions:
> > > > >
> > > > > 1. how do you handle possible duplications caused by the 
"special"
> > > > > producer timing-out/retrying? are you explicitely relying on the
> > > > > "exactly once" sequencing?
> > > > > 2. what about the combination of log compacted topics + 
replicator
> > > > > downtime? by the time the replicator comes back up there might 
be
> > > > > "holes" in the source offsets (some msgs might have been 
compacted
> > > > > out)? how is that recoverable?
> > > > > 3. similarly, what if you try and fire up replication on a 
non-empty
> > > > > source topic? does the kip allow for offsets starting at some
> > > > > arbitrary X > 0 ? or would this have to be designed from the 
start.
> > > > >
> > > > > and lastly, since this KIP seems to be designed fro 
active-passive
> > > > > failover (there can be no produce traffic except the replicator)
> > > > > wouldnt a solution based on seeking to a time offset be more 
> > generic?
> > > > > your producers could checkpoint the last (say log append) 
timestamp 
> > of
> > > > > records theyve seen, and when restoring in the remote site seek 
to
> > > > > those timestamps (which will be metadata in their committed 
offsets) 
> > -
> > > > > assumming replication takes > 0 time you'd need to handle some 
dups,
> > > > > but every kafka consumer setup needs to know how to handle those
> > > > > anyway.
> > > > > On Fri, Nov 23, 2018 at 2:27 AM Edoardo Comar 
<EC...@uk.ibm.com> 
> > wrote:
> > > > > >
> > > > > > Hi Stanislav
> > > > > >
> > > > > > > > The flag is needed to distinguish a batch with a desired 
base
> > > > offset
> > > > > > of
> > > > > > > 0,
> > > > > > > from a regular batch for which offsets need to be generated.
> > > > > > > If the producer can provide offsets, why not provide a base 
> > offset of
> > > > > 0?
> > > > > >
> > > > > > a regular batch (for which offsets are generated by the broker 
on
> > > > write)
> > > > > > is sent with a base offset of 0.
> > > > > > How could you distinguish it from a batch where you *want* the 

> > first
> > > > > > record to be written at offset 0 (i.e. be the first in the 
> > partition
> > > > and
> > > > > > be rejected if there are records on the log already) ?
> > > > > > We wanted to avoid a "deep" inspection (and potentially 
> > decompression)
> > > > of
> > > > > > the records.
> > > > > >
> > > > > > For the replicator use case, a single produce request where 
all 
> > the
> > > > data
> > > > > > is to be assumed with offset,
> > > > > > or all without offsets, seems to suffice,
> > > > > > So we added only a toplevel flag, not a per-topic-partition 
one.
> > > > > >
> > > > > > Thanks for your interest !
> > > > > > cheers
> > > > > > Edo
> > > > > > --------------------------------------------------
> > > > > >
> > > > > > Edoardo Comar
> > > > > >
> > > > > > IBM Event Streams
> > > > > > IBM UK Ltd, Hursley Park, SO21 2JN
> > > > > >
> > > > > >
> > > > > > Stanislav Kozlovski <st...@confluent.io> wrote on 
22/11/2018
> > > > > 22:32:42:
> > > > > >
> > > > > > > From: Stanislav Kozlovski <st...@confluent.io>
> > > > > > > To: dev@kafka.apache.org
> > > > > > > Date: 22/11/2018 22:33
> > > > > > > Subject: Re: [DISCUSS] KIP-391: Allow Producing with Offsets 
for
> > > > > > > Cluster Replication
> > > > > > >
> > > > > > > Hey Edo & Mickael,
> > > > > > >
> > > > > > > > The flag is needed to distinguish a batch with a desired 
base
> > > > offset
> > > > > > of
> > > > > > > 0,
> > > > > > > from a regular batch for which offsets need to be generated.
> > > > > > > If the producer can provide offsets, why not provide a base 
> > offset of
> > > > > 0?
> > > > > > >
> > > > > > > > (I am reading your post thinking about
> > > > > > > partitions rather than topics).
> > > > > > > Yes, I meant partitions. Sorry about that.
> > > > > > >
> > > > > > > Thanks for answering my questions :)
> > > > > > >
> > > > > > > Best,
> > > > > > > Stanislav
> > > > > > >
> > > > > > > On Thu, Nov 22, 2018 at 5:28 PM Edoardo Comar 
> > <EC...@uk.ibm.com>
> > > > > wrote:
> > > > > > >
> > > > > > > > Hi Stanislav,
> > > > > > > >
> > > > > > > > you're right we envision the replicator use case to have a 

> > single
> > > > > > producer
> > > > > > > > with offsets per partition (I am reading your post 
thinking 
> > about
> > > > > > > > partitions rather than topics).
> > > > > > > >
> > > > > > > > If a regular producer was to send its own records at the 
same 
> > time,
> > > > > > it's
> > > > > > > > very likely that the one sending with an offset will fail 
> > because
> > > > of
> > > > > > > > invalid offsets.
> > > > > > > > Same if two producers were sending with offsets, likely 
both 
> > would
> > > > > > then
> > > > > > > > fail.
> > > > > > > >
> > > > > > > > > Does it make sense to *lock* the topic from other 
producers 
> > while
> > > > > > there
> > > > > > > > is
> > > > > > > > > one that uses offsets?
> > > > > > > >
> > > > > > > > You could do that with ACL permissions if you wanted, I 
don't 
> > think
> > > > > it
> > > > > > > > needs to be mandated by changing the broker logic.
> > > > > > > >
> > > > > > > >
> > > > > > > > > Since we are tying the produce-with-offset request to 
the 
> > ACL, do
> > > > > we
> > > > > > > > need
> > > > > > > > > the `use_offset` field in the produce request? Maybe we 
make 
> > it
> > > > > > > > mandatory
> > > > > > > > > for produce requests with that ACL to have offsets.
> > > > > > > >
> > > > > > > > The flag is needed to distinguish a batch with a desired 
base
> > > > offset
> > > > > > of 0,
> > > > > > > > from a regular batch for which offsets need to be 
generated.
> > > > > > > > I would not restrict a principal to only send-with-offsets 
(by
> > > > making
> > > > > > that
> > > > > > > > mandatory via the ACL).
> > > > > > > >
> > > > > > > > Thanks
> > > > > > > > Edo & Mickael
> > > > > > > >
> > > > > > > > --------------------------------------------------
> > > > > > > >
> > > > > > > > Edoardo Comar
> > > > > > > >
> > > > > > > > IBM Event Streams
> > > > > > > > IBM UK Ltd, Hursley Park, SO21 2JN
> > > > > > > >
> > > > > > > >
> > > > > > > > Stanislav Kozlovski <st...@confluent.io> wrote on 
> > 22/11/2018
> > > > > > 16:17:11:
> > > > > > > >
> > > > > > > > > From: Stanislav Kozlovski <st...@confluent.io>
> > > > > > > > > To: dev@kafka.apache.org
> > > > > > > > > Date: 22/11/2018 16:17
> > > > > > > > > Subject: Re: [DISCUSS] KIP-391: Allow Producing with 
Offsets 
> > for
> > > > > > > > > Cluster Replication
> > > > > > > > >
> > > > > > > > > Hey Edurdo, thanks for the KIP!
> > > > > > > > >
> > > > > > > > > I have some questions, apologies if they are naive:
> > > > > > > > > Is this intended to work for a single producer use case 
> > only?
> > > > > > > > > How would it work if two producers were producing to the 

> > same
> > > > topic
> > > > > > with
> > > > > > > > > offsets?
> > > > > > > > > How would it work if two producers, one with offsets and 
one
> > > > > without
> > > > > > > > were
> > > > > > > > > producing to a topic?
> > > > > > > > > Does it make sense to *lock* the topic from other 
producers 
> > while
> > > > > > there
> > > > > > > > is
> > > > > > > > > one that uses offsets?
> > > > > > > > >
> > > > > > > > > Since we are tying the produce-with-offset request to 
the 
> > ACL, do
> > > > > we
> > > > > > > > need
> > > > > > > > > the `use_offset` field in the produce request? Maybe we 
make 
> > it
> > > > > > > > mandatory
> > > > > > > > > for produce requests with that ACL to have offsets.
> > > > > > > > >
> > > > > > > > > Best,
> > > > > > > > > Stanislav
> > > > > > > > >
> > > > > > > > > On Wed, Nov 21, 2018 at 5:14 PM Edoardo Comar 
> > <ECOMAR@uk.ibm.com
> > > > >
> > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Hi,
> > > > > > > > > > we've opened a KIP to improve data replication between 

> > Kafka
> > > > > > clusters
> > > > > > > > :
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > INVALID URI REMOVED
> > > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > > 
> > > 
> > 
> 
u=https-3A__cwiki.apache.org_confluence_display_KAFKA_KIP-2D391-253A-2BAllow-2BProducing-2Bwith-2BOffsets-2Bfor-2BCluster-2BReplication&d=DwIBaQ&c=jf_iaSHvJObTbx-
> > > > > > > > >
> > > > > > > >
> > > > > >
> > > > 
> > 
siA1ZOg&r=EzRhmSah4IHsUZVekRUIINhltZK7U0OaeRo7hgW4_tQ&m=uUj9C3BdbYz0dDNA-
> > > > > > > > >
> > > > > > > >
> > > > > >
> > > > >
> > > > 
> > 
E6iXreg1M5hWiWgG6ClS86VIPI&s=Vav8_-N7_OpfYEW33yGOf_or8ESMUJ4S45t2g-EUWKg&e=
> > > > > > > > > >
> > > > > > > > > > We'd like to start a discussion, please post your 
feedback 
> > in
> > > > > this
> > > > > > > > thread.
> > > > > > > > > >
> > > > > > > > > > Thank you
> > > > > > > > > > Edo and Mickael
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > --------------------------------------------------
> > > > > > > > > >
> > > > > > > > > > Edoardo Comar
> > > > > > > > > >
> > > > > > > > > > IBM Event Streams
> > > > > > > > > > IBM UK Ltd, Hursley Park, SO21 2JN
> > > > > > > > > >
> > > > > > > > > > Unless stated otherwise above:
> > > > > > > > > > IBM United Kingdom Limited - Registered in England and 

> > Wales
> > > > with
> > > > > > > > number
> > > > > > > > > > 741598.
> > > > > > > > > > Registered office: PO Box 41, North Harbour, 
Portsmouth,
> > > > > Hampshire
> > > > > > PO6
> > > > > > > > 3AU
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > --
> > > > > > > > > Best,
> > > > > > > > > Stanislav
> > > > > > > >
> > > > > > > > Unless stated otherwise above:
> > > > > > > > IBM United Kingdom Limited - Registered in England and 
Wales 
> > with
> > > > > > number
> > > > > > > > 741598.
> > > > > > > > Registered office: PO Box 41, North Harbour, Portsmouth, 
> > Hampshire
> > > > > PO6
> > > > > > 3AU
> > > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > Best,
> > > > > > > Stanislav
> > > > > >
> > > > > > Unless stated otherwise above:
> > > > > > IBM United Kingdom Limited - Registered in England and Wales 
with
> > > > number
> > > > > > 741598.
> > > > > > Registered office: PO Box 41, North Harbour, Portsmouth, 
Hampshire 
> > PO6
> > > > > 3AU
> > > > >
> > > >
> > > >
> > > > --
> > > > -Regards,
> > > > Mayuresh R. Gharat
> > > > (862) 250-7125
> > > >
> > 
> > Unless stated otherwise above:
> > IBM United Kingdom Limited - Registered in England and Wales with 
number 
> > 741598. 
> > Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 
3AU
> 
> Unless stated otherwise above:
> IBM United Kingdom Limited - Registered in England and Wales with 
> number 741598. 
> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 
3AU

Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU

Re: [DISCUSS] KIP-391: Allow Producing with Offsets for Cluster Replication

Posted by Edoardo Comar <EC...@uk.ibm.com>.
Hi Jason

we envisioned the replicator to replicate the __consumer_offsets topic too 
(although without producing-with-offsets to it!).

As there is no client-side implementation yet using the leader epoch, 
we could not yet see the impact of writing to the destination cluster 
__consumer_offsets records with an invalid leader epoch.

Also, applications might still use external storage mechanism for consumer 
offsets where the leader_epoch is missing.

Perhaps the replicator could - for the __consumer_offsets topic - just 
omit the leader_epoch field in the data sent to destination.

What do you think ?


Jason Gustafson <ja...@confluent.io> wrote on 27/11/2018 00:09:56:

> Another wrinkle to consider is KIP-320. If you are planning to replicate
> __consumer_offsets directly, then you will have to account for leader 
epoch
> information which is stored with the committed offsets. But I cannot 
think
> how it would be possible to replicate the leader epoch information in
> messages even if you can preserve offsets.
> 
> -Jason
> 
> On Mon, Nov 26, 2018 at 1:16 PM Mayuresh Gharat 
<gh...@gmail.com>
> wrote:
> 
> > Hi Edoardo,
> >
> > Thanks a lot for the KIP.
> >  I have a few questions/suggestions in addition to what Radai has 
mentioned
> > above :
> >
> >    1. Is this meant only for 1:1 replication, for example one Kafka 
cluster
> >    replicating to other, instead of having multiple Kafka clusters
> > mirroring
> >    into one Kafka cluster?
> >    2. Are we relying on exactly once produce in the replicator? If 
not, how
> >    are retries handled in the replicator ?
> >    3. What is the recommended value for inflight requests, here. Is it
> >    suppose to be strictly 1, if yes, it would be great to mention that 
in
> > the
> >    KIP.
> >    4. How is unclean Leader election between source cluster and 
destination
> >    cluster handled?
> >    5. How are offsets resets in case of the replicator's consumer 
handled?
> >    6. It would be good to explain the workflow in the KIP, with an
> >    example,  regarding how this KIP will change the replication 
scenario
> > and
> >    how it will benefit the consumer apps.
> >
> > Thanks,
> >
> > Mayuresh
> >
> > On Mon, Nov 26, 2018 at 8:08 AM radai <ra...@gmail.com> 
wrote:
> >
> > > a few questions:
> > >
> > > 1. how do you handle possible duplications caused by the "special"
> > > producer timing-out/retrying? are you explicitely relying on the
> > > "exactly once" sequencing?
> > > 2. what about the combination of log compacted topics + replicator
> > > downtime? by the time the replicator comes back up there might be
> > > "holes" in the source offsets (some msgs might have been compacted
> > > out)? how is that recoverable?
> > > 3. similarly, what if you try and fire up replication on a non-empty
> > > source topic? does the kip allow for offsets starting at some
> > > arbitrary X > 0 ? or would this have to be designed from the start.
> > >
> > > and lastly, since this KIP seems to be designed fro active-passive
> > > failover (there can be no produce traffic except the replicator)
> > > wouldnt a solution based on seeking to a time offset be more 
generic?
> > > your producers could checkpoint the last (say log append) timestamp 
of
> > > records theyve seen, and when restoring in the remote site seek to
> > > those timestamps (which will be metadata in their committed offsets) 
-
> > > assumming replication takes > 0 time you'd need to handle some dups,
> > > but every kafka consumer setup needs to know how to handle those
> > > anyway.
> > > On Fri, Nov 23, 2018 at 2:27 AM Edoardo Comar <EC...@uk.ibm.com> 
wrote:
> > > >
> > > > Hi Stanislav
> > > >
> > > > > > The flag is needed to distinguish a batch with a desired base
> > offset
> > > > of
> > > > > 0,
> > > > > from a regular batch for which offsets need to be generated.
> > > > > If the producer can provide offsets, why not provide a base 
offset of
> > > 0?
> > > >
> > > > a regular batch (for which offsets are generated by the broker on
> > write)
> > > > is sent with a base offset of 0.
> > > > How could you distinguish it from a batch where you *want* the 
first
> > > > record to be written at offset 0 (i.e. be the first in the 
partition
> > and
> > > > be rejected if there are records on the log already) ?
> > > > We wanted to avoid a "deep" inspection (and potentially 
decompression)
> > of
> > > > the records.
> > > >
> > > > For the replicator use case, a single produce request where all 
the
> > data
> > > > is to be assumed with offset,
> > > > or all without offsets, seems to suffice,
> > > > So we added only a toplevel flag, not a per-topic-partition one.
> > > >
> > > > Thanks for your interest !
> > > > cheers
> > > > Edo
> > > > --------------------------------------------------
> > > >
> > > > Edoardo Comar
> > > >
> > > > IBM Event Streams
> > > > IBM UK Ltd, Hursley Park, SO21 2JN
> > > >
> > > >
> > > > Stanislav Kozlovski <st...@confluent.io> wrote on 22/11/2018
> > > 22:32:42:
> > > >
> > > > > From: Stanislav Kozlovski <st...@confluent.io>
> > > > > To: dev@kafka.apache.org
> > > > > Date: 22/11/2018 22:33
> > > > > Subject: Re: [DISCUSS] KIP-391: Allow Producing with Offsets for
> > > > > Cluster Replication
> > > > >
> > > > > Hey Edo & Mickael,
> > > > >
> > > > > > The flag is needed to distinguish a batch with a desired base
> > offset
> > > > of
> > > > > 0,
> > > > > from a regular batch for which offsets need to be generated.
> > > > > If the producer can provide offsets, why not provide a base 
offset of
> > > 0?
> > > > >
> > > > > > (I am reading your post thinking about
> > > > > partitions rather than topics).
> > > > > Yes, I meant partitions. Sorry about that.
> > > > >
> > > > > Thanks for answering my questions :)
> > > > >
> > > > > Best,
> > > > > Stanislav
> > > > >
> > > > > On Thu, Nov 22, 2018 at 5:28 PM Edoardo Comar 
<EC...@uk.ibm.com>
> > > wrote:
> > > > >
> > > > > > Hi Stanislav,
> > > > > >
> > > > > > you're right we envision the replicator use case to have a 
single
> > > > producer
> > > > > > with offsets per partition (I am reading your post thinking 
about
> > > > > > partitions rather than topics).
> > > > > >
> > > > > > If a regular producer was to send its own records at the same 
time,
> > > > it's
> > > > > > very likely that the one sending with an offset will fail 
because
> > of
> > > > > > invalid offsets.
> > > > > > Same if two producers were sending with offsets, likely both 
would
> > > > then
> > > > > > fail.
> > > > > >
> > > > > > > Does it make sense to *lock* the topic from other producers 
while
> > > > there
> > > > > > is
> > > > > > > one that uses offsets?
> > > > > >
> > > > > > You could do that with ACL permissions if you wanted, I don't 
think
> > > it
> > > > > > needs to be mandated by changing the broker logic.
> > > > > >
> > > > > >
> > > > > > > Since we are tying the produce-with-offset request to the 
ACL, do
> > > we
> > > > > > need
> > > > > > > the `use_offset` field in the produce request? Maybe we make 
it
> > > > > > mandatory
> > > > > > > for produce requests with that ACL to have offsets.
> > > > > >
> > > > > > The flag is needed to distinguish a batch with a desired base
> > offset
> > > > of 0,
> > > > > > from a regular batch for which offsets need to be generated.
> > > > > > I would not restrict a principal to only send-with-offsets (by
> > making
> > > > that
> > > > > > mandatory via the ACL).
> > > > > >
> > > > > > Thanks
> > > > > > Edo & Mickael
> > > > > >
> > > > > > --------------------------------------------------
> > > > > >
> > > > > > Edoardo Comar
> > > > > >
> > > > > > IBM Event Streams
> > > > > > IBM UK Ltd, Hursley Park, SO21 2JN
> > > > > >
> > > > > >
> > > > > > Stanislav Kozlovski <st...@confluent.io> wrote on 
22/11/2018
> > > > 16:17:11:
> > > > > >
> > > > > > > From: Stanislav Kozlovski <st...@confluent.io>
> > > > > > > To: dev@kafka.apache.org
> > > > > > > Date: 22/11/2018 16:17
> > > > > > > Subject: Re: [DISCUSS] KIP-391: Allow Producing with Offsets 
for
> > > > > > > Cluster Replication
> > > > > > >
> > > > > > > Hey Edurdo, thanks for the KIP!
> > > > > > >
> > > > > > > I have some questions, apologies if they are naive:
> > > > > > > Is this intended to work for a single producer use case 
only?
> > > > > > > How would it work if two producers were producing to the 
same
> > topic
> > > > with
> > > > > > > offsets?
> > > > > > > How would it work if two producers, one with offsets and one
> > > without
> > > > > > were
> > > > > > > producing to a topic?
> > > > > > > Does it make sense to *lock* the topic from other producers 
while
> > > > there
> > > > > > is
> > > > > > > one that uses offsets?
> > > > > > >
> > > > > > > Since we are tying the produce-with-offset request to the 
ACL, do
> > > we
> > > > > > need
> > > > > > > the `use_offset` field in the produce request? Maybe we make 
it
> > > > > > mandatory
> > > > > > > for produce requests with that ACL to have offsets.
> > > > > > >
> > > > > > > Best,
> > > > > > > Stanislav
> > > > > > >
> > > > > > > On Wed, Nov 21, 2018 at 5:14 PM Edoardo Comar 
<ECOMAR@uk.ibm.com
> > >
> > > > wrote:
> > > > > > >
> > > > > > > > Hi,
> > > > > > > > we've opened a KIP to improve data replication between 
Kafka
> > > > clusters
> > > > > > :
> > > > > > > >
> > > > > > > >
> > > > > > > > INVALID URI REMOVED
> > > > > > >
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> > 
> 
u=https-3A__cwiki.apache.org_confluence_display_KAFKA_KIP-2D391-253A-2BAllow-2BProducing-2Bwith-2BOffsets-2Bfor-2BCluster-2BReplication&d=DwIBaQ&c=jf_iaSHvJObTbx-
> > > > > > >
> > > > > >
> > > >
> > 
siA1ZOg&r=EzRhmSah4IHsUZVekRUIINhltZK7U0OaeRo7hgW4_tQ&m=uUj9C3BdbYz0dDNA-
> > > > > > >
> > > > > >
> > > >
> > >
> > 
E6iXreg1M5hWiWgG6ClS86VIPI&s=Vav8_-N7_OpfYEW33yGOf_or8ESMUJ4S45t2g-EUWKg&e=
> > > > > > > >
> > > > > > > > We'd like to start a discussion, please post your feedback 
in
> > > this
> > > > > > thread.
> > > > > > > >
> > > > > > > > Thank you
> > > > > > > > Edo and Mickael
> > > > > > > >
> > > > > > > >
> > > > > > > > --------------------------------------------------
> > > > > > > >
> > > > > > > > Edoardo Comar
> > > > > > > >
> > > > > > > > IBM Event Streams
> > > > > > > > IBM UK Ltd, Hursley Park, SO21 2JN
> > > > > > > >
> > > > > > > > Unless stated otherwise above:
> > > > > > > > IBM United Kingdom Limited - Registered in England and 
Wales
> > with
> > > > > > number
> > > > > > > > 741598.
> > > > > > > > Registered office: PO Box 41, North Harbour, Portsmouth,
> > > Hampshire
> > > > PO6
> > > > > > 3AU
> > > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > Best,
> > > > > > > Stanislav
> > > > > >
> > > > > > Unless stated otherwise above:
> > > > > > IBM United Kingdom Limited - Registered in England and Wales 
with
> > > > number
> > > > > > 741598.
> > > > > > Registered office: PO Box 41, North Harbour, Portsmouth, 
Hampshire
> > > PO6
> > > > 3AU
> > > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Best,
> > > > > Stanislav
> > > >
> > > > Unless stated otherwise above:
> > > > IBM United Kingdom Limited - Registered in England and Wales with
> > number
> > > > 741598.
> > > > Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire 
PO6
> > > 3AU
> > >
> >
> >
> > --
> > -Regards,
> > Mayuresh R. Gharat
> > (862) 250-7125
> >

Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU

Re: [DISCUSS] KIP-391: Allow Producing with Offsets for Cluster Replication

Posted by Jason Gustafson <ja...@confluent.io>.
Another wrinkle to consider is KIP-320. If you are planning to replicate
__consumer_offsets directly, then you will have to account for leader epoch
information which is stored with the committed offsets. But I cannot think
how it would be possible to replicate the leader epoch information in
messages even if you can preserve offsets.

-Jason

On Mon, Nov 26, 2018 at 1:16 PM Mayuresh Gharat <gh...@gmail.com>
wrote:

> Hi Edoardo,
>
> Thanks a lot for the KIP.
>  I have a few questions/suggestions in addition to what Radai has mentioned
> above :
>
>    1. Is this meant only for 1:1 replication, for example one Kafka cluster
>    replicating to other, instead of having multiple Kafka clusters
> mirroring
>    into one Kafka cluster?
>    2. Are we relying on exactly once produce in the replicator? If not, how
>    are retries handled in the replicator ?
>    3. What is the recommended value for inflight requests, here. Is it
>    suppose to be strictly 1, if yes, it would be great to mention that in
> the
>    KIP.
>    4. How is unclean Leader election between source cluster and destination
>    cluster handled?
>    5. How are offsets resets in case of the replicator's consumer handled?
>    6. It would be good to explain the workflow in the KIP, with an
>    example,  regarding how this KIP will change the replication scenario
> and
>    how it will benefit the consumer apps.
>
> Thanks,
>
> Mayuresh
>
> On Mon, Nov 26, 2018 at 8:08 AM radai <ra...@gmail.com> wrote:
>
> > a few questions:
> >
> > 1. how do you handle possible duplications caused by the "special"
> > producer timing-out/retrying? are you explicitely relying on the
> > "exactly once" sequencing?
> > 2. what about the combination of log compacted topics + replicator
> > downtime? by the time the replicator comes back up there might be
> > "holes" in the source offsets (some msgs might have been compacted
> > out)? how is that recoverable?
> > 3. similarly, what if you try and fire up replication on a non-empty
> > source topic? does the kip allow for offsets starting at some
> > arbitrary X > 0 ? or would this have to be designed from the start.
> >
> > and lastly, since this KIP seems to be designed fro active-passive
> > failover (there can be no produce traffic except the replicator)
> > wouldnt a solution based on seeking to a time offset be more generic?
> > your producers could checkpoint the last (say log append) timestamp of
> > records theyve seen, and when restoring in the remote site seek to
> > those timestamps (which will be metadata in their committed offsets) -
> > assumming replication takes > 0 time you'd need to handle some dups,
> > but every kafka consumer setup needs to know how to handle those
> > anyway.
> > On Fri, Nov 23, 2018 at 2:27 AM Edoardo Comar <EC...@uk.ibm.com> wrote:
> > >
> > > Hi Stanislav
> > >
> > > > > The flag is needed to distinguish a batch with a desired base
> offset
> > > of
> > > > 0,
> > > > from a regular batch for which offsets need to be generated.
> > > > If the producer can provide offsets, why not provide a base offset of
> > 0?
> > >
> > > a regular batch (for which offsets are generated by the broker on
> write)
> > > is sent with a base offset of 0.
> > > How could you distinguish it from a batch where you *want* the first
> > > record to be written at offset 0 (i.e. be the first in the partition
> and
> > > be rejected if there are records on the log already) ?
> > > We wanted to avoid a "deep" inspection (and potentially decompression)
> of
> > > the records.
> > >
> > > For the replicator use case, a single produce request where all the
> data
> > > is to be assumed with offset,
> > > or all without offsets, seems to suffice,
> > > So we added only a toplevel flag, not a per-topic-partition one.
> > >
> > > Thanks for your interest !
> > > cheers
> > > Edo
> > > --------------------------------------------------
> > >
> > > Edoardo Comar
> > >
> > > IBM Event Streams
> > > IBM UK Ltd, Hursley Park, SO21 2JN
> > >
> > >
> > > Stanislav Kozlovski <st...@confluent.io> wrote on 22/11/2018
> > 22:32:42:
> > >
> > > > From: Stanislav Kozlovski <st...@confluent.io>
> > > > To: dev@kafka.apache.org
> > > > Date: 22/11/2018 22:33
> > > > Subject: Re: [DISCUSS] KIP-391: Allow Producing with Offsets for
> > > > Cluster Replication
> > > >
> > > > Hey Edo & Mickael,
> > > >
> > > > > The flag is needed to distinguish a batch with a desired base
> offset
> > > of
> > > > 0,
> > > > from a regular batch for which offsets need to be generated.
> > > > If the producer can provide offsets, why not provide a base offset of
> > 0?
> > > >
> > > > > (I am reading your post thinking about
> > > > partitions rather than topics).
> > > > Yes, I meant partitions. Sorry about that.
> > > >
> > > > Thanks for answering my questions :)
> > > >
> > > > Best,
> > > > Stanislav
> > > >
> > > > On Thu, Nov 22, 2018 at 5:28 PM Edoardo Comar <EC...@uk.ibm.com>
> > wrote:
> > > >
> > > > > Hi Stanislav,
> > > > >
> > > > > you're right we envision the replicator use case to have a single
> > > producer
> > > > > with offsets per partition (I am reading your post thinking about
> > > > > partitions rather than topics).
> > > > >
> > > > > If a regular producer was to send its own records at the same time,
> > > it's
> > > > > very likely that the one sending with an offset will fail because
> of
> > > > > invalid offsets.
> > > > > Same if two producers were sending with offsets, likely both would
> > > then
> > > > > fail.
> > > > >
> > > > > > Does it make sense to *lock* the topic from other producers while
> > > there
> > > > > is
> > > > > > one that uses offsets?
> > > > >
> > > > > You could do that with ACL permissions if you wanted, I don't think
> > it
> > > > > needs to be mandated by changing the broker logic.
> > > > >
> > > > >
> > > > > > Since we are tying the produce-with-offset request to the ACL, do
> > we
> > > > > need
> > > > > > the `use_offset` field in the produce request? Maybe we make it
> > > > > mandatory
> > > > > > for produce requests with that ACL to have offsets.
> > > > >
> > > > > The flag is needed to distinguish a batch with a desired base
> offset
> > > of 0,
> > > > > from a regular batch for which offsets need to be generated.
> > > > > I would not restrict a principal to only send-with-offsets (by
> making
> > > that
> > > > > mandatory via the ACL).
> > > > >
> > > > > Thanks
> > > > > Edo & Mickael
> > > > >
> > > > > --------------------------------------------------
> > > > >
> > > > > Edoardo Comar
> > > > >
> > > > > IBM Event Streams
> > > > > IBM UK Ltd, Hursley Park, SO21 2JN
> > > > >
> > > > >
> > > > > Stanislav Kozlovski <st...@confluent.io> wrote on 22/11/2018
> > > 16:17:11:
> > > > >
> > > > > > From: Stanislav Kozlovski <st...@confluent.io>
> > > > > > To: dev@kafka.apache.org
> > > > > > Date: 22/11/2018 16:17
> > > > > > Subject: Re: [DISCUSS] KIP-391: Allow Producing with Offsets for
> > > > > > Cluster Replication
> > > > > >
> > > > > > Hey Edurdo, thanks for the KIP!
> > > > > >
> > > > > > I have some questions, apologies if they are naive:
> > > > > > Is this intended to work for a single producer use case only?
> > > > > > How would it work if two producers were producing to the same
> topic
> > > with
> > > > > > offsets?
> > > > > > How would it work if two producers, one with offsets and one
> > without
> > > > > were
> > > > > > producing to a topic?
> > > > > > Does it make sense to *lock* the topic from other producers while
> > > there
> > > > > is
> > > > > > one that uses offsets?
> > > > > >
> > > > > > Since we are tying the produce-with-offset request to the ACL, do
> > we
> > > > > need
> > > > > > the `use_offset` field in the produce request? Maybe we make it
> > > > > mandatory
> > > > > > for produce requests with that ACL to have offsets.
> > > > > >
> > > > > > Best,
> > > > > > Stanislav
> > > > > >
> > > > > > On Wed, Nov 21, 2018 at 5:14 PM Edoardo Comar <ECOMAR@uk.ibm.com
> >
> > > wrote:
> > > > > >
> > > > > > > Hi,
> > > > > > > we've opened a KIP to improve data replication between Kafka
> > > clusters
> > > > > :
> > > > > > >
> > > > > > >
> > > > > > > INVALID URI REMOVED
> > > > > >
> > > > >
> > > > >
> > > >
> > >
> >
> u=https-3A__cwiki.apache.org_confluence_display_KAFKA_KIP-2D391-253A-2BAllow-2BProducing-2Bwith-2BOffsets-2Bfor-2BCluster-2BReplication&d=DwIBaQ&c=jf_iaSHvJObTbx-
> > > > > >
> > > > >
> > >
> siA1ZOg&r=EzRhmSah4IHsUZVekRUIINhltZK7U0OaeRo7hgW4_tQ&m=uUj9C3BdbYz0dDNA-
> > > > > >
> > > > >
> > >
> >
> E6iXreg1M5hWiWgG6ClS86VIPI&s=Vav8_-N7_OpfYEW33yGOf_or8ESMUJ4S45t2g-EUWKg&e=
> > > > > > >
> > > > > > > We'd like to start a discussion, please post your feedback in
> > this
> > > > > thread.
> > > > > > >
> > > > > > > Thank you
> > > > > > > Edo and Mickael
> > > > > > >
> > > > > > >
> > > > > > > --------------------------------------------------
> > > > > > >
> > > > > > > Edoardo Comar
> > > > > > >
> > > > > > > IBM Event Streams
> > > > > > > IBM UK Ltd, Hursley Park, SO21 2JN
> > > > > > >
> > > > > > > Unless stated otherwise above:
> > > > > > > IBM United Kingdom Limited - Registered in England and Wales
> with
> > > > > number
> > > > > > > 741598.
> > > > > > > Registered office: PO Box 41, North Harbour, Portsmouth,
> > Hampshire
> > > PO6
> > > > > 3AU
> > > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Best,
> > > > > > Stanislav
> > > > >
> > > > > Unless stated otherwise above:
> > > > > IBM United Kingdom Limited - Registered in England and Wales with
> > > number
> > > > > 741598.
> > > > > Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire
> > PO6
> > > 3AU
> > > > >
> > > >
> > > >
> > > > --
> > > > Best,
> > > > Stanislav
> > >
> > > Unless stated otherwise above:
> > > IBM United Kingdom Limited - Registered in England and Wales with
> number
> > > 741598.
> > > Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
> > 3AU
> >
>
>
> --
> -Regards,
> Mayuresh R. Gharat
> (862) 250-7125
>

Re: [DISCUSS] KIP-391: Allow Producing with Offsets for Cluster Replication

Posted by Edoardo Comar <EC...@uk.ibm.com>.
Hi Mayuresh

1. we were envisioning the 1:1 case, however as long as topic names do not 
clash, 
you could replicate multiple cluster into a single replica, 
or use topic prefixes on the destination. 

2. using an idempotent producer in the replicator would be recommended in 
the replicator.

3. Why would you force max.in.flight.requests.per.connection it to 1? 
The idempotent producer can work with <=5 

4. if truncation occurred in a source topic-partition,
the replicator could encounter INVALID_PRODUCE_OFFSET and at the moment it 
could only
delete the topic and restart replicating to it. 
There is no mechanism for truncating the newest records in the 
destination.
Note that unclean Leader election is now disabled by default.

5. can you please clarify the question?

6. Consumers can use their saved offsets - either stored in Kafka's 
__consumer_offsets or in an external store -
on the records of the replicated cluster, without any translation or 
without relying on timestamps.

This allows the replicator to replicate the committed offsets without 
translation too.

HTH
Edo & Mickael

Mayuresh Gharat <gh...@gmail.com> wrote on 26/11/2018 21:16:25:

> Hi Edoardo,
> 
> Thanks a lot for the KIP.
>  I have a few questions/suggestions in addition to what Radai has 
mentioned
> above :
> 
>    1. Is this meant only for 1:1 replication, for example one Kafka 
cluster
>    replicating to other, instead of having multiple Kafka clusters 
mirroring
>    into one Kafka cluster?
>    2. Are we relying on exactly once produce in the replicator? If not, 
how
>    are retries handled in the replicator ?
>    3. What is the recommended value for inflight requests, here. Is it
>    suppose to be strictly 1, if yes, it would be great to mention that 
in the
>    KIP.
>    4. How is unclean Leader election between source cluster and 
destination
>    cluster handled?
>    5. How are offsets resets in case of the replicator's consumer 
handled?
>    6. It would be good to explain the workflow in the KIP, with an
>    example,  regarding how this KIP will change the replication scenario 
and
>    how it will benefit the consumer apps.
> 
> Thanks,
> 
> Mayuresh
> 
> On Mon, Nov 26, 2018 at 8:08 AM radai <ra...@gmail.com> 
wrote:
> 
> > a few questions:
> >
> > 1. how do you handle possible duplications caused by the "special"
> > producer timing-out/retrying? are you explicitely relying on the
> > "exactly once" sequencing?
> > 2. what about the combination of log compacted topics + replicator
> > downtime? by the time the replicator comes back up there might be
> > "holes" in the source offsets (some msgs might have been compacted
> > out)? how is that recoverable?
> > 3. similarly, what if you try and fire up replication on a non-empty
> > source topic? does the kip allow for offsets starting at some
> > arbitrary X > 0 ? or would this have to be designed from the start.
> >
> > and lastly, since this KIP seems to be designed fro active-passive
> > failover (there can be no produce traffic except the replicator)
> > wouldnt a solution based on seeking to a time offset be more generic?
> > your producers could checkpoint the last (say log append) timestamp of
> > records theyve seen, and when restoring in the remote site seek to
> > those timestamps (which will be metadata in their committed offsets) -
> > assumming replication takes > 0 time you'd need to handle some dups,
> > but every kafka consumer setup needs to know how to handle those
> > anyway.
> > On Fri, Nov 23, 2018 at 2:27 AM Edoardo Comar <EC...@uk.ibm.com> 
wrote:
> > >
> > > Hi Stanislav
> > >
> > > > > The flag is needed to distinguish a batch with a desired base 
offset
> > > of
> > > > 0,
> > > > from a regular batch for which offsets need to be generated.
> > > > If the producer can provide offsets, why not provide a base offset 
of
> > 0?
> > >
> > > a regular batch (for which offsets are generated by the broker on 
write)
> > > is sent with a base offset of 0.
> > > How could you distinguish it from a batch where you *want* the first
> > > record to be written at offset 0 (i.e. be the first in the partition 
and
> > > be rejected if there are records on the log already) ?
> > > We wanted to avoid a "deep" inspection (and potentially 
decompression) of
> > > the records.
> > >
> > > For the replicator use case, a single produce request where all the 
data
> > > is to be assumed with offset,
> > > or all without offsets, seems to suffice,
> > > So we added only a toplevel flag, not a per-topic-partition one.
> > >
> > > Thanks for your interest !
> > > cheers
> > > Edo
> > > --------------------------------------------------
> > >
> > > Edoardo Comar
> > >
> > > IBM Event Streams
> > > IBM UK Ltd, Hursley Park, SO21 2JN
> > >
> > >
> > > Stanislav Kozlovski <st...@confluent.io> wrote on 22/11/2018
> > 22:32:42:
> > >
> > > > From: Stanislav Kozlovski <st...@confluent.io>
> > > > To: dev@kafka.apache.org
> > > > Date: 22/11/2018 22:33
> > > > Subject: Re: [DISCUSS] KIP-391: Allow Producing with Offsets for
> > > > Cluster Replication
> > > >
> > > > Hey Edo & Mickael,
> > > >
> > > > > The flag is needed to distinguish a batch with a desired base 
offset
> > > of
> > > > 0,
> > > > from a regular batch for which offsets need to be generated.
> > > > If the producer can provide offsets, why not provide a base offset 
of
> > 0?
> > > >
> > > > > (I am reading your post thinking about
> > > > partitions rather than topics).
> > > > Yes, I meant partitions. Sorry about that.
> > > >
> > > > Thanks for answering my questions :)
> > > >
> > > > Best,
> > > > Stanislav
> > > >
> > > > On Thu, Nov 22, 2018 at 5:28 PM Edoardo Comar <EC...@uk.ibm.com>
> > wrote:
> > > >
> > > > > Hi Stanislav,
> > > > >
> > > > > you're right we envision the replicator use case to have a 
single
> > > producer
> > > > > with offsets per partition (I am reading your post thinking 
about
> > > > > partitions rather than topics).
> > > > >
> > > > > If a regular producer was to send its own records at the same 
time,
> > > it's
> > > > > very likely that the one sending with an offset will fail 
because of
> > > > > invalid offsets.
> > > > > Same if two producers were sending with offsets, likely both 
would
> > > then
> > > > > fail.
> > > > >
> > > > > > Does it make sense to *lock* the topic from other producers 
while
> > > there
> > > > > is
> > > > > > one that uses offsets?
> > > > >
> > > > > You could do that with ACL permissions if you wanted, I don't 
think
> > it
> > > > > needs to be mandated by changing the broker logic.
> > > > >
> > > > >
> > > > > > Since we are tying the produce-with-offset request to the ACL, 
do
> > we
> > > > > need
> > > > > > the `use_offset` field in the produce request? Maybe we make 
it
> > > > > mandatory
> > > > > > for produce requests with that ACL to have offsets.
> > > > >
> > > > > The flag is needed to distinguish a batch with a desired base 
offset
> > > of 0,
> > > > > from a regular batch for which offsets need to be generated.
> > > > > I would not restrict a principal to only send-with-offsets (by 
making
> > > that
> > > > > mandatory via the ACL).
> > > > >
> > > > > Thanks
> > > > > Edo & Mickael
> > > > >
> > > > > --------------------------------------------------
> > > > >
> > > > > Edoardo Comar
> > > > >
> > > > > IBM Event Streams
> > > > > IBM UK Ltd, Hursley Park, SO21 2JN
> > > > >
> > > > >
> > > > > Stanislav Kozlovski <st...@confluent.io> wrote on 22/11/2018
> > > 16:17:11:
> > > > >
> > > > > > From: Stanislav Kozlovski <st...@confluent.io>
> > > > > > To: dev@kafka.apache.org
> > > > > > Date: 22/11/2018 16:17
> > > > > > Subject: Re: [DISCUSS] KIP-391: Allow Producing with Offsets 
for
> > > > > > Cluster Replication
> > > > > >
> > > > > > Hey Edurdo, thanks for the KIP!
> > > > > >
> > > > > > I have some questions, apologies if they are naive:
> > > > > > Is this intended to work for a single producer use case only?
> > > > > > How would it work if two producers were producing to the same 
topic
> > > with
> > > > > > offsets?
> > > > > > How would it work if two producers, one with offsets and one
> > without
> > > > > were
> > > > > > producing to a topic?
> > > > > > Does it make sense to *lock* the topic from other producers 
while
> > > there
> > > > > is
> > > > > > one that uses offsets?
> > > > > >
> > > > > > Since we are tying the produce-with-offset request to the ACL, 
do
> > we
> > > > > need
> > > > > > the `use_offset` field in the produce request? Maybe we make 
it
> > > > > mandatory
> > > > > > for produce requests with that ACL to have offsets.
> > > > > >
> > > > > > Best,
> > > > > > Stanislav
> > > > > >
> > > > > > On Wed, Nov 21, 2018 at 5:14 PM Edoardo Comar 
<EC...@uk.ibm.com>
> > > wrote:
> > > > > >
> > > > > > > Hi,
> > > > > > > we've opened a KIP to improve data replication between Kafka
> > > clusters
> > > > > :
> > > > > > >
> > > > > > >
> > > > > > > INVALID URI REMOVED
> > > > > >
> > > > >
> > > > >
> > > >
> > >
> > 
> 
u=https-3A__cwiki.apache.org_confluence_display_KAFKA_KIP-2D391-253A-2BAllow-2BProducing-2Bwith-2BOffsets-2Bfor-2BCluster-2BReplication&d=DwIBaQ&c=jf_iaSHvJObTbx-
> > > > > >
> > > > >
> > > 
siA1ZOg&r=EzRhmSah4IHsUZVekRUIINhltZK7U0OaeRo7hgW4_tQ&m=uUj9C3BdbYz0dDNA-
> > > > > >
> > > > >
> > >
> > 
E6iXreg1M5hWiWgG6ClS86VIPI&s=Vav8_-N7_OpfYEW33yGOf_or8ESMUJ4S45t2g-EUWKg&e=
> > > > > > >
> > > > > > > We'd like to start a discussion, please post your feedback 
in
> > this
> > > > > thread.
> > > > > > >
> > > > > > > Thank you
> > > > > > > Edo and Mickael
> > > > > > >
> > > > > > >
> > > > > > > --------------------------------------------------
> > > > > > >
> > > > > > > Edoardo Comar
> > > > > > >
> > > > > > > IBM Event Streams
> > > > > > > IBM UK Ltd, Hursley Park, SO21 2JN
> > > > > > >
> > > > > > > Unless stated otherwise above:
> > > > > > > IBM United Kingdom Limited - Registered in England and Wales 
with
> > > > > number
> > > > > > > 741598.
> > > > > > > Registered office: PO Box 41, North Harbour, Portsmouth,
> > Hampshire
> > > PO6
> > > > > 3AU
> > > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Best,
> > > > > > Stanislav
> > > > >
> > > > > Unless stated otherwise above:
> > > > > IBM United Kingdom Limited - Registered in England and Wales 
with
> > > number
> > > > > 741598.
> > > > > Registered office: PO Box 41, North Harbour, Portsmouth, 
Hampshire
> > PO6
> > > 3AU
> > > > >
> > > >
> > > >
> > > > --
> > > > Best,
> > > > Stanislav
> > >
> > > Unless stated otherwise above:
> > > IBM United Kingdom Limited - Registered in England and Wales with 
number
> > > 741598.
> > > Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire 
PO6
> > 3AU
> >
> 
> 
> -- 
> -Regards,
> Mayuresh R. Gharat
> (862) 250-7125

Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU

Re: [DISCUSS] KIP-391: Allow Producing with Offsets for Cluster Replication

Posted by Mayuresh Gharat <gh...@gmail.com>.
Hi Edoardo,

Thanks a lot for the KIP.
 I have a few questions/suggestions in addition to what Radai has mentioned
above :

   1. Is this meant only for 1:1 replication, for example one Kafka cluster
   replicating to other, instead of having multiple Kafka clusters mirroring
   into one Kafka cluster?
   2. Are we relying on exactly once produce in the replicator? If not, how
   are retries handled in the replicator ?
   3. What is the recommended value for inflight requests, here. Is it
   suppose to be strictly 1, if yes, it would be great to mention that in the
   KIP.
   4. How is unclean Leader election between source cluster and destination
   cluster handled?
   5. How are offsets resets in case of the replicator's consumer handled?
   6. It would be good to explain the workflow in the KIP, with an
   example,  regarding how this KIP will change the replication scenario and
   how it will benefit the consumer apps.

Thanks,

Mayuresh

On Mon, Nov 26, 2018 at 8:08 AM radai <ra...@gmail.com> wrote:

> a few questions:
>
> 1. how do you handle possible duplications caused by the "special"
> producer timing-out/retrying? are you explicitely relying on the
> "exactly once" sequencing?
> 2. what about the combination of log compacted topics + replicator
> downtime? by the time the replicator comes back up there might be
> "holes" in the source offsets (some msgs might have been compacted
> out)? how is that recoverable?
> 3. similarly, what if you try and fire up replication on a non-empty
> source topic? does the kip allow for offsets starting at some
> arbitrary X > 0 ? or would this have to be designed from the start.
>
> and lastly, since this KIP seems to be designed fro active-passive
> failover (there can be no produce traffic except the replicator)
> wouldnt a solution based on seeking to a time offset be more generic?
> your producers could checkpoint the last (say log append) timestamp of
> records theyve seen, and when restoring in the remote site seek to
> those timestamps (which will be metadata in their committed offsets) -
> assumming replication takes > 0 time you'd need to handle some dups,
> but every kafka consumer setup needs to know how to handle those
> anyway.
> On Fri, Nov 23, 2018 at 2:27 AM Edoardo Comar <EC...@uk.ibm.com> wrote:
> >
> > Hi Stanislav
> >
> > > > The flag is needed to distinguish a batch with a desired base offset
> > of
> > > 0,
> > > from a regular batch for which offsets need to be generated.
> > > If the producer can provide offsets, why not provide a base offset of
> 0?
> >
> > a regular batch (for which offsets are generated by the broker on write)
> > is sent with a base offset of 0.
> > How could you distinguish it from a batch where you *want* the first
> > record to be written at offset 0 (i.e. be the first in the partition and
> > be rejected if there are records on the log already) ?
> > We wanted to avoid a "deep" inspection (and potentially decompression) of
> > the records.
> >
> > For the replicator use case, a single produce request where all the data
> > is to be assumed with offset,
> > or all without offsets, seems to suffice,
> > So we added only a toplevel flag, not a per-topic-partition one.
> >
> > Thanks for your interest !
> > cheers
> > Edo
> > --------------------------------------------------
> >
> > Edoardo Comar
> >
> > IBM Event Streams
> > IBM UK Ltd, Hursley Park, SO21 2JN
> >
> >
> > Stanislav Kozlovski <st...@confluent.io> wrote on 22/11/2018
> 22:32:42:
> >
> > > From: Stanislav Kozlovski <st...@confluent.io>
> > > To: dev@kafka.apache.org
> > > Date: 22/11/2018 22:33
> > > Subject: Re: [DISCUSS] KIP-391: Allow Producing with Offsets for
> > > Cluster Replication
> > >
> > > Hey Edo & Mickael,
> > >
> > > > The flag is needed to distinguish a batch with a desired base offset
> > of
> > > 0,
> > > from a regular batch for which offsets need to be generated.
> > > If the producer can provide offsets, why not provide a base offset of
> 0?
> > >
> > > > (I am reading your post thinking about
> > > partitions rather than topics).
> > > Yes, I meant partitions. Sorry about that.
> > >
> > > Thanks for answering my questions :)
> > >
> > > Best,
> > > Stanislav
> > >
> > > On Thu, Nov 22, 2018 at 5:28 PM Edoardo Comar <EC...@uk.ibm.com>
> wrote:
> > >
> > > > Hi Stanislav,
> > > >
> > > > you're right we envision the replicator use case to have a single
> > producer
> > > > with offsets per partition (I am reading your post thinking about
> > > > partitions rather than topics).
> > > >
> > > > If a regular producer was to send its own records at the same time,
> > it's
> > > > very likely that the one sending with an offset will fail because of
> > > > invalid offsets.
> > > > Same if two producers were sending with offsets, likely both would
> > then
> > > > fail.
> > > >
> > > > > Does it make sense to *lock* the topic from other producers while
> > there
> > > > is
> > > > > one that uses offsets?
> > > >
> > > > You could do that with ACL permissions if you wanted, I don't think
> it
> > > > needs to be mandated by changing the broker logic.
> > > >
> > > >
> > > > > Since we are tying the produce-with-offset request to the ACL, do
> we
> > > > need
> > > > > the `use_offset` field in the produce request? Maybe we make it
> > > > mandatory
> > > > > for produce requests with that ACL to have offsets.
> > > >
> > > > The flag is needed to distinguish a batch with a desired base offset
> > of 0,
> > > > from a regular batch for which offsets need to be generated.
> > > > I would not restrict a principal to only send-with-offsets (by making
> > that
> > > > mandatory via the ACL).
> > > >
> > > > Thanks
> > > > Edo & Mickael
> > > >
> > > > --------------------------------------------------
> > > >
> > > > Edoardo Comar
> > > >
> > > > IBM Event Streams
> > > > IBM UK Ltd, Hursley Park, SO21 2JN
> > > >
> > > >
> > > > Stanislav Kozlovski <st...@confluent.io> wrote on 22/11/2018
> > 16:17:11:
> > > >
> > > > > From: Stanislav Kozlovski <st...@confluent.io>
> > > > > To: dev@kafka.apache.org
> > > > > Date: 22/11/2018 16:17
> > > > > Subject: Re: [DISCUSS] KIP-391: Allow Producing with Offsets for
> > > > > Cluster Replication
> > > > >
> > > > > Hey Edurdo, thanks for the KIP!
> > > > >
> > > > > I have some questions, apologies if they are naive:
> > > > > Is this intended to work for a single producer use case only?
> > > > > How would it work if two producers were producing to the same topic
> > with
> > > > > offsets?
> > > > > How would it work if two producers, one with offsets and one
> without
> > > > were
> > > > > producing to a topic?
> > > > > Does it make sense to *lock* the topic from other producers while
> > there
> > > > is
> > > > > one that uses offsets?
> > > > >
> > > > > Since we are tying the produce-with-offset request to the ACL, do
> we
> > > > need
> > > > > the `use_offset` field in the produce request? Maybe we make it
> > > > mandatory
> > > > > for produce requests with that ACL to have offsets.
> > > > >
> > > > > Best,
> > > > > Stanislav
> > > > >
> > > > > On Wed, Nov 21, 2018 at 5:14 PM Edoardo Comar <EC...@uk.ibm.com>
> > wrote:
> > > > >
> > > > > > Hi,
> > > > > > we've opened a KIP to improve data replication between Kafka
> > clusters
> > > > :
> > > > > >
> > > > > >
> > > > > > INVALID URI REMOVED
> > > > >
> > > >
> > > >
> > >
> >
> u=https-3A__cwiki.apache.org_confluence_display_KAFKA_KIP-2D391-253A-2BAllow-2BProducing-2Bwith-2BOffsets-2Bfor-2BCluster-2BReplication&d=DwIBaQ&c=jf_iaSHvJObTbx-
> > > > >
> > > >
> > siA1ZOg&r=EzRhmSah4IHsUZVekRUIINhltZK7U0OaeRo7hgW4_tQ&m=uUj9C3BdbYz0dDNA-
> > > > >
> > > >
> >
> E6iXreg1M5hWiWgG6ClS86VIPI&s=Vav8_-N7_OpfYEW33yGOf_or8ESMUJ4S45t2g-EUWKg&e=
> > > > > >
> > > > > > We'd like to start a discussion, please post your feedback in
> this
> > > > thread.
> > > > > >
> > > > > > Thank you
> > > > > > Edo and Mickael
> > > > > >
> > > > > >
> > > > > > --------------------------------------------------
> > > > > >
> > > > > > Edoardo Comar
> > > > > >
> > > > > > IBM Event Streams
> > > > > > IBM UK Ltd, Hursley Park, SO21 2JN
> > > > > >
> > > > > > Unless stated otherwise above:
> > > > > > IBM United Kingdom Limited - Registered in England and Wales with
> > > > number
> > > > > > 741598.
> > > > > > Registered office: PO Box 41, North Harbour, Portsmouth,
> Hampshire
> > PO6
> > > > 3AU
> > > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Best,
> > > > > Stanislav
> > > >
> > > > Unless stated otherwise above:
> > > > IBM United Kingdom Limited - Registered in England and Wales with
> > number
> > > > 741598.
> > > > Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire
> PO6
> > 3AU
> > > >
> > >
> > >
> > > --
> > > Best,
> > > Stanislav
> >
> > Unless stated otherwise above:
> > IBM United Kingdom Limited - Registered in England and Wales with number
> > 741598.
> > Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
> 3AU
>


-- 
-Regards,
Mayuresh R. Gharat
(862) 250-7125

Re: [DISCUSS] KIP-391: Allow Producing with Offsets for Cluster Replication

Posted by Edoardo Comar <EC...@uk.ibm.com>.
Hi Radai

> 1. how do you handle possible duplications caused by the "special"
> producer timing-out/retrying? are you explicitely relying on the
> "exactly once" sequencing?

A duplicate ProduceRequest would be rejected with an 
INVALID_PRODUCE_OFFSET error.

We envision using an idempotent producer for cluster replication but to 
not require it.


> 2. what about the combination of log compacted topics + replicator
> downtime? by the time the replicator comes back up there might be
> "holes" in the source offsets (some msgs might have been compacted
> out)? how is that recoverable?
> 3. similarly, what if you try and fire up replication on a non-empty
> source topic? does the kip allow for offsets starting at some
> arbitrary X > 0 ? or would this have to be designed from the start.

Both these cases do not pose a problem. 
As mentioned in the KIP each Producer batch must not contain offset gaps, 
but these can exist between batches.
The companion PR has an implementation with tests that cover these cases

> and lastly, since this KIP seems to be designed fro active-passive
> failover (there can be no produce traffic except the replicator)
> wouldnt a solution based on seeking to a time offset be more generic?
> your producers could checkpoint the last (say log append) timestamp of
> records theyve seen, and when restoring in the remote site seek to
> those timestamps (which will be metadata in their committed offsets) -
> assumming replication takes > 0 time you'd need to handle some dups,
> but every kafka consumer setup needs to know how to handle those
> anyway.

can you please clarify?
We do not expect any cooperation from users applications.

thanks!
E&M

> On Fri, Nov 23, 2018 at 2:27 AM Edoardo Comar <EC...@uk.ibm.com> wrote:
> >
> > Hi Stanislav
> >
> > > > The flag is needed to distinguish a batch with a desired base 
offset
> > of
> > > 0,
> > > from a regular batch for which offsets need to be generated.
> > > If the producer can provide offsets, why not provide a base offset 
of 0?
> >
> > a regular batch (for which offsets are generated by the broker on 
write)
> > is sent with a base offset of 0.
> > How could you distinguish it from a batch where you *want* the first
> > record to be written at offset 0 (i.e. be the first in the partition 
and
> > be rejected if there are records on the log already) ?
> > We wanted to avoid a "deep" inspection (and potentially decompression) 
of
> > the records.
> >
> > For the replicator use case, a single produce request where all the 
data
> > is to be assumed with offset,
> > or all without offsets, seems to suffice,
> > So we added only a toplevel flag, not a per-topic-partition one.
> >
> > Thanks for your interest !
> > cheers
> > Edo
> > --------------------------------------------------
> >
> > Edoardo Comar
> >
> > IBM Event Streams
> > IBM UK Ltd, Hursley Park, SO21 2JN
> >
> >
> > Stanislav Kozlovski <st...@confluent.io> wrote on 22/11/2018 
22:32:42:
> >
> > > From: Stanislav Kozlovski <st...@confluent.io>
> > > To: dev@kafka.apache.org
> > > Date: 22/11/2018 22:33
> > > Subject: Re: [DISCUSS] KIP-391: Allow Producing with Offsets for
> > > Cluster Replication
> > >
> > > Hey Edo & Mickael,
> > >
> > > > The flag is needed to distinguish a batch with a desired base 
offset
> > of
> > > 0,
> > > from a regular batch for which offsets need to be generated.
> > > If the producer can provide offsets, why not provide a base offset 
of 0?
> > >
> > > > (I am reading your post thinking about
> > > partitions rather than topics).
> > > Yes, I meant partitions. Sorry about that.
> > >
> > > Thanks for answering my questions :)
> > >
> > > Best,
> > > Stanislav
> > >
> > > On Thu, Nov 22, 2018 at 5:28 PM Edoardo Comar <EC...@uk.ibm.com> 
wrote:
> > >
> > > > Hi Stanislav,
> > > >
> > > > you're right we envision the replicator use case to have a single
> > producer
> > > > with offsets per partition (I am reading your post thinking about
> > > > partitions rather than topics).
> > > >
> > > > If a regular producer was to send its own records at the same 
time,
> > it's
> > > > very likely that the one sending with an offset will fail because 
of
> > > > invalid offsets.
> > > > Same if two producers were sending with offsets, likely both would
> > then
> > > > fail.
> > > >
> > > > > Does it make sense to *lock* the topic from other producers 
while
> > there
> > > > is
> > > > > one that uses offsets?
> > > >
> > > > You could do that with ACL permissions if you wanted, I don't 
think it
> > > > needs to be mandated by changing the broker logic.
> > > >
> > > >
> > > > > Since we are tying the produce-with-offset request to the ACL, 
do we
> > > > need
> > > > > the `use_offset` field in the produce request? Maybe we make it
> > > > mandatory
> > > > > for produce requests with that ACL to have offsets.
> > > >
> > > > The flag is needed to distinguish a batch with a desired base 
offset
> > of 0,
> > > > from a regular batch for which offsets need to be generated.
> > > > I would not restrict a principal to only send-with-offsets (by 
making
> > that
> > > > mandatory via the ACL).
> > > >
> > > > Thanks
> > > > Edo & Mickael
> > > >
> > > > --------------------------------------------------
> > > >
> > > > Edoardo Comar
> > > >
> > > > IBM Event Streams
> > > > IBM UK Ltd, Hursley Park, SO21 2JN
> > > >
> > > >
> > > > Stanislav Kozlovski <st...@confluent.io> wrote on 22/11/2018
> > 16:17:11:
> > > >
> > > > > From: Stanislav Kozlovski <st...@confluent.io>
> > > > > To: dev@kafka.apache.org
> > > > > Date: 22/11/2018 16:17
> > > > > Subject: Re: [DISCUSS] KIP-391: Allow Producing with Offsets for
> > > > > Cluster Replication
> > > > >
> > > > > Hey Edurdo, thanks for the KIP!
> > > > >
> > > > > I have some questions, apologies if they are naive:
> > > > > Is this intended to work for a single producer use case only?
> > > > > How would it work if two producers were producing to the same 
topic
> > with
> > > > > offsets?
> > > > > How would it work if two producers, one with offsets and one 
without
> > > > were
> > > > > producing to a topic?
> > > > > Does it make sense to *lock* the topic from other producers 
while
> > there
> > > > is
> > > > > one that uses offsets?
> > > > >
> > > > > Since we are tying the produce-with-offset request to the ACL, 
do we
> > > > need
> > > > > the `use_offset` field in the produce request? Maybe we make it
> > > > mandatory
> > > > > for produce requests with that ACL to have offsets.
> > > > >
> > > > > Best,
> > > > > Stanislav
> > > > >
> > > > > On Wed, Nov 21, 2018 at 5:14 PM Edoardo Comar 
<EC...@uk.ibm.com>
> > wrote:
> > > > >
> > > > > > Hi,
> > > > > > we've opened a KIP to improve data replication between Kafka
> > clusters
> > > > :
> > > > > >
> > > > > >
> > > > > > INVALID URI REMOVED
> > > > >
> > > >
> > > >
> > >
> > 
> 
u=https-3A__cwiki.apache.org_confluence_display_KAFKA_KIP-2D391-253A-2BAllow-2BProducing-2Bwith-2BOffsets-2Bfor-2BCluster-2BReplication&d=DwIBaQ&c=jf_iaSHvJObTbx-
> > > > >
> > > >
> > 
siA1ZOg&r=EzRhmSah4IHsUZVekRUIINhltZK7U0OaeRo7hgW4_tQ&m=uUj9C3BdbYz0dDNA-
> > > > >
> > > >
> > 
E6iXreg1M5hWiWgG6ClS86VIPI&s=Vav8_-N7_OpfYEW33yGOf_or8ESMUJ4S45t2g-EUWKg&e=
> > > > > >
> > > > > > We'd like to start a discussion, please post your feedback in 
this
> > > > thread.
> > > > > >
> > > > > > Thank you
> > > > > > Edo and Mickael
> > > > > >
> > > > > >
> > > > > > --------------------------------------------------
> > > > > >
> > > > > > Edoardo Comar
> > > > > >
> > > > > > IBM Event Streams
> > > > > > IBM UK Ltd, Hursley Park, SO21 2JN
> > > > > >
> > > > > > Unless stated otherwise above:
> > > > > > IBM United Kingdom Limited - Registered in England and Wales 
with
> > > > number
> > > > > > 741598.
> > > > > > Registered office: PO Box 41, North Harbour, Portsmouth, 
Hampshire
> > PO6
> > > > 3AU
> > > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Best,
> > > > > Stanislav
> > > >
> > > > Unless stated otherwise above:
> > > > IBM United Kingdom Limited - Registered in England and Wales with
> > number
> > > > 741598.
> > > > Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire 
PO6
> > 3AU
> > > >
> > >
> > >
> > > --
> > > Best,
> > > Stanislav
> >
> > Unless stated otherwise above:
> > IBM United Kingdom Limited - Registered in England and Wales with 
number
> > 741598.
> > Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 
3AU
> 

Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU

Re: [DISCUSS] KIP-391: Allow Producing with Offsets for Cluster Replication

Posted by radai <ra...@gmail.com>.
a few questions:

1. how do you handle possible duplications caused by the "special"
producer timing-out/retrying? are you explicitely relying on the
"exactly once" sequencing?
2. what about the combination of log compacted topics + replicator
downtime? by the time the replicator comes back up there might be
"holes" in the source offsets (some msgs might have been compacted
out)? how is that recoverable?
3. similarly, what if you try and fire up replication on a non-empty
source topic? does the kip allow for offsets starting at some
arbitrary X > 0 ? or would this have to be designed from the start.

and lastly, since this KIP seems to be designed fro active-passive
failover (there can be no produce traffic except the replicator)
wouldnt a solution based on seeking to a time offset be more generic?
your producers could checkpoint the last (say log append) timestamp of
records theyve seen, and when restoring in the remote site seek to
those timestamps (which will be metadata in their committed offsets) -
assumming replication takes > 0 time you'd need to handle some dups,
but every kafka consumer setup needs to know how to handle those
anyway.
On Fri, Nov 23, 2018 at 2:27 AM Edoardo Comar <EC...@uk.ibm.com> wrote:
>
> Hi Stanislav
>
> > > The flag is needed to distinguish a batch with a desired base offset
> of
> > 0,
> > from a regular batch for which offsets need to be generated.
> > If the producer can provide offsets, why not provide a base offset of 0?
>
> a regular batch (for which offsets are generated by the broker on write)
> is sent with a base offset of 0.
> How could you distinguish it from a batch where you *want* the first
> record to be written at offset 0 (i.e. be the first in the partition and
> be rejected if there are records on the log already) ?
> We wanted to avoid a "deep" inspection (and potentially decompression) of
> the records.
>
> For the replicator use case, a single produce request where all the data
> is to be assumed with offset,
> or all without offsets, seems to suffice,
> So we added only a toplevel flag, not a per-topic-partition one.
>
> Thanks for your interest !
> cheers
> Edo
> --------------------------------------------------
>
> Edoardo Comar
>
> IBM Event Streams
> IBM UK Ltd, Hursley Park, SO21 2JN
>
>
> Stanislav Kozlovski <st...@confluent.io> wrote on 22/11/2018 22:32:42:
>
> > From: Stanislav Kozlovski <st...@confluent.io>
> > To: dev@kafka.apache.org
> > Date: 22/11/2018 22:33
> > Subject: Re: [DISCUSS] KIP-391: Allow Producing with Offsets for
> > Cluster Replication
> >
> > Hey Edo & Mickael,
> >
> > > The flag is needed to distinguish a batch with a desired base offset
> of
> > 0,
> > from a regular batch for which offsets need to be generated.
> > If the producer can provide offsets, why not provide a base offset of 0?
> >
> > > (I am reading your post thinking about
> > partitions rather than topics).
> > Yes, I meant partitions. Sorry about that.
> >
> > Thanks for answering my questions :)
> >
> > Best,
> > Stanislav
> >
> > On Thu, Nov 22, 2018 at 5:28 PM Edoardo Comar <EC...@uk.ibm.com> wrote:
> >
> > > Hi Stanislav,
> > >
> > > you're right we envision the replicator use case to have a single
> producer
> > > with offsets per partition (I am reading your post thinking about
> > > partitions rather than topics).
> > >
> > > If a regular producer was to send its own records at the same time,
> it's
> > > very likely that the one sending with an offset will fail because of
> > > invalid offsets.
> > > Same if two producers were sending with offsets, likely both would
> then
> > > fail.
> > >
> > > > Does it make sense to *lock* the topic from other producers while
> there
> > > is
> > > > one that uses offsets?
> > >
> > > You could do that with ACL permissions if you wanted, I don't think it
> > > needs to be mandated by changing the broker logic.
> > >
> > >
> > > > Since we are tying the produce-with-offset request to the ACL, do we
> > > need
> > > > the `use_offset` field in the produce request? Maybe we make it
> > > mandatory
> > > > for produce requests with that ACL to have offsets.
> > >
> > > The flag is needed to distinguish a batch with a desired base offset
> of 0,
> > > from a regular batch for which offsets need to be generated.
> > > I would not restrict a principal to only send-with-offsets (by making
> that
> > > mandatory via the ACL).
> > >
> > > Thanks
> > > Edo & Mickael
> > >
> > > --------------------------------------------------
> > >
> > > Edoardo Comar
> > >
> > > IBM Event Streams
> > > IBM UK Ltd, Hursley Park, SO21 2JN
> > >
> > >
> > > Stanislav Kozlovski <st...@confluent.io> wrote on 22/11/2018
> 16:17:11:
> > >
> > > > From: Stanislav Kozlovski <st...@confluent.io>
> > > > To: dev@kafka.apache.org
> > > > Date: 22/11/2018 16:17
> > > > Subject: Re: [DISCUSS] KIP-391: Allow Producing with Offsets for
> > > > Cluster Replication
> > > >
> > > > Hey Edurdo, thanks for the KIP!
> > > >
> > > > I have some questions, apologies if they are naive:
> > > > Is this intended to work for a single producer use case only?
> > > > How would it work if two producers were producing to the same topic
> with
> > > > offsets?
> > > > How would it work if two producers, one with offsets and one without
> > > were
> > > > producing to a topic?
> > > > Does it make sense to *lock* the topic from other producers while
> there
> > > is
> > > > one that uses offsets?
> > > >
> > > > Since we are tying the produce-with-offset request to the ACL, do we
> > > need
> > > > the `use_offset` field in the produce request? Maybe we make it
> > > mandatory
> > > > for produce requests with that ACL to have offsets.
> > > >
> > > > Best,
> > > > Stanislav
> > > >
> > > > On Wed, Nov 21, 2018 at 5:14 PM Edoardo Comar <EC...@uk.ibm.com>
> wrote:
> > > >
> > > > > Hi,
> > > > > we've opened a KIP to improve data replication between Kafka
> clusters
> > > :
> > > > >
> > > > >
> > > > > INVALID URI REMOVED
> > > >
> > >
> > >
> >
> u=https-3A__cwiki.apache.org_confluence_display_KAFKA_KIP-2D391-253A-2BAllow-2BProducing-2Bwith-2BOffsets-2Bfor-2BCluster-2BReplication&d=DwIBaQ&c=jf_iaSHvJObTbx-
> > > >
> > >
> siA1ZOg&r=EzRhmSah4IHsUZVekRUIINhltZK7U0OaeRo7hgW4_tQ&m=uUj9C3BdbYz0dDNA-
> > > >
> > >
> E6iXreg1M5hWiWgG6ClS86VIPI&s=Vav8_-N7_OpfYEW33yGOf_or8ESMUJ4S45t2g-EUWKg&e=
> > > > >
> > > > > We'd like to start a discussion, please post your feedback in this
> > > thread.
> > > > >
> > > > > Thank you
> > > > > Edo and Mickael
> > > > >
> > > > >
> > > > > --------------------------------------------------
> > > > >
> > > > > Edoardo Comar
> > > > >
> > > > > IBM Event Streams
> > > > > IBM UK Ltd, Hursley Park, SO21 2JN
> > > > >
> > > > > Unless stated otherwise above:
> > > > > IBM United Kingdom Limited - Registered in England and Wales with
> > > number
> > > > > 741598.
> > > > > Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire
> PO6
> > > 3AU
> > > > >
> > > >
> > > >
> > > > --
> > > > Best,
> > > > Stanislav
> > >
> > > Unless stated otherwise above:
> > > IBM United Kingdom Limited - Registered in England and Wales with
> number
> > > 741598.
> > > Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
> 3AU
> > >
> >
> >
> > --
> > Best,
> > Stanislav
>
> Unless stated otherwise above:
> IBM United Kingdom Limited - Registered in England and Wales with number
> 741598.
> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU

Re: [DISCUSS] KIP-391: Allow Producing with Offsets for Cluster Replication

Posted by Edoardo Comar <EC...@uk.ibm.com>.
Hi Stanislav

> > The flag is needed to distinguish a batch with a desired base offset 
of
> 0,
> from a regular batch for which offsets need to be generated.
> If the producer can provide offsets, why not provide a base offset of 0?

a regular batch (for which offsets are generated by the broker on write) 
is sent with a base offset of 0.
How could you distinguish it from a batch where you *want* the first 
record to be written at offset 0 (i.e. be the first in the partition and 
be rejected if there are records on the log already) ?
We wanted to avoid a "deep" inspection (and potentially decompression) of 
the records. 

For the replicator use case, a single produce request where all the data 
is to be assumed with offset, 
or all without offsets, seems to suffice,
So we added only a toplevel flag, not a per-topic-partition one.

Thanks for your interest !
cheers
Edo
--------------------------------------------------

Edoardo Comar

IBM Event Streams
IBM UK Ltd, Hursley Park, SO21 2JN


Stanislav Kozlovski <st...@confluent.io> wrote on 22/11/2018 22:32:42:

> From: Stanislav Kozlovski <st...@confluent.io>
> To: dev@kafka.apache.org
> Date: 22/11/2018 22:33
> Subject: Re: [DISCUSS] KIP-391: Allow Producing with Offsets for 
> Cluster Replication
> 
> Hey Edo & Mickael,
> 
> > The flag is needed to distinguish a batch with a desired base offset 
of
> 0,
> from a regular batch for which offsets need to be generated.
> If the producer can provide offsets, why not provide a base offset of 0?
> 
> > (I am reading your post thinking about
> partitions rather than topics).
> Yes, I meant partitions. Sorry about that.
> 
> Thanks for answering my questions :)
> 
> Best,
> Stanislav
> 
> On Thu, Nov 22, 2018 at 5:28 PM Edoardo Comar <EC...@uk.ibm.com> wrote:
> 
> > Hi Stanislav,
> >
> > you're right we envision the replicator use case to have a single 
producer
> > with offsets per partition (I am reading your post thinking about
> > partitions rather than topics).
> >
> > If a regular producer was to send its own records at the same time, 
it's
> > very likely that the one sending with an offset will fail because of
> > invalid offsets.
> > Same if two producers were sending with offsets, likely both would 
then
> > fail.
> >
> > > Does it make sense to *lock* the topic from other producers while 
there
> > is
> > > one that uses offsets?
> >
> > You could do that with ACL permissions if you wanted, I don't think it
> > needs to be mandated by changing the broker logic.
> >
> >
> > > Since we are tying the produce-with-offset request to the ACL, do we
> > need
> > > the `use_offset` field in the produce request? Maybe we make it
> > mandatory
> > > for produce requests with that ACL to have offsets.
> >
> > The flag is needed to distinguish a batch with a desired base offset 
of 0,
> > from a regular batch for which offsets need to be generated.
> > I would not restrict a principal to only send-with-offsets (by making 
that
> > mandatory via the ACL).
> >
> > Thanks
> > Edo & Mickael
> >
> > --------------------------------------------------
> >
> > Edoardo Comar
> >
> > IBM Event Streams
> > IBM UK Ltd, Hursley Park, SO21 2JN
> >
> >
> > Stanislav Kozlovski <st...@confluent.io> wrote on 22/11/2018 
16:17:11:
> >
> > > From: Stanislav Kozlovski <st...@confluent.io>
> > > To: dev@kafka.apache.org
> > > Date: 22/11/2018 16:17
> > > Subject: Re: [DISCUSS] KIP-391: Allow Producing with Offsets for
> > > Cluster Replication
> > >
> > > Hey Edurdo, thanks for the KIP!
> > >
> > > I have some questions, apologies if they are naive:
> > > Is this intended to work for a single producer use case only?
> > > How would it work if two producers were producing to the same topic 
with
> > > offsets?
> > > How would it work if two producers, one with offsets and one without
> > were
> > > producing to a topic?
> > > Does it make sense to *lock* the topic from other producers while 
there
> > is
> > > one that uses offsets?
> > >
> > > Since we are tying the produce-with-offset request to the ACL, do we
> > need
> > > the `use_offset` field in the produce request? Maybe we make it
> > mandatory
> > > for produce requests with that ACL to have offsets.
> > >
> > > Best,
> > > Stanislav
> > >
> > > On Wed, Nov 21, 2018 at 5:14 PM Edoardo Comar <EC...@uk.ibm.com> 
wrote:
> > >
> > > > Hi,
> > > > we've opened a KIP to improve data replication between Kafka 
clusters
> > :
> > > >
> > > >
> > > > INVALID URI REMOVED
> > >
> >
> > 
> 
u=https-3A__cwiki.apache.org_confluence_display_KAFKA_KIP-2D391-253A-2BAllow-2BProducing-2Bwith-2BOffsets-2Bfor-2BCluster-2BReplication&d=DwIBaQ&c=jf_iaSHvJObTbx-
> > >
> > 
siA1ZOg&r=EzRhmSah4IHsUZVekRUIINhltZK7U0OaeRo7hgW4_tQ&m=uUj9C3BdbYz0dDNA-
> > >
> > 
E6iXreg1M5hWiWgG6ClS86VIPI&s=Vav8_-N7_OpfYEW33yGOf_or8ESMUJ4S45t2g-EUWKg&e=
> > > >
> > > > We'd like to start a discussion, please post your feedback in this
> > thread.
> > > >
> > > > Thank you
> > > > Edo and Mickael
> > > >
> > > >
> > > > --------------------------------------------------
> > > >
> > > > Edoardo Comar
> > > >
> > > > IBM Event Streams
> > > > IBM UK Ltd, Hursley Park, SO21 2JN
> > > >
> > > > Unless stated otherwise above:
> > > > IBM United Kingdom Limited - Registered in England and Wales with
> > number
> > > > 741598.
> > > > Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire 
PO6
> > 3AU
> > > >
> > >
> > >
> > > --
> > > Best,
> > > Stanislav
> >
> > Unless stated otherwise above:
> > IBM United Kingdom Limited - Registered in England and Wales with 
number
> > 741598.
> > Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 
3AU
> >
> 
> 
> -- 
> Best,
> Stanislav

Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU

Re: [DISCUSS] KIP-391: Allow Producing with Offsets for Cluster Replication

Posted by Stanislav Kozlovski <st...@confluent.io>.
Hey Edo & Mickael,

> The flag is needed to distinguish a batch with a desired base offset of
0,
from a regular batch for which offsets need to be generated.
If the producer can provide offsets, why not provide a base offset of 0?

> (I am reading your post thinking about
partitions rather than topics).
Yes, I meant partitions. Sorry about that.

Thanks for answering my questions :)

Best,
Stanislav

On Thu, Nov 22, 2018 at 5:28 PM Edoardo Comar <EC...@uk.ibm.com> wrote:

> Hi Stanislav,
>
> you're right we envision the replicator use case to have a single producer
> with offsets per partition (I am reading your post thinking about
> partitions rather than topics).
>
> If a regular producer was to send its own records at the same time, it's
> very likely that the one sending with an offset will fail because of
> invalid offsets.
> Same if two producers were sending with offsets, likely both would then
> fail.
>
> > Does it make sense to *lock* the topic from other producers while there
> is
> > one that uses offsets?
>
> You could do that with ACL permissions if you wanted, I don't think it
> needs to be mandated by changing the broker logic.
>
>
> > Since we are tying the produce-with-offset request to the ACL, do we
> need
> > the `use_offset` field in the produce request? Maybe we make it
> mandatory
> > for produce requests with that ACL to have offsets.
>
> The flag is needed to distinguish a batch with a desired base offset of 0,
> from a regular batch for which offsets need to be generated.
> I would not restrict a principal to only send-with-offsets (by making that
> mandatory via the ACL).
>
> Thanks
> Edo & Mickael
>
> --------------------------------------------------
>
> Edoardo Comar
>
> IBM Event Streams
> IBM UK Ltd, Hursley Park, SO21 2JN
>
>
> Stanislav Kozlovski <st...@confluent.io> wrote on 22/11/2018 16:17:11:
>
> > From: Stanislav Kozlovski <st...@confluent.io>
> > To: dev@kafka.apache.org
> > Date: 22/11/2018 16:17
> > Subject: Re: [DISCUSS] KIP-391: Allow Producing with Offsets for
> > Cluster Replication
> >
> > Hey Edurdo, thanks for the KIP!
> >
> > I have some questions, apologies if they are naive:
> > Is this intended to work for a single producer use case only?
> > How would it work if two producers were producing to the same topic with
> > offsets?
> > How would it work if two producers, one with offsets and one without
> were
> > producing to a topic?
> > Does it make sense to *lock* the topic from other producers while there
> is
> > one that uses offsets?
> >
> > Since we are tying the produce-with-offset request to the ACL, do we
> need
> > the `use_offset` field in the produce request? Maybe we make it
> mandatory
> > for produce requests with that ACL to have offsets.
> >
> > Best,
> > Stanislav
> >
> > On Wed, Nov 21, 2018 at 5:14 PM Edoardo Comar <EC...@uk.ibm.com> wrote:
> >
> > > Hi,
> > > we've opened a KIP to improve data replication between Kafka clusters
> :
> > >
> > >
> > > INVALID URI REMOVED
> >
>
> u=https-3A__cwiki.apache.org_confluence_display_KAFKA_KIP-2D391-253A-2BAllow-2BProducing-2Bwith-2BOffsets-2Bfor-2BCluster-2BReplication&d=DwIBaQ&c=jf_iaSHvJObTbx-
> >
> siA1ZOg&r=EzRhmSah4IHsUZVekRUIINhltZK7U0OaeRo7hgW4_tQ&m=uUj9C3BdbYz0dDNA-
> >
> E6iXreg1M5hWiWgG6ClS86VIPI&s=Vav8_-N7_OpfYEW33yGOf_or8ESMUJ4S45t2g-EUWKg&e=
> > >
> > > We'd like to start a discussion, please post your feedback in this
> thread.
> > >
> > > Thank you
> > > Edo and Mickael
> > >
> > >
> > > --------------------------------------------------
> > >
> > > Edoardo Comar
> > >
> > > IBM Event Streams
> > > IBM UK Ltd, Hursley Park, SO21 2JN
> > >
> > > Unless stated otherwise above:
> > > IBM United Kingdom Limited - Registered in England and Wales with
> number
> > > 741598.
> > > Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
> 3AU
> > >
> >
> >
> > --
> > Best,
> > Stanislav
>
> Unless stated otherwise above:
> IBM United Kingdom Limited - Registered in England and Wales with number
> 741598.
> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
>


-- 
Best,
Stanislav

Re: [DISCUSS] KIP-391: Allow Producing with Offsets for Cluster Replication

Posted by Edoardo Comar <EC...@uk.ibm.com>.
Hi Stanislav,

you're right we envision the replicator use case to have a single producer 
with offsets per partition (I am reading your post thinking about 
partitions rather than topics).

If a regular producer was to send its own records at the same time, it's 
very likely that the one sending with an offset will fail because of 
invalid offsets.
Same if two producers were sending with offsets, likely both would then 
fail.

> Does it make sense to *lock* the topic from other producers while there 
is
> one that uses offsets?

You could do that with ACL permissions if you wanted, I don't think it 
needs to be mandated by changing the broker logic.


> Since we are tying the produce-with-offset request to the ACL, do we 
need
> the `use_offset` field in the produce request? Maybe we make it 
mandatory
> for produce requests with that ACL to have offsets.

The flag is needed to distinguish a batch with a desired base offset of 0, 
from a regular batch for which offsets need to be generated.
I would not restrict a principal to only send-with-offsets (by making that 
mandatory via the ACL).

Thanks
Edo & Mickael

--------------------------------------------------

Edoardo Comar

IBM Event Streams
IBM UK Ltd, Hursley Park, SO21 2JN


Stanislav Kozlovski <st...@confluent.io> wrote on 22/11/2018 16:17:11:

> From: Stanislav Kozlovski <st...@confluent.io>
> To: dev@kafka.apache.org
> Date: 22/11/2018 16:17
> Subject: Re: [DISCUSS] KIP-391: Allow Producing with Offsets for 
> Cluster Replication
> 
> Hey Edurdo, thanks for the KIP!
> 
> I have some questions, apologies if they are naive:
> Is this intended to work for a single producer use case only?
> How would it work if two producers were producing to the same topic with
> offsets?
> How would it work if two producers, one with offsets and one without 
were
> producing to a topic?
> Does it make sense to *lock* the topic from other producers while there 
is
> one that uses offsets?
> 
> Since we are tying the produce-with-offset request to the ACL, do we 
need
> the `use_offset` field in the produce request? Maybe we make it 
mandatory
> for produce requests with that ACL to have offsets.
> 
> Best,
> Stanislav
> 
> On Wed, Nov 21, 2018 at 5:14 PM Edoardo Comar <EC...@uk.ibm.com> wrote:
> 
> > Hi,
> > we've opened a KIP to improve data replication between Kafka clusters 
:
> >
> >
> > INVALID URI REMOVED
> 
u=https-3A__cwiki.apache.org_confluence_display_KAFKA_KIP-2D391-253A-2BAllow-2BProducing-2Bwith-2BOffsets-2Bfor-2BCluster-2BReplication&d=DwIBaQ&c=jf_iaSHvJObTbx-
> 
siA1ZOg&r=EzRhmSah4IHsUZVekRUIINhltZK7U0OaeRo7hgW4_tQ&m=uUj9C3BdbYz0dDNA-
> 
E6iXreg1M5hWiWgG6ClS86VIPI&s=Vav8_-N7_OpfYEW33yGOf_or8ESMUJ4S45t2g-EUWKg&e=
> >
> > We'd like to start a discussion, please post your feedback in this 
thread.
> >
> > Thank you
> > Edo and Mickael
> >
> >
> > --------------------------------------------------
> >
> > Edoardo Comar
> >
> > IBM Event Streams
> > IBM UK Ltd, Hursley Park, SO21 2JN
> >
> > Unless stated otherwise above:
> > IBM United Kingdom Limited - Registered in England and Wales with 
number
> > 741598.
> > Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 
3AU
> >
> 
> 
> -- 
> Best,
> Stanislav

Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU

Re: [DISCUSS] KIP-391: Allow Producing with Offsets for Cluster Replication

Posted by Stanislav Kozlovski <st...@confluent.io>.
Hey Edurdo, thanks for the KIP!

I have some questions, apologies if they are naive:
Is this intended to work for a single producer use case only?
How would it work if two producers were producing to the same topic with
offsets?
How would it work if two producers, one with offsets and one without were
producing to a topic?
Does it make sense to *lock* the topic from other producers while there is
one that uses offsets?

Since we are tying the produce-with-offset request to the ACL, do we need
the `use_offset` field in the produce request? Maybe we make it mandatory
for produce requests with that ACL to have offsets.

Best,
Stanislav

On Wed, Nov 21, 2018 at 5:14 PM Edoardo Comar <EC...@uk.ibm.com> wrote:

> Hi,
> we've opened a KIP to improve data replication between Kafka clusters :
>
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-391%3A+Allow+Producing+with+Offsets+for+Cluster+Replication
>
> We'd like to start a discussion, please post your feedback in this thread.
>
> Thank you
> Edo and Mickael
>
>
> --------------------------------------------------
>
> Edoardo Comar
>
> IBM Event Streams
> IBM UK Ltd, Hursley Park, SO21 2JN
>
> Unless stated otherwise above:
> IBM United Kingdom Limited - Registered in England and Wales with number
> 741598.
> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
>


-- 
Best,
Stanislav