You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by Richard Yu <yo...@gmail.com> on 2019/10/14 23:42:19 UTC

[VOTE] KIP-534: Retain tombstones for approximately delete.retention.ms milliseconds

Hi all,

The discussion for KIP-534 seems to have concluded.
So I wish to vote this in so that we can get it done. Its a small bug fix.
:)

Below is the KIP link:
https://cwiki.apache.org/confluence/display/KAFKA/KIP-534%3A+Retain+tombstones+for+approximately+delete.retention.ms+milliseconds

Cheers,
Richard

Re: [VOTE] KIP-534: Retain tombstones for approximately delete.retention.ms milliseconds

Posted by Jun Rao <ju...@confluent.io>.
It's true that the size of the tombstone or the control marker could
increase after a round of cleaning. This can already happen with
re-compression. We already have the logic in the log cleaner to auto grow
the read/write buffer during cleaning and can accommodate for an oversized
message. The only potential issue is that the estimation of the output
segment assumes that a batch doesn't grow in size after cleaning, which can
cause the output segment to overflow the 2GB size limit. However, since the
default log segment size is 1GB, this is unlikely going to be an issue in
practice.

Thanks,

Jun

On Thu, Oct 17, 2019 at 3:55 PM Guozhang Wang <wa...@gmail.com> wrote:

> Thanks for the reply Jason.
>
> On Thu, Oct 17, 2019 at 10:40 AM Jason Gustafson <ja...@confluent.io>
> wrote:
>
> > Hi Guozhang,
> >
> > It's a fair point. For control records, I think it's a non-issue since
> they
> > are tiny and not batched. So the only case where this might matter is
> large
> > batch deletions. I think the size difference is not a major issue itself,
> > but I think it's worth mentioning in the KIP the risk of exceeding the
> max
> > message size. I think the code should probably make this more of a soft
> > limit when cleaning. We have run into scenarios in the past as well where
> > recompression has actually increased message size. We may also want to be
> > able to upconvert messages to the new format in the future in the
> cleaner.
> >
> > -Jason
> >
> >
> >
> > On Thu, Oct 17, 2019 at 9:08 AM Guozhang Wang <wa...@gmail.com>
> wrote:
> >
> > > Here's my understanding: when log compaction kicks in, the system time
> at
> > > the moment would be larger than the message timestamp to be compacted,
> so
> > > the modification on the batch timestamp would practically be increasing
> > its
> > > value, and hence the deltas for each inner message would be negative to
> > > maintain their actual timestamp. Depending on the time diff between the
> > > actual timestamp of the message and the time when log compaction
> happens,
> > > this negative delta can be large or small since it not long depends on
> > the
> > > cleaner thread wakeup frequency but also dirty ratio etc.
> > >
> > > With varInt encoding, the num.bytes needed for encode an int varies
> from
> > 1
> > > to 5 bytes; before compaction, the deltas should be relatively small
> > > positive values compared with the base timestamp, and hence most
> likely 1
> > > or 2 bytes needed to encode, after compaction, the deltas could be
> > > relatively large negative values that may take more bytes to encode.
> > With a
> > > record batch of 512 in practice, and suppose after compaction each
> record
> > > would take 2 more byte for encoding deltas, that would be 1K more per
> > > batch. Usually it would not be too big of an issue with reasonable
> sized
> > > message, but I just wanted to point out this as a potential regression.
> > >
> > >
> > > Guozhang
> > >
> > > On Wed, Oct 16, 2019 at 9:36 PM Richard Yu <yohan.richard.yu@gmail.com
> >
> > > wrote:
> > >
> > > > Hi Guozhang,
> > > >
> > > > Your understanding basically is on point.
> > > >
> > > > I haven't looked into the details for what happens if we change the
> > base
> > > > timestamp and how its calculated, so I'm not clear on how small the
> > delta
> > > > (or big) is.
> > > > To be fair, would the the delta size pose a big problem if it takes
> up
> > > more
> > > > bytes to encode?
> > > >
> > > > Cheers,
> > > > Richard
> > > >
> > > > On Wed, Oct 16, 2019 at 7:36 PM Guozhang Wang <wa...@gmail.com>
> > > wrote:
> > > >
> > > > > Hello Richard,
> > > > >
> > > > > Thanks for the KIP, I just have one clarification regarding "So the
> > > idea
> > > > is
> > > > > to set the base timestamp to the delete horizon and adjust the
> deltas
> > > > > accordingly." My understanding is that during compaction, for each
> > > > > compacted new segment, we would set its base offset of each batch
> as
> > > the
> > > > > delete horizon, which is the "current system time that cleaner has
> > seen
> > > > so
> > > > > far", and adjust the delta timestamps of each of the inner records
> of
> > > the
> > > > > batch (and practically the deltas will be all negative)?
> > > > >
> > > > > If that's case, could we do some back of the envelope calculation
> on
> > > > what's
> > > > > the possible smallest case of deltas? Note that since we use varInt
> > for
> > > > > delta values for each record, the smaller the negative delta, that
> > > would
> > > > > take more bytes to encode.
> > > > >
> > > > > Guozhang
> > > > >
> > > > > On Wed, Oct 16, 2019 at 6:48 PM Jason Gustafson <
> jason@confluent.io>
> > > > > wrote:
> > > > >
> > > > > > +1. Thanks Richard.
> > > > > >
> > > > > > On Wed, Oct 16, 2019 at 10:04 AM Richard Yu <
> > > > yohan.richard.yu@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > Hi all,
> > > > > > >
> > > > > > > Want to try to get this KIP wrapped up. So it would be great if
> > we
> > > > can
> > > > > > get
> > > > > > > some votes.
> > > > > > >
> > > > > > > Cheers,
> > > > > > > Richard
> > > > > > >
> > > > > > > On Tue, Oct 15, 2019 at 12:58 PM Jun Rao <ju...@confluent.io>
> > wrote:
> > > > > > >
> > > > > > > > Hi, Richard,
> > > > > > > >
> > > > > > > > Thanks for the updated KIP. +1 from me.
> > > > > > > >
> > > > > > > > Jun
> > > > > > > >
> > > > > > > > On Tue, Oct 15, 2019 at 12:46 PM Richard Yu <
> > > > > > yohan.richard.yu@gmail.com>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Hi Jun,
> > > > > > > > >
> > > > > > > > > I've updated the link accordingly. :)
> > > > > > > > > Here is the updated KIP link:
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-534%3A+Retain+tombstones+and+transaction+markers+for+approximately+delete.retention.ms+milliseconds
> > > > > > > > >
> > > > > > > > > Cheers,
> > > > > > > > > Richard
> > > > > > > > >
> > > > > > > > > On Mon, Oct 14, 2019 at 5:12 PM Jun Rao <ju...@confluent.io>
> > > > wrote:
> > > > > > > > >
> > > > > > > > > > Hi, Richard,
> > > > > > > > > >
> > > > > > > > > > Thanks for the KIP. Looks good to me overall. A few minor
> > > > > comments
> > > > > > > > below.
> > > > > > > > > >
> > > > > > > > > > 1. Could you change the title from "Retain tombstones" to
> > > > "Retain
> > > > > > > > > > tombstones and transaction markers" to make it more
> > general?
> > > > > > > > > >
> > > > > > > > > > 2. Could you document which bit in the batch attribute
> will
> > > be
> > > > > used
> > > > > > > for
> > > > > > > > > the
> > > > > > > > > > new flag? The current format of the batch attribute is
> the
> > > > > > following.
> > > > > > > > > >
> > > > > > > > > > *
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> -------------------------------------------------------------------------------------------------
> > > > > > > > > > *  | Unused (6-15) | Control (5) | Transactional (4) |
> > > > Timestamp
> > > > > > Type
> > > > > > > > > > (3) | Compression Type (0-2) |
> > > > > > > > > >
> > > > > > > > > > *
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> -------------------------------------------------------------------------------------------------
> > > > > > > > > >
> > > > > > > > > > 3. Could you provide the reasons for the rejected
> > proposals?
> > > > For
> > > > > > > > > > proposal 1, one reason is that it doesn't cover the
> > > transaction
> > > > > > > > > > markers. For proposal 2, one reason is that the interval
> > > record
> > > > > > > header
> > > > > > > > > > could be exposed to the clients.
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Jun
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > On Mon, Oct 14, 2019 at 4:42 PM Richard Yu <
> > > > > > > yohan.richard.yu@gmail.com
> > > > > > > > >
> > > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Hi all,
> > > > > > > > > > >
> > > > > > > > > > > The discussion for KIP-534 seems to have concluded.
> > > > > > > > > > > So I wish to vote this in so that we can get it done.
> > Its a
> > > > > small
> > > > > > > bug
> > > > > > > > > > fix.
> > > > > > > > > > > :)
> > > > > > > > > > >
> > > > > > > > > > > Below is the KIP link:
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-534%3A+Retain+tombstones+for+approximately+delete.retention.ms+milliseconds
> > > > > > > > > > >
> > > > > > > > > > > Cheers,
> > > > > > > > > > > Richard
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > > >
> > > > > --
> > > > > -- Guozhang
> > > > >
> > > >
> > >
> > >
> > > --
> > > -- Guozhang
> > >
> >
>
>
> --
> -- Guozhang
>

Re: [VOTE] KIP-534: Retain tombstones for approximately delete.retention.ms milliseconds

Posted by Guozhang Wang <wa...@gmail.com>.
Thanks for the reply Jason.

On Thu, Oct 17, 2019 at 10:40 AM Jason Gustafson <ja...@confluent.io> wrote:

> Hi Guozhang,
>
> It's a fair point. For control records, I think it's a non-issue since they
> are tiny and not batched. So the only case where this might matter is large
> batch deletions. I think the size difference is not a major issue itself,
> but I think it's worth mentioning in the KIP the risk of exceeding the max
> message size. I think the code should probably make this more of a soft
> limit when cleaning. We have run into scenarios in the past as well where
> recompression has actually increased message size. We may also want to be
> able to upconvert messages to the new format in the future in the cleaner.
>
> -Jason
>
>
>
> On Thu, Oct 17, 2019 at 9:08 AM Guozhang Wang <wa...@gmail.com> wrote:
>
> > Here's my understanding: when log compaction kicks in, the system time at
> > the moment would be larger than the message timestamp to be compacted, so
> > the modification on the batch timestamp would practically be increasing
> its
> > value, and hence the deltas for each inner message would be negative to
> > maintain their actual timestamp. Depending on the time diff between the
> > actual timestamp of the message and the time when log compaction happens,
> > this negative delta can be large or small since it not long depends on
> the
> > cleaner thread wakeup frequency but also dirty ratio etc.
> >
> > With varInt encoding, the num.bytes needed for encode an int varies from
> 1
> > to 5 bytes; before compaction, the deltas should be relatively small
> > positive values compared with the base timestamp, and hence most likely 1
> > or 2 bytes needed to encode, after compaction, the deltas could be
> > relatively large negative values that may take more bytes to encode.
> With a
> > record batch of 512 in practice, and suppose after compaction each record
> > would take 2 more byte for encoding deltas, that would be 1K more per
> > batch. Usually it would not be too big of an issue with reasonable sized
> > message, but I just wanted to point out this as a potential regression.
> >
> >
> > Guozhang
> >
> > On Wed, Oct 16, 2019 at 9:36 PM Richard Yu <yo...@gmail.com>
> > wrote:
> >
> > > Hi Guozhang,
> > >
> > > Your understanding basically is on point.
> > >
> > > I haven't looked into the details for what happens if we change the
> base
> > > timestamp and how its calculated, so I'm not clear on how small the
> delta
> > > (or big) is.
> > > To be fair, would the the delta size pose a big problem if it takes up
> > more
> > > bytes to encode?
> > >
> > > Cheers,
> > > Richard
> > >
> > > On Wed, Oct 16, 2019 at 7:36 PM Guozhang Wang <wa...@gmail.com>
> > wrote:
> > >
> > > > Hello Richard,
> > > >
> > > > Thanks for the KIP, I just have one clarification regarding "So the
> > idea
> > > is
> > > > to set the base timestamp to the delete horizon and adjust the deltas
> > > > accordingly." My understanding is that during compaction, for each
> > > > compacted new segment, we would set its base offset of each batch as
> > the
> > > > delete horizon, which is the "current system time that cleaner has
> seen
> > > so
> > > > far", and adjust the delta timestamps of each of the inner records of
> > the
> > > > batch (and practically the deltas will be all negative)?
> > > >
> > > > If that's case, could we do some back of the envelope calculation on
> > > what's
> > > > the possible smallest case of deltas? Note that since we use varInt
> for
> > > > delta values for each record, the smaller the negative delta, that
> > would
> > > > take more bytes to encode.
> > > >
> > > > Guozhang
> > > >
> > > > On Wed, Oct 16, 2019 at 6:48 PM Jason Gustafson <ja...@confluent.io>
> > > > wrote:
> > > >
> > > > > +1. Thanks Richard.
> > > > >
> > > > > On Wed, Oct 16, 2019 at 10:04 AM Richard Yu <
> > > yohan.richard.yu@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Hi all,
> > > > > >
> > > > > > Want to try to get this KIP wrapped up. So it would be great if
> we
> > > can
> > > > > get
> > > > > > some votes.
> > > > > >
> > > > > > Cheers,
> > > > > > Richard
> > > > > >
> > > > > > On Tue, Oct 15, 2019 at 12:58 PM Jun Rao <ju...@confluent.io>
> wrote:
> > > > > >
> > > > > > > Hi, Richard,
> > > > > > >
> > > > > > > Thanks for the updated KIP. +1 from me.
> > > > > > >
> > > > > > > Jun
> > > > > > >
> > > > > > > On Tue, Oct 15, 2019 at 12:46 PM Richard Yu <
> > > > > yohan.richard.yu@gmail.com>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Hi Jun,
> > > > > > > >
> > > > > > > > I've updated the link accordingly. :)
> > > > > > > > Here is the updated KIP link:
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-534%3A+Retain+tombstones+and+transaction+markers+for+approximately+delete.retention.ms+milliseconds
> > > > > > > >
> > > > > > > > Cheers,
> > > > > > > > Richard
> > > > > > > >
> > > > > > > > On Mon, Oct 14, 2019 at 5:12 PM Jun Rao <ju...@confluent.io>
> > > wrote:
> > > > > > > >
> > > > > > > > > Hi, Richard,
> > > > > > > > >
> > > > > > > > > Thanks for the KIP. Looks good to me overall. A few minor
> > > > comments
> > > > > > > below.
> > > > > > > > >
> > > > > > > > > 1. Could you change the title from "Retain tombstones" to
> > > "Retain
> > > > > > > > > tombstones and transaction markers" to make it more
> general?
> > > > > > > > >
> > > > > > > > > 2. Could you document which bit in the batch attribute will
> > be
> > > > used
> > > > > > for
> > > > > > > > the
> > > > > > > > > new flag? The current format of the batch attribute is the
> > > > > following.
> > > > > > > > >
> > > > > > > > > *
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> -------------------------------------------------------------------------------------------------
> > > > > > > > > *  | Unused (6-15) | Control (5) | Transactional (4) |
> > > Timestamp
> > > > > Type
> > > > > > > > > (3) | Compression Type (0-2) |
> > > > > > > > >
> > > > > > > > > *
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> -------------------------------------------------------------------------------------------------
> > > > > > > > >
> > > > > > > > > 3. Could you provide the reasons for the rejected
> proposals?
> > > For
> > > > > > > > > proposal 1, one reason is that it doesn't cover the
> > transaction
> > > > > > > > > markers. For proposal 2, one reason is that the interval
> > record
> > > > > > header
> > > > > > > > > could be exposed to the clients.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Jun
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Mon, Oct 14, 2019 at 4:42 PM Richard Yu <
> > > > > > yohan.richard.yu@gmail.com
> > > > > > > >
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Hi all,
> > > > > > > > > >
> > > > > > > > > > The discussion for KIP-534 seems to have concluded.
> > > > > > > > > > So I wish to vote this in so that we can get it done.
> Its a
> > > > small
> > > > > > bug
> > > > > > > > > fix.
> > > > > > > > > > :)
> > > > > > > > > >
> > > > > > > > > > Below is the KIP link:
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-534%3A+Retain+tombstones+for+approximately+delete.retention.ms+milliseconds
> > > > > > > > > >
> > > > > > > > > > Cheers,
> > > > > > > > > > Richard
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > > >
> > > > --
> > > > -- Guozhang
> > > >
> > >
> >
> >
> > --
> > -- Guozhang
> >
>


-- 
-- Guozhang

Re: [VOTE] KIP-534: Retain tombstones for approximately delete.retention.ms milliseconds

Posted by Richard Yu <yo...@gmail.com>.
Hi all,

Seeing that we got all out votes needed with 3 binding votes and 0
nonbinding.
I consider this KIP passed.

Cheers,
Richard

On Fri, Oct 18, 2019 at 9:17 AM Guozhang Wang <wa...@gmail.com> wrote:

> Thanks Richard, I'm +1 on the KIP
>
> On Thu, Oct 17, 2019 at 3:51 PM Richard Yu <yo...@gmail.com>
> wrote:
>
> > Hi Guozhang, Jason,
> >
> > I've updated the KIP to include this warning as well.
> > If there is anything else that we need for it, let me know. :)
> > Otherwise, we should vote this KIP in.
> >
> > Cheers,
> > Richard
> >
> > On Thu, Oct 17, 2019 at 10:41 AM Jason Gustafson <ja...@confluent.io>
> > wrote:
> >
> > > Hi Guozhang,
> > >
> > > It's a fair point. For control records, I think it's a non-issue since
> > they
> > > are tiny and not batched. So the only case where this might matter is
> > large
> > > batch deletions. I think the size difference is not a major issue
> itself,
> > > but I think it's worth mentioning in the KIP the risk of exceeding the
> > max
> > > message size. I think the code should probably make this more of a soft
> > > limit when cleaning. We have run into scenarios in the past as well
> where
> > > recompression has actually increased message size. We may also want to
> be
> > > able to upconvert messages to the new format in the future in the
> > cleaner.
> > >
> > > -Jason
> > >
> > >
> > >
> > > On Thu, Oct 17, 2019 at 9:08 AM Guozhang Wang <wa...@gmail.com>
> > wrote:
> > >
> > > > Here's my understanding: when log compaction kicks in, the system
> time
> > at
> > > > the moment would be larger than the message timestamp to be
> compacted,
> > so
> > > > the modification on the batch timestamp would practically be
> increasing
> > > its
> > > > value, and hence the deltas for each inner message would be negative
> to
> > > > maintain their actual timestamp. Depending on the time diff between
> the
> > > > actual timestamp of the message and the time when log compaction
> > happens,
> > > > this negative delta can be large or small since it not long depends
> on
> > > the
> > > > cleaner thread wakeup frequency but also dirty ratio etc.
> > > >
> > > > With varInt encoding, the num.bytes needed for encode an int varies
> > from
> > > 1
> > > > to 5 bytes; before compaction, the deltas should be relatively small
> > > > positive values compared with the base timestamp, and hence most
> > likely 1
> > > > or 2 bytes needed to encode, after compaction, the deltas could be
> > > > relatively large negative values that may take more bytes to encode.
> > > With a
> > > > record batch of 512 in practice, and suppose after compaction each
> > record
> > > > would take 2 more byte for encoding deltas, that would be 1K more per
> > > > batch. Usually it would not be too big of an issue with reasonable
> > sized
> > > > message, but I just wanted to point out this as a potential
> regression.
> > > >
> > > >
> > > > Guozhang
> > > >
> > > > On Wed, Oct 16, 2019 at 9:36 PM Richard Yu <
> yohan.richard.yu@gmail.com
> > >
> > > > wrote:
> > > >
> > > > > Hi Guozhang,
> > > > >
> > > > > Your understanding basically is on point.
> > > > >
> > > > > I haven't looked into the details for what happens if we change the
> > > base
> > > > > timestamp and how its calculated, so I'm not clear on how small the
> > > delta
> > > > > (or big) is.
> > > > > To be fair, would the the delta size pose a big problem if it takes
> > up
> > > > more
> > > > > bytes to encode?
> > > > >
> > > > > Cheers,
> > > > > Richard
> > > > >
> > > > > On Wed, Oct 16, 2019 at 7:36 PM Guozhang Wang <wa...@gmail.com>
> > > > wrote:
> > > > >
> > > > > > Hello Richard,
> > > > > >
> > > > > > Thanks for the KIP, I just have one clarification regarding "So
> the
> > > > idea
> > > > > is
> > > > > > to set the base timestamp to the delete horizon and adjust the
> > deltas
> > > > > > accordingly." My understanding is that during compaction, for
> each
> > > > > > compacted new segment, we would set its base offset of each batch
> > as
> > > > the
> > > > > > delete horizon, which is the "current system time that cleaner
> has
> > > seen
> > > > > so
> > > > > > far", and adjust the delta timestamps of each of the inner
> records
> > of
> > > > the
> > > > > > batch (and practically the deltas will be all negative)?
> > > > > >
> > > > > > If that's case, could we do some back of the envelope calculation
> > on
> > > > > what's
> > > > > > the possible smallest case of deltas? Note that since we use
> varInt
> > > for
> > > > > > delta values for each record, the smaller the negative delta,
> that
> > > > would
> > > > > > take more bytes to encode.
> > > > > >
> > > > > > Guozhang
> > > > > >
> > > > > > On Wed, Oct 16, 2019 at 6:48 PM Jason Gustafson <
> > jason@confluent.io>
> > > > > > wrote:
> > > > > >
> > > > > > > +1. Thanks Richard.
> > > > > > >
> > > > > > > On Wed, Oct 16, 2019 at 10:04 AM Richard Yu <
> > > > > yohan.richard.yu@gmail.com>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Hi all,
> > > > > > > >
> > > > > > > > Want to try to get this KIP wrapped up. So it would be great
> if
> > > we
> > > > > can
> > > > > > > get
> > > > > > > > some votes.
> > > > > > > >
> > > > > > > > Cheers,
> > > > > > > > Richard
> > > > > > > >
> > > > > > > > On Tue, Oct 15, 2019 at 12:58 PM Jun Rao <ju...@confluent.io>
> > > wrote:
> > > > > > > >
> > > > > > > > > Hi, Richard,
> > > > > > > > >
> > > > > > > > > Thanks for the updated KIP. +1 from me.
> > > > > > > > >
> > > > > > > > > Jun
> > > > > > > > >
> > > > > > > > > On Tue, Oct 15, 2019 at 12:46 PM Richard Yu <
> > > > > > > yohan.richard.yu@gmail.com>
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Hi Jun,
> > > > > > > > > >
> > > > > > > > > > I've updated the link accordingly. :)
> > > > > > > > > > Here is the updated KIP link:
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-534%3A+Retain+tombstones+and+transaction+markers+for+approximately+delete.retention.ms+milliseconds
> > > > > > > > > >
> > > > > > > > > > Cheers,
> > > > > > > > > > Richard
> > > > > > > > > >
> > > > > > > > > > On Mon, Oct 14, 2019 at 5:12 PM Jun Rao <
> jun@confluent.io>
> > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Hi, Richard,
> > > > > > > > > > >
> > > > > > > > > > > Thanks for the KIP. Looks good to me overall. A few
> minor
> > > > > > comments
> > > > > > > > > below.
> > > > > > > > > > >
> > > > > > > > > > > 1. Could you change the title from "Retain tombstones"
> to
> > > > > "Retain
> > > > > > > > > > > tombstones and transaction markers" to make it more
> > > general?
> > > > > > > > > > >
> > > > > > > > > > > 2. Could you document which bit in the batch attribute
> > will
> > > > be
> > > > > > used
> > > > > > > > for
> > > > > > > > > > the
> > > > > > > > > > > new flag? The current format of the batch attribute is
> > the
> > > > > > > following.
> > > > > > > > > > >
> > > > > > > > > > > *
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> -------------------------------------------------------------------------------------------------
> > > > > > > > > > > *  | Unused (6-15) | Control (5) | Transactional (4) |
> > > > > Timestamp
> > > > > > > Type
> > > > > > > > > > > (3) | Compression Type (0-2) |
> > > > > > > > > > >
> > > > > > > > > > > *
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> -------------------------------------------------------------------------------------------------
> > > > > > > > > > >
> > > > > > > > > > > 3. Could you provide the reasons for the rejected
> > > proposals?
> > > > > For
> > > > > > > > > > > proposal 1, one reason is that it doesn't cover the
> > > > transaction
> > > > > > > > > > > markers. For proposal 2, one reason is that the
> interval
> > > > record
> > > > > > > > header
> > > > > > > > > > > could be exposed to the clients.
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Jun
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > On Mon, Oct 14, 2019 at 4:42 PM Richard Yu <
> > > > > > > > yohan.richard.yu@gmail.com
> > > > > > > > > >
> > > > > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > Hi all,
> > > > > > > > > > > >
> > > > > > > > > > > > The discussion for KIP-534 seems to have concluded.
> > > > > > > > > > > > So I wish to vote this in so that we can get it done.
> > > Its a
> > > > > > small
> > > > > > > > bug
> > > > > > > > > > > fix.
> > > > > > > > > > > > :)
> > > > > > > > > > > >
> > > > > > > > > > > > Below is the KIP link:
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-534%3A+Retain+tombstones+for+approximately+delete.retention.ms+milliseconds
> > > > > > > > > > > >
> > > > > > > > > > > > Cheers,
> > > > > > > > > > > > Richard
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > -- Guozhang
> > > > > >
> > > > >
> > > >
> > > >
> > > > --
> > > > -- Guozhang
> > > >
> > >
> >
>
>
> --
> -- Guozhang
>

Re: [VOTE] KIP-534: Retain tombstones for approximately delete.retention.ms milliseconds

Posted by Guozhang Wang <wa...@gmail.com>.
Thanks Richard, I'm +1 on the KIP

On Thu, Oct 17, 2019 at 3:51 PM Richard Yu <yo...@gmail.com>
wrote:

> Hi Guozhang, Jason,
>
> I've updated the KIP to include this warning as well.
> If there is anything else that we need for it, let me know. :)
> Otherwise, we should vote this KIP in.
>
> Cheers,
> Richard
>
> On Thu, Oct 17, 2019 at 10:41 AM Jason Gustafson <ja...@confluent.io>
> wrote:
>
> > Hi Guozhang,
> >
> > It's a fair point. For control records, I think it's a non-issue since
> they
> > are tiny and not batched. So the only case where this might matter is
> large
> > batch deletions. I think the size difference is not a major issue itself,
> > but I think it's worth mentioning in the KIP the risk of exceeding the
> max
> > message size. I think the code should probably make this more of a soft
> > limit when cleaning. We have run into scenarios in the past as well where
> > recompression has actually increased message size. We may also want to be
> > able to upconvert messages to the new format in the future in the
> cleaner.
> >
> > -Jason
> >
> >
> >
> > On Thu, Oct 17, 2019 at 9:08 AM Guozhang Wang <wa...@gmail.com>
> wrote:
> >
> > > Here's my understanding: when log compaction kicks in, the system time
> at
> > > the moment would be larger than the message timestamp to be compacted,
> so
> > > the modification on the batch timestamp would practically be increasing
> > its
> > > value, and hence the deltas for each inner message would be negative to
> > > maintain their actual timestamp. Depending on the time diff between the
> > > actual timestamp of the message and the time when log compaction
> happens,
> > > this negative delta can be large or small since it not long depends on
> > the
> > > cleaner thread wakeup frequency but also dirty ratio etc.
> > >
> > > With varInt encoding, the num.bytes needed for encode an int varies
> from
> > 1
> > > to 5 bytes; before compaction, the deltas should be relatively small
> > > positive values compared with the base timestamp, and hence most
> likely 1
> > > or 2 bytes needed to encode, after compaction, the deltas could be
> > > relatively large negative values that may take more bytes to encode.
> > With a
> > > record batch of 512 in practice, and suppose after compaction each
> record
> > > would take 2 more byte for encoding deltas, that would be 1K more per
> > > batch. Usually it would not be too big of an issue with reasonable
> sized
> > > message, but I just wanted to point out this as a potential regression.
> > >
> > >
> > > Guozhang
> > >
> > > On Wed, Oct 16, 2019 at 9:36 PM Richard Yu <yohan.richard.yu@gmail.com
> >
> > > wrote:
> > >
> > > > Hi Guozhang,
> > > >
> > > > Your understanding basically is on point.
> > > >
> > > > I haven't looked into the details for what happens if we change the
> > base
> > > > timestamp and how its calculated, so I'm not clear on how small the
> > delta
> > > > (or big) is.
> > > > To be fair, would the the delta size pose a big problem if it takes
> up
> > > more
> > > > bytes to encode?
> > > >
> > > > Cheers,
> > > > Richard
> > > >
> > > > On Wed, Oct 16, 2019 at 7:36 PM Guozhang Wang <wa...@gmail.com>
> > > wrote:
> > > >
> > > > > Hello Richard,
> > > > >
> > > > > Thanks for the KIP, I just have one clarification regarding "So the
> > > idea
> > > > is
> > > > > to set the base timestamp to the delete horizon and adjust the
> deltas
> > > > > accordingly." My understanding is that during compaction, for each
> > > > > compacted new segment, we would set its base offset of each batch
> as
> > > the
> > > > > delete horizon, which is the "current system time that cleaner has
> > seen
> > > > so
> > > > > far", and adjust the delta timestamps of each of the inner records
> of
> > > the
> > > > > batch (and practically the deltas will be all negative)?
> > > > >
> > > > > If that's case, could we do some back of the envelope calculation
> on
> > > > what's
> > > > > the possible smallest case of deltas? Note that since we use varInt
> > for
> > > > > delta values for each record, the smaller the negative delta, that
> > > would
> > > > > take more bytes to encode.
> > > > >
> > > > > Guozhang
> > > > >
> > > > > On Wed, Oct 16, 2019 at 6:48 PM Jason Gustafson <
> jason@confluent.io>
> > > > > wrote:
> > > > >
> > > > > > +1. Thanks Richard.
> > > > > >
> > > > > > On Wed, Oct 16, 2019 at 10:04 AM Richard Yu <
> > > > yohan.richard.yu@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > Hi all,
> > > > > > >
> > > > > > > Want to try to get this KIP wrapped up. So it would be great if
> > we
> > > > can
> > > > > > get
> > > > > > > some votes.
> > > > > > >
> > > > > > > Cheers,
> > > > > > > Richard
> > > > > > >
> > > > > > > On Tue, Oct 15, 2019 at 12:58 PM Jun Rao <ju...@confluent.io>
> > wrote:
> > > > > > >
> > > > > > > > Hi, Richard,
> > > > > > > >
> > > > > > > > Thanks for the updated KIP. +1 from me.
> > > > > > > >
> > > > > > > > Jun
> > > > > > > >
> > > > > > > > On Tue, Oct 15, 2019 at 12:46 PM Richard Yu <
> > > > > > yohan.richard.yu@gmail.com>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Hi Jun,
> > > > > > > > >
> > > > > > > > > I've updated the link accordingly. :)
> > > > > > > > > Here is the updated KIP link:
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-534%3A+Retain+tombstones+and+transaction+markers+for+approximately+delete.retention.ms+milliseconds
> > > > > > > > >
> > > > > > > > > Cheers,
> > > > > > > > > Richard
> > > > > > > > >
> > > > > > > > > On Mon, Oct 14, 2019 at 5:12 PM Jun Rao <ju...@confluent.io>
> > > > wrote:
> > > > > > > > >
> > > > > > > > > > Hi, Richard,
> > > > > > > > > >
> > > > > > > > > > Thanks for the KIP. Looks good to me overall. A few minor
> > > > > comments
> > > > > > > > below.
> > > > > > > > > >
> > > > > > > > > > 1. Could you change the title from "Retain tombstones" to
> > > > "Retain
> > > > > > > > > > tombstones and transaction markers" to make it more
> > general?
> > > > > > > > > >
> > > > > > > > > > 2. Could you document which bit in the batch attribute
> will
> > > be
> > > > > used
> > > > > > > for
> > > > > > > > > the
> > > > > > > > > > new flag? The current format of the batch attribute is
> the
> > > > > > following.
> > > > > > > > > >
> > > > > > > > > > *
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> -------------------------------------------------------------------------------------------------
> > > > > > > > > > *  | Unused (6-15) | Control (5) | Transactional (4) |
> > > > Timestamp
> > > > > > Type
> > > > > > > > > > (3) | Compression Type (0-2) |
> > > > > > > > > >
> > > > > > > > > > *
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> -------------------------------------------------------------------------------------------------
> > > > > > > > > >
> > > > > > > > > > 3. Could you provide the reasons for the rejected
> > proposals?
> > > > For
> > > > > > > > > > proposal 1, one reason is that it doesn't cover the
> > > transaction
> > > > > > > > > > markers. For proposal 2, one reason is that the interval
> > > record
> > > > > > > header
> > > > > > > > > > could be exposed to the clients.
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Jun
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > On Mon, Oct 14, 2019 at 4:42 PM Richard Yu <
> > > > > > > yohan.richard.yu@gmail.com
> > > > > > > > >
> > > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Hi all,
> > > > > > > > > > >
> > > > > > > > > > > The discussion for KIP-534 seems to have concluded.
> > > > > > > > > > > So I wish to vote this in so that we can get it done.
> > Its a
> > > > > small
> > > > > > > bug
> > > > > > > > > > fix.
> > > > > > > > > > > :)
> > > > > > > > > > >
> > > > > > > > > > > Below is the KIP link:
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-534%3A+Retain+tombstones+for+approximately+delete.retention.ms+milliseconds
> > > > > > > > > > >
> > > > > > > > > > > Cheers,
> > > > > > > > > > > Richard
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > > >
> > > > > --
> > > > > -- Guozhang
> > > > >
> > > >
> > >
> > >
> > > --
> > > -- Guozhang
> > >
> >
>


-- 
-- Guozhang

Re: [VOTE] KIP-534: Retain tombstones for approximately delete.retention.ms milliseconds

Posted by Richard Yu <yo...@gmail.com>.
Hi Guozhang, Jason,

I've updated the KIP to include this warning as well.
If there is anything else that we need for it, let me know. :)
Otherwise, we should vote this KIP in.

Cheers,
Richard

On Thu, Oct 17, 2019 at 10:41 AM Jason Gustafson <ja...@confluent.io> wrote:

> Hi Guozhang,
>
> It's a fair point. For control records, I think it's a non-issue since they
> are tiny and not batched. So the only case where this might matter is large
> batch deletions. I think the size difference is not a major issue itself,
> but I think it's worth mentioning in the KIP the risk of exceeding the max
> message size. I think the code should probably make this more of a soft
> limit when cleaning. We have run into scenarios in the past as well where
> recompression has actually increased message size. We may also want to be
> able to upconvert messages to the new format in the future in the cleaner.
>
> -Jason
>
>
>
> On Thu, Oct 17, 2019 at 9:08 AM Guozhang Wang <wa...@gmail.com> wrote:
>
> > Here's my understanding: when log compaction kicks in, the system time at
> > the moment would be larger than the message timestamp to be compacted, so
> > the modification on the batch timestamp would practically be increasing
> its
> > value, and hence the deltas for each inner message would be negative to
> > maintain their actual timestamp. Depending on the time diff between the
> > actual timestamp of the message and the time when log compaction happens,
> > this negative delta can be large or small since it not long depends on
> the
> > cleaner thread wakeup frequency but also dirty ratio etc.
> >
> > With varInt encoding, the num.bytes needed for encode an int varies from
> 1
> > to 5 bytes; before compaction, the deltas should be relatively small
> > positive values compared with the base timestamp, and hence most likely 1
> > or 2 bytes needed to encode, after compaction, the deltas could be
> > relatively large negative values that may take more bytes to encode.
> With a
> > record batch of 512 in practice, and suppose after compaction each record
> > would take 2 more byte for encoding deltas, that would be 1K more per
> > batch. Usually it would not be too big of an issue with reasonable sized
> > message, but I just wanted to point out this as a potential regression.
> >
> >
> > Guozhang
> >
> > On Wed, Oct 16, 2019 at 9:36 PM Richard Yu <yo...@gmail.com>
> > wrote:
> >
> > > Hi Guozhang,
> > >
> > > Your understanding basically is on point.
> > >
> > > I haven't looked into the details for what happens if we change the
> base
> > > timestamp and how its calculated, so I'm not clear on how small the
> delta
> > > (or big) is.
> > > To be fair, would the the delta size pose a big problem if it takes up
> > more
> > > bytes to encode?
> > >
> > > Cheers,
> > > Richard
> > >
> > > On Wed, Oct 16, 2019 at 7:36 PM Guozhang Wang <wa...@gmail.com>
> > wrote:
> > >
> > > > Hello Richard,
> > > >
> > > > Thanks for the KIP, I just have one clarification regarding "So the
> > idea
> > > is
> > > > to set the base timestamp to the delete horizon and adjust the deltas
> > > > accordingly." My understanding is that during compaction, for each
> > > > compacted new segment, we would set its base offset of each batch as
> > the
> > > > delete horizon, which is the "current system time that cleaner has
> seen
> > > so
> > > > far", and adjust the delta timestamps of each of the inner records of
> > the
> > > > batch (and practically the deltas will be all negative)?
> > > >
> > > > If that's case, could we do some back of the envelope calculation on
> > > what's
> > > > the possible smallest case of deltas? Note that since we use varInt
> for
> > > > delta values for each record, the smaller the negative delta, that
> > would
> > > > take more bytes to encode.
> > > >
> > > > Guozhang
> > > >
> > > > On Wed, Oct 16, 2019 at 6:48 PM Jason Gustafson <ja...@confluent.io>
> > > > wrote:
> > > >
> > > > > +1. Thanks Richard.
> > > > >
> > > > > On Wed, Oct 16, 2019 at 10:04 AM Richard Yu <
> > > yohan.richard.yu@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Hi all,
> > > > > >
> > > > > > Want to try to get this KIP wrapped up. So it would be great if
> we
> > > can
> > > > > get
> > > > > > some votes.
> > > > > >
> > > > > > Cheers,
> > > > > > Richard
> > > > > >
> > > > > > On Tue, Oct 15, 2019 at 12:58 PM Jun Rao <ju...@confluent.io>
> wrote:
> > > > > >
> > > > > > > Hi, Richard,
> > > > > > >
> > > > > > > Thanks for the updated KIP. +1 from me.
> > > > > > >
> > > > > > > Jun
> > > > > > >
> > > > > > > On Tue, Oct 15, 2019 at 12:46 PM Richard Yu <
> > > > > yohan.richard.yu@gmail.com>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Hi Jun,
> > > > > > > >
> > > > > > > > I've updated the link accordingly. :)
> > > > > > > > Here is the updated KIP link:
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-534%3A+Retain+tombstones+and+transaction+markers+for+approximately+delete.retention.ms+milliseconds
> > > > > > > >
> > > > > > > > Cheers,
> > > > > > > > Richard
> > > > > > > >
> > > > > > > > On Mon, Oct 14, 2019 at 5:12 PM Jun Rao <ju...@confluent.io>
> > > wrote:
> > > > > > > >
> > > > > > > > > Hi, Richard,
> > > > > > > > >
> > > > > > > > > Thanks for the KIP. Looks good to me overall. A few minor
> > > > comments
> > > > > > > below.
> > > > > > > > >
> > > > > > > > > 1. Could you change the title from "Retain tombstones" to
> > > "Retain
> > > > > > > > > tombstones and transaction markers" to make it more
> general?
> > > > > > > > >
> > > > > > > > > 2. Could you document which bit in the batch attribute will
> > be
> > > > used
> > > > > > for
> > > > > > > > the
> > > > > > > > > new flag? The current format of the batch attribute is the
> > > > > following.
> > > > > > > > >
> > > > > > > > > *
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> -------------------------------------------------------------------------------------------------
> > > > > > > > > *  | Unused (6-15) | Control (5) | Transactional (4) |
> > > Timestamp
> > > > > Type
> > > > > > > > > (3) | Compression Type (0-2) |
> > > > > > > > >
> > > > > > > > > *
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> -------------------------------------------------------------------------------------------------
> > > > > > > > >
> > > > > > > > > 3. Could you provide the reasons for the rejected
> proposals?
> > > For
> > > > > > > > > proposal 1, one reason is that it doesn't cover the
> > transaction
> > > > > > > > > markers. For proposal 2, one reason is that the interval
> > record
> > > > > > header
> > > > > > > > > could be exposed to the clients.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Jun
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Mon, Oct 14, 2019 at 4:42 PM Richard Yu <
> > > > > > yohan.richard.yu@gmail.com
> > > > > > > >
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Hi all,
> > > > > > > > > >
> > > > > > > > > > The discussion for KIP-534 seems to have concluded.
> > > > > > > > > > So I wish to vote this in so that we can get it done.
> Its a
> > > > small
> > > > > > bug
> > > > > > > > > fix.
> > > > > > > > > > :)
> > > > > > > > > >
> > > > > > > > > > Below is the KIP link:
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-534%3A+Retain+tombstones+for+approximately+delete.retention.ms+milliseconds
> > > > > > > > > >
> > > > > > > > > > Cheers,
> > > > > > > > > > Richard
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > > >
> > > > --
> > > > -- Guozhang
> > > >
> > >
> >
> >
> > --
> > -- Guozhang
> >
>

Re: [VOTE] KIP-534: Retain tombstones for approximately delete.retention.ms milliseconds

Posted by Jason Gustafson <ja...@confluent.io>.
Hi Guozhang,

It's a fair point. For control records, I think it's a non-issue since they
are tiny and not batched. So the only case where this might matter is large
batch deletions. I think the size difference is not a major issue itself,
but I think it's worth mentioning in the KIP the risk of exceeding the max
message size. I think the code should probably make this more of a soft
limit when cleaning. We have run into scenarios in the past as well where
recompression has actually increased message size. We may also want to be
able to upconvert messages to the new format in the future in the cleaner.

-Jason



On Thu, Oct 17, 2019 at 9:08 AM Guozhang Wang <wa...@gmail.com> wrote:

> Here's my understanding: when log compaction kicks in, the system time at
> the moment would be larger than the message timestamp to be compacted, so
> the modification on the batch timestamp would practically be increasing its
> value, and hence the deltas for each inner message would be negative to
> maintain their actual timestamp. Depending on the time diff between the
> actual timestamp of the message and the time when log compaction happens,
> this negative delta can be large or small since it not long depends on the
> cleaner thread wakeup frequency but also dirty ratio etc.
>
> With varInt encoding, the num.bytes needed for encode an int varies from 1
> to 5 bytes; before compaction, the deltas should be relatively small
> positive values compared with the base timestamp, and hence most likely 1
> or 2 bytes needed to encode, after compaction, the deltas could be
> relatively large negative values that may take more bytes to encode. With a
> record batch of 512 in practice, and suppose after compaction each record
> would take 2 more byte for encoding deltas, that would be 1K more per
> batch. Usually it would not be too big of an issue with reasonable sized
> message, but I just wanted to point out this as a potential regression.
>
>
> Guozhang
>
> On Wed, Oct 16, 2019 at 9:36 PM Richard Yu <yo...@gmail.com>
> wrote:
>
> > Hi Guozhang,
> >
> > Your understanding basically is on point.
> >
> > I haven't looked into the details for what happens if we change the base
> > timestamp and how its calculated, so I'm not clear on how small the delta
> > (or big) is.
> > To be fair, would the the delta size pose a big problem if it takes up
> more
> > bytes to encode?
> >
> > Cheers,
> > Richard
> >
> > On Wed, Oct 16, 2019 at 7:36 PM Guozhang Wang <wa...@gmail.com>
> wrote:
> >
> > > Hello Richard,
> > >
> > > Thanks for the KIP, I just have one clarification regarding "So the
> idea
> > is
> > > to set the base timestamp to the delete horizon and adjust the deltas
> > > accordingly." My understanding is that during compaction, for each
> > > compacted new segment, we would set its base offset of each batch as
> the
> > > delete horizon, which is the "current system time that cleaner has seen
> > so
> > > far", and adjust the delta timestamps of each of the inner records of
> the
> > > batch (and practically the deltas will be all negative)?
> > >
> > > If that's case, could we do some back of the envelope calculation on
> > what's
> > > the possible smallest case of deltas? Note that since we use varInt for
> > > delta values for each record, the smaller the negative delta, that
> would
> > > take more bytes to encode.
> > >
> > > Guozhang
> > >
> > > On Wed, Oct 16, 2019 at 6:48 PM Jason Gustafson <ja...@confluent.io>
> > > wrote:
> > >
> > > > +1. Thanks Richard.
> > > >
> > > > On Wed, Oct 16, 2019 at 10:04 AM Richard Yu <
> > yohan.richard.yu@gmail.com>
> > > > wrote:
> > > >
> > > > > Hi all,
> > > > >
> > > > > Want to try to get this KIP wrapped up. So it would be great if we
> > can
> > > > get
> > > > > some votes.
> > > > >
> > > > > Cheers,
> > > > > Richard
> > > > >
> > > > > On Tue, Oct 15, 2019 at 12:58 PM Jun Rao <ju...@confluent.io> wrote:
> > > > >
> > > > > > Hi, Richard,
> > > > > >
> > > > > > Thanks for the updated KIP. +1 from me.
> > > > > >
> > > > > > Jun
> > > > > >
> > > > > > On Tue, Oct 15, 2019 at 12:46 PM Richard Yu <
> > > > yohan.richard.yu@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > Hi Jun,
> > > > > > >
> > > > > > > I've updated the link accordingly. :)
> > > > > > > Here is the updated KIP link:
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-534%3A+Retain+tombstones+and+transaction+markers+for+approximately+delete.retention.ms+milliseconds
> > > > > > >
> > > > > > > Cheers,
> > > > > > > Richard
> > > > > > >
> > > > > > > On Mon, Oct 14, 2019 at 5:12 PM Jun Rao <ju...@confluent.io>
> > wrote:
> > > > > > >
> > > > > > > > Hi, Richard,
> > > > > > > >
> > > > > > > > Thanks for the KIP. Looks good to me overall. A few minor
> > > comments
> > > > > > below.
> > > > > > > >
> > > > > > > > 1. Could you change the title from "Retain tombstones" to
> > "Retain
> > > > > > > > tombstones and transaction markers" to make it more general?
> > > > > > > >
> > > > > > > > 2. Could you document which bit in the batch attribute will
> be
> > > used
> > > > > for
> > > > > > > the
> > > > > > > > new flag? The current format of the batch attribute is the
> > > > following.
> > > > > > > >
> > > > > > > > *
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> -------------------------------------------------------------------------------------------------
> > > > > > > > *  | Unused (6-15) | Control (5) | Transactional (4) |
> > Timestamp
> > > > Type
> > > > > > > > (3) | Compression Type (0-2) |
> > > > > > > >
> > > > > > > > *
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> -------------------------------------------------------------------------------------------------
> > > > > > > >
> > > > > > > > 3. Could you provide the reasons for the rejected proposals?
> > For
> > > > > > > > proposal 1, one reason is that it doesn't cover the
> transaction
> > > > > > > > markers. For proposal 2, one reason is that the interval
> record
> > > > > header
> > > > > > > > could be exposed to the clients.
> > > > > > > >
> > > > > > > >
> > > > > > > > Jun
> > > > > > > >
> > > > > > > >
> > > > > > > > On Mon, Oct 14, 2019 at 4:42 PM Richard Yu <
> > > > > yohan.richard.yu@gmail.com
> > > > > > >
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Hi all,
> > > > > > > > >
> > > > > > > > > The discussion for KIP-534 seems to have concluded.
> > > > > > > > > So I wish to vote this in so that we can get it done. Its a
> > > small
> > > > > bug
> > > > > > > > fix.
> > > > > > > > > :)
> > > > > > > > >
> > > > > > > > > Below is the KIP link:
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-534%3A+Retain+tombstones+for+approximately+delete.retention.ms+milliseconds
> > > > > > > > >
> > > > > > > > > Cheers,
> > > > > > > > > Richard
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> > >
> > > --
> > > -- Guozhang
> > >
> >
>
>
> --
> -- Guozhang
>

Re: [VOTE] KIP-534: Retain tombstones for approximately delete.retention.ms milliseconds

Posted by Guozhang Wang <wa...@gmail.com>.
Here's my understanding: when log compaction kicks in, the system time at
the moment would be larger than the message timestamp to be compacted, so
the modification on the batch timestamp would practically be increasing its
value, and hence the deltas for each inner message would be negative to
maintain their actual timestamp. Depending on the time diff between the
actual timestamp of the message and the time when log compaction happens,
this negative delta can be large or small since it not long depends on the
cleaner thread wakeup frequency but also dirty ratio etc.

With varInt encoding, the num.bytes needed for encode an int varies from 1
to 5 bytes; before compaction, the deltas should be relatively small
positive values compared with the base timestamp, and hence most likely 1
or 2 bytes needed to encode, after compaction, the deltas could be
relatively large negative values that may take more bytes to encode. With a
record batch of 512 in practice, and suppose after compaction each record
would take 2 more byte for encoding deltas, that would be 1K more per
batch. Usually it would not be too big of an issue with reasonable sized
message, but I just wanted to point out this as a potential regression.


Guozhang

On Wed, Oct 16, 2019 at 9:36 PM Richard Yu <yo...@gmail.com>
wrote:

> Hi Guozhang,
>
> Your understanding basically is on point.
>
> I haven't looked into the details for what happens if we change the base
> timestamp and how its calculated, so I'm not clear on how small the delta
> (or big) is.
> To be fair, would the the delta size pose a big problem if it takes up more
> bytes to encode?
>
> Cheers,
> Richard
>
> On Wed, Oct 16, 2019 at 7:36 PM Guozhang Wang <wa...@gmail.com> wrote:
>
> > Hello Richard,
> >
> > Thanks for the KIP, I just have one clarification regarding "So the idea
> is
> > to set the base timestamp to the delete horizon and adjust the deltas
> > accordingly." My understanding is that during compaction, for each
> > compacted new segment, we would set its base offset of each batch as the
> > delete horizon, which is the "current system time that cleaner has seen
> so
> > far", and adjust the delta timestamps of each of the inner records of the
> > batch (and practically the deltas will be all negative)?
> >
> > If that's case, could we do some back of the envelope calculation on
> what's
> > the possible smallest case of deltas? Note that since we use varInt for
> > delta values for each record, the smaller the negative delta, that would
> > take more bytes to encode.
> >
> > Guozhang
> >
> > On Wed, Oct 16, 2019 at 6:48 PM Jason Gustafson <ja...@confluent.io>
> > wrote:
> >
> > > +1. Thanks Richard.
> > >
> > > On Wed, Oct 16, 2019 at 10:04 AM Richard Yu <
> yohan.richard.yu@gmail.com>
> > > wrote:
> > >
> > > > Hi all,
> > > >
> > > > Want to try to get this KIP wrapped up. So it would be great if we
> can
> > > get
> > > > some votes.
> > > >
> > > > Cheers,
> > > > Richard
> > > >
> > > > On Tue, Oct 15, 2019 at 12:58 PM Jun Rao <ju...@confluent.io> wrote:
> > > >
> > > > > Hi, Richard,
> > > > >
> > > > > Thanks for the updated KIP. +1 from me.
> > > > >
> > > > > Jun
> > > > >
> > > > > On Tue, Oct 15, 2019 at 12:46 PM Richard Yu <
> > > yohan.richard.yu@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Hi Jun,
> > > > > >
> > > > > > I've updated the link accordingly. :)
> > > > > > Here is the updated KIP link:
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-534%3A+Retain+tombstones+and+transaction+markers+for+approximately+delete.retention.ms+milliseconds
> > > > > >
> > > > > > Cheers,
> > > > > > Richard
> > > > > >
> > > > > > On Mon, Oct 14, 2019 at 5:12 PM Jun Rao <ju...@confluent.io>
> wrote:
> > > > > >
> > > > > > > Hi, Richard,
> > > > > > >
> > > > > > > Thanks for the KIP. Looks good to me overall. A few minor
> > comments
> > > > > below.
> > > > > > >
> > > > > > > 1. Could you change the title from "Retain tombstones" to
> "Retain
> > > > > > > tombstones and transaction markers" to make it more general?
> > > > > > >
> > > > > > > 2. Could you document which bit in the batch attribute will be
> > used
> > > > for
> > > > > > the
> > > > > > > new flag? The current format of the batch attribute is the
> > > following.
> > > > > > >
> > > > > > > *
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> -------------------------------------------------------------------------------------------------
> > > > > > > *  | Unused (6-15) | Control (5) | Transactional (4) |
> Timestamp
> > > Type
> > > > > > > (3) | Compression Type (0-2) |
> > > > > > >
> > > > > > > *
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> -------------------------------------------------------------------------------------------------
> > > > > > >
> > > > > > > 3. Could you provide the reasons for the rejected proposals?
> For
> > > > > > > proposal 1, one reason is that it doesn't cover the transaction
> > > > > > > markers. For proposal 2, one reason is that the interval record
> > > > header
> > > > > > > could be exposed to the clients.
> > > > > > >
> > > > > > >
> > > > > > > Jun
> > > > > > >
> > > > > > >
> > > > > > > On Mon, Oct 14, 2019 at 4:42 PM Richard Yu <
> > > > yohan.richard.yu@gmail.com
> > > > > >
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Hi all,
> > > > > > > >
> > > > > > > > The discussion for KIP-534 seems to have concluded.
> > > > > > > > So I wish to vote this in so that we can get it done. Its a
> > small
> > > > bug
> > > > > > > fix.
> > > > > > > > :)
> > > > > > > >
> > > > > > > > Below is the KIP link:
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-534%3A+Retain+tombstones+for+approximately+delete.retention.ms+milliseconds
> > > > > > > >
> > > > > > > > Cheers,
> > > > > > > > Richard
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> >
> > --
> > -- Guozhang
> >
>


-- 
-- Guozhang

Re: [VOTE] KIP-534: Retain tombstones for approximately delete.retention.ms milliseconds

Posted by Richard Yu <yo...@gmail.com>.
Hi Guozhang,

Your understanding basically is on point.

I haven't looked into the details for what happens if we change the base
timestamp and how its calculated, so I'm not clear on how small the delta
(or big) is.
To be fair, would the the delta size pose a big problem if it takes up more
bytes to encode?

Cheers,
Richard

On Wed, Oct 16, 2019 at 7:36 PM Guozhang Wang <wa...@gmail.com> wrote:

> Hello Richard,
>
> Thanks for the KIP, I just have one clarification regarding "So the idea is
> to set the base timestamp to the delete horizon and adjust the deltas
> accordingly." My understanding is that during compaction, for each
> compacted new segment, we would set its base offset of each batch as the
> delete horizon, which is the "current system time that cleaner has seen so
> far", and adjust the delta timestamps of each of the inner records of the
> batch (and practically the deltas will be all negative)?
>
> If that's case, could we do some back of the envelope calculation on what's
> the possible smallest case of deltas? Note that since we use varInt for
> delta values for each record, the smaller the negative delta, that would
> take more bytes to encode.
>
> Guozhang
>
> On Wed, Oct 16, 2019 at 6:48 PM Jason Gustafson <ja...@confluent.io>
> wrote:
>
> > +1. Thanks Richard.
> >
> > On Wed, Oct 16, 2019 at 10:04 AM Richard Yu <yo...@gmail.com>
> > wrote:
> >
> > > Hi all,
> > >
> > > Want to try to get this KIP wrapped up. So it would be great if we can
> > get
> > > some votes.
> > >
> > > Cheers,
> > > Richard
> > >
> > > On Tue, Oct 15, 2019 at 12:58 PM Jun Rao <ju...@confluent.io> wrote:
> > >
> > > > Hi, Richard,
> > > >
> > > > Thanks for the updated KIP. +1 from me.
> > > >
> > > > Jun
> > > >
> > > > On Tue, Oct 15, 2019 at 12:46 PM Richard Yu <
> > yohan.richard.yu@gmail.com>
> > > > wrote:
> > > >
> > > > > Hi Jun,
> > > > >
> > > > > I've updated the link accordingly. :)
> > > > > Here is the updated KIP link:
> > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-534%3A+Retain+tombstones+and+transaction+markers+for+approximately+delete.retention.ms+milliseconds
> > > > >
> > > > > Cheers,
> > > > > Richard
> > > > >
> > > > > On Mon, Oct 14, 2019 at 5:12 PM Jun Rao <ju...@confluent.io> wrote:
> > > > >
> > > > > > Hi, Richard,
> > > > > >
> > > > > > Thanks for the KIP. Looks good to me overall. A few minor
> comments
> > > > below.
> > > > > >
> > > > > > 1. Could you change the title from "Retain tombstones" to "Retain
> > > > > > tombstones and transaction markers" to make it more general?
> > > > > >
> > > > > > 2. Could you document which bit in the batch attribute will be
> used
> > > for
> > > > > the
> > > > > > new flag? The current format of the batch attribute is the
> > following.
> > > > > >
> > > > > > *
> > > > > >
> > > > >
> > > >
> > >
> >
> -------------------------------------------------------------------------------------------------
> > > > > > *  | Unused (6-15) | Control (5) | Transactional (4) | Timestamp
> > Type
> > > > > > (3) | Compression Type (0-2) |
> > > > > >
> > > > > > *
> > > > > >
> > > > >
> > > >
> > >
> >
> -------------------------------------------------------------------------------------------------
> > > > > >
> > > > > > 3. Could you provide the reasons for the rejected proposals? For
> > > > > > proposal 1, one reason is that it doesn't cover the transaction
> > > > > > markers. For proposal 2, one reason is that the interval record
> > > header
> > > > > > could be exposed to the clients.
> > > > > >
> > > > > >
> > > > > > Jun
> > > > > >
> > > > > >
> > > > > > On Mon, Oct 14, 2019 at 4:42 PM Richard Yu <
> > > yohan.richard.yu@gmail.com
> > > > >
> > > > > > wrote:
> > > > > >
> > > > > > > Hi all,
> > > > > > >
> > > > > > > The discussion for KIP-534 seems to have concluded.
> > > > > > > So I wish to vote this in so that we can get it done. Its a
> small
> > > bug
> > > > > > fix.
> > > > > > > :)
> > > > > > >
> > > > > > > Below is the KIP link:
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-534%3A+Retain+tombstones+for+approximately+delete.retention.ms+milliseconds
> > > > > > >
> > > > > > > Cheers,
> > > > > > > Richard
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>
>
> --
> -- Guozhang
>

Re: [VOTE] KIP-534: Retain tombstones for approximately delete.retention.ms milliseconds

Posted by Guozhang Wang <wa...@gmail.com>.
Hello Richard,

Thanks for the KIP, I just have one clarification regarding "So the idea is
to set the base timestamp to the delete horizon and adjust the deltas
accordingly." My understanding is that during compaction, for each
compacted new segment, we would set its base offset of each batch as the
delete horizon, which is the "current system time that cleaner has seen so
far", and adjust the delta timestamps of each of the inner records of the
batch (and practically the deltas will be all negative)?

If that's case, could we do some back of the envelope calculation on what's
the possible smallest case of deltas? Note that since we use varInt for
delta values for each record, the smaller the negative delta, that would
take more bytes to encode.

Guozhang

On Wed, Oct 16, 2019 at 6:48 PM Jason Gustafson <ja...@confluent.io> wrote:

> +1. Thanks Richard.
>
> On Wed, Oct 16, 2019 at 10:04 AM Richard Yu <yo...@gmail.com>
> wrote:
>
> > Hi all,
> >
> > Want to try to get this KIP wrapped up. So it would be great if we can
> get
> > some votes.
> >
> > Cheers,
> > Richard
> >
> > On Tue, Oct 15, 2019 at 12:58 PM Jun Rao <ju...@confluent.io> wrote:
> >
> > > Hi, Richard,
> > >
> > > Thanks for the updated KIP. +1 from me.
> > >
> > > Jun
> > >
> > > On Tue, Oct 15, 2019 at 12:46 PM Richard Yu <
> yohan.richard.yu@gmail.com>
> > > wrote:
> > >
> > > > Hi Jun,
> > > >
> > > > I've updated the link accordingly. :)
> > > > Here is the updated KIP link:
> > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-534%3A+Retain+tombstones+and+transaction+markers+for+approximately+delete.retention.ms+milliseconds
> > > >
> > > > Cheers,
> > > > Richard
> > > >
> > > > On Mon, Oct 14, 2019 at 5:12 PM Jun Rao <ju...@confluent.io> wrote:
> > > >
> > > > > Hi, Richard,
> > > > >
> > > > > Thanks for the KIP. Looks good to me overall. A few minor comments
> > > below.
> > > > >
> > > > > 1. Could you change the title from "Retain tombstones" to "Retain
> > > > > tombstones and transaction markers" to make it more general?
> > > > >
> > > > > 2. Could you document which bit in the batch attribute will be used
> > for
> > > > the
> > > > > new flag? The current format of the batch attribute is the
> following.
> > > > >
> > > > > *
> > > > >
> > > >
> > >
> >
> -------------------------------------------------------------------------------------------------
> > > > > *  | Unused (6-15) | Control (5) | Transactional (4) | Timestamp
> Type
> > > > > (3) | Compression Type (0-2) |
> > > > >
> > > > > *
> > > > >
> > > >
> > >
> >
> -------------------------------------------------------------------------------------------------
> > > > >
> > > > > 3. Could you provide the reasons for the rejected proposals? For
> > > > > proposal 1, one reason is that it doesn't cover the transaction
> > > > > markers. For proposal 2, one reason is that the interval record
> > header
> > > > > could be exposed to the clients.
> > > > >
> > > > >
> > > > > Jun
> > > > >
> > > > >
> > > > > On Mon, Oct 14, 2019 at 4:42 PM Richard Yu <
> > yohan.richard.yu@gmail.com
> > > >
> > > > > wrote:
> > > > >
> > > > > > Hi all,
> > > > > >
> > > > > > The discussion for KIP-534 seems to have concluded.
> > > > > > So I wish to vote this in so that we can get it done. Its a small
> > bug
> > > > > fix.
> > > > > > :)
> > > > > >
> > > > > > Below is the KIP link:
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-534%3A+Retain+tombstones+for+approximately+delete.retention.ms+milliseconds
> > > > > >
> > > > > > Cheers,
> > > > > > Richard
> > > > > >
> > > > >
> > > >
> > >
> >
>


-- 
-- Guozhang

Re: [VOTE] KIP-534: Retain tombstones for approximately delete.retention.ms milliseconds

Posted by Jason Gustafson <ja...@confluent.io>.
+1. Thanks Richard.

On Wed, Oct 16, 2019 at 10:04 AM Richard Yu <yo...@gmail.com>
wrote:

> Hi all,
>
> Want to try to get this KIP wrapped up. So it would be great if we can get
> some votes.
>
> Cheers,
> Richard
>
> On Tue, Oct 15, 2019 at 12:58 PM Jun Rao <ju...@confluent.io> wrote:
>
> > Hi, Richard,
> >
> > Thanks for the updated KIP. +1 from me.
> >
> > Jun
> >
> > On Tue, Oct 15, 2019 at 12:46 PM Richard Yu <yo...@gmail.com>
> > wrote:
> >
> > > Hi Jun,
> > >
> > > I've updated the link accordingly. :)
> > > Here is the updated KIP link:
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-534%3A+Retain+tombstones+and+transaction+markers+for+approximately+delete.retention.ms+milliseconds
> > >
> > > Cheers,
> > > Richard
> > >
> > > On Mon, Oct 14, 2019 at 5:12 PM Jun Rao <ju...@confluent.io> wrote:
> > >
> > > > Hi, Richard,
> > > >
> > > > Thanks for the KIP. Looks good to me overall. A few minor comments
> > below.
> > > >
> > > > 1. Could you change the title from "Retain tombstones" to "Retain
> > > > tombstones and transaction markers" to make it more general?
> > > >
> > > > 2. Could you document which bit in the batch attribute will be used
> for
> > > the
> > > > new flag? The current format of the batch attribute is the following.
> > > >
> > > > *
> > > >
> > >
> >
> -------------------------------------------------------------------------------------------------
> > > > *  | Unused (6-15) | Control (5) | Transactional (4) | Timestamp Type
> > > > (3) | Compression Type (0-2) |
> > > >
> > > > *
> > > >
> > >
> >
> -------------------------------------------------------------------------------------------------
> > > >
> > > > 3. Could you provide the reasons for the rejected proposals? For
> > > > proposal 1, one reason is that it doesn't cover the transaction
> > > > markers. For proposal 2, one reason is that the interval record
> header
> > > > could be exposed to the clients.
> > > >
> > > >
> > > > Jun
> > > >
> > > >
> > > > On Mon, Oct 14, 2019 at 4:42 PM Richard Yu <
> yohan.richard.yu@gmail.com
> > >
> > > > wrote:
> > > >
> > > > > Hi all,
> > > > >
> > > > > The discussion for KIP-534 seems to have concluded.
> > > > > So I wish to vote this in so that we can get it done. Its a small
> bug
> > > > fix.
> > > > > :)
> > > > >
> > > > > Below is the KIP link:
> > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-534%3A+Retain+tombstones+for+approximately+delete.retention.ms+milliseconds
> > > > >
> > > > > Cheers,
> > > > > Richard
> > > > >
> > > >
> > >
> >
>

Re: [VOTE] KIP-534: Retain tombstones for approximately delete.retention.ms milliseconds

Posted by Richard Yu <yo...@gmail.com>.
Hi all,

Want to try to get this KIP wrapped up. So it would be great if we can get
some votes.

Cheers,
Richard

On Tue, Oct 15, 2019 at 12:58 PM Jun Rao <ju...@confluent.io> wrote:

> Hi, Richard,
>
> Thanks for the updated KIP. +1 from me.
>
> Jun
>
> On Tue, Oct 15, 2019 at 12:46 PM Richard Yu <yo...@gmail.com>
> wrote:
>
> > Hi Jun,
> >
> > I've updated the link accordingly. :)
> > Here is the updated KIP link:
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-534%3A+Retain+tombstones+and+transaction+markers+for+approximately+delete.retention.ms+milliseconds
> >
> > Cheers,
> > Richard
> >
> > On Mon, Oct 14, 2019 at 5:12 PM Jun Rao <ju...@confluent.io> wrote:
> >
> > > Hi, Richard,
> > >
> > > Thanks for the KIP. Looks good to me overall. A few minor comments
> below.
> > >
> > > 1. Could you change the title from "Retain tombstones" to "Retain
> > > tombstones and transaction markers" to make it more general?
> > >
> > > 2. Could you document which bit in the batch attribute will be used for
> > the
> > > new flag? The current format of the batch attribute is the following.
> > >
> > > *
> > >
> >
> -------------------------------------------------------------------------------------------------
> > > *  | Unused (6-15) | Control (5) | Transactional (4) | Timestamp Type
> > > (3) | Compression Type (0-2) |
> > >
> > > *
> > >
> >
> -------------------------------------------------------------------------------------------------
> > >
> > > 3. Could you provide the reasons for the rejected proposals? For
> > > proposal 1, one reason is that it doesn't cover the transaction
> > > markers. For proposal 2, one reason is that the interval record header
> > > could be exposed to the clients.
> > >
> > >
> > > Jun
> > >
> > >
> > > On Mon, Oct 14, 2019 at 4:42 PM Richard Yu <yohan.richard.yu@gmail.com
> >
> > > wrote:
> > >
> > > > Hi all,
> > > >
> > > > The discussion for KIP-534 seems to have concluded.
> > > > So I wish to vote this in so that we can get it done. Its a small bug
> > > fix.
> > > > :)
> > > >
> > > > Below is the KIP link:
> > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-534%3A+Retain+tombstones+for+approximately+delete.retention.ms+milliseconds
> > > >
> > > > Cheers,
> > > > Richard
> > > >
> > >
> >
>

Re: [VOTE] KIP-534: Retain tombstones for approximately delete.retention.ms milliseconds

Posted by Jun Rao <ju...@confluent.io>.
Hi, Richard,

Thanks for the updated KIP. +1 from me.

Jun

On Tue, Oct 15, 2019 at 12:46 PM Richard Yu <yo...@gmail.com>
wrote:

> Hi Jun,
>
> I've updated the link accordingly. :)
> Here is the updated KIP link:
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-534%3A+Retain+tombstones+and+transaction+markers+for+approximately+delete.retention.ms+milliseconds
>
> Cheers,
> Richard
>
> On Mon, Oct 14, 2019 at 5:12 PM Jun Rao <ju...@confluent.io> wrote:
>
> > Hi, Richard,
> >
> > Thanks for the KIP. Looks good to me overall. A few minor comments below.
> >
> > 1. Could you change the title from "Retain tombstones" to "Retain
> > tombstones and transaction markers" to make it more general?
> >
> > 2. Could you document which bit in the batch attribute will be used for
> the
> > new flag? The current format of the batch attribute is the following.
> >
> > *
> >
> -------------------------------------------------------------------------------------------------
> > *  | Unused (6-15) | Control (5) | Transactional (4) | Timestamp Type
> > (3) | Compression Type (0-2) |
> >
> > *
> >
> -------------------------------------------------------------------------------------------------
> >
> > 3. Could you provide the reasons for the rejected proposals? For
> > proposal 1, one reason is that it doesn't cover the transaction
> > markers. For proposal 2, one reason is that the interval record header
> > could be exposed to the clients.
> >
> >
> > Jun
> >
> >
> > On Mon, Oct 14, 2019 at 4:42 PM Richard Yu <yo...@gmail.com>
> > wrote:
> >
> > > Hi all,
> > >
> > > The discussion for KIP-534 seems to have concluded.
> > > So I wish to vote this in so that we can get it done. Its a small bug
> > fix.
> > > :)
> > >
> > > Below is the KIP link:
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-534%3A+Retain+tombstones+for+approximately+delete.retention.ms+milliseconds
> > >
> > > Cheers,
> > > Richard
> > >
> >
>

Re: [VOTE] KIP-534: Retain tombstones for approximately delete.retention.ms milliseconds

Posted by Richard Yu <yo...@gmail.com>.
Hi Jun,

I've updated the link accordingly. :)
Here is the updated KIP link:
https://cwiki.apache.org/confluence/display/KAFKA/KIP-534%3A+Retain+tombstones+and+transaction+markers+for+approximately+delete.retention.ms+milliseconds

Cheers,
Richard

On Mon, Oct 14, 2019 at 5:12 PM Jun Rao <ju...@confluent.io> wrote:

> Hi, Richard,
>
> Thanks for the KIP. Looks good to me overall. A few minor comments below.
>
> 1. Could you change the title from "Retain tombstones" to "Retain
> tombstones and transaction markers" to make it more general?
>
> 2. Could you document which bit in the batch attribute will be used for the
> new flag? The current format of the batch attribute is the following.
>
> *
> -------------------------------------------------------------------------------------------------
> *  | Unused (6-15) | Control (5) | Transactional (4) | Timestamp Type
> (3) | Compression Type (0-2) |
>
> *
> -------------------------------------------------------------------------------------------------
>
> 3. Could you provide the reasons for the rejected proposals? For
> proposal 1, one reason is that it doesn't cover the transaction
> markers. For proposal 2, one reason is that the interval record header
> could be exposed to the clients.
>
>
> Jun
>
>
> On Mon, Oct 14, 2019 at 4:42 PM Richard Yu <yo...@gmail.com>
> wrote:
>
> > Hi all,
> >
> > The discussion for KIP-534 seems to have concluded.
> > So I wish to vote this in so that we can get it done. Its a small bug
> fix.
> > :)
> >
> > Below is the KIP link:
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-534%3A+Retain+tombstones+for+approximately+delete.retention.ms+milliseconds
> >
> > Cheers,
> > Richard
> >
>

Re: [VOTE] KIP-534: Retain tombstones for approximately delete.retention.ms milliseconds

Posted by Jun Rao <ju...@confluent.io>.
Hi, Richard,

Thanks for the KIP. Looks good to me overall. A few minor comments below.

1. Could you change the title from "Retain tombstones" to "Retain
tombstones and transaction markers" to make it more general?

2. Could you document which bit in the batch attribute will be used for the
new flag? The current format of the batch attribute is the following.

*  -------------------------------------------------------------------------------------------------
*  | Unused (6-15) | Control (5) | Transactional (4) | Timestamp Type
(3) | Compression Type (0-2) |

*  -------------------------------------------------------------------------------------------------

3. Could you provide the reasons for the rejected proposals? For
proposal 1, one reason is that it doesn't cover the transaction
markers. For proposal 2, one reason is that the interval record header
could be exposed to the clients.


Jun


On Mon, Oct 14, 2019 at 4:42 PM Richard Yu <yo...@gmail.com>
wrote:

> Hi all,
>
> The discussion for KIP-534 seems to have concluded.
> So I wish to vote this in so that we can get it done. Its a small bug fix.
> :)
>
> Below is the KIP link:
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-534%3A+Retain+tombstones+for+approximately+delete.retention.ms+milliseconds
>
> Cheers,
> Richard
>