You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by Becket Qin <be...@gmail.com> on 2016/10/08 01:57:21 UTC

Re: [DISCUSS] KIP-47 - Add timestamp-based log deletion policy

Hi Bill,

I saw the voting thread and think it may be better to discuss this in the
discussion thread.

It would be good to have the KIP wiki to clarify the behavior when
log.retention.ms, log.retention.bytes and log.retention.min.timestamp are
all set.
For example, if the size of the partition has reached beyond
log.retention.bytes but the timestamp has not reached
log.retention.min.timestamp,
do we delete the segment?

Thanks,

Jiangjie (Becket) Qin

On Fri, Jun 3, 2016 at 11:02 AM, Magnus Edenhill <ma...@edenhill.se> wrote:

> Bumping this thread so Wes can reply to it. Ignore this mail.
>
> 2016-02-24 0:36 GMT+01:00 Joel Koshy <jj...@gmail.com>:
>
> > Great - thanks for clarifying.
> >
> > Joel
> >
> > On Tue, Feb 23, 2016 at 1:47 PM, Bill Warshaw <wd...@gmail.com>
> wrote:
> >
> > > Sorry that I didn't see this comment before the meeting Joel.  I'll try
> > to
> > > clarify what I said at the meeting:
> > >
> > > - The KIP currently states that timestamp-based log deletion will only
> > work
> > > with LogAppendTime.  I need to update the KIP to reflect that, after
> the
> > > work is done for KIP-33, it will work with both LogAppendTime and
> > > CreateTime.
> > > - To use the existing time-based retention mechanism to delete a
> precise
> > > range of messages, a client application would need to do the following:
> > >   - by default, turn off these retention mechanisms
> > >   - when the application wishes to delete a range of messages which
> were
> > > sent before a certain time, compute an approximate value to set
> > > "log.retention.minutes" to, to create a window of messages based on
> that
> > > timestamp that are ok to delete.  There is some degree of imprecision
> > > implied here.
> > >   - wait until we are confident that the log retention mechanism has
> been
> > > run and deleted any stale segments
> > >   - reset "log.retention.minutes" to turn off time-based log retention
> > > until the next time the client application wants to delete something
> > >
> > > - To use the proposed timestamp-based retention mechanism, there is
> only
> > > one step: the application just has to set "log.retention.min.timestamp"
> > to
> > > whatever time boundary it deems fit.  It doesn't need to compute any
> > fuzzy
> > > windows, try to wait until asynchronous processes have been completed
> or
> > > continually flip settings between enabled and disabled.
> > >
> > > I will update the KIP to reflect the discussion around LogAppendTime vs
> > > CreateTime and the work being done in KIP-33.
> > >
> > > Thanks,
> > > Bill
> > >
> > >
> > > On Tue, Feb 23, 2016 at 1:22 PM, Joel Koshy <jj...@gmail.com>
> wrote:
> > >
> > > > I'm having some trouble reconciling the current proposal with your
> > > original
> > > > requirement which was essentially being able to purge log data up to
> a
> > > > precise point (an offset). The KIP currently suggests that
> > > timestamp-based
> > > > deletion would only work with LogAppendTime, so it does not seem
> > > > significantly different from time-based retention (after KIP-32/33) -
> > IOW
> > > > to me it appears that you would need to use CreateTime and not
> > > > LogAppendTime. Also one of the rejected alternatives observes that
> > > changing
> > > > the existing configuration settings to try to flush ranges of a given
> > > > partition's log are problematic, but it seems to me you would have to
> > do
> > > > this in with timestamp-based deletion as well right? I think it would
> > be
> > > > useful for me if you or anyone else can go over the exact
> > > > mechanics/workflow for accomplishing precise purges at today's KIP
> > > meeting.
> > > >
> > > > Thanks,
> > > >
> > > > Joel
> > > >
> > > > On Monday, February 22, 2016, Bill Warshaw <wd...@gmail.com>
> > wrote:
> > > >
> > > > > Sounds good.  I'll hold off on sending out a VOTE thread until
> after
> > > the
> > > > > KIP meeting tomorrow.
> > > > >
> > > > > On Mon, Feb 22, 2016 at 12:56 PM, Becket Qin <becket.qin@gmail.com
> >
> > > > wrote:
> > > > >
> > > > > > Hi Jun,
> > > > > >
> > > > > > I think it makes sense to implement KIP-47 after KIP-33 so we can
> > > make
> > > > it
> > > > > > work for both LogAppendTime and CreateTime.
> > > > > >
> > > > > > And yes, I'm actively working on KIP-33. I had a voting thread on
> > > > KIP-33
> > > > > > before and I'll bump it up.
> > > > > >
> > > > > > Thanks,
> > > > > >
> > > > > > Jiangjie (Becket) Qin
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Mon, Feb 22, 2016 at 9:11 AM, Jun Rao <ju...@confluent.io>
> wrote:
> > > > > >
> > > > > > > Becket,
> > > > > > >
> > > > > > > Since you submitted KIP-33, are you actively working on that?
> If
> > > so,
> > > > it
> > > > > > > would make sense to implement KIP-47 after KIP-33 so that it
> > works
> > > > for
> > > > > > both
> > > > > > > CreateTime and LogAppendTime.
> > > > > > >
> > > > > > > Thanks,
> > > > > > >
> > > > > > > Jun
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > On Fri, Feb 19, 2016 at 6:25 PM, Bill Warshaw <
> > wdwarshaw@gmail.com
> > > >
> > > > > > wrote:
> > > > > > >
> > > > > > > > Hi Jun,
> > > > > > > >
> > > > > > > > 1.  I thought more about Andrew's comment about
> LogAppendTime.
> > > The
> > > > > > > > time-based index you are referring to is associated with
> > KIP-33,
> > > > > > correct?
> > > > > > > > Currently my implementation is just checking the last message
> > in
> > > a
> > > > > > > segment,
> > > > > > > > so we're restricted to LogAppendTime.  When the work for
> KIP-33
> > > is
> > > > > > > > completed, it sounds like CreateTime would also be valid.  Do
> > you
> > > > > > happen
> > > > > > > to
> > > > > > > > know if anyone is currently working on KIP-33?
> > > > > > > >
> > > > > > > > 2. I did update the wiki after reading your original comment,
> > but
> > > > > > reading
> > > > > > > > over it again I realize I could word a couple things more
> > > > clearly.  I
> > > > > > > will
> > > > > > > > do that tonight.
> > > > > > > >
> > > > > > > > Bill
> > > > > > > >
> > > > > > > > On Fri, Feb 19, 2016 at 7:02 PM, Jun Rao <ju...@confluent.io>
> > > wrote:
> > > > > > > >
> > > > > > > > > Hi, Bill,
> > > > > > > > >
> > > > > > > > > I replied with the following comments earlier to the
> thread.
> > > Did
> > > > > you
> > > > > > > see
> > > > > > > > > that?
> > > > > > > > >
> > > > > > > > > Thanks for the proposal. A couple of comments.
> > > > > > > > >
> > > > > > > > > 1. It seems that this new policy should work for CreateTime
> > as
> > > > > well.
> > > > > > > If a
> > > > > > > > > topic is configured with CreateTime, messages may not be
> > added
> > > in
> > > > > > > strict
> > > > > > > > > order in the log. However, to build a time-based index, we
> > will
> > > > be
> > > > > > > > > maintaining the largest timestamp for all messages in a log
> > > > > segment.
> > > > > > We
> > > > > > > > can
> > > > > > > > > delete a segment if its largest timestamp is less than
> > > > > > > > > log.retention.min.timestamp. This guarantees that no
> messages
> > > > newer
> > > > > > > than
> > > > > > > > > log.retention.min.timestamp will be deleted, which is
> > probably
> > > > what
> > > > > > the
> > > > > > > > > user wants.
> > > > > > > > >
> > > > > > > > > 2. Right now, the user can specify "delete" as the
> retention
> > > > policy
> > > > > > > and a
> > > > > > > > > log segment will be deleted either when the size of a
> > partition
> > > > > > > exceeds a
> > > > > > > > > threshold or the timestamp of a segment is older than a
> > > relative
> > > > > > period
> > > > > > > > of
> > > > > > > > > time (say 7 days) from now. What you are proposing is not a
> > new
> > > > > > > retention
> > > > > > > > > policy, but an additional check that will cause a segment
> to
> > be
> > > > > > deleted
> > > > > > > > > when the timestamp of a segment is older than an absolute
> > > > > timestamp?
> > > > > > If
> > > > > > > > so,
> > > > > > > > > could you update the wiki accordingly?
> > > > > > > > >
> > > > > > > > > Thanks,
> > > > > > > > >
> > > > > > > > > Jun
> > > > > > > > >
> > > > > > > > > On Fri, Feb 19, 2016 at 2:57 PM, Bill Warshaw <
> > > > wdwarshaw@gmail.com
> > > > > >
> > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Hello all,
> > > > > > > > > >
> > > > > > > > > > What is the next step with this proposal?  The work for
> > > KIP-32
> > > > > that
> > > > > > > it
> > > > > > > > > was
> > > > > > > > > > based off merged earlier today (
> > > > > > > > https://github.com/apache/kafka/pull/764
> > > > > > > > > ,
> > > > > > > > > > thank you Becket).  I have an implementation with tests,
> > and
> > > > I've
> > > > > > > > > confirmed
> > > > > > > > > > that it actually works in a live system.  Is there more
> > > > > discussion
> > > > > > > that
> > > > > > > > > > needs to be had about this KIP, or should I start a VOTE
> > > > thread?
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > On Tue, Feb 16, 2016 at 5:06 PM, Jun Rao <
> jun@confluent.io
> > >
> > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Bill,
> > > > > > > > > > >
> > > > > > > > > > > Thanks for the proposal. A couple of comments.
> > > > > > > > > > >
> > > > > > > > > > > 1. It seems that this new policy should work for
> > CreateTime
> > > > as
> > > > > > > well.
> > > > > > > > > If a
> > > > > > > > > > > topic is configured with CreateTime, messages may not
> be
> > > > added
> > > > > in
> > > > > > > > > strict
> > > > > > > > > > > order in the log. However, to build a time-based index,
> > we
> > > > will
> > > > > > be
> > > > > > > > > > > maintaining the largest timestamp for all messages in a
> > log
> > > > > > > segment.
> > > > > > > > We
> > > > > > > > > > can
> > > > > > > > > > > delete a segment if its largest timestamp is less than
> > > > > > > > > > > log.retention.min.timestamp. This guarantees that no
> > > messages
> > > > > > newer
> > > > > > > > > than
> > > > > > > > > > > log.retention.min.timestamp will be deleted, which is
> > > > probably
> > > > > > what
> > > > > > > > the
> > > > > > > > > > > user wants.
> > > > > > > > > > >
> > > > > > > > > > > 2. Right now, the user can specify "delete" as the
> > > retention
> > > > > > policy
> > > > > > > > > and a
> > > > > > > > > > > log segment will be deleted either when the size of a
> > > > partition
> > > > > > > > > exceeds a
> > > > > > > > > > > threshold or the timestamp of a segment is older than a
> > > > > relative
> > > > > > > > period
> > > > > > > > > > of
> > > > > > > > > > > time (say 7 days) from now. What you are proposing is
> > not a
> > > > new
> > > > > > > > > retention
> > > > > > > > > > > policy, but an additional check that will cause a
> segment
> > > to
> > > > be
> > > > > > > > deleted
> > > > > > > > > > > when the timestamp of a segment is older than an
> absolute
> > > > > > > timestamp?
> > > > > > > > If
> > > > > > > > > > so,
> > > > > > > > > > > could you update the wiki accordingly?
> > > > > > > > > > >
> > > > > > > > > > > Jun
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > On Sat, Feb 13, 2016 at 3:23 PM, Bill Warshaw <
> > > > > > wdwarshaw@gmail.com
> > > > > > > >
> > > > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > Hello,
> > > > > > > > > > > >
> > > > > > > > > > > > That is a good catch, thanks for pointing it out.  If
> > > this
> > > > > KIP
> > > > > > is
> > > > > > > > > > > accepted,
> > > > > > > > > > > > we'd need to document this and make the log cleaner
> not
> > > run
> > > > > > > > > > > timestamp-based
> > > > > > > > > > > > deletion unless message.timestamp.type=
> LogAppendTime.
> > > > > > > > > > > >
> > > > > > > > > > > > On Sat, Feb 13, 2016 at 5:38 AM, Andrew Schofield <
> > > > > > > > > > > > andrew_schofield_jira@outlook.com> wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > This KIP is related to KIP-32, but I strikes me
> that
> > it
> > > > > only
> > > > > > > > makes
> > > > > > > > > > > sense
> > > > > > > > > > > > > with one of the two proposed message timestamp
> types.
> > > If
> > > > I
> > > > > > > > > understand
> > > > > > > > > > > > > correctly, message timestamps are only certain to
> be
> > > > > > > > monotonically
> > > > > > > > > > > > > increasing in the log if
> > > > > > message.timestamp.type=LogAppendTime.
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > Does timestamp-based auto-expiration require use of
> > > > > > > > > > > > > message.timestamp.type=LogAppendTime?
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > I think this KIP is a good idea, but I think it
> > relies
> > > on
> > > > > > > strict
> > > > > > > > > > > ordering
> > > > > > > > > > > > > of timestamps to be workable.
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > Andrew Schofield
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > > Date: Fri, 12 Feb 2016 10:38:46 -0800
> > > > > > > > > > > > > > Subject: Re: [DISCUSS] KIP-47 - Add
> timestamp-based
> > > log
> > > > > > > > deletion
> > > > > > > > > > > policy
> > > > > > > > > > > > > > From: neha@confluent.io
> > > > > > > > > > > > > > To: dev@kafka.apache.org
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Adding a timestamp based auto-expiration is
> useful
> > > and
> > > > > this
> > > > > > > > > > proposal
> > > > > > > > > > > > > makes
> > > > > > > > > > > > > > sense. Thx!
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On Wed, Feb 10, 2016 at 3:35 PM, Jay Kreps
> wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >> I think this makes a lot of sense and won't be
> > hard
> > > to
> > > > > > > > implement
> > > > > > > > > > and
> > > > > > > > > > > > > >> doesn't create too much in the way of new
> > > interfaces.
> > > > > > > > > > > > > >>
> > > > > > > > > > > > > >> -Jay
> > > > > > > > > > > > > >>
> > > > > > > > > > > > > >> On Tue, Feb 9, 2016 at 8:13 AM, Bill Warshaw
> > wrote:
> > > > > > > > > > > > > >>
> > > > > > > > > > > > > >>> Hello,
> > > > > > > > > > > > > >>>
> > > > > > > > > > > > > >>> I just submitted KIP-47 for adding a new log
> > > deletion
> > > > > > > policy
> > > > > > > > > > based
> > > > > > > > > > > > on a
> > > > > > > > > > > > > >>> minimum timestamp of messages to retain.
> > > > > > > > > > > > > >>>
> > > > > > > > > > > > > >>>
> > > > > > > > > > > > > >>>
> > > > > > > > > > > > > >>
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> 47+-+Add+timestamp-based+log+deletion+policy
> > > > > > > > > > > > > >>>
> > > > > > > > > > > > > >>> I'm open to any comments or suggestions.
> > > > > > > > > > > > > >>>
> > > > > > > > > > > > > >>> Thanks,
> > > > > > > > > > > > > >>> Bill Warshaw
> > > > > > > > > > > > > >>>
> > > > > > > > > > > > > >>
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > --
> > > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > > Neha
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>