You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by Guozhang Wang <wa...@gmail.com> on 2016/05/02 00:55:52 UTC

Re: [VOTE] KIP-47 - Add timestamp-based log deletion policy

Thanks Bill.

Read through the KIP, LGTM overall. One clarification question:

With this KIP the LogManager's cleanup logic would be, for each segment

1) delete the segment if its last timestamp is < current timstamp -
log.retention.time (ms, minutes, hours, etc).
2) delete the segment if its last timestamp is < specified
log.retention.min.timestamp.

And then check again for each segment

3) delete the segment if the total size is still > log.retention.bytes.

My understanding is that for condition 1) and 2), the segment will be
deleted if "EITHER ONE" of them holds, not "BOTH" of them holds. Just
asking for confirmation.


Guozhang





On Thu, Apr 28, 2016 at 8:28 AM, Bill Warshaw <wd...@gmail.com> wrote:

> I'd like to re-initiate the vote for KIP-47 now that KIP-33 has been
> accepted and is in-progress.  I've updated the KIP (
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-47+-+Add+timestamp-based+log+deletion+policy
> ).
> I have a commit with the functionality for KIP-47 ready to go once KIP-33
> is complete; it's a fairly minor change.
>
> On Wed, Mar 9, 2016 at 8:42 PM, Gwen Shapira <gw...@confluent.io> wrote:
>
> > For convenience, the KIP is here:
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-47+-+Add+timestamp-based+log+deletion+policy
> >
> > Do you mind updating the KIP with  time formats we plan on supporting
> > in the configuration?
> >
> > On Wed, Mar 9, 2016 at 11:44 AM, Bill Warshaw <wd...@gmail.com>
> wrote:
> > > Hello,
> > >
> > > I'd like to initiate the vote for KIP-47.
> > >
> > > Thanks,
> > > Bill Warshaw
> >
>



-- 
-- Guozhang

Re: [VOTE] KIP-47 - Add timestamp-based log deletion policy

Posted by Guozhang Wang <wa...@gmail.com>.
Thanks. I'm +1 on this proposal given the comment above.

On Mon, May 2, 2016 at 9:34 AM, Bill Warshaw <wd...@gmail.com> wrote:

> Yeah 1 and 2 could easily be combined into the same predicate.
>
> On Mon, May 2, 2016 at 10:27 AM, Guozhang Wang <wa...@gmail.com> wrote:
>
> > Can we do 1 and 2 in one pass, and 3 in another pass? It may result in
> > different results but semantically it should be acceptable. Arguably
> saving
> > one pass on the segment list may not be huge, but if it is
> straight-forward
> > to do I'd suggest choose this option.
> >
> >
> > Guozhang
> >
> >
> > On Mon, May 2, 2016 at 7:15 AM, Bill Warshaw <wd...@gmail.com>
> wrote:
> >
> > > Conditions 1, 2 and 3 will all be checked sequentially.  If any of the
> > > three conditions is true, that segment will be deleted.
> > >
> > > This is what it looks like in my commit:
> > >
> > >
> >
> https://github.com/apache/kafka/blob/a229462df567f91f76122668037e1bcbbbdff41b/core/src/main/scala/kafka/log/LogManager.scala#L423-L468
> > >
> > > The order of the checks is 1,3,2 (log.retention.time,
> > log.retention.bytes,
> > > log.retention.mintimestamp)
> > >
> > > Bill
> > >
> > > On Sun, May 1, 2016 at 6:55 PM, Guozhang Wang <wa...@gmail.com>
> > wrote:
> > >
> > > > Thanks Bill.
> > > >
> > > > Read through the KIP, LGTM overall. One clarification question:
> > > >
> > > > With this KIP the LogManager's cleanup logic would be, for each
> segment
> > > >
> > > > 1) delete the segment if its last timestamp is < current timstamp -
> > > > log.retention.time (ms, minutes, hours, etc).
> > > > 2) delete the segment if its last timestamp is < specified
> > > > log.retention.min.timestamp.
> > > >
> > > > And then check again for each segment
> > > >
> > > > 3) delete the segment if the total size is still >
> log.retention.bytes.
> > > >
> > > > My understanding is that for condition 1) and 2), the segment will be
> > > > deleted if "EITHER ONE" of them holds, not "BOTH" of them holds. Just
> > > > asking for confirmation.
> > > >
> > > >
> > > > Guozhang
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > On Thu, Apr 28, 2016 at 8:28 AM, Bill Warshaw <wd...@gmail.com>
> > > wrote:
> > > >
> > > > > I'd like to re-initiate the vote for KIP-47 now that KIP-33 has
> been
> > > > > accepted and is in-progress.  I've updated the KIP (
> > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-47+-+Add+timestamp-based+log+deletion+policy
> > > > > ).
> > > > > I have a commit with the functionality for KIP-47 ready to go once
> > > KIP-33
> > > > > is complete; it's a fairly minor change.
> > > > >
> > > > > On Wed, Mar 9, 2016 at 8:42 PM, Gwen Shapira <gw...@confluent.io>
> > > wrote:
> > > > >
> > > > > > For convenience, the KIP is here:
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-47+-+Add+timestamp-based+log+deletion+policy
> > > > > >
> > > > > > Do you mind updating the KIP with  time formats we plan on
> > supporting
> > > > > > in the configuration?
> > > > > >
> > > > > > On Wed, Mar 9, 2016 at 11:44 AM, Bill Warshaw <
> wdwarshaw@gmail.com
> > >
> > > > > wrote:
> > > > > > > Hello,
> > > > > > >
> > > > > > > I'd like to initiate the vote for KIP-47.
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Bill Warshaw
> > > > > >
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > -- Guozhang
> > > >
> > >
> >
> >
> >
> > --
> > -- Guozhang
> >
>



-- 
-- Guozhang

Re: [VOTE] KIP-47 - Add timestamp-based log deletion policy

Posted by Bill Warshaw <wd...@gmail.com>.
Yeah 1 and 2 could easily be combined into the same predicate.

On Mon, May 2, 2016 at 10:27 AM, Guozhang Wang <wa...@gmail.com> wrote:

> Can we do 1 and 2 in one pass, and 3 in another pass? It may result in
> different results but semantically it should be acceptable. Arguably saving
> one pass on the segment list may not be huge, but if it is straight-forward
> to do I'd suggest choose this option.
>
>
> Guozhang
>
>
> On Mon, May 2, 2016 at 7:15 AM, Bill Warshaw <wd...@gmail.com> wrote:
>
> > Conditions 1, 2 and 3 will all be checked sequentially.  If any of the
> > three conditions is true, that segment will be deleted.
> >
> > This is what it looks like in my commit:
> >
> >
> https://github.com/apache/kafka/blob/a229462df567f91f76122668037e1bcbbbdff41b/core/src/main/scala/kafka/log/LogManager.scala#L423-L468
> >
> > The order of the checks is 1,3,2 (log.retention.time,
> log.retention.bytes,
> > log.retention.mintimestamp)
> >
> > Bill
> >
> > On Sun, May 1, 2016 at 6:55 PM, Guozhang Wang <wa...@gmail.com>
> wrote:
> >
> > > Thanks Bill.
> > >
> > > Read through the KIP, LGTM overall. One clarification question:
> > >
> > > With this KIP the LogManager's cleanup logic would be, for each segment
> > >
> > > 1) delete the segment if its last timestamp is < current timstamp -
> > > log.retention.time (ms, minutes, hours, etc).
> > > 2) delete the segment if its last timestamp is < specified
> > > log.retention.min.timestamp.
> > >
> > > And then check again for each segment
> > >
> > > 3) delete the segment if the total size is still > log.retention.bytes.
> > >
> > > My understanding is that for condition 1) and 2), the segment will be
> > > deleted if "EITHER ONE" of them holds, not "BOTH" of them holds. Just
> > > asking for confirmation.
> > >
> > >
> > > Guozhang
> > >
> > >
> > >
> > >
> > >
> > > On Thu, Apr 28, 2016 at 8:28 AM, Bill Warshaw <wd...@gmail.com>
> > wrote:
> > >
> > > > I'd like to re-initiate the vote for KIP-47 now that KIP-33 has been
> > > > accepted and is in-progress.  I've updated the KIP (
> > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-47+-+Add+timestamp-based+log+deletion+policy
> > > > ).
> > > > I have a commit with the functionality for KIP-47 ready to go once
> > KIP-33
> > > > is complete; it's a fairly minor change.
> > > >
> > > > On Wed, Mar 9, 2016 at 8:42 PM, Gwen Shapira <gw...@confluent.io>
> > wrote:
> > > >
> > > > > For convenience, the KIP is here:
> > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-47+-+Add+timestamp-based+log+deletion+policy
> > > > >
> > > > > Do you mind updating the KIP with  time formats we plan on
> supporting
> > > > > in the configuration?
> > > > >
> > > > > On Wed, Mar 9, 2016 at 11:44 AM, Bill Warshaw <wdwarshaw@gmail.com
> >
> > > > wrote:
> > > > > > Hello,
> > > > > >
> > > > > > I'd like to initiate the vote for KIP-47.
> > > > > >
> > > > > > Thanks,
> > > > > > Bill Warshaw
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > -- Guozhang
> > >
> >
>
>
>
> --
> -- Guozhang
>

Re: [VOTE] KIP-47 - Add timestamp-based log deletion policy

Posted by Guozhang Wang <wa...@gmail.com>.
Can we do 1 and 2 in one pass, and 3 in another pass? It may result in
different results but semantically it should be acceptable. Arguably saving
one pass on the segment list may not be huge, but if it is straight-forward
to do I'd suggest choose this option.


Guozhang


On Mon, May 2, 2016 at 7:15 AM, Bill Warshaw <wd...@gmail.com> wrote:

> Conditions 1, 2 and 3 will all be checked sequentially.  If any of the
> three conditions is true, that segment will be deleted.
>
> This is what it looks like in my commit:
>
> https://github.com/apache/kafka/blob/a229462df567f91f76122668037e1bcbbbdff41b/core/src/main/scala/kafka/log/LogManager.scala#L423-L468
>
> The order of the checks is 1,3,2 (log.retention.time, log.retention.bytes,
> log.retention.mintimestamp)
>
> Bill
>
> On Sun, May 1, 2016 at 6:55 PM, Guozhang Wang <wa...@gmail.com> wrote:
>
> > Thanks Bill.
> >
> > Read through the KIP, LGTM overall. One clarification question:
> >
> > With this KIP the LogManager's cleanup logic would be, for each segment
> >
> > 1) delete the segment if its last timestamp is < current timstamp -
> > log.retention.time (ms, minutes, hours, etc).
> > 2) delete the segment if its last timestamp is < specified
> > log.retention.min.timestamp.
> >
> > And then check again for each segment
> >
> > 3) delete the segment if the total size is still > log.retention.bytes.
> >
> > My understanding is that for condition 1) and 2), the segment will be
> > deleted if "EITHER ONE" of them holds, not "BOTH" of them holds. Just
> > asking for confirmation.
> >
> >
> > Guozhang
> >
> >
> >
> >
> >
> > On Thu, Apr 28, 2016 at 8:28 AM, Bill Warshaw <wd...@gmail.com>
> wrote:
> >
> > > I'd like to re-initiate the vote for KIP-47 now that KIP-33 has been
> > > accepted and is in-progress.  I've updated the KIP (
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-47+-+Add+timestamp-based+log+deletion+policy
> > > ).
> > > I have a commit with the functionality for KIP-47 ready to go once
> KIP-33
> > > is complete; it's a fairly minor change.
> > >
> > > On Wed, Mar 9, 2016 at 8:42 PM, Gwen Shapira <gw...@confluent.io>
> wrote:
> > >
> > > > For convenience, the KIP is here:
> > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-47+-+Add+timestamp-based+log+deletion+policy
> > > >
> > > > Do you mind updating the KIP with  time formats we plan on supporting
> > > > in the configuration?
> > > >
> > > > On Wed, Mar 9, 2016 at 11:44 AM, Bill Warshaw <wd...@gmail.com>
> > > wrote:
> > > > > Hello,
> > > > >
> > > > > I'd like to initiate the vote for KIP-47.
> > > > >
> > > > > Thanks,
> > > > > Bill Warshaw
> > > >
> > >
> >
> >
> >
> > --
> > -- Guozhang
> >
>



-- 
-- Guozhang

Re: [VOTE] KIP-47 - Add timestamp-based log deletion policy

Posted by Bill Warshaw <wd...@gmail.com>.
Conditions 1, 2 and 3 will all be checked sequentially.  If any of the
three conditions is true, that segment will be deleted.

This is what it looks like in my commit:
https://github.com/apache/kafka/blob/a229462df567f91f76122668037e1bcbbbdff41b/core/src/main/scala/kafka/log/LogManager.scala#L423-L468

The order of the checks is 1,3,2 (log.retention.time, log.retention.bytes,
log.retention.mintimestamp)

Bill

On Sun, May 1, 2016 at 6:55 PM, Guozhang Wang <wa...@gmail.com> wrote:

> Thanks Bill.
>
> Read through the KIP, LGTM overall. One clarification question:
>
> With this KIP the LogManager's cleanup logic would be, for each segment
>
> 1) delete the segment if its last timestamp is < current timstamp -
> log.retention.time (ms, minutes, hours, etc).
> 2) delete the segment if its last timestamp is < specified
> log.retention.min.timestamp.
>
> And then check again for each segment
>
> 3) delete the segment if the total size is still > log.retention.bytes.
>
> My understanding is that for condition 1) and 2), the segment will be
> deleted if "EITHER ONE" of them holds, not "BOTH" of them holds. Just
> asking for confirmation.
>
>
> Guozhang
>
>
>
>
>
> On Thu, Apr 28, 2016 at 8:28 AM, Bill Warshaw <wd...@gmail.com> wrote:
>
> > I'd like to re-initiate the vote for KIP-47 now that KIP-33 has been
> > accepted and is in-progress.  I've updated the KIP (
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-47+-+Add+timestamp-based+log+deletion+policy
> > ).
> > I have a commit with the functionality for KIP-47 ready to go once KIP-33
> > is complete; it's a fairly minor change.
> >
> > On Wed, Mar 9, 2016 at 8:42 PM, Gwen Shapira <gw...@confluent.io> wrote:
> >
> > > For convenience, the KIP is here:
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-47+-+Add+timestamp-based+log+deletion+policy
> > >
> > > Do you mind updating the KIP with  time formats we plan on supporting
> > > in the configuration?
> > >
> > > On Wed, Mar 9, 2016 at 11:44 AM, Bill Warshaw <wd...@gmail.com>
> > wrote:
> > > > Hello,
> > > >
> > > > I'd like to initiate the vote for KIP-47.
> > > >
> > > > Thanks,
> > > > Bill Warshaw
> > >
> >
>
>
>
> --
> -- Guozhang
>