You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Heath Ivie <hi...@AutoAnything.com> on 2016/04/01 21:21:27 UTC

Log Retention: What gets deleted

Hi,

I have some questions about the log retention and specifically what gets deleted.

I have a test app where I am writing 10 logs to the topic every second.

What I would expect is a lag in a group would be somewhere around 10 if I have retention.ms at 1000.

What I am seeing that the lag continues to grow, but then at some point all messages are gone and the lag is at 0.

I thought that the messages that are old would be deleted first.

Am I misinterpreting how the log retention works?

Heath Ivie
Solutions Architect


Warning: This e-mail may contain information proprietary to AutoAnything Inc. and is intended only for the use of the intended recipient(s). If the reader of this message is not the intended recipient(s), you have received this message in error and any review, dissemination, distribution or copying of this message is strictly prohibited. If you have received this message in error, please notify the sender immediately and delete all copies.

Re: Log Retention: What gets deleted

Posted by Gwen Shapira <gw...@confluent.io>.
Agree! Thats a serious problem. We are trying to fix this in the upcoming
release.

Gwen

On Fri, Apr 8, 2016 at 2:56 PM, Anandha L Ranganathan <analog.sony@gmail.com
> wrote:

> Thanks.
>
> I have seen this in our system would like to understand the behavior of the
> log segment.
>
> How the log segment will get deleted in the case of one of the ISR moved to
> the new node.
> Say for an example currently my ISR nodes {1,2,3} for the partition-0.  Due
> to some reason  after 2 days the new ISR nodes are {2,3,4}.
> Brokers {2,3} will contains some log segment creation date  as T1 for the
> partition-0
> Broker {4} has different log segment creation date as T2 for the
> partition-0.
>
> The deletion of log segment will be based on broker {4} or brokers
> {2,3}.    We noticed that latest timestamp of  log segment applies and it
> sometime requires more disk space than anticipated.
>
>
>
>
>
> On Fri, Apr 8, 2016 at 1:07 PM Gwen Shapira <gw...@confluent.io> wrote:
>
> > Yes. It is whichever is shorter :)
> >
> > Another clarification:
> > A segment is deleted as a whole, based on the newest event in the
> segment.
> > So if the newest event is too recent to delete, the older events in the
> > segment will also be kept around.
> >
> > On Fri, Apr 8, 2016 at 12:52 PM, Anandha L Ranganathan <
> > analog.sony@gmail.com> wrote:
> >
> > > Just a clarification based on Gwen's reply
> > >
> > > *log.segment.bytes*  - by default this property is set to 1 GB.
> > > If we haven't set any value for  *log.roll.ms <http://log.roll.ms>* ,
> > > again
> > > by default it is set to 168 hours.  In that case  after every 1 GB,
> will
> > it
> > > roll out new log segment file ?
> > >
> > >
> > >
> > >
> > >
> > > <http://log.roll.ms>
> > >
> > > On Fri, Apr 8, 2016 at 11:32 AM Heath Ivie <hi...@autoanything.com>
> > wrote:
> > >
> > > > Gwen,
> > > >
> > > > Thanks for the detailed reply.
> > > >
> > > > That makes it more clear for me.
> > > >
> > > > Heath
> > > >
> > > > -----Original Message-----
> > > > From: Gwen Shapira [mailto:gwen@confluent.io]
> > > > Sent: Tuesday, April 05, 2016 6:13 PM
> > > > To: users@kafka.apache.org
> > > > Subject: Re: Log Retention: What gets deleted
> > > >
> > > > I think you got it almost right. The missing part is that we only
> > delete
> > > > whole partition segments, not individual messages.
> > > >
> > > > As you are writing messages, every X bytes or Y milliseconds, a new
> > file
> > > > gets created for the partition to store new messages in. Those files
> > are
> > > > called segments.
> > > > The segment you are currently writing to is an active segment.
> > > >
> > > > We will never delete an active segment, so in order to delete old
> > > messages
> > > > we will look for an inactive segment where the newest message is
> older
> > > than
> > > > our retention and delete the entire segment.
> > > >
> > > > So there are several parameters controlling when will data get
> deleted
> > > > (I'm looking at just the time based, not the size-based):
> > > > 1. log.retention.ms - how old messages should be before we consider
> > them
> > > > for deletion 2. log.roll.ms - how frequently we roll new segments.
> > > > Messages will not get deleted before a new segment is rolled 3.
> > > > log.retention.check.interval.ms - how frequently we check for
> segments
> > > > that we can delete.
> > > >
> > > > A message will be deleted if all 3 are true:
> > > > 1. It is older than log.retention.ms
> > > > 2. It is in an inactive segment, meaning enough time passed since the
> > > > message was written to roll a new segment 3. Kafka checked for
> segments
> > > > that can be deleted, meaning that more than check.interval.ms time
> > > passed
> > > > since the segment was rolled.
> > > >
> > > > Hope this helps,
> > > >
> > > > Gwen
> > > >
> > > >
> > > >
> > > > On Fri, Apr 1, 2016 at 12:21 PM, Heath Ivie <hi...@autoanything.com>
> > > > wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > I have some questions about the log retention and specifically what
> > > > > gets deleted.
> > > > >
> > > > > I have a test app where I am writing 10 logs to the topic every
> > second.
> > > > >
> > > > > What I would expect is a lag in a group would be somewhere around
> 10
> > > > > if I have retention.ms at 1000.
> > > > >
> > > > > What I am seeing that the lag continues to grow, but then at some
> > > > > point all messages are gone and the lag is at 0.
> > > > >
> > > > > I thought that the messages that are old would be deleted first.
> > > > >
> > > > > Am I misinterpreting how the log retention works?
> > > > >
> > > > > Heath Ivie
> > > > > Solutions Architect
> > > > >
> > > > >
> > > > > Warning: This e-mail may contain information proprietary to
> > > > > AutoAnything Inc. and is intended only for the use of the intended
> > > > > recipient(s). If the reader of this message is not the intended
> > > > > recipient(s), you have received this message in error and any
> review,
> > > > > dissemination, distribution or copying of this message is strictly
> > > > > prohibited. If you have received this message in error, please
> notify
> > > > > the sender immediately and delete all copies.
> > > > >
> > > >
> > >
> >
>

Re: Log Retention: What gets deleted

Posted by Anandha L Ranganathan <an...@gmail.com>.
Thanks.

I have seen this in our system would like to understand the behavior of the
log segment.

How the log segment will get deleted in the case of one of the ISR moved to
the new node.
Say for an example currently my ISR nodes {1,2,3} for the partition-0.  Due
to some reason  after 2 days the new ISR nodes are {2,3,4}.
Brokers {2,3} will contains some log segment creation date  as T1 for the
partition-0
Broker {4} has different log segment creation date as T2 for the
partition-0.

The deletion of log segment will be based on broker {4} or brokers
{2,3}.    We noticed that latest timestamp of  log segment applies and it
sometime requires more disk space than anticipated.





On Fri, Apr 8, 2016 at 1:07 PM Gwen Shapira <gw...@confluent.io> wrote:

> Yes. It is whichever is shorter :)
>
> Another clarification:
> A segment is deleted as a whole, based on the newest event in the segment.
> So if the newest event is too recent to delete, the older events in the
> segment will also be kept around.
>
> On Fri, Apr 8, 2016 at 12:52 PM, Anandha L Ranganathan <
> analog.sony@gmail.com> wrote:
>
> > Just a clarification based on Gwen's reply
> >
> > *log.segment.bytes*  - by default this property is set to 1 GB.
> > If we haven't set any value for  *log.roll.ms <http://log.roll.ms>* ,
> > again
> > by default it is set to 168 hours.  In that case  after every 1 GB, will
> it
> > roll out new log segment file ?
> >
> >
> >
> >
> >
> > <http://log.roll.ms>
> >
> > On Fri, Apr 8, 2016 at 11:32 AM Heath Ivie <hi...@autoanything.com>
> wrote:
> >
> > > Gwen,
> > >
> > > Thanks for the detailed reply.
> > >
> > > That makes it more clear for me.
> > >
> > > Heath
> > >
> > > -----Original Message-----
> > > From: Gwen Shapira [mailto:gwen@confluent.io]
> > > Sent: Tuesday, April 05, 2016 6:13 PM
> > > To: users@kafka.apache.org
> > > Subject: Re: Log Retention: What gets deleted
> > >
> > > I think you got it almost right. The missing part is that we only
> delete
> > > whole partition segments, not individual messages.
> > >
> > > As you are writing messages, every X bytes or Y milliseconds, a new
> file
> > > gets created for the partition to store new messages in. Those files
> are
> > > called segments.
> > > The segment you are currently writing to is an active segment.
> > >
> > > We will never delete an active segment, so in order to delete old
> > messages
> > > we will look for an inactive segment where the newest message is older
> > than
> > > our retention and delete the entire segment.
> > >
> > > So there are several parameters controlling when will data get deleted
> > > (I'm looking at just the time based, not the size-based):
> > > 1. log.retention.ms - how old messages should be before we consider
> them
> > > for deletion 2. log.roll.ms - how frequently we roll new segments.
> > > Messages will not get deleted before a new segment is rolled 3.
> > > log.retention.check.interval.ms - how frequently we check for segments
> > > that we can delete.
> > >
> > > A message will be deleted if all 3 are true:
> > > 1. It is older than log.retention.ms
> > > 2. It is in an inactive segment, meaning enough time passed since the
> > > message was written to roll a new segment 3. Kafka checked for segments
> > > that can be deleted, meaning that more than check.interval.ms time
> > passed
> > > since the segment was rolled.
> > >
> > > Hope this helps,
> > >
> > > Gwen
> > >
> > >
> > >
> > > On Fri, Apr 1, 2016 at 12:21 PM, Heath Ivie <hi...@autoanything.com>
> > > wrote:
> > >
> > > > Hi,
> > > >
> > > > I have some questions about the log retention and specifically what
> > > > gets deleted.
> > > >
> > > > I have a test app where I am writing 10 logs to the topic every
> second.
> > > >
> > > > What I would expect is a lag in a group would be somewhere around 10
> > > > if I have retention.ms at 1000.
> > > >
> > > > What I am seeing that the lag continues to grow, but then at some
> > > > point all messages are gone and the lag is at 0.
> > > >
> > > > I thought that the messages that are old would be deleted first.
> > > >
> > > > Am I misinterpreting how the log retention works?
> > > >
> > > > Heath Ivie
> > > > Solutions Architect
> > > >
> > > >
> > > > Warning: This e-mail may contain information proprietary to
> > > > AutoAnything Inc. and is intended only for the use of the intended
> > > > recipient(s). If the reader of this message is not the intended
> > > > recipient(s), you have received this message in error and any review,
> > > > dissemination, distribution or copying of this message is strictly
> > > > prohibited. If you have received this message in error, please notify
> > > > the sender immediately and delete all copies.
> > > >
> > >
> >
>

Re: Log Retention: What gets deleted

Posted by Gwen Shapira <gw...@confluent.io>.
Yes. It is whichever is shorter :)

Another clarification:
A segment is deleted as a whole, based on the newest event in the segment.
So if the newest event is too recent to delete, the older events in the
segment will also be kept around.

On Fri, Apr 8, 2016 at 12:52 PM, Anandha L Ranganathan <
analog.sony@gmail.com> wrote:

> Just a clarification based on Gwen's reply
>
> *log.segment.bytes*  - by default this property is set to 1 GB.
> If we haven't set any value for  *log.roll.ms <http://log.roll.ms>* ,
> again
> by default it is set to 168 hours.  In that case  after every 1 GB, will it
> roll out new log segment file ?
>
>
>
>
>
> <http://log.roll.ms>
>
> On Fri, Apr 8, 2016 at 11:32 AM Heath Ivie <hi...@autoanything.com> wrote:
>
> > Gwen,
> >
> > Thanks for the detailed reply.
> >
> > That makes it more clear for me.
> >
> > Heath
> >
> > -----Original Message-----
> > From: Gwen Shapira [mailto:gwen@confluent.io]
> > Sent: Tuesday, April 05, 2016 6:13 PM
> > To: users@kafka.apache.org
> > Subject: Re: Log Retention: What gets deleted
> >
> > I think you got it almost right. The missing part is that we only delete
> > whole partition segments, not individual messages.
> >
> > As you are writing messages, every X bytes or Y milliseconds, a new file
> > gets created for the partition to store new messages in. Those files are
> > called segments.
> > The segment you are currently writing to is an active segment.
> >
> > We will never delete an active segment, so in order to delete old
> messages
> > we will look for an inactive segment where the newest message is older
> than
> > our retention and delete the entire segment.
> >
> > So there are several parameters controlling when will data get deleted
> > (I'm looking at just the time based, not the size-based):
> > 1. log.retention.ms - how old messages should be before we consider them
> > for deletion 2. log.roll.ms - how frequently we roll new segments.
> > Messages will not get deleted before a new segment is rolled 3.
> > log.retention.check.interval.ms - how frequently we check for segments
> > that we can delete.
> >
> > A message will be deleted if all 3 are true:
> > 1. It is older than log.retention.ms
> > 2. It is in an inactive segment, meaning enough time passed since the
> > message was written to roll a new segment 3. Kafka checked for segments
> > that can be deleted, meaning that more than check.interval.ms time
> passed
> > since the segment was rolled.
> >
> > Hope this helps,
> >
> > Gwen
> >
> >
> >
> > On Fri, Apr 1, 2016 at 12:21 PM, Heath Ivie <hi...@autoanything.com>
> > wrote:
> >
> > > Hi,
> > >
> > > I have some questions about the log retention and specifically what
> > > gets deleted.
> > >
> > > I have a test app where I am writing 10 logs to the topic every second.
> > >
> > > What I would expect is a lag in a group would be somewhere around 10
> > > if I have retention.ms at 1000.
> > >
> > > What I am seeing that the lag continues to grow, but then at some
> > > point all messages are gone and the lag is at 0.
> > >
> > > I thought that the messages that are old would be deleted first.
> > >
> > > Am I misinterpreting how the log retention works?
> > >
> > > Heath Ivie
> > > Solutions Architect
> > >
> > >
> > > Warning: This e-mail may contain information proprietary to
> > > AutoAnything Inc. and is intended only for the use of the intended
> > > recipient(s). If the reader of this message is not the intended
> > > recipient(s), you have received this message in error and any review,
> > > dissemination, distribution or copying of this message is strictly
> > > prohibited. If you have received this message in error, please notify
> > > the sender immediately and delete all copies.
> > >
> >
>

Re: Log Retention: What gets deleted

Posted by Anandha L Ranganathan <an...@gmail.com>.
Just a clarification based on Gwen's reply

*log.segment.bytes*  - by default this property is set to 1 GB.
If we haven't set any value for  *log.roll.ms <http://log.roll.ms>* , again
by default it is set to 168 hours.  In that case  after every 1 GB, will it
roll out new log segment file ?





<http://log.roll.ms>

On Fri, Apr 8, 2016 at 11:32 AM Heath Ivie <hi...@autoanything.com> wrote:

> Gwen,
>
> Thanks for the detailed reply.
>
> That makes it more clear for me.
>
> Heath
>
> -----Original Message-----
> From: Gwen Shapira [mailto:gwen@confluent.io]
> Sent: Tuesday, April 05, 2016 6:13 PM
> To: users@kafka.apache.org
> Subject: Re: Log Retention: What gets deleted
>
> I think you got it almost right. The missing part is that we only delete
> whole partition segments, not individual messages.
>
> As you are writing messages, every X bytes or Y milliseconds, a new file
> gets created for the partition to store new messages in. Those files are
> called segments.
> The segment you are currently writing to is an active segment.
>
> We will never delete an active segment, so in order to delete old messages
> we will look for an inactive segment where the newest message is older than
> our retention and delete the entire segment.
>
> So there are several parameters controlling when will data get deleted
> (I'm looking at just the time based, not the size-based):
> 1. log.retention.ms - how old messages should be before we consider them
> for deletion 2. log.roll.ms - how frequently we roll new segments.
> Messages will not get deleted before a new segment is rolled 3.
> log.retention.check.interval.ms - how frequently we check for segments
> that we can delete.
>
> A message will be deleted if all 3 are true:
> 1. It is older than log.retention.ms
> 2. It is in an inactive segment, meaning enough time passed since the
> message was written to roll a new segment 3. Kafka checked for segments
> that can be deleted, meaning that more than check.interval.ms time passed
> since the segment was rolled.
>
> Hope this helps,
>
> Gwen
>
>
>
> On Fri, Apr 1, 2016 at 12:21 PM, Heath Ivie <hi...@autoanything.com>
> wrote:
>
> > Hi,
> >
> > I have some questions about the log retention and specifically what
> > gets deleted.
> >
> > I have a test app where I am writing 10 logs to the topic every second.
> >
> > What I would expect is a lag in a group would be somewhere around 10
> > if I have retention.ms at 1000.
> >
> > What I am seeing that the lag continues to grow, but then at some
> > point all messages are gone and the lag is at 0.
> >
> > I thought that the messages that are old would be deleted first.
> >
> > Am I misinterpreting how the log retention works?
> >
> > Heath Ivie
> > Solutions Architect
> >
> >
> > Warning: This e-mail may contain information proprietary to
> > AutoAnything Inc. and is intended only for the use of the intended
> > recipient(s). If the reader of this message is not the intended
> > recipient(s), you have received this message in error and any review,
> > dissemination, distribution or copying of this message is strictly
> > prohibited. If you have received this message in error, please notify
> > the sender immediately and delete all copies.
> >
>

RE: Log Retention: What gets deleted

Posted by Heath Ivie <hi...@AutoAnything.com>.
Gwen,

Thanks for the detailed reply.

That makes it more clear for me.

Heath

-----Original Message-----
From: Gwen Shapira [mailto:gwen@confluent.io] 
Sent: Tuesday, April 05, 2016 6:13 PM
To: users@kafka.apache.org
Subject: Re: Log Retention: What gets deleted

I think you got it almost right. The missing part is that we only delete whole partition segments, not individual messages.

As you are writing messages, every X bytes or Y milliseconds, a new file gets created for the partition to store new messages in. Those files are called segments.
The segment you are currently writing to is an active segment.

We will never delete an active segment, so in order to delete old messages we will look for an inactive segment where the newest message is older than our retention and delete the entire segment.

So there are several parameters controlling when will data get deleted (I'm looking at just the time based, not the size-based):
1. log.retention.ms - how old messages should be before we consider them for deletion 2. log.roll.ms - how frequently we roll new segments. Messages will not get deleted before a new segment is rolled 3. log.retention.check.interval.ms - how frequently we check for segments that we can delete.

A message will be deleted if all 3 are true:
1. It is older than log.retention.ms
2. It is in an inactive segment, meaning enough time passed since the message was written to roll a new segment 3. Kafka checked for segments that can be deleted, meaning that more than check.interval.ms time passed since the segment was rolled.

Hope this helps,

Gwen



On Fri, Apr 1, 2016 at 12:21 PM, Heath Ivie <hi...@autoanything.com> wrote:

> Hi,
>
> I have some questions about the log retention and specifically what 
> gets deleted.
>
> I have a test app where I am writing 10 logs to the topic every second.
>
> What I would expect is a lag in a group would be somewhere around 10 
> if I have retention.ms at 1000.
>
> What I am seeing that the lag continues to grow, but then at some 
> point all messages are gone and the lag is at 0.
>
> I thought that the messages that are old would be deleted first.
>
> Am I misinterpreting how the log retention works?
>
> Heath Ivie
> Solutions Architect
>
>
> Warning: This e-mail may contain information proprietary to 
> AutoAnything Inc. and is intended only for the use of the intended 
> recipient(s). If the reader of this message is not the intended 
> recipient(s), you have received this message in error and any review, 
> dissemination, distribution or copying of this message is strictly 
> prohibited. If you have received this message in error, please notify 
> the sender immediately and delete all copies.
>

Re: Log Retention: What gets deleted

Posted by Gwen Shapira <gw...@confluent.io>.
I think you got it almost right. The missing part is that we only delete
whole partition segments, not individual messages.

As you are writing messages, every X bytes or Y milliseconds, a new file
gets created for the partition to store new messages in. Those files are
called segments.
The segment you are currently writing to is an active segment.

We will never delete an active segment, so in order to delete old messages
we will look for an inactive segment where the newest message is older than
our retention and delete the entire segment.

So there are several parameters controlling when will data get deleted (I'm
looking at just the time based, not the size-based):
1. log.retention.ms - how old messages should be before we consider them
for deletion
2. log.roll.ms - how frequently we roll new segments. Messages will not get
deleted before a new segment is rolled
3. log.retention.check.interval.ms - how frequently we check for segments
that we can delete.

A message will be deleted if all 3 are true:
1. It is older than log.retention.ms
2. It is in an inactive segment, meaning enough time passed since the
message was written to roll a new segment
3. Kafka checked for segments that can be deleted, meaning that more than
check.interval.ms time passed since the segment was rolled.

Hope this helps,

Gwen



On Fri, Apr 1, 2016 at 12:21 PM, Heath Ivie <hi...@autoanything.com> wrote:

> Hi,
>
> I have some questions about the log retention and specifically what gets
> deleted.
>
> I have a test app where I am writing 10 logs to the topic every second.
>
> What I would expect is a lag in a group would be somewhere around 10 if I
> have retention.ms at 1000.
>
> What I am seeing that the lag continues to grow, but then at some point
> all messages are gone and the lag is at 0.
>
> I thought that the messages that are old would be deleted first.
>
> Am I misinterpreting how the log retention works?
>
> Heath Ivie
> Solutions Architect
>
>
> Warning: This e-mail may contain information proprietary to AutoAnything
> Inc. and is intended only for the use of the intended recipient(s). If the
> reader of this message is not the intended recipient(s), you have received
> this message in error and any review, dissemination, distribution or
> copying of this message is strictly prohibited. If you have received this
> message in error, please notify the sender immediately and delete all
> copies.
>