You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Yuheng Du <yu...@gmail.com> on 2015/07/24 21:49:31 UTC

deleting data automatically

Hi,

I am testing the kafka producer performance. So I created a queue and
writes a large amount of data to that queue.

Is there a way to delete the data automatically after some time, say
whenever the data size reaches 50GB or the retention time exceeds 10
seconds, it will be deleted so my disk won't get filled and new data can't
be written in?

Thanks.!

Re: deleting data automatically

Posted by gh...@gmail.com.
You can configure that in the Configs by setting log retention :

http://kafka.apache.org/07/configuration.html

Thanks,

Mayuresh

Sent from my iPhone

> On Jul 24, 2015, at 12:49 PM, Yuheng Du <yu...@gmail.com> wrote:
> 
> Hi,
> 
> I am testing the kafka producer performance. So I created a queue and
> writes a large amount of data to that queue.
> 
> Is there a way to delete the data automatically after some time, say
> whenever the data size reaches 50GB or the retention time exceeds 10
> seconds, it will be deleted so my disk won't get filled and new data can't
> be written in?
> 
> Thanks.!

Re: deleting data automatically

Posted by Yuheng Du <yu...@gmail.com>.
Thank you!

On Mon, Jul 27, 2015 at 1:43 PM, Ewen Cheslack-Postava <ew...@confluent.io>
wrote:

> As I mentioned, adjusting any settings such that files are small enough
> that you don't get the benefits of append-only writes or file
> creation/deletion become a bottleneck might affect performance. It looks
> like the default setting for log.segment.bytes is 1GB, so given fast enough
> cleanup of old logs, you may not need to adjust that setting -- assuming
> you have a reasonable amount of storage, you'll easily fit many dozen log
> files of that size.
>
> -Ewen
>
> On Mon, Jul 27, 2015 at 10:36 AM, Yuheng Du <yu...@gmail.com>
> wrote:
>
> > Thank you! what performance impacts will it be if I change
> > log.segment.bytes? Thanks.
> >
> > On Mon, Jul 27, 2015 at 1:25 PM, Ewen Cheslack-Postava <
> ewen@confluent.io>
> > wrote:
> >
> > > I think log.cleanup.interval.mins was removed in the first 0.8 release.
> > It
> > > sounds like you're looking at outdated docs. Search for
> > > log.retention.check.interval.ms here:
> > > http://kafka.apache.org/documentation.html
> > >
> > > As for setting the values too low hurting performance, I'd guess it's
> > > probably only an issue if you set them extremely small, such that file
> > > creation and cleanup become a bottleneck.
> > >
> > > -Ewen
> > >
> > > On Mon, Jul 27, 2015 at 10:03 AM, Yuheng Du <yu...@gmail.com>
> > > wrote:
> > >
> > > > If I want to get higher throughput, should I increase the
> > > > log.segment.bytes?
> > > >
> > > > I don't see log.retention.check.interval.ms, but there is
> > > > log.cleanup.interval.mins, is that what you mean?
> > > >
> > > > If I set log.roll.ms or log.cleanup.interval.mins too small, will it
> > > hurt
> > > > the throughput? Thanks.
> > > >
> > > > On Fri, Jul 24, 2015 at 11:03 PM, Ewen Cheslack-Postava <
> > > ewen@confluent.io
> > > > >
> > > > wrote:
> > > >
> > > > > You'll want to set the log retention policy via
> > > > > log.retention.{ms,minutes,hours} or log.retention.bytes. If you
> want
> > > > really
> > > > > aggressive collection (e.g., on the order of seconds, as you
> > > specified),
> > > > > you might also need to adjust log.segment.bytes/log.roll.{ms,hours}
> > and
> > > > > log.retention.check.interval.ms.
> > > > >
> > > > > On Fri, Jul 24, 2015 at 12:49 PM, Yuheng Du <
> > yuheng.du.hust@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Hi,
> > > > > >
> > > > > > I am testing the kafka producer performance. So I created a queue
> > and
> > > > > > writes a large amount of data to that queue.
> > > > > >
> > > > > > Is there a way to delete the data automatically after some time,
> > say
> > > > > > whenever the data size reaches 50GB or the retention time exceeds
> > 10
> > > > > > seconds, it will be deleted so my disk won't get filled and new
> > data
> > > > > can't
> > > > > > be written in?
> > > > > >
> > > > > > Thanks.!
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Thanks,
> > > > > Ewen
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > Thanks,
> > > Ewen
> > >
> >
>
>
>
> --
> Thanks,
> Ewen
>

Re: deleting data automatically

Posted by Ewen Cheslack-Postava <ew...@confluent.io>.
As I mentioned, adjusting any settings such that files are small enough
that you don't get the benefits of append-only writes or file
creation/deletion become a bottleneck might affect performance. It looks
like the default setting for log.segment.bytes is 1GB, so given fast enough
cleanup of old logs, you may not need to adjust that setting -- assuming
you have a reasonable amount of storage, you'll easily fit many dozen log
files of that size.

-Ewen

On Mon, Jul 27, 2015 at 10:36 AM, Yuheng Du <yu...@gmail.com>
wrote:

> Thank you! what performance impacts will it be if I change
> log.segment.bytes? Thanks.
>
> On Mon, Jul 27, 2015 at 1:25 PM, Ewen Cheslack-Postava <ew...@confluent.io>
> wrote:
>
> > I think log.cleanup.interval.mins was removed in the first 0.8 release.
> It
> > sounds like you're looking at outdated docs. Search for
> > log.retention.check.interval.ms here:
> > http://kafka.apache.org/documentation.html
> >
> > As for setting the values too low hurting performance, I'd guess it's
> > probably only an issue if you set them extremely small, such that file
> > creation and cleanup become a bottleneck.
> >
> > -Ewen
> >
> > On Mon, Jul 27, 2015 at 10:03 AM, Yuheng Du <yu...@gmail.com>
> > wrote:
> >
> > > If I want to get higher throughput, should I increase the
> > > log.segment.bytes?
> > >
> > > I don't see log.retention.check.interval.ms, but there is
> > > log.cleanup.interval.mins, is that what you mean?
> > >
> > > If I set log.roll.ms or log.cleanup.interval.mins too small, will it
> > hurt
> > > the throughput? Thanks.
> > >
> > > On Fri, Jul 24, 2015 at 11:03 PM, Ewen Cheslack-Postava <
> > ewen@confluent.io
> > > >
> > > wrote:
> > >
> > > > You'll want to set the log retention policy via
> > > > log.retention.{ms,minutes,hours} or log.retention.bytes. If you want
> > > really
> > > > aggressive collection (e.g., on the order of seconds, as you
> > specified),
> > > > you might also need to adjust log.segment.bytes/log.roll.{ms,hours}
> and
> > > > log.retention.check.interval.ms.
> > > >
> > > > On Fri, Jul 24, 2015 at 12:49 PM, Yuheng Du <
> yuheng.du.hust@gmail.com>
> > > > wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > I am testing the kafka producer performance. So I created a queue
> and
> > > > > writes a large amount of data to that queue.
> > > > >
> > > > > Is there a way to delete the data automatically after some time,
> say
> > > > > whenever the data size reaches 50GB or the retention time exceeds
> 10
> > > > > seconds, it will be deleted so my disk won't get filled and new
> data
> > > > can't
> > > > > be written in?
> > > > >
> > > > > Thanks.!
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Thanks,
> > > > Ewen
> > > >
> > >
> >
> >
> >
> > --
> > Thanks,
> > Ewen
> >
>



-- 
Thanks,
Ewen

Re: deleting data automatically

Posted by Yuheng Du <yu...@gmail.com>.
Thank you! what performance impacts will it be if I change
log.segment.bytes? Thanks.

On Mon, Jul 27, 2015 at 1:25 PM, Ewen Cheslack-Postava <ew...@confluent.io>
wrote:

> I think log.cleanup.interval.mins was removed in the first 0.8 release. It
> sounds like you're looking at outdated docs. Search for
> log.retention.check.interval.ms here:
> http://kafka.apache.org/documentation.html
>
> As for setting the values too low hurting performance, I'd guess it's
> probably only an issue if you set them extremely small, such that file
> creation and cleanup become a bottleneck.
>
> -Ewen
>
> On Mon, Jul 27, 2015 at 10:03 AM, Yuheng Du <yu...@gmail.com>
> wrote:
>
> > If I want to get higher throughput, should I increase the
> > log.segment.bytes?
> >
> > I don't see log.retention.check.interval.ms, but there is
> > log.cleanup.interval.mins, is that what you mean?
> >
> > If I set log.roll.ms or log.cleanup.interval.mins too small, will it
> hurt
> > the throughput? Thanks.
> >
> > On Fri, Jul 24, 2015 at 11:03 PM, Ewen Cheslack-Postava <
> ewen@confluent.io
> > >
> > wrote:
> >
> > > You'll want to set the log retention policy via
> > > log.retention.{ms,minutes,hours} or log.retention.bytes. If you want
> > really
> > > aggressive collection (e.g., on the order of seconds, as you
> specified),
> > > you might also need to adjust log.segment.bytes/log.roll.{ms,hours} and
> > > log.retention.check.interval.ms.
> > >
> > > On Fri, Jul 24, 2015 at 12:49 PM, Yuheng Du <yu...@gmail.com>
> > > wrote:
> > >
> > > > Hi,
> > > >
> > > > I am testing the kafka producer performance. So I created a queue and
> > > > writes a large amount of data to that queue.
> > > >
> > > > Is there a way to delete the data automatically after some time, say
> > > > whenever the data size reaches 50GB or the retention time exceeds 10
> > > > seconds, it will be deleted so my disk won't get filled and new data
> > > can't
> > > > be written in?
> > > >
> > > > Thanks.!
> > > >
> > >
> > >
> > >
> > > --
> > > Thanks,
> > > Ewen
> > >
> >
>
>
>
> --
> Thanks,
> Ewen
>

Re: deleting data automatically

Posted by Ewen Cheslack-Postava <ew...@confluent.io>.
I think log.cleanup.interval.mins was removed in the first 0.8 release. It
sounds like you're looking at outdated docs. Search for
log.retention.check.interval.ms here:
http://kafka.apache.org/documentation.html

As for setting the values too low hurting performance, I'd guess it's
probably only an issue if you set them extremely small, such that file
creation and cleanup become a bottleneck.

-Ewen

On Mon, Jul 27, 2015 at 10:03 AM, Yuheng Du <yu...@gmail.com>
wrote:

> If I want to get higher throughput, should I increase the
> log.segment.bytes?
>
> I don't see log.retention.check.interval.ms, but there is
> log.cleanup.interval.mins, is that what you mean?
>
> If I set log.roll.ms or log.cleanup.interval.mins too small, will it hurt
> the throughput? Thanks.
>
> On Fri, Jul 24, 2015 at 11:03 PM, Ewen Cheslack-Postava <ewen@confluent.io
> >
> wrote:
>
> > You'll want to set the log retention policy via
> > log.retention.{ms,minutes,hours} or log.retention.bytes. If you want
> really
> > aggressive collection (e.g., on the order of seconds, as you specified),
> > you might also need to adjust log.segment.bytes/log.roll.{ms,hours} and
> > log.retention.check.interval.ms.
> >
> > On Fri, Jul 24, 2015 at 12:49 PM, Yuheng Du <yu...@gmail.com>
> > wrote:
> >
> > > Hi,
> > >
> > > I am testing the kafka producer performance. So I created a queue and
> > > writes a large amount of data to that queue.
> > >
> > > Is there a way to delete the data automatically after some time, say
> > > whenever the data size reaches 50GB or the retention time exceeds 10
> > > seconds, it will be deleted so my disk won't get filled and new data
> > can't
> > > be written in?
> > >
> > > Thanks.!
> > >
> >
> >
> >
> > --
> > Thanks,
> > Ewen
> >
>



-- 
Thanks,
Ewen

Re: deleting data automatically

Posted by Yuheng Du <yu...@gmail.com>.
If I want to get higher throughput, should I increase the
log.segment.bytes?

I don't see log.retention.check.interval.ms, but there is
log.cleanup.interval.mins, is that what you mean?

If I set log.roll.ms or log.cleanup.interval.mins too small, will it hurt
the throughput? Thanks.

On Fri, Jul 24, 2015 at 11:03 PM, Ewen Cheslack-Postava <ew...@confluent.io>
wrote:

> You'll want to set the log retention policy via
> log.retention.{ms,minutes,hours} or log.retention.bytes. If you want really
> aggressive collection (e.g., on the order of seconds, as you specified),
> you might also need to adjust log.segment.bytes/log.roll.{ms,hours} and
> log.retention.check.interval.ms.
>
> On Fri, Jul 24, 2015 at 12:49 PM, Yuheng Du <yu...@gmail.com>
> wrote:
>
> > Hi,
> >
> > I am testing the kafka producer performance. So I created a queue and
> > writes a large amount of data to that queue.
> >
> > Is there a way to delete the data automatically after some time, say
> > whenever the data size reaches 50GB or the retention time exceeds 10
> > seconds, it will be deleted so my disk won't get filled and new data
> can't
> > be written in?
> >
> > Thanks.!
> >
>
>
>
> --
> Thanks,
> Ewen
>

Re: deleting data automatically

Posted by Ewen Cheslack-Postava <ew...@confluent.io>.
You'll want to set the log retention policy via
log.retention.{ms,minutes,hours} or log.retention.bytes. If you want really
aggressive collection (e.g., on the order of seconds, as you specified),
you might also need to adjust log.segment.bytes/log.roll.{ms,hours} and
log.retention.check.interval.ms.

On Fri, Jul 24, 2015 at 12:49 PM, Yuheng Du <yu...@gmail.com>
wrote:

> Hi,
>
> I am testing the kafka producer performance. So I created a queue and
> writes a large amount of data to that queue.
>
> Is there a way to delete the data automatically after some time, say
> whenever the data size reaches 50GB or the retention time exceeds 10
> seconds, it will be deleted so my disk won't get filled and new data can't
> be written in?
>
> Thanks.!
>



-- 
Thanks,
Ewen