You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Ivan Balashov <ib...@gmail.com> on 2015/03/01 16:19:44 UTC

Re: Kafka 0.8.2 log cleaner

Hi,

Do I understand correctly that compaction and deletion are currently
mutually exclusive?

Is it possible to compact recent segments and delete older ones,
according to general deletion policies?

Thanks,


2014-11-30 15:10 GMT+03:00 Manikumar Reddy <ku...@nmsworks.co.in>:
> Log cleaner does not support topics with compressed messages.
>
> https://issues.apache.org/jira/browse/KAFKA-1374
>
> On Sun, Nov 30, 2014 at 5:33 PM, Mathias Söderberg <
> mathias.soederberg@gmail.com> wrote:
>
>> Does the log cleaner in 0.8.2 support topics with compressed messages? IIRC
>> that wasn't supported in 0.8.1.1.
>>
>> On 29 November 2014 at 17:23, Jun Rao <ju...@gmail.com> wrote:
>>
>> > Yes, log cleaner is in 0.8.2. You just need to set the retention policy
>> of
>> > a topic to "compact".
>> >
>> > Thanks,
>> >
>> > Jun
>> >
>> > On Thu, Nov 27, 2014 at 5:20 AM, Khandygo, Evgeny (EXT) <
>> > evgeny.khandygo.ext@siemens.com> wrote:
>> >
>> > > I’m wondering if you could tell me whether log cleaner implemented in
>> > > 0.8.2 because it seems like it didn’t.
>> > >
>> > > Thanks
>> > > John
>> > >
>> > >
>> >
>>

Re: Kafka 0.8.2 log cleaner

Posted by Mayuresh Gharat <gh...@gmail.com>.
This would be a good feature to add to log Cleaner.

Thanks,

Mayuresh

On Mon, Mar 2, 2015 at 8:57 AM, Ivan Balashov <ib...@gmail.com> wrote:

> Svante,
>
> Not sure if I understand your suggestion correctly, but I do think
> that enabling retention for deleted values would make a useful
> addition to the "compact" policy. Otherwise some data is bound to be
> hanging around not used.
>
> Guozhang, could this potentially deserve a feature request?
>
> Thanks,
>
>
> 2015-03-02 19:40 GMT+03:00 svante karlsson <sa...@csi.se>:
> > Wouldn't it be rather simple to add a retention time on "deleted" items
> ie
> > keys with null value for topics that are compacted?
> >
> > The retention time would then be set to some "large" time to allow all
> > consumers to understand that a previous k/v is being deleted.
>



-- 
-Regards,
Mayuresh R. Gharat
(862) 250-7125

Re: Kafka 0.8.2 log cleaner

Posted by Ivan Balashov <ib...@gmail.com>.
James,

Indeed, does exactly what is needed.

Thanks for noticing!


2015-03-02 22:34 GMT+03:00 James Cheng <jc...@tivo.com>:
> Ivan,
>
> I think log.cleaner.delete.retention.ms does just that?
>
> "The amount of time to retain delete tombstone markers for log compacted topics. This setting also gives a bound on the time in which a consumer must complete a read if they begin from offset 0 to ensure that they get a valid snapshot of the final stage (otherwise delete tombstones may be collected before they complete their scan). This setting can be overridden on a per-topic basis (see the per-topic configuration section)."
>
> http://kafka.apache.org/documentation.html#brokerconfigs
>
> -James
>
>> On Mar 2, 2015, at 8:57 AM, Ivan Balashov <ib...@gmail.com> wrote:
>>
>> Svante,
>>
>> Not sure if I understand your suggestion correctly, but I do think
>> that enabling retention for deleted values would make a useful
>> addition to the "compact" policy. Otherwise some data is bound to be
>> hanging around not used.
>>
>> Guozhang, could this potentially deserve a feature request?
>>
>> Thanks,
>>
>>
>> 2015-03-02 19:40 GMT+03:00 svante karlsson <sa...@csi.se>:
>>> Wouldn't it be rather simple to add a retention time on "deleted" items ie
>>> keys with null value for topics that are compacted?
>>>
>>> The retention time would then be set to some "large" time to allow all
>>> consumers to understand that a previous k/v is being deleted.
>

Re: Kafka 0.8.2 log cleaner

Posted by James Cheng <jc...@tivo.com>.
Ivan,

I think log.cleaner.delete.retention.ms does just that?

"The amount of time to retain delete tombstone markers for log compacted topics. This setting also gives a bound on the time in which a consumer must complete a read if they begin from offset 0 to ensure that they get a valid snapshot of the final stage (otherwise delete tombstones may be collected before they complete their scan). This setting can be overridden on a per-topic basis (see the per-topic configuration section)."

http://kafka.apache.org/documentation.html#brokerconfigs

-James

> On Mar 2, 2015, at 8:57 AM, Ivan Balashov <ib...@gmail.com> wrote:
> 
> Svante,
> 
> Not sure if I understand your suggestion correctly, but I do think
> that enabling retention for deleted values would make a useful
> addition to the "compact" policy. Otherwise some data is bound to be
> hanging around not used.
> 
> Guozhang, could this potentially deserve a feature request?
> 
> Thanks,
> 
> 
> 2015-03-02 19:40 GMT+03:00 svante karlsson <sa...@csi.se>:
>> Wouldn't it be rather simple to add a retention time on "deleted" items ie
>> keys with null value for topics that are compacted?
>> 
>> The retention time would then be set to some "large" time to allow all
>> consumers to understand that a previous k/v is being deleted.


Re: Kafka 0.8.2 log cleaner

Posted by Ivan Balashov <ib...@gmail.com>.
Svante,

Not sure if I understand your suggestion correctly, but I do think
that enabling retention for deleted values would make a useful
addition to the "compact" policy. Otherwise some data is bound to be
hanging around not used.

Guozhang, could this potentially deserve a feature request?

Thanks,


2015-03-02 19:40 GMT+03:00 svante karlsson <sa...@csi.se>:
> Wouldn't it be rather simple to add a retention time on "deleted" items ie
> keys with null value for topics that are compacted?
>
> The retention time would then be set to some "large" time to allow all
> consumers to understand that a previous k/v is being deleted.

Re: Kafka 0.8.2 log cleaner

Posted by svante karlsson <sa...@csi.se>.
Wouldn't it be rather simple to add a retention time on "deleted" items ie
keys with null value for topics that are compacted?

The retention time would then be set to some "large" time to allow all
consumers to understand that a previous k/v is being deleted.



2015-03-02 17:30 GMT+01:00 Ivan Balashov <ib...@gmail.com>:

> Guozhang,
>
> I agree, but upon restart the application still needs to init
> KV-storage. And even though values are empty, keys will generate
> traffic (delaying app startup time).
> Besides, the idea of keeping needless data in kafka forever, even keys
> only, sounds rather unsettling.
>
> I guess we could try to reduce the key update count, and adjust
> retention of KV topic.
>
> Thanks,
>
> 2015-03-02 19:14 GMT+03:00 Guozhang Wang <wa...@gmail.com>:
> > Currently Kafka log compaction does not support removing keys, but as
> long
> > as you also have log cleaning done at the app level #.keys will not
> > increase indefinitely.
>

Re: Kafka 0.8.2 log cleaner

Posted by Ivan Balashov <ib...@gmail.com>.
Guozhang,

I agree, but upon restart the application still needs to init
KV-storage. And even though values are empty, keys will generate
traffic (delaying app startup time).
Besides, the idea of keeping needless data in kafka forever, even keys
only, sounds rather unsettling.

I guess we could try to reduce the key update count, and adjust
retention of KV topic.

Thanks,

2015-03-02 19:14 GMT+03:00 Guozhang Wang <wa...@gmail.com>:
> Currently Kafka log compaction does not support removing keys, but as long
> as you also have log cleaning done at the app level #.keys will not
> increase indefinitely.

Re: Kafka 0.8.2 log cleaner

Posted by Guozhang Wang <wa...@gmail.com>.
Currently Kafka log compaction does not support removing keys, but as long
as you also have log cleaning done at the app level #.keys will not
increase indefinitely.

On Mon, Mar 2, 2015 at 2:08 AM, Ivan Balashov <ib...@gmail.com> wrote:

> Guozhang,
>
> Thanks for the suggestion, however, I'm afraid cardinality of keys
> will grow indefinitely and AFAIU keys are permanent with log
> compaction. Any chance keys could also be removed during compaction?
>
> Thanks,
>
> 2015-03-02 5:27 GMT+03:00 Guozhang Wang <wa...@gmail.com>:
> >
> > From your description it seems Kafka stores "source of truth" of the data
> > and the k-v store is constructed via consuming from Kafka, right? In that
> > case time/size-based data retention policy is usually not preferred as it
> > may delete data out of expectation while people are querying the k-v
> store.
> > If you have to enforce some retention policy, then I would suggest use
> log
> > compaction at the Kafka layer and use an app-level thread that cleans up
> > the data in both kafka / kv stored according to your policy.
>



-- 
-- Guozhang

Re: Kafka 0.8.2 log cleaner

Posted by Ivan Balashov <ib...@gmail.com>.
Guozhang,

Thanks for the suggestion, however, I'm afraid cardinality of keys
will grow indefinitely and AFAIU keys are permanent with log
compaction. Any chance keys could also be removed during compaction?

Thanks,

2015-03-02 5:27 GMT+03:00 Guozhang Wang <wa...@gmail.com>:
>
> From your description it seems Kafka stores "source of truth" of the data
> and the k-v store is constructed via consuming from Kafka, right? In that
> case time/size-based data retention policy is usually not preferred as it
> may delete data out of expectation while people are querying the k-v store.
> If you have to enforce some retention policy, then I would suggest use log
> compaction at the Kafka layer and use an app-level thread that cleans up
> the data in both kafka / kv stored according to your policy.

Re: Kafka 0.8.2 log cleaner

Posted by Guozhang Wang <wa...@gmail.com>.
Ivan,

>From your description it seems Kafka stores "source of truth" of the data
and the k-v store is constructed via consuming from Kafka, right? In that
case time/size-based data retention policy is usually not preferred as it
may delete data out of expectation while people are querying the k-v store.
If you have to enforce some retention policy, then I would suggest use log
compaction at the Kafka layer and use an app-level thread that cleans up
the data in both kafka / kv stored according to your policy.

Guozhang


On Sun, Mar 1, 2015 at 8:07 AM, Ivan Balashov <ib...@gmail.com> wrote:

> 2015-03-01 18:41 GMT+03:00 Jay Kreps <ja...@gmail.com>:
> > They are mutually exclusive. Can you expand on the motivation/use for
> > combining them?
>
> Thanks, Jay
>
> Let's say we need to build key-value storage semantically connected to
> the data that also stored in kafka.
> Once the particular pieces of data are gone due to retention
> expiration there might be no need to keep relevant pieces in the
> kv-storage.
> On the other hand, kv-storage most likely will benefit from
> compaction, since its keys receive multiple updates.
>
> If this is not available oob, looks like the same can now be achieved
> by manually scanning compacted topic and issuing "delete" markers.
>



-- 
-- Guozhang

Re: Kafka 0.8.2 log cleaner

Posted by Mayuresh Gharat <gh...@gmail.com>.
I think currently you can issue delete markers (tombstones) for the keys.
That will delete the data associated with the respective keys during
compaction. But the keys still will exist in the log.

Thanks,

Mayuresh

On Sun, Mar 1, 2015 at 8:07 AM, Ivan Balashov <ib...@gmail.com> wrote:

> 2015-03-01 18:41 GMT+03:00 Jay Kreps <ja...@gmail.com>:
> > They are mutually exclusive. Can you expand on the motivation/use for
> > combining them?
>
> Thanks, Jay
>
> Let's say we need to build key-value storage semantically connected to
> the data that also stored in kafka.
> Once the particular pieces of data are gone due to retention
> expiration there might be no need to keep relevant pieces in the
> kv-storage.
> On the other hand, kv-storage most likely will benefit from
> compaction, since its keys receive multiple updates.
>
> If this is not available oob, looks like the same can now be achieved
> by manually scanning compacted topic and issuing "delete" markers.
>



-- 
-Regards,
Mayuresh R. Gharat
(862) 250-7125

Re: Kafka 0.8.2 log cleaner

Posted by Ivan Balashov <ib...@gmail.com>.
2015-03-01 18:41 GMT+03:00 Jay Kreps <ja...@gmail.com>:
> They are mutually exclusive. Can you expand on the motivation/use for
> combining them?

Thanks, Jay

Let's say we need to build key-value storage semantically connected to
the data that also stored in kafka.
Once the particular pieces of data are gone due to retention
expiration there might be no need to keep relevant pieces in the
kv-storage.
On the other hand, kv-storage most likely will benefit from
compaction, since its keys receive multiple updates.

If this is not available oob, looks like the same can now be achieved
by manually scanning compacted topic and issuing "delete" markers.

Re: Kafka 0.8.2 log cleaner

Posted by Jay Kreps <ja...@gmail.com>.
They are mutually exclusive. Can you expand on the motivation/use for
combining them?

-Jay

On Sunday, March 1, 2015, Ivan Balashov <ib...@gmail.com> wrote:

> Hi,
>
> Do I understand correctly that compaction and deletion are currently
> mutually exclusive?
>
> Is it possible to compact recent segments and delete older ones,
> according to general deletion policies?
>
> Thanks,
>
>
> 2014-11-30 15:10 GMT+03:00 Manikumar Reddy <kumar@nmsworks.co.in
> <javascript:;>>:
> > Log cleaner does not support topics with compressed messages.
> >
> > https://issues.apache.org/jira/browse/KAFKA-1374
> >
> > On Sun, Nov 30, 2014 at 5:33 PM, Mathias Söderberg <
> > mathias.soederberg@gmail.com <javascript:;>> wrote:
> >
> >> Does the log cleaner in 0.8.2 support topics with compressed messages?
> IIRC
> >> that wasn't supported in 0.8.1.1.
> >>
> >> On 29 November 2014 at 17:23, Jun Rao <junrao@gmail.com <javascript:;>>
> wrote:
> >>
> >> > Yes, log cleaner is in 0.8.2. You just need to set the retention
> policy
> >> of
> >> > a topic to "compact".
> >> >
> >> > Thanks,
> >> >
> >> > Jun
> >> >
> >> > On Thu, Nov 27, 2014 at 5:20 AM, Khandygo, Evgeny (EXT) <
> >> > evgeny.khandygo.ext@siemens.com <javascript:;>> wrote:
> >> >
> >> > > I’m wondering if you could tell me whether log cleaner implemented
> in
> >> > > 0.8.2 because it seems like it didn’t.
> >> > >
> >> > > Thanks
> >> > > John
> >> > >
> >> > >
> >> >
> >>
>