You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by Koushik Chitta <kc...@microsoft.com.INVALID> on 2020/02/23 21:34:48 UTC

Issue in retention with compact,delete cleanup policy

Hi,

I have a Topic with following config.

cleanup.policy  =  compact,delete
segment.bytes = 52428800 (~52 mb)
min.compaction.lag.ms = 1800000 (30 min)
delete.retention.ms = 86400000 (1 day)
retention.ms = 259200000 (3 days)

Ideally I would want the old records > 3 days to be deleted without producing an explicit delete(null value of a key) of the record.
But there can be a case due to continuous compaction, the segments can contain a very old record(eg: > 30 days) and new recent record (eg: 1hr) which will make the segment ineligible for retention delete.

Currently I don't see a work around for this. Please suggest.
I plan to start a KIP to address this use case.

Thanks,
Koushik


RE: [EXTERNAL] Re: Issue in retention with compact,delete cleanup policy

Posted by Koushik Chitta <kc...@microsoft.com.INVALID>.
Right. Let me start a kip to add in few ideas to discuss .

Thanks,
Koushik

-----Original Message-----
From: Matthias J. Sax <mj...@apache.org> 
Sent: Tuesday, March 3, 2020 8:35 PM
To: users@kafka.apache.org
Subject: [EXTERNAL] Re: Issue in retention with compact,delete cleanup policy

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512

If I understand the issue correctly, the problem is that some data, even if it is not updated for a long period of time will, never be deleted because it's always in a segment with new data?

As your segment size is already fairly small, I don't see any other solution as to write corresponding tombstones.

Changing the behavior would for sure require a KIP. One could basically change the compaction to leave data that is older than the retention time in their own segments to make them available for deletion. I am not familiar with this part of the code and thus cannot say how complex it would be to implement.


- -Matthias


On 3/3/20 7:02 PM, Koushik Chitta wrote:
> Bubbling this up to understand if anyone else are in similar use case.
>
>
> -----Original Message----- From: Koushik Chitta
> <kc...@microsoft.com.INVALID> Sent: Sunday, February 23, 2020
> 1:35 PM To: users@kafka.apache.org; dev@kafka.apache.org Subject:
> [EXTERNAL] Issue in retention with compact,delete cleanup policy
>
> Hi,
>
> I have a Topic with following config.
>
> cleanup.policy  =  compact,delete segment.bytes = 52428800 (~52
> mb) min.compaction.lag.ms = 1800000 (30 min) delete.retention.ms =
> 86400000 (1 day) retention.ms = 259200000 (3 days)
>
> Ideally I would want the old records > 3 days to be deleted without
> producing an explicit delete(null value of a key) of the record.
> But there can be a case due to continuous compaction, the segments
> can contain a very old record(eg: > 30 days) and new recent record
> (eg: 1hr) which will make the segment ineligible for retention
> delete.
>
> Currently I don't see a work around for this. Please suggest. I
> plan to start a KIP to address this use case.
>
> Thanks, Koushik
>
-----BEGIN PGP SIGNATURE-----

iQIzBAEBCgAdFiEEI8mthP+5zxXZZdDSO4miYXKq/OgFAl5fMAEACgkQO4miYXKq
/OgBpA//auSjqG8bnpKD44Svey2GK7cI1kA6pyEY/NMgS4PLav5q+dWiPnADBDQV
ZmgekEcXLk6TRggl8oHs0zJCf9ETesGwAUQEcQnIstK+lPIBT/fOEpVdJiZzwNt5
24t1flhO3NBgFty+XQm5J0DrNJmMaysbhptuulPpOfn2Cqj6L26Co22rWkm1w9Pa
03bmrswj5f2kVXWvKExY1kZLRxNhGu4fQaovyDF8dgTZAoHQWAwXpUOxB70tl0Ct
fDwzzYd+waGNnQ7cbsbdY6QxvhQJj2aijXCgih3tqJ2ww14BIlTHBku9FAIU77Ww
OesfAFAt8tcZmYX5pntVyaG2FblATDKSnf1WEAzSRNzuw+dGsnrkAZSwizZJWVYq
JgRo3EO55HHPDqol+T1jI/KFL7tUb549rEd6Er3sxM/+PHX7CeKoS+Y/+VOPY8Z3
uc6qVe1y+rjqikBE9dmOjBmJngopbYtAK8Wu6CbdNRqDXXZtkXsnoj1vr7SlsJZ1
yov+11gLEAtGHSrAwGw6W5Jydcwdbxug2xSgWI0W0XD6djiyS3NBw59vBfB4+wPP
YLrS5UvxB3W6/wUgivx3iPNTOtHMuqb4lElxVLgHQzKJUmQkouDGQTpUNKCEfncb
9631uJP9hAMLOl/rvPW6CDp2PlMZLAgB14vIgqyEB0vkU7D+1qs=
=YTFS
-----END PGP SIGNATURE-----

Re: Issue in retention with compact,delete cleanup policy

Posted by "Matthias J. Sax" <mj...@apache.org>.
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512

If I understand the issue correctly, the problem is that some data,
even if it is not updated for a long period of time will, never be
deleted because it's always in a segment with new data?

As your segment size is already fairly small, I don't see any other
solution as to write corresponding tombstones.

Changing the behavior would for sure require a KIP. One could
basically change the compaction to leave data that is older than the
retention time in their own segments to make them available for
deletion. I am not familiar with this part of the code and thus cannot
say how complex it would be to implement.


- -Matthias


On 3/3/20 7:02 PM, Koushik Chitta wrote:
> Bubbling this up to understand if anyone else are in similar use
> case.
>
>
> -----Original Message----- From: Koushik Chitta
> <kc...@microsoft.com.INVALID> Sent: Sunday, February 23, 2020
> 1:35 PM To: users@kafka.apache.org; dev@kafka.apache.org Subject:
> [EXTERNAL] Issue in retention with compact,delete cleanup policy
>
> Hi,
>
> I have a Topic with following config.
>
> cleanup.policy  =  compact,delete segment.bytes = 52428800 (~52
> mb) min.compaction.lag.ms = 1800000 (30 min) delete.retention.ms =
> 86400000 (1 day) retention.ms = 259200000 (3 days)
>
> Ideally I would want the old records > 3 days to be deleted without
> producing an explicit delete(null value of a key) of the record.
> But there can be a case due to continuous compaction, the segments
> can contain a very old record(eg: > 30 days) and new recent record
> (eg: 1hr) which will make the segment ineligible for retention
> delete.
>
> Currently I don't see a work around for this. Please suggest. I
> plan to start a KIP to address this use case.
>
> Thanks, Koushik
>
-----BEGIN PGP SIGNATURE-----

iQIzBAEBCgAdFiEEI8mthP+5zxXZZdDSO4miYXKq/OgFAl5fMAEACgkQO4miYXKq
/OgBpA//auSjqG8bnpKD44Svey2GK7cI1kA6pyEY/NMgS4PLav5q+dWiPnADBDQV
ZmgekEcXLk6TRggl8oHs0zJCf9ETesGwAUQEcQnIstK+lPIBT/fOEpVdJiZzwNt5
24t1flhO3NBgFty+XQm5J0DrNJmMaysbhptuulPpOfn2Cqj6L26Co22rWkm1w9Pa
03bmrswj5f2kVXWvKExY1kZLRxNhGu4fQaovyDF8dgTZAoHQWAwXpUOxB70tl0Ct
fDwzzYd+waGNnQ7cbsbdY6QxvhQJj2aijXCgih3tqJ2ww14BIlTHBku9FAIU77Ww
OesfAFAt8tcZmYX5pntVyaG2FblATDKSnf1WEAzSRNzuw+dGsnrkAZSwizZJWVYq
JgRo3EO55HHPDqol+T1jI/KFL7tUb549rEd6Er3sxM/+PHX7CeKoS+Y/+VOPY8Z3
uc6qVe1y+rjqikBE9dmOjBmJngopbYtAK8Wu6CbdNRqDXXZtkXsnoj1vr7SlsJZ1
yov+11gLEAtGHSrAwGw6W5Jydcwdbxug2xSgWI0W0XD6djiyS3NBw59vBfB4+wPP
YLrS5UvxB3W6/wUgivx3iPNTOtHMuqb4lElxVLgHQzKJUmQkouDGQTpUNKCEfncb
9631uJP9hAMLOl/rvPW6CDp2PlMZLAgB14vIgqyEB0vkU7D+1qs=
=YTFS
-----END PGP SIGNATURE-----

RE: Issue in retention with compact,delete cleanup policy

Posted by Koushik Chitta <kc...@microsoft.com.INVALID>.
Bubbling this up to understand if anyone else are in similar use case.


-----Original Message-----
From: Koushik Chitta <kc...@microsoft.com.INVALID> 
Sent: Sunday, February 23, 2020 1:35 PM
To: users@kafka.apache.org; dev@kafka.apache.org
Subject: [EXTERNAL] Issue in retention with compact,delete cleanup policy

Hi,

I have a Topic with following config.

cleanup.policy  =  compact,delete
segment.bytes = 52428800 (~52 mb)
min.compaction.lag.ms = 1800000 (30 min) delete.retention.ms = 86400000 (1 day) retention.ms = 259200000 (3 days)

Ideally I would want the old records > 3 days to be deleted without producing an explicit delete(null value of a key) of the record.
But there can be a case due to continuous compaction, the segments can contain a very old record(eg: > 30 days) and new recent record (eg: 1hr) which will make the segment ineligible for retention delete.

Currently I don't see a work around for this. Please suggest.
I plan to start a KIP to address this use case.

Thanks,
Koushik


RE: Issue in retention with compact,delete cleanup policy

Posted by Koushik Chitta <kc...@microsoft.com.INVALID>.
Bubbling this up to understand if anyone else are in similar use case.


-----Original Message-----
From: Koushik Chitta <kc...@microsoft.com.INVALID> 
Sent: Sunday, February 23, 2020 1:35 PM
To: users@kafka.apache.org; dev@kafka.apache.org
Subject: [EXTERNAL] Issue in retention with compact,delete cleanup policy

Hi,

I have a Topic with following config.

cleanup.policy  =  compact,delete
segment.bytes = 52428800 (~52 mb)
min.compaction.lag.ms = 1800000 (30 min) delete.retention.ms = 86400000 (1 day) retention.ms = 259200000 (3 days)

Ideally I would want the old records > 3 days to be deleted without producing an explicit delete(null value of a key) of the record.
But there can be a case due to continuous compaction, the segments can contain a very old record(eg: > 30 days) and new recent record (eg: 1hr) which will make the segment ineligible for retention delete.

Currently I don't see a work around for this. Please suggest.
I plan to start a KIP to address this use case.

Thanks,
Koushik