You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Upesh Desai <ud...@itrsgroup.com> on 2021/04/18 16:18:51 UTC

State Store Data Retention

Hello, I have not been able to find a concrete answer on if/how state stores on a running kafka streams instance remove data when it has passed the configured retention.ms config. So a couple clarification questions:


  1.  If the stores are configured with: cleanup.policy=compact,delete AND retention.ms=N, will the stores remove data automatically over time in the running stream instance stores?
  2.  Is this behavior the same for in-memory stores and persistent rocksdb stores?
  3.  If they do not remove data that has passed the retention.ms period, is there a different way to periodically remove old data from the stores?

I’m using kafka 2.7.0 components across the board (broker, connect, etc.).

Thanks in advance,
Upesh

Upesh Desai
Senior Software Developer
udesai@itrsgroup.com
www.itrsgroup.com
Internet communications are not secure and therefore the ITRS Group does not accept legal responsibility for the contents of this message. Any view or opinions presented are solely those of the author and do not necessarily represent those of the ITRS Group unless otherwise specifically stated.
[itrs.email.signature]

Disclaimer

The information contained in this communication from the sender is confidential. It is intended solely for use by the recipient and others authorized to receive it. If you are not the recipient, you are hereby notified that any disclosure, copying, distribution or taking action in relation of the contents of this information is strictly prohibited and may be unlawful.

This email has been scanned for viruses and malware, and may have been automatically archived by Mimecast Ltd, an innovator in Software as a Service (SaaS) for business. Providing a safer and more useful place for your human generated data. Specializing in; Security, archiving and compliance. To find out more visit the Mimecast website.

Re: State Store Data Retention

Posted by Bruno Cadonna <ca...@apache.org>.
Hi Navneeth,

I wrote that the *local state stores* are not affected when the topic 
configs cleanup.policy and retention.ms are passed to the state store. 
The *changelog topics* will consider the configs and they will remove 
data as specified in the configs.

In the case of a state migration to another instance, it depends whether 
the other instance already has some state for the given state locally.

- If the most recent offset on the instance is still within the range of 
offsets of the changelog topic on the brokers, the state will be 
replayed from the local offset to the most recent offset on the brokers. 
Data removed on the brokers might still exist locally.

- If the most recent offset on the instance is before the range of 
offsets of the changelog topic on the brokers, the state will be 
replayed from the beginning of the the changelog on the brokers which 
means that removed data in the changelog topic cannot be replayed 
because the beginning of the changelog was moved after this data.

- If the state does not exist on the instance, the state will be 
replayed from the beginning of the the changelog and removed data is not 
replayed as in the previous case.

I hope that helps.

Best,
Bruno

On 08.05.21 00:59, Navneeth Krishnan wrote:
> Hi Bruno/All,
> 
> I have a follow up question regarding the same topic. As per you had
> mentioned there will be no impact to key value stores even when retention.ms
> and clean up policy is provided. Does that mean the change log topic will
> not clear the data in the broker even after the retention period is over?
> 
> I agree the local state stores will not be able to delete the data but when
> there is any reallocation then the state replay would just have to replay
> the data for the given retention time. Is this understanding correct?
> 
> Thanks
> 
> On Mon, Apr 19, 2021 at 1:57 AM Bruno Cadonna <ca...@apache.org> wrote:
> 
>> Hi Upesh,
>>
>> The answers to your questions are:
>>
>> 1.
>> The configs cleanup.policy and retention.ms are topic configs. Hence,
>> they only affect the changelog of a state store, not the local state
>> store in a Kafka Streams client.
>>
>> Locally, window and session stores remove data they do not need anymore.
>> Window and session stores are segmented stores. That means they consist
>> of segments that are ordered by the windows they contain. Once the
>> segment that contains the oldest windows is not needed anymore, i.e.,
>> the data exceeded the retention time of the state store, the segment is
>> removed.
>>
>> Non-windowed state store will not remove data.
>>
>> Worth noting here: If you change retention.ms directly on the brokers,
>> it will not affect the behavior of local state stores.
>>
>> 2.
>> Yes, this behavior is the same for in-memory state stores and persistent
>> state stores.
>>
>> 3.
>> Window and session state stores do remove data.
>>
>>
>> Best,
>> Bruno
>>
>>
>>
>> On 18.04.21 18:18, Upesh Desai wrote:
>>> Hello, I have not been able to find a concrete answer on if/how state
>>> stores on a running kafka streams instance remove data when it has
>>> passed the configured retention.ms config. So a couple clarification
>>> questions:
>>>
>>>   1. If the stores are configured with: cleanup.policy=compact,delete AND
>>>      retention.ms=N, will the stores remove data automatically over time
>>>      in the running stream instance stores?
>>>   2. Is this behavior the same for in-memory stores and persistent
>>>      rocksdb stores?
>>>   3. If they do not remove data that has passed the retention.ms period,
>>>      is there a different way to periodically remove old data from the
>>>      stores?
>>>
>>> I’m using kafka 2.7.0 components across the board (broker, connect,
>> etc.).
>>>
>>> Thanks in advance,
>>> Upesh
>>>
>>> <https://www.itrsgroup.com/>
>>>
>>>
>>> Upesh Desai​
>>> Senior Software Developer
>>>
>>> *udesai@itrsgroup.com* <ma...@itrsgroup.com>
>>> *www.itrsgroup.com* <https://www.itrsgroup.com/>
>>>
>>> Internet communications are not secure and therefore the ITRS Group does
>>> not accept legal responsibility for the contents of this message. Any
>>> view or opinions presented are solely those of the author and do not
>>> necessarily represent those of the ITRS Group unless otherwise
>>> specifically stated.
>>>
>>> [itrs.email.signature]
>>>
>>>
>>>
>>> *Disclaimer*
>>>
>>> The information contained in this communication from the sender is
>>> confidential. It is intended solely for use by the recipient and others
>>> authorized to receive it. If you are not the recipient, you are hereby
>>> notified that any disclosure, copying, distribution or taking action in
>>> relation of the contents of this information is strictly prohibited and
>>> may be unlawful.
>>>
>>> This email has been scanned for viruses and malware, and may have been
>>> automatically archived by *Mimecast Ltd*, an innovator in Software as a
>>> Service (SaaS) for business. Providing a *safer* and *more useful* place
>>> for your human generated data. Specializing in; Security, archiving and
>>> compliance.
>>>
>>
> 

Re: State Store Data Retention

Posted by Navneeth Krishnan <re...@gmail.com>.
Hi Bruno/All,

I have a follow up question regarding the same topic. As per you had
mentioned there will be no impact to key value stores even when retention.ms
and clean up policy is provided. Does that mean the change log topic will
not clear the data in the broker even after the retention period is over?

I agree the local state stores will not be able to delete the data but when
there is any reallocation then the state replay would just have to replay
the data for the given retention time. Is this understanding correct?

Thanks

On Mon, Apr 19, 2021 at 1:57 AM Bruno Cadonna <ca...@apache.org> wrote:

> Hi Upesh,
>
> The answers to your questions are:
>
> 1.
> The configs cleanup.policy and retention.ms are topic configs. Hence,
> they only affect the changelog of a state store, not the local state
> store in a Kafka Streams client.
>
> Locally, window and session stores remove data they do not need anymore.
> Window and session stores are segmented stores. That means they consist
> of segments that are ordered by the windows they contain. Once the
> segment that contains the oldest windows is not needed anymore, i.e.,
> the data exceeded the retention time of the state store, the segment is
> removed.
>
> Non-windowed state store will not remove data.
>
> Worth noting here: If you change retention.ms directly on the brokers,
> it will not affect the behavior of local state stores.
>
> 2.
> Yes, this behavior is the same for in-memory state stores and persistent
> state stores.
>
> 3.
> Window and session state stores do remove data.
>
>
> Best,
> Bruno
>
>
>
> On 18.04.21 18:18, Upesh Desai wrote:
> > Hello, I have not been able to find a concrete answer on if/how state
> > stores on a running kafka streams instance remove data when it has
> > passed the configured retention.ms config. So a couple clarification
> > questions:
> >
> >  1. If the stores are configured with: cleanup.policy=compact,delete AND
> >     retention.ms=N, will the stores remove data automatically over time
> >     in the running stream instance stores?
> >  2. Is this behavior the same for in-memory stores and persistent
> >     rocksdb stores?
> >  3. If they do not remove data that has passed the retention.ms period,
> >     is there a different way to periodically remove old data from the
> >     stores?
> >
> > I’m using kafka 2.7.0 components across the board (broker, connect,
> etc.).
> >
> > Thanks in advance,
> > Upesh
> >
> > <https://www.itrsgroup.com/>
> >
> >
> > Upesh Desai​
> > Senior Software Developer
> >
> > *udesai@itrsgroup.com* <ma...@itrsgroup.com>
> > *www.itrsgroup.com* <https://www.itrsgroup.com/>
> >
> > Internet communications are not secure and therefore the ITRS Group does
> > not accept legal responsibility for the contents of this message. Any
> > view or opinions presented are solely those of the author and do not
> > necessarily represent those of the ITRS Group unless otherwise
> > specifically stated.
> >
> > [itrs.email.signature]
> >
> >
> >
> > *Disclaimer*
> >
> > The information contained in this communication from the sender is
> > confidential. It is intended solely for use by the recipient and others
> > authorized to receive it. If you are not the recipient, you are hereby
> > notified that any disclosure, copying, distribution or taking action in
> > relation of the contents of this information is strictly prohibited and
> > may be unlawful.
> >
> > This email has been scanned for viruses and malware, and may have been
> > automatically archived by *Mimecast Ltd*, an innovator in Software as a
> > Service (SaaS) for business. Providing a *safer* and *more useful* place
> > for your human generated data. Specializing in; Security, archiving and
> > compliance.
> >
>

Re: State Store Data Retention

Posted by Bruno Cadonna <ca...@apache.org>.
Hi Upesh,

The answers to your questions are:

1.
The configs cleanup.policy and retention.ms are topic configs. Hence, 
they only affect the changelog of a state store, not the local state 
store in a Kafka Streams client.

Locally, window and session stores remove data they do not need anymore. 
Window and session stores are segmented stores. That means they consist 
of segments that are ordered by the windows they contain. Once the 
segment that contains the oldest windows is not needed anymore, i.e., 
the data exceeded the retention time of the state store, the segment is 
removed.

Non-windowed state store will not remove data.

Worth noting here: If you change retention.ms directly on the brokers, 
it will not affect the behavior of local state stores.

2.
Yes, this behavior is the same for in-memory state stores and persistent 
state stores.

3.
Window and session state stores do remove data.


Best,
Bruno



On 18.04.21 18:18, Upesh Desai wrote:
> Hello, I have not been able to find a concrete answer on if/how state 
> stores on a running kafka streams instance remove data when it has 
> passed the configured retention.ms config. So a couple clarification 
> questions:
> 
>  1. If the stores are configured with: cleanup.policy=compact,delete AND
>     retention.ms=N, will the stores remove data automatically over time
>     in the running stream instance stores?
>  2. Is this behavior the same for in-memory stores and persistent
>     rocksdb stores?
>  3. If they do not remove data that has passed the retention.ms period,
>     is there a different way to periodically remove old data from the
>     stores?
> 
> I’m using kafka 2.7.0 components across the board (broker, connect, etc.).
> 
> Thanks in advance,
> Upesh
> 
> <https://www.itrsgroup.com/>
> 
> 	
> Upesh Desai​
> Senior Software Developer
> 
> *udesai@itrsgroup.com* <ma...@itrsgroup.com>
> *www.itrsgroup.com* <https://www.itrsgroup.com/>
> 
> Internet communications are not secure and therefore the ITRS Group does 
> not accept legal responsibility for the contents of this message. Any 
> view or opinions presented are solely those of the author and do not 
> necessarily represent those of the ITRS Group unless otherwise 
> specifically stated.
> 
> [itrs.email.signature]
> 
> 
> 
> *Disclaimer*
> 
> The information contained in this communication from the sender is 
> confidential. It is intended solely for use by the recipient and others 
> authorized to receive it. If you are not the recipient, you are hereby 
> notified that any disclosure, copying, distribution or taking action in 
> relation of the contents of this information is strictly prohibited and 
> may be unlawful.
> 
> This email has been scanned for viruses and malware, and may have been 
> automatically archived by *Mimecast Ltd*, an innovator in Software as a 
> Service (SaaS) for business. Providing a *safer* and *more useful* place 
> for your human generated data. Specializing in; Security, archiving and 
> compliance.
>