You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Chris Baumgartner <ch...@fujifilm.com> on 2019/07/17 20:11:18 UTC

Best practices for compacting topics with tombstones

Hello,

I'm wondering if anyone has advice on configuring compaction. Here is my
scenario:

A producer writes raw data to topic #1. A stream app reads the data from
topic #1, processes it, writes the processed data to topic #2, and then
writes a tombstone record to topic #1.

So, I don't intend for data to be retained very long in topic #1.

Are there any best practices for configuring compaction on topic #1 in this
case? I don't want to keep the data around very long after it has been
processed, but I also don't want to cause performance issues by compacting
too often.

Thanks.

- Chris

-- 
NOTICE:  This message, including any attachments, is only for the use of 
the intended recipient(s) and may contain confidential, sensitive and/or 
privileged information, or information otherwise prohibited from 
dissemination or disclosure by law or regulation, including applicable 
export regulations.  If the reader of this message is not the intended 
recipient, you are hereby notified that any use, disclosure, copying, 
dissemination or distribution of this message or any of its attachments is 
strictly prohibited.  If you received this message in error, please contact 
the sender immediately by reply email and destroy this message, including 
all attachments, and any copies thereof.

Re: Best practices for compacting topics with tombstones

Posted by Omar Al-Safi <om...@gmail.com>.
If I recall correctly, you can set 'delete.retention.ms' in the topic level
configuration to how long you want to retain the tombstones in the topic.
By default is set to 86400000, you can set it to lower than this. Regarding
the performance, I am not really why would compaction causes the
performance hit to your broker, but the question would be how much data you
hold there, how often you have updates to your topic (records with the same
key) and how often you have tombstones for records

On Wed, 17 Jul 2019 at 22:12, Chris Baumgartner <
chris.baumgartner@fujifilm.com> wrote:

> Hello,
>
> I'm wondering if anyone has advice on configuring compaction. Here is my
> scenario:
>
> A producer writes raw data to topic #1. A stream app reads the data from
> topic #1, processes it, writes the processed data to topic #2, and then
> writes a tombstone record to topic #1.
>
> So, I don't intend for data to be retained very long in topic #1.
>
> Are there any best practices for configuring compaction on topic #1 in this
> case? I don't want to keep the data around very long after it has been
> processed, but I also don't want to cause performance issues by compacting
> too often.
>
> Thanks.
>
> - Chris
>
> --
> NOTICE:  This message, including any attachments, is only for the use of
> the intended recipient(s) and may contain confidential, sensitive and/or
> privileged information, or information otherwise prohibited from
> dissemination or disclosure by law or regulation, including applicable
> export regulations.  If the reader of this message is not the intended
> recipient, you are hereby notified that any use, disclosure, copying,
> dissemination or distribution of this message or any of its attachments is
> strictly prohibited.  If you received this message in error, please
> contact
> the sender immediately by reply email and destroy this message, including
> all attachments, and any copies thereof.
>