You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by Tom Crayford <tc...@heroku.com> on 2016/05/20 13:44:31 UTC

Default log.roll.ms to the default retention or the retention in that topic

Hi,

Kafka has a configuration property, log.roll.ms (and log.roll.hours because
Kafka likes having lots of conflicting settings). By default it's set at
168 hours, which exactly matches the default retention of 168 hours.

# Reminder of what the setting does, please skip if you know
This setting controls when the broker will force a new log segment file to
be created. Retention works by looking at the older log segments and
deleting those whose file modification time is past the window specified.

The default of 168 hours becomes confusing when users configure different
retention windows themselves - if you set a retention window *lower* than
that limit, but have a low volume topic, retention will only be applied
once every 7 days. This is often confusing, and may lead to nasty
consequences if e.g. you had a compliance reason to actually only keep say,
4 days of data, you now have to know to tune log.roll.ms or log.roll.hours.

Instead, we could default `log.roll.ms` to:

a) the default retention window if no per topic one is set
b) the per topic retention setting if it's set

Clearly if `log.roll.ms` or `log.roll.hours` *are* explicitly set, we can
use them still, which avoids breaking backwards compatibility for the most
part.

There's only one complication here, which is that if you set your retention
super low (say you set it to 100ms), Kafka will now roll the log file that
often, which would lead to performance issues and number of files issues. I
think we can and maybe should reject having that setting be so low anyway
(either in the topic creation command, or at broker bootup), but finding a
good default lower bound there might be tricky. An alternative would be to
limit `log.roll.ms` to a certain lower bound, even with this defaulting
behaviour.

I think making log retention behave with it's intention without having to
understand log.roll.ms would be a notable improvement to most users of
Kafka, and has few drawbacks except "a small matter of coding".

I'd be happy to write up a KIP if y'all think this warrents it, and/or
write a pull request for this change.

Thanks

Tom Crayford
Heroku Kafka