You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by "Fabien LD (JIRA)" <ji...@apache.org> on 2018/05/07 10:12:00 UTC
[jira] [Created] (KAFKA-6872) Doc for log.roll.* is wrong

Fabien LD created KAFKA-6872:
--------------------------------

             Summary: Doc for log.roll.* is wrong
                 Key: KAFKA-6872
                 URL: https://issues.apache.org/jira/browse/KAFKA-6872
             Project: Kafka
          Issue Type: Bug
    Affects Versions: 1.0.0
            Reporter: Fabien LD


For {{log.roll.ms}}, doc says for example:
{quote}The maximum time before a new log segment is rolled out (in milliseconds). If not set, the value in log.roll.hours is used
{quote}
In other parts (see [https://kafka.apache.org/10/documentation.html#upgrade_10_1_breaking]), it says:
{quote}The log rolling time is no longer depending on log segment create time. Instead it is now based on the timestamp in the messages. More specifically. if the timestamp of the first message in the segment is T, the log will be rolled out when a new message has a timestamp greater than or equal to T + log.roll.ms
{quote}
which is wrong. More specifically, the wrong part is:
{quote}if the timestamp of the +first+ message in the segment is T
{quote}
Indeed, the truth is actually:
{quote}if the timestamp of the +last+ message in the segment is T
{quote}
 

A simple use case to reproduce this is to configure a single broker with:
{code:java}
# One partition ... or any small number should be fine
num.partitions=1
# 100MB segment
log.segment.bytes=1073741824
# Delete old segments when their last addition is 24h old
log.retention.hours=24
# Check age of segments every 5 minutes
log.retention.check.interval.ms=300000
# Every hour (?!?!?), roll a new segment
log.roll.hours=1
{code}
and loop on sending a small message (a few bytes so that you never reach 100MB during the period of this test) every minute to one topic.

After at least 24h running, according to what is described in the doc, on would expect to see ~24 segments (on new segment rolled every hour).
 But the truth is that there is only one log segment with all the records you sent. Stop the producer for a bit more than one hour and restart it: you will have a second segment created per partition because at some point, when adding a new record, the previous one (the last one of what was the current segment) was more than 1h old.

This proves that the doc should say:
{quote}if the timestamp of the +last+ message in the segment is T, the log will be rolled out when a new message has a timestamp greater than or equal to T + log.roll.ms
{quote}
 

Notes:
 * as a DevOps, I would prefer the doc to be true and kafka's behavior to be changed. But I think that both should be done: doc updated to let users of current versions know what to expect (and avoid running into the problem we faced) and later the behavior of kafka updated. Indeed, one could have kafka keep very old records with default conf where {{log.roll.hours=168}} and {{log.segment.bytes=1073741824}} and when pushing like one small (~1k) record a day -> 100k records can fit in that segment -> it is never rotated
 * I detected this on version 1.0.0 but assume it impacts much more than that version (and very likely 1.1.0 too)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)