You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@kafka.apache.org by "John Crowley (JIRA)" <ji...@apache.org> on 2017/11/15 14:54:01 UTC

[jira] [Commented] (KAFKA-3806) Adjust default values of log.retention.hours and offsets.retention.minutes

    [ https://issues.apache.org/jira/browse/KAFKA-3806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16253555#comment-16253555 ] 

John Crowley commented on KAFKA-3806:
-------------------------------------

Am working on a PubSub project where the source is reference data - some of which changes very slowly. (And, yes, this is not exactly the standard Kafka use-case. Kafka is being used as a reliable persistent store supporting multiple subscribers.) As a very simplistic example, assume one of the sources is the holiday calendar for the following year for a company - this is probably published in November, and unless an error is discovered will not be updated until the following November. The PubSub logic must still monitor the topic so that any change is published rapidly, but usually none occur - and even though Kafka can handle the load, performing an artificial commit every N seconds would seem to be a pure waste of resources. (And if the example is extended to possibly 100s of topics, with multiple groupIds consuming each topic then the overall load might be significant.)

On the topic side, the retention.ms can be set to a very large value per topic so that the "log" is never deleted. Would it be possible to allow the offsets.retention.minutes value to be set on a per groupId basis so that it could be adjusted based on the use-case expectations of a particular consumer group?

> Adjust default values of log.retention.hours and offsets.retention.minutes
> --------------------------------------------------------------------------
>
>                 Key: KAFKA-3806
>                 URL: https://issues.apache.org/jira/browse/KAFKA-3806
>             Project: Kafka
>          Issue Type: Improvement
>          Components: config
>    Affects Versions: 0.9.0.1, 0.10.0.0
>            Reporter: Michal Turek
>            Priority: Minor
>             Fix For: 1.1.0
>
>
> Combination of default values of log.retention.hours (168 hours = 7 days) and offsets.retention.minutes (1440 minutes = 1 day) may be dangerous in special cases. Offset retention should be always greater than log retention.
> We have observed the following scenario and issue:
> - Producing of data to a topic was disabled two days ago by producer update, topic wasn't deleted.
> - Consumer consumed all data and properly committed offsets to Kafka.
> - Consumer made no more offset commits for that topic because there was no more incoming data and there was nothing to confirm. (We have auto-commit disabled, I'm not sure how behaves enabled auto-commit.)
> - After one day: Kafka cleared too old offsets according to offsets.retention.minutes.
> - After two days: Long-term running consumer was restarted after update, it didn't find any committed offsets for that topic since they were deleted by offsets.retention.minutes so it started consuming from the beginning.
> - The messages were still in Kafka due to larger log.retention.hours, about 5 days of messages were read again.
> Known workaround to solve this issue:
> - Explicitly configure log.retention.hours and offsets.retention.minutes, don't use defaults.
> Proposals:
> - Prolong default value of offsets.retention.minutes to be at least twice larger than log.retention.hours.
> - Check these values during Kafka startup and log a warning if offsets.retention.minutes is smaller than log.retention.hours.
> - Add a note to migration guide about differences between storing of offsets in ZooKeeper and Kafka (http://kafka.apache.org/documentation.html#upgrade).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)