You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by "Eric Wasserman (JIRA)" <ji...@apache.org> on 2016/06/07 00:56:21 UTC

[jira] [Comment Edited] (KAFKA-1981) Make log compaction point configurable

    [ https://issues.apache.org/jira/browse/KAFKA-1981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15317588#comment-15317588 ] 

Eric Wasserman edited comment on KAFKA-1981 at 6/7/16 12:56 AM:
----------------------------------------------------------------

During the KIP-58 vote it was [suggested](http://mail-archives.apache.org/mod_mbox/kafka-dev/201605.mbox/%3cCABtAgwEBxsRvEOK-unuPTJtSdF+D+pq8FuAaHQL+u9bGaZ3A_A@mail.gmail.com%3e) the name of the sole remaining property be changed from:

    log.cleaner.min.compaction.lag.ms

to

    log.cleaner.compaction.delay.ms

The feature makes a guarantee that the elapsed time between adding a message and its being subject to compaction is _at minimum_ _*x*_ number of milliseconds. This setting is specifying _*x*_.

In particular this guarantee does not really affect *when* a compaction will or will not happen. It only controls which messages will be protected from compaction in the event one occurs.

New Oxford American Dictionary defines:

**Lag** n. (also time lag) a period of time between one event or phenomenon and another: there was a time lag between the commission of the crime and its reporting to the police.

**Delay** n. a period of time by which something is late or postponed: a two-hour delay | long delays in obtaining passports.

Seems to me "lag" is closer than "delay" to the meaning we are after.

When considering alternative phrasing we may want to consider that the other parameters (cumulative message size, or message count) may later be added back into this feature.




was (Author: ewasserman):
During the KIP-58 vote it was [suggested](http://mail-archives.apache.org/mod_mbox/kafka-dev/201605.mbox/%3cCABtAgwEBxsRvEOK-unuPTJtSdF+D+pq8FuAaHQL+u9bGaZ3A_A@mail.gmail.com%3e) the name of the sole remaining property be changed from:

    log.cleaner.min.compaction.lag.ms

to

    log.cleaner.compaction.delay.ms

The feature makes a guarantee that the elapsed time between adding a message and its being subject to compaction is _at minimum_ _**x**_ number of milliseconds. This setting is specifying _**x**_.

In particular this guarantee does not really affect *when* a compaction will or will not happen. It only controls which messages will be protected from compaction in the event one occurs.

New Oxford American Dictionary defines:

**Lag** n. (also time lag) a period of time between one event or phenomenon and another: there was a time lag between the commission of the crime and its reporting to the police.

**Delay** n. a period of time by which something is late or postponed: a two-hour delay | long delays in obtaining passports.

Seems to me "lag" is closer than "delay" to the meaning we are after.

When considering alternative phrasing we may want to consider that the other parameters (cumulative message size, or message count) may later be added back into this feature.



> Make log compaction point configurable
> --------------------------------------
>
>                 Key: KAFKA-1981
>                 URL: https://issues.apache.org/jira/browse/KAFKA-1981
>             Project: Kafka
>          Issue Type: Improvement
>    Affects Versions: 0.8.2.0
>            Reporter: Jay Kreps
>              Labels: newbie++
>         Attachments: KIP for Kafka Compaction Patch.md
>
>
> Currently if you enable log compaction the compactor will kick in whenever you hit a certain "dirty ratio", i.e. when 50% of your data is uncompacted. Other than this we don't give you fine-grained control over when compaction occurs. In addition we never compact the active segment (since it is still being written to).
> Other than this we don't really give you much control over when compaction will happen. The result is that you can't really guarantee that a consumer will get every update to a compacted topic--if the consumer falls behind a bit it might just get the compacted version.
> This is usually fine, but it would be nice to make this more configurable so you could set either a # messages, size, or time bound for compaction.
> This would let you say, for example, "any consumer that is no more than 1 hour behind will get every message."
> This should be relatively easy to implement since it just impacts the end-point the compactor considers available for compaction. I think we already have that concept, so this would just be some other overrides to add in when calculating that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)