You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2016/05/09 06:34:13 UTC

[jira] [Commented] (KAFKA-1489) Global threshold on data retention size

    [ https://issues.apache.org/jira/browse/KAFKA-1489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15276004#comment-15276004 ] 

ASF GitHub Bot commented on KAFKA-1489:
---------------------------------------

GitHub user bendrees opened a pull request:

    https://github.com/apache/kafka/pull/1348

    KAFKA-1489: Global threshold on data retention size

    Implemented a "log retention policy" based on keeping a certain
    percentage of disk space free. In dynamic situations where topics
    are added in unpredictable ways, the other log retention
    parameters are not entirely sufficient to prevent out-of-disk
    conditions from occurring. The new log.retention.disk.usage.percent
    parameter provides this guarantee. It is applied after all the
    other retention parameters are applied, at the end of each log
    cleanup cycle. Oldest segments (across all topics) are pruned
    until usage falls below this percentage of each disk's capacity.
    The default value is 100, which effectively disables the feature.
    
    This is my original work and I license the work to the project under
    the project's open source license.
    
    @junrao, @jkreps, @gwenshap

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/bendrees/kafka KAFKA-1489

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/kafka/pull/1348.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1348
    
----
commit 26ef1c5e4a432421f9c1dbdac84d19de1d0ccf54
Author: Ben Drees <be...@zenti.com>
Date:   2016-05-09T06:29:48Z

    Implemented a "log retention policy" based on keeping a certain
    percentage of disk space free. In dynamic situations where topics
    are added in unpredictable ways, the other log retention
    parameters are not entirely sufficient to prevent out-of-disk
    conditions from occurring. The new log.retention.disk.usage.percent
    parameter provides this guarantee. It is applied after all the
    other retention parameters are applied, at the end of each log
    cleanup cycle. Oldest segments (across all topics) are pruned
    until usage falls below this percentage of each disk's capacity.
    The default value is 100, which effectively disables the feature.

----


> Global threshold on data retention size
> ---------------------------------------
>
>                 Key: KAFKA-1489
>                 URL: https://issues.apache.org/jira/browse/KAFKA-1489
>             Project: Kafka
>          Issue Type: New Feature
>          Components: log
>    Affects Versions: 0.8.1.1
>            Reporter: Andras Sereny
>
> Currently, Kafka has per topic settings to control the size of one single log (log.retention.bytes). With lots of topics of different volume and as they grow in number, it could become tedious to maintain topic level settings applying to a single log. 
> Often, a chunk of disk space is dedicated to Kafka that hosts all logs stored, so it'd make sense to have a configurable threshold to control how much space *all* data in one Kafka log data directory can take up.
> See also:
> http://mail-archives.apache.org/mod_mbox/kafka-users/201406.mbox/browser
> http://mail-archives.apache.org/mod_mbox/kafka-users/201311.mbox/%3C20131107015125.GC9718@jkoshy-ld.linkedin.biz%3E



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)