You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Shayne S <sh...@gmail.com> on 2015/06/12 20:08:13 UTC

Log compaction not working as expected

Hi, I'm new to Kafka and having trouble with log compaction.

I'm attempting to set up topics that will aggressively compact, but so far
I'm having trouble getting complete compaction at all.  The topic is
configured like so:

Topic:beer_archive PartitionCount:20 ReplicationFactor:1
Configs:min.cleanable.dirty.ratio=0.01,delete.retention.ms=60000,segment.ms
=1800000,cleanup.policy=compact

The dirty ratio and segment.ms have been changed after duplicated records
have shown up in an attempt to get compaction to work. My test for success
is a dump of keys, comparing the total count to the unique count. This list
is produced like so:

kafka-console-consumer.sh ..... --from-beginning --property print.key=true
| cut -f1 > id_file

This gives me 535,480 unique keys, and a total of 2,230,784 entries. After
making tweaks to the segment.ms to make the last segment eligible for
compaction, SOME compaction did occur a couple times.  A sample compaction
from the log:

[2015-06-12 15:51:31,440] INFO Cleaner 0: Beginning cleaning of log
beer_archive-5. (kafka.log.LogCleaner)
[2015-06-12 15:51:31,441] INFO Cleaner 0: Building offset map for
beer_archive-5... (kafka.log.LogCleaner)
[2015-06-12 15:51:31,580] INFO Cleaner 0: Building offset map for log
beer_archive-5 for 1 segments in offset range [123847, 126857).
(kafka.log.LogCleaner)
[2015-06-12 15:51:31,583] INFO Cleaner 0: Offset map for log beer_archive-5
complete. (kafka.log.LogCleaner)
[2015-06-12 15:51:31,583] INFO Cleaner 0: Cleaning log beer_archive-5
(discarding tombstones prior to Fri Jun 12 14:41:42 UTC 2015)...
(kafka.log.LogCleaner)
[2015-06-12 15:51:31,583] INFO Cleaner 0: Cleaning segment 0 in log
beer_archive-5 (last modified Fri Jun 12 14:42:42 UTC 2015) into 0,
retaining deletes. (kafka.log.LogCleaner)
[2015-06-12 15:51:32,319] INFO Cleaner 0: Cleaning segment 123847 in log
beer_archive-5 (last modified Fri Jun 12 15:26:00 UTC 2015) into 0,
retaining deletes. (kafka.log.LogCleaner)
[2015-06-12 15:51:35,094] INFO Cleaner 0: Swapping in cleaned segment 0 for
segment(s) 0,123847 in log beer_archive-5. (kafka.log.LogCleaner)
[2015-06-12 15:51:35,095] INFO [kafka-log-cleaner-thread-0],
        Log cleaner thread 0 cleaned log beer_archive-5 (dirty section =
[123847, 126857])
        116.5 MB of log processed in 3.7 seconds (31.9 MB/sec).
        Indexed 2.5 MB in 0.1 seconds (17.2 Mb/sec, 3.9% of total time)
        Buffer utilization: 0.0%
        Cleaned 116.5 MB in 3.5 seconds (33.2 Mb/sec, 96.1% of total time)
        Start size: 116.5 MB (111,662 messages)
        End size: 115.0 MB (109,893 messages)
        1.2% size reduction (1.6% fewer messages)
 (kafka.log.LogCleaner)

Any ideas where I'm going wrong?

Thanks!
Shayne