You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Harsha <ka...@harsha.io> on 2015/04/03 02:42:37 UTC

Re: Per-topic retention.bytes uses kilobytes not bytes?

Hi Willy,
          retention.bytes used to check if log  in total exceeded and take the diff between log.size and retention bytes and only delete those log segment files that exceeds this diff. So its rounded off to the segment.size and it does check bytes not in kbs. Since your log.segment.size is 1024 you are getting that behavior that its looks like its checking kbs.
  

For ex:
 you set retention.bytes 150 and log.segment.size to be 100
if you have two log segment files under your topic with 100 bytes size
1) total log size is 200 bytes and retention.bytes is 150 bytes
2) and diff between the two is 50 and this is lower than any individual log segment file size and it won’t be able to delete.
3) if you added more data lets we have 3 segment files of total size 250 bytes.
4) This time the diff is 100 and it will be able to delete 1 segment file keeping the total log size to 150.

retention.bytes won’t exactly match the total log size. It tries to keep the total log file size close to retention.bytes.
In the above same example if we have 3 log segment files with total size of 300. When the log clean up runs it can only delete 1 log segment file
and keeping the total log size to 200 bytes.

-- 
Harsha


On April 2, 2015 at 2:29:45 PM, Willy Hoang (wh@knewton.com) wrote:

Hello,  

I’ve been having trouble using the retention.bytes per-topic configuration (using Kafka version 0.8.2.1). I had the same issue that users described in these two threads where logs were growing to sizes larger than retention.bytes. I couldn’t find an explanation to explain the issue in either thread.  
http://search-hadoop.com/m/4TaT4Y2YRD1 <http://search-hadoop.com/m/4TaT4Y2YRD1>  
http://search-hadoop.com/m/4TaT4A94w9 <http://search-hadoop.com/m/4TaT4A94w9>  

After a bit of exploring I came up with a hypothesis: retention.bytes uses kilobytes, not bytes, as its unit of measurement.  

Below are reproduceable steps to support my findings.  

# Create a new topic with retention.bytes = 1 and segment.bytes = 1024  
./kafka-topics.sh --create --zookeeper `kafka-zookeeper` --replication-factor 2 --partitions 1 --topic test-topic-wh --config retention.ms=604800000 --config retention.bytes=1 --config segment.bytes=1024  

# Produce a message that will add 1024 bytes to the log (26 bytes of metadata and 998 bytes from the message string)  
# [2015-04-01 21:31:30,192]  
./kafka-console-producer.sh --broker-list localhost:9092 --topic test-topic-wh  
48511592621585064912153832133745068851354167277338568723801212367882940512382099547077656452011868167062280671787644034983697360468153738320733530248963074919916340211639682996497736197584019505594305204918092844365775522508769053709992262705578058943319678767004341493111503613353102924979561571028366773343124814043716584730147544725607450538227253470831289390680687225547253363513232291750196998204510607040879259384601451167183178896571219320889861706587525006032028098059014382213355803535550612056296013517434057006192416475524344248518557786455850822677869343421138195772284656076117000648020242375211903419500185954902765027000903916410762342630905680728543902271883661840640596483915010329616341194914110460126269112972976548329834183117816884560790416331259138123086341037733285781009676617847368669318437423457236162645890525200414080351181649588421908379380799396957194784506503965311272014255330651454364327607848972940341663812345678085583832958639819357061848511592621585064912153832  

ls -r -l /mnt*/spool/kafka/test-topic-wh*  
total 4  
-rw-r--r-- 1 isaak isaak 1024 Apr 1 21:31 00000000000000000018.log  
-rw-r--r-- 1 isaak isaak 10485760 Apr 1 21:27 00000000000000000018.index  

# Wait abut 10 minutes (longer than the 5 minute retention check interval)  
# Note that no changes occured  

# Produce any sized message to exceed the 1024 bytes (1 KB) retention limit  
# [2015-04-01 21:40:04,851]  
./kafka-console-producer.sh --broker-list localhost:9092 --topic test-topic-wh  

ls -r -l /mnt*/spool/kafka/test-topic-wh*  
total 8  
-rw-r--r-- 1 isaak isaak 26 Apr 1 21:40 00000000000000000020.log  
-rw-r--r-- 1 isaak isaak 10485760 Apr 1 21:40 00000000000000000020.index  
-rw-r--r-- 1 isaak isaak 1024 Apr 1 21:31 00000000000000000018.log  
-rw-r--r-- 1 isaak isaak 0 Apr 1 21:40 00000000000000000018.index  

# Note from /var/log/kafka/server.log that the older segment is deleted now that we have exceeded the retention.bytes limit  
[2015-04-01 21:40:10,114] INFO Rolled new log segment for 'test-topic-wh-0' in 0 ms. (kafka.log.Log)  
[2015-04-01 21:42:16,214] INFO Scheduling log segment 18 for log test-topic-wh-0 for deletion. (kafka.log.Log)  
[2015-04-01 21:43:16,217] INFO Deleting segment 18 from log test-topic-wh-0. (kafka.log.Log)  
[2015-04-01 21:43:16,217] INFO Deleting index /mnt/spool/kafka/test-topic-wh-0/00000000000000000018.index.deleted (kafka.log.OffsetIndex)  

ls -r -l /mnt*/spool/kafka/test-topic-wh*  
total 4  
-rw-r--r-- 1 isaak isaak 26 Apr 1 21:40 00000000000000000020.log  
-rw-r--r-- 1 isaak isaak 10485760 Apr 1 21:40 00000000000000000020.index  

I did a similar experiment with segment.bytes=2 and the results were consistent. I was wondering if anyone else has discovered the same thing?  

Regards,  
Willy