You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Jaroslav Libák <ja...@seznam.cz> on 2017/05/30 14:46:54 UTC

log purge - only processed records after certain period

Hello

I'm thinking about using Kafka for messaging use-case, when records will be 
entity change events, e.g "orderStateChange". There can be multiple 
consumers of events and I do not want to lose any Kafka log records unless 
they have been processed by all consumers (e.g due to some consumers 
temporarily not working). I prefer to block producers instead of losing 
records (that requires problem to be fixed so that records are processed).

I read Kafka documentation and the above of course doesn't work out of box 
as Kafka topic doesn't know about consumers and log purge is either size or 
period based, or we use compaction.

But it seems the above scenario could be implemented by using log 
compaction. Each record key would be something like "eventType/entityId/
someUniqueHash" so that records out of box do not get compacted. Topics 
would have no size or period based purging. There would be a consumer that 
would discover indexes per partition for all other consumers and consume up 
to the lowest common index, producing new record with (key, null), which 
according to documentation means log compaction will delete records with 
that given key (so that the original and the null value record will get 
deleted). This unfortunately means consumers would see (key, null) records 
they would have to ignore.

It is not clear to me how Kafka handles situation when we are running with 
low disk space - does it fill up disk space until OS file write error is 
returned? I want to block producers before that happens. I haven't found any
limits that would make Kafka refuse new records in topic.

Due to above problems it seems it would be best not to use Kafka. One 
alternative is RabbitMQ, but Kafka has the advantage of speed (single topic 
per event type instead of single topic per consumer so less IO) and keeping 
messages persisted even after being consumed (listeners do not need to be 
known at the time of event production).

Jaroslav