You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Jaroslav Libák <ja...@seznam.cz> on 2017/05/30 14:46:54 UTC
log purge - only processed records after certain period
Hello
I'm thinking about using Kafka for messaging use-case, when records will be
entity change events, e.g "orderStateChange". There can be multiple
consumers of events and I do not want to lose any Kafka log records unless
they have been processed by all consumers (e.g due to some consumers
temporarily not working). I prefer to block producers instead of losing
records (that requires problem to be fixed so that records are processed).
I read Kafka documentation and the above of course doesn't work out of box
as Kafka topic doesn't know about consumers and log purge is either size or
period based, or we use compaction.
But it seems the above scenario could be implemented by using log
compaction. Each record key would be something like "eventType/entityId/
someUniqueHash" so that records out of box do not get compacted. Topics
would have no size or period based purging. There would be a consumer that
would discover indexes per partition for all other consumers and consume up
to the lowest common index, producing new record with (key, null), which
according to documentation means log compaction will delete records with
that given key (so that the original and the null value record will get
deleted). This unfortunately means consumers would see (key, null) records
they would have to ignore.
It is not clear to me how Kafka handles situation when we are running with
low disk space - does it fill up disk space until OS file write error is
returned? I want to block producers before that happens. I haven't found any
limits that would make Kafka refuse new records in topic.
Due to above problems it seems it would be best not to use Kafka. One
alternative is RabbitMQ, but Kafka has the advantage of speed (single topic
per event type instead of single topic per consumer so less IO) and keeping
messages persisted even after being consumed (listeners do not need to be
known at the time of event production).
Jaroslav