You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Bart Wyatt <ba...@dsvolition.com> on 2015/09/15 18:38:21 UTC

Durability and Integrity for long standing key-compacted log.

Hello,

We have a set of processing jobs (in samza) using key compacted Kafka logs as a durable Key-Value store.  Recently, after some network troubles that resulted in various parts of the infrastructure rebooting, we discovered that a key that we expected to be "alive" was compacted out of the log.

Because of the nature of the outage and our current level of logging, it is impossible to know whether the application level was at fault and send an erroneous tombstone to Kafka or if Kafka's cleaner was at fault however, it got me thinking whether it was good practice to use Kafka as a long term backing for a Key Value store.

Are there best practices concerning data loss and integrity when expecting certain messages to live "forever" and never be reaped/compacted?  It seems like the basic log abstraction can assume that messages only have to live for their contracted amount of time/space however, with the key compacted logs this can be defeated perpetually.

FWIW, we are deployed on top of ZFS in mirrored mode.

-Bart


________________________________
This e-mail may contain CONFIDENTIAL AND PROPRIETARY INFORMATION and/or PRIVILEGED AND CONFIDENTIAL COMMUNICATION intended solely for the recipient and, therefore, may not be retransmitted to any party outside of the recipient's organization without the prior written consent of the sender. If you have received this e-mail in error please notify the sender immediately by telephone or reply e-mail and destroy the original message without making a copy. Deep Silver, Inc. accepts no liability for any losses or damages resulting from infected e-mail transmissions and viruses in e-mail attachments.