You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@kafka.apache.org by Avi Flax <av...@aviflax.com> on 2015/10/16 00:18:40 UTC

How are people dealing with time and detecting delayed messages?

I hope this question makes sense, I’m kinda a newbie when it comes to
reasoning about distributed systems.

Let’s say I have a consumer that needs to be able to detect when a
given message was delayed by some period of time (could be due to
network partition or producer errors or whatever). By delayed I mean
that, E.G. a producer learned of some real-world event that happened
at time A, and ideally would have communicated a message to a Kafka
topic within, say, 5 seconds of learning about that event, but because
of an error ends up actually producing the message to Kafka 10 minutes
later. I might have a consumer that needs to detect that delay, to
know that that real-world event actually happened 10 minutes ago.

Is there a best practice or a common pattern that is employed by the
Kafka community for dealing with this sort of thing, something more
sophisticated and robust than just comparing timestamps and hoping
that the clocks of the producer(s) and consumer(s) are more-or-less in
sync? E.G. vector clocks, etc? (Something hopefully more accessible
than atomic clocks and GPS.)

I guess what I’m concerned about is clock drift… some things I’ve read
lately have lead me to think that perhaps I can’t really trust
timestamps naively attached to messages by producers, because the
various producers and consumers in a system could have clocks that are
significantly divergent. (I’m working with a client that uses NTP to
try to keep all node clocks in sync, but has experienced many problems
with this approach.)

It’s entirely possible that I’m thinking about this all wrong, but if
that’s the case I’d greatly appreciate being pointed in the right
direction.

Thank you!
Avi