You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Neeraj Vaidya <ne...@yahoo.co.in.INVALID> on 2021/04/20 23:21:56 UTC

Apache Kafka Streams : Out-of-Order messages & uses of TimeStamp extractor

Hi,
I have asked this on StackOverflow, but will ask it here as well.

I have an Apache Kafka 2.6 Producer which writes to topic-A (TA). I also have a Kafka streams application which consumes from TA and writes to topic-B (TB). In the streams application, I have a custom timestamp extractor which extracts the timestamp from the message payload.

For one of my failure handling test cases, I shutdown the Kafka cluster while my applications are running.

When the producer application tries to write messages to TA, it cannot because the cluster is down and hence (I assume) buffers the messages. Let's say it receives 4 messages m1,m2,m3,m4 in increasing time order. (i.e. m1 is first and m4 is last).

When I bring the Kafka cluster back online, the producer sends the buffered messages to the topic, but they are not in order. I receive for example, m2 then m3 then m1 and then m4.

Why is that ? Is it because the buffering in the producer is multi-threaded with each producing to the topic at the same time ?

I assumed that the custom timestamp extractor would help in ordering messages when consuming them. But they do not. Or maybe my understanding of the timestamp extractor is wrong.

If not, then what are the specific uses of the timestamp extractor ? Just to associate a timestamp with an event ?

I got one solution from SO here, to just stream all events from tA to another intermediate topic (say tA') which will use the TimeStamp extractor to another topic. But I am not sure if this will cause the events to get reordered based on the extracted timestamp.

Regards,
Neeraj

Re: Apache Kafka Streams : Out-of-Order messages & uses of TimeStamp extractor

Posted by "Matthias J. Sax" <mj...@apache.org>.
Replied on StackOverflow:
https://stackoverflow.com/questions/67158317/apache-kafka-streams-out-of-order-messages


-Matthias



On 4/20/21 4:21 PM, Neeraj Vaidya wrote:
> Hi,
> I have asked this on StackOverflow, but will ask it here as well.
> 
> I have an Apache Kafka 2.6 Producer which writes to topic-A (TA). I also have a Kafka streams application which consumes from TA and writes to topic-B (TB). In the streams application, I have a custom timestamp extractor which extracts the timestamp from the message payload.
> 
> For one of my failure handling test cases, I shutdown the Kafka cluster while my applications are running.
> 
> When the producer application tries to write messages to TA, it cannot because the cluster is down and hence (I assume) buffers the messages. Let's say it receives 4 messages m1,m2,m3,m4 in increasing time order. (i.e. m1 is first and m4 is last).
> 
> When I bring the Kafka cluster back online, the producer sends the buffered messages to the topic, but they are not in order. I receive for example, m2 then m3 then m1 and then m4.
> 
> Why is that ? Is it because the buffering in the producer is multi-threaded with each producing to the topic at the same time ?
> 
> I assumed that the custom timestamp extractor would help in ordering messages when consuming them. But they do not. Or maybe my understanding of the timestamp extractor is wrong.
> 
> If not, then what are the specific uses of the timestamp extractor ? Just to associate a timestamp with an event ?
> 
> I got one solution from SO here, to just stream all events from tA to another intermediate topic (say tA') which will use the TimeStamp extractor to another topic. But I am not sure if this will cause the events to get reordered based on the extracted timestamp.
> 
> Regards,
> Neeraj
>