You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@kafka.apache.org by "Nicholas Telford (Jira)" <ji...@apache.org> on 2021/08/13 09:44:00 UTC

[jira] [Commented] (KAFKA-10493) KTable out-of-order updates are not being ignored

    [ https://issues.apache.org/jira/browse/KAFKA-10493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17398550#comment-17398550 ] 

Nicholas Telford commented on KAFKA-10493:
------------------------------------------

This issue is quite serious, because it appears to be quite easy to inadvertently process out-of-order records. In my testing, if you process a backlog/lag of data from input topics with multiple consumer instances, you're almost guaranteed to get wildly out-of-order records. This is because while Kafka Streams guarantees that records are processed in timestamp-order within a consumer, it can't guarantee that _across_ consumers.

For example, in a simple app like: {{builder.topic("events").repartition().toTable(Materialized.as("latest-events"))}}, the {{latest-events}} table is highly likely to show incorrect results for many keys when processing a backlog from the {{events}} topic.

!out-of-order-table.png!

For this reason, I think it's worth solving this issue in advance of KIP-280, and coming up with a stop-gap solution for optimized source topics.

> KTable out-of-order updates are not being ignored
> -------------------------------------------------
>
>                 Key: KAFKA-10493
>                 URL: https://issues.apache.org/jira/browse/KAFKA-10493
>             Project: Kafka
>          Issue Type: Bug
>          Components: streams
>    Affects Versions: 2.6.0
>            Reporter: Pedro Gontijo
>            Assignee: Matthias J. Sax
>            Priority: Blocker
>             Fix For: 4.0.0
>
>         Attachments: KTableOutOfOrderBug.java, out-of-order-table.png
>
>
> On a materialized KTable, out-of-order records for a given key (records which timestamp are older than the current value in store) are not being ignored but used to update the local store value and also being forwarded.
> I believe the bug is here: [https://github.com/apache/kafka/blob/2.6.0/streams/src/main/java/org/apache/kafka/streams/state/internals/ValueAndTimestampSerializer.java#L77] It should return true, not false (see javadoc)
> The bug impacts here: [https://github.com/apache/kafka/blob/2.6.0/streams/src/main/java/org/apache/kafka/streams/kstream/internals/KTableSource.java#L142-L148]
> I have attached a simple stream app that shows the issue happening.
> Thank you!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)