You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@kafka.apache.org by "John Roesler (Jira)" <ji...@apache.org> on 2021/03/23 02:51:00 UTC

[jira] [Commented] (KAFKA-12366) Performance regression in stream-table joins on trunk

    [ https://issues.apache.org/jira/browse/KAFKA-12366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17306708#comment-17306708 ] 

John Roesler commented on KAFKA-12366:
--------------------------------------

Thanks for the report [~vcrfxia] , I have reviewed the benchmark results offline, and I believe this was caused by the task idling improvements from KIP-695.

[https://cwiki.apache.org/confluence/display/KAFKA/KIP-695%3A+Further+Improve+Kafka+Streams+Timestamp+Synchronization]

That feature was actually pulled from 2.8, so I adjusted the report to target 3.0, just to avoid confusion.

Also, the benchmark in question shows performance for the last few release branches to be about 68k +/- 7k and for trunk to be around 62K +/- 5K, which is certainly a drop, but still in the ballpark.

In contrast, the change is a significant improvement to the semantics of that join. The reason it's slightly slower now is that it is joining the stream and table records in the correct order, which fixes a pretty bad past behavior of producing many missed join results (i.e., dropping stream records because there's no table record to join with, but only because we didn't ingest the table record yet). We were able to compute those results faster, but only because we were missing a lot of the output one would expect.

Since the perf is still pretty close, and because it seems like the slight drop is well worth the improved results, I'll go ahead and close this. Thanks for raising it for review!

> Performance regression in stream-table joins on trunk
> -----------------------------------------------------
>
>                 Key: KAFKA-12366
>                 URL: https://issues.apache.org/jira/browse/KAFKA-12366
>             Project: Kafka
>          Issue Type: Bug
>          Components: streams
>            Reporter: Victoria Xia
>            Assignee: John Roesler
>            Priority: Blocker
>             Fix For: 3.0.0
>
>
> Stream-table join benchmarks have revealed a significant performance regression on trunk as compared to the latest release version. We should investigate as a blocker prior to the 2.8 release.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)