You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@kafka.apache.org by "Sophie Blee-Goldman (Jira)" <ji...@apache.org> on 2020/05/08 20:22:00 UTC

[jira] [Commented] (KAFKA-9923) Join window store duplicates can be compacted in changelog

    [ https://issues.apache.org/jira/browse/KAFKA-9923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17102911#comment-17102911 ] 

Sophie Blee-Goldman commented on KAFKA-9923:
--------------------------------------------

This might be a good opportunity to clean up the duplicates interface in general. One idea discussed on KAFKA-9921 was to remove (by deprecation) the window store supplier methods that accept a retainDuplicates flag and instead split the duplicate case into its own layer in the store hierarchy similar to CachingWindowStore. By doing the key wrapping in this layer and just placing it above the ChangeLoggingWindowStore layer we would get a fix for the bug in this ticket. Would need a KIP of course

> Join window store duplicates can be compacted in changelog 
> -----------------------------------------------------------
>
>                 Key: KAFKA-9923
>                 URL: https://issues.apache.org/jira/browse/KAFKA-9923
>             Project: Kafka
>          Issue Type: Bug
>          Components: streams
>            Reporter: Sophie Blee-Goldman
>            Priority: Blocker
>             Fix For: 2.6.0
>
>
> Stream-stream joins use the regular `WindowStore` implementation but with `retainDuplicates` set to true. To allow for duplicates while using the same unique-key underlying stores we just wrap the key with an incrementing sequence number before inserting it.
> This wrapping occurs at the innermost layer of the store hierarchy, which means the duplicates must first pass through the changelogging layer. At this point the keys are still identical. So, we end up sending the records to the changelog without distinct keys and therefore may lose the older of the duplicates during compaction.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)