You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@kafka.apache.org by "Sophie Blee-Goldman (Jira)" <ji...@apache.org> on 2020/07/30 21:29:00 UTC

[jira] [Commented] (KAFKA-10137) Clean-up retain Duplicate logic in Window Stores

    [ https://issues.apache.org/jira/browse/KAFKA-10137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17168234#comment-17168234 ] 

Sophie Blee-Goldman commented on KAFKA-10137:
---------------------------------------------

I think there's actually a real bug lurking here: I was looking at the ChangeLoggingWindowBytesStore and noticed we seem to insert the sequence number into the changelogged bytes regardless of `retainDuplicates`. 

We peel off the unnecessary seqnum during restoration, so it doesn't seem to cause any correctness issues. But we're obviously storing an extra 4 bytes per window store changelog record for no reason. Unfortunatel,y I'm not sure how this can be fixed in a backwards compatible way

> Clean-up retain Duplicate logic in Window Stores
> ------------------------------------------------
>
>                 Key: KAFKA-10137
>                 URL: https://issues.apache.org/jira/browse/KAFKA-10137
>             Project: Kafka
>          Issue Type: Task
>          Components: streams
>    Affects Versions: 2.5.0
>            Reporter: Bruno Cadonna
>            Priority: Minor
>
> Stream-stream joins use the regular `WindowStore` implementation but with `retainDuplicates` set to true. To allow for duplicates while using the same unique-key underlying stores we just wrap the key with an incrementing sequence number before inserting it.
> The logic to maintain and append the sequence number is present in multiple locations, namely in the changelogging window store and in its underlying window stores. We should consolidate this code to one single location.  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)