You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@kafka.apache.org by "Sergio Duran Vegas (Jira)" <ji...@apache.org> on 2021/10/20 05:03:00 UTC

[jira] [Updated] (KAFKA-13386) Foreign Key Join filtering out valid records after a code change / schema evolved

     [ https://issues.apache.org/jira/browse/KAFKA-13386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sergio Duran Vegas updated KAFKA-13386:
---------------------------------------
    Summary: Foreign Key Join filtering out valid records after a code change / schema evolved  (was: Foreign Key Join filtering valid records after a code change / schema evolved)

> Foreign Key Join filtering out valid records after a code change / schema evolved
> ---------------------------------------------------------------------------------
>
>                 Key: KAFKA-13386
>                 URL: https://issues.apache.org/jira/browse/KAFKA-13386
>             Project: Kafka
>          Issue Type: Bug
>          Components: streams
>    Affects Versions: 2.6.2
>            Reporter: Sergio Duran Vegas
>            Priority: Major
>
> The join optimization assumes the serializer is deterministic and invariant across upgrades. I can recall other discussions in which we wanted to rely on the same property, for example when computing whether an update is a duplicate result or not.
>  
> [https://github.com/apache/kafka/blob/trunk/streams/src/main/java/org/apache/kafka/streams/kstream/internals/foreignkeyjoin/SubscriptionResolverJoinProcessorSupplier.java]
>  
> {code:java}
> //If this value doesn't match the current value from the original table, it is stale and should be discarded.
>  if (java.util.Arrays.equals(messageHash, currentHash)) {{code}
>  
> A solution for this problem would be that the comparison use foreign-key reference itself instead of the whole message hash.
>  
> The bug fix proposal is to be allow the user to choose between one method of comparison or another (whole hash or Fk reference). This would fix the problem of dropping valid records on certain cases and allow the user to also choose the current optimized way of checking valid records and intermediate results dropping.
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)