You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@kafka.apache.org by "Domantas Petrauskas (Jira)" <ji...@apache.org> on 2023/05/29 07:48:00 UTC

[jira] [Commented] (KAFKA-12317) Relax non-null key requirement for left/outer KStream joins

    [ https://issues.apache.org/jira/browse/KAFKA-12317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17727033#comment-17727033 ] 

Domantas Petrauskas commented on KAFKA-12317:
---------------------------------------------

I am thinking about working on this issue - recently encountered it, feels like a serious limitation. Since the issue is old-ish and has seen little action, I just wanted to double check if the contributions will be accepted, is getting this issue fixed still relevant for the project?

> Relax non-null key requirement for left/outer KStream joins
> -----------------------------------------------------------
>
>                 Key: KAFKA-12317
>                 URL: https://issues.apache.org/jira/browse/KAFKA-12317
>             Project: Kafka
>          Issue Type: Improvement
>          Components: streams
>            Reporter: Matthias J. Sax
>            Priority: Major
>
> Currently, for a stream-streams and stream-table/globalTable join KafkaStreams drops all stream records with a `null`-key (`null`-join-key for stream-globalTable), because for a `null`-(join)key the join is undefined: ie, we don't have an attribute the do the table lookup (we consider the stream-record as malformed). Note, that we define the semantics of _left/outer_ join as: keep the stream record if no matching join record was found.
> We could relax the definition of _left_ stream-table/globalTable and _left/outer_ stream-stream join though, and not drop `null`-(join)key stream records, and call the ValueJoiner with a `null` "other-side" value instead: if the stream record key (or join-key) is `null`, we could treat is as "failed lookup" instead of treating the stream record as corrupted.
> If we make this change, users that want to keep the current behavior, can add a `filter()` before the join to drop `null`-(join)key records from the stream explicitly.
> Note that this change also requires to change the behavior if we insert a repartition topic before the join: currently, we drop `null`-key record before writing into the repartition topic (as we know they would be dropped later anyway). We need to relax this behavior for a left stream-table and left/outer stream-stream join. User need to be aware (ie, we might need to put this into the docs and JavaDocs), that records with `null`-key would be partitioned randomly.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)