You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@kafka.apache.org by "Guozhang Wang (Jira)" <ji...@apache.org> on 2023/02/27 19:26:00 UTC

[jira] [Commented] (KAFKA-14748) Relax non-null FK left-join requirement

    [ https://issues.apache.org/jira/browse/KAFKA-14748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17694147#comment-17694147 ] 

Guozhang Wang commented on KAFKA-14748:
---------------------------------------

Originally I think this can be treated without a KIP, just as a fix in join semantics. But when I think again I realized it may not be the case, primarily because in that case we cannot distinguish the following two cases:

1) Key extractor returns non-null `K0`, and then found a matching record for `K0` with a null `V0`, resulting in `<K, join(V, null)>`.

2) Key extractor returns null `K0`, and hence we directly result in `<K, join(V, null)>`.

Hence, adding a `filter` operator after the `join` operator alone for `<K, join(V, null)>` cannot preserve the old behavior if a developer really wants that..

In fact, the same question applies for the general issue of https://issues.apache.org/jira/browse/KAFKA-12317 as well: should we try to distinguish between the case of extracting a null key for the join, v.s. a case where non-null extracted key did not found a matching record on the other relation (or more specifically, the other relation returns a null value with the extracted key).

My thoughts about the above question are as follows: put performance benefits aside, for app semantics where the developers knows there are certain keys in the other relation which would never exist (i.e. would always return a null value), then developer could let the key extractor to return those keys when they want to return no-matching join results; that means, the value of KAFKA-12317/KAFKA-14748 would be when the developer does not know any keys in the other relations that would never exist.

If we want to change to that behavior which would not distinguish these two cases, I'd suggest we add a flag config to enable this behavior across fk/out/left joins, and to remove it (i.e. always enable it) when we did not hear people complain about the behavior change for a while. But this would result in a KIP..

> Relax non-null FK left-join requirement
> ---------------------------------------
>
>                 Key: KAFKA-14748
>                 URL: https://issues.apache.org/jira/browse/KAFKA-14748
>             Project: Kafka
>          Issue Type: Improvement
>          Components: streams
>            Reporter: Matthias J. Sax
>            Priority: Major
>
> Kafka Streams enforces a strict non-null-key policy in the DSL across all key-dependent operations (like aggregations and joins).
> This also applies to FK-joins, in particular to the ForeignKeyExtractor. If it returns `null`, it's treated as invalid. For left-joins, it might make sense to still accept a `null`, and add the left-hand record with an empty right-hand-side to the result.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)