You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@kafka.apache.org by "John Roesler (Jira)" <ji...@apache.org> on 2020/04/28 15:24:00 UTC

[jira] [Commented] (KAFKA-9925) Non-key KTable Joining may result in duplicate schema name in confluence schema registry

    [ https://issues.apache.org/jira/browse/KAFKA-9925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17094604#comment-17094604 ] 

John Roesler commented on KAFKA-9925:
-------------------------------------

Ah, right you are. So sorry I overlooked that part of the bug report when I submitted my fix for it.

The issue is that these "pseudo topics" are being created the same way that real repartition topics get created in the DSL layer, but for real repartition topics, we add them to the InternalTopologyBuilder, which later on invokes org.apache.kafka.streams.processor.internals.InternalTopologyBuilder#decorateTopic to add the applicationId prefix. Of course, this will never happen for the pseudo-topics, since we don't add them to the InternalTopologyBuilder.

The complication is that we don't know the applicationId until the application is started. Currently, both the DSL builder and the runtime are isolated from this because the DSL builder only has to register the topic with the InternalTopologyBuilder, and then the runtime code only has to deal with pre-configured Serdes, which get the pre-decorated topics injected at startup.

I'll submit a PR shortly to fix it.

> Non-key KTable Joining may result in duplicate schema name in confluence schema registry
> ----------------------------------------------------------------------------------------
>
>                 Key: KAFKA-9925
>                 URL: https://issues.apache.org/jira/browse/KAFKA-9925
>             Project: Kafka
>          Issue Type: Bug
>          Components: streams
>    Affects Versions: 2.4.1
>            Reporter: Kin Siu
>            Priority: Major
>
> The second half of issue Andy Bryant reported in KAFKA-9390 looks like still exist.
> When testing non-key join method without passing in "Named", I noticed that there are schema subjects registered in confluent schema registry without consumer group Id still, 
> e.g. 
> {noformat}
> "KTABLE-FK-JOIN-SUBSCRIPTION-REGISTRATION-0000000005-topic-pk-key",
> "KTABLE-FK-JOIN-SUBSCRIPTION-REGISTRATION-0000000005-topic-fk-key",
> "KTABLE-FK-JOIN-SUBSCRIPTION-REGISTRATION-0000000005-topic-vh-value",
> "KTABLE-FK-JOIN-SUBSCRIPTION-REGISTRATION-0000000025-topic-pk-key",
> "KTABLE-FK-JOIN-SUBSCRIPTION-REGISTRATION-0000000025-topic-fk-key",
> "KTABLE-FK-JOIN-SUBSCRIPTION-REGISTRATION-0000000025-topic-vh-value"
> {noformat}
> Code in KTableImpl which constructed above naming :
> https://github.com/apache/kafka/blob/2.4.1/streams/src/main/java/org/apache/kafka/streams/kstream/internals/KTableImpl.java#L959
> When we have multiple topologies using foreignKey join and registered to same schema registry, we can have a name clash, and fail to register schema. 
> In order to clean up these schema subjects, we will need to know the internal naming of a consumer group's topology, which is not straightforward and error prone.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)