You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@kafka.apache.org by "Matthias J. Sax (Jira)" <ji...@apache.org> on 2023/02/24 20:04:00 UTC
[jira] [Resolved] (KAFKA-8177) Allow for separate connect instances to have sink connectors with the same name

     [ https://issues.apache.org/jira/browse/KAFKA-8177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Matthias J. Sax resolved KAFKA-8177.
------------------------------------
    Resolution: Fixed

> Allow for separate connect instances to have sink connectors with the same name
> -------------------------------------------------------------------------------
>
>                 Key: KAFKA-8177
>                 URL: https://issues.apache.org/jira/browse/KAFKA-8177
>             Project: Kafka
>          Issue Type: Improvement
>          Components: KafkaConnect
>            Reporter: Paul Whalen
>            Priority: Minor
>              Labels: connect
>
> If you have multiple Connect instances (either a single standalone or distributed group of workers) running against the same Kafka cluster, the connect instances cannot each have a sink connector with the same name and still operate independently. This is because the consumer group ID used internally for reading from the source topic(s) is entirely derived from the connector's name: [https://github.com/apache/kafka/blob/d0e436c471ba4122ddcc0f7a1624546f97c4a517/connect/runtime/src/main/java/org/apache/kafka/connect/util/SinkUtils.java#L24]
> The documentation of Connect implies to me that it supports "multi-tenancy," that is, as long as...
>  * In standalone mode, the {{offset.storage.file.filename}} is not shared between instances
>  * In distributed mode, {{group.id}} and {{config.storage.topic}}, {{offset.storage.topic}}, and {{status.storage.topic}} are not the same between instances
> ... then the connect instances can operate completely independently without fear of conflict.  But the sink connector consumer group naming policy makes this untrue. Obviously this can be achieved by uniquely naming connectors across instances, but in some environments that could be a bit of a nuisance, or a challenging policy to enforce. For instance, imagine a large group of developers or data analysts all running their own standalone Connect to load into a SQL database for their own analysis, or replicating to mirroring to their own local cluster for testing.
> The obvious solution is allow supplying config that gives a Connect instance some notion of identity, and to use that when creating the sink task consumer group. Distributed mode already has this obviously ({{group.id}}), but it would need to be added for standalone mode. Maybe {{instance.id}}? Given that solution it seems like this would need a small KIP.
> I could also imagine this solving this problem through better documentation ("ensure your connector names are unique!"), but having that subtlety doesn't seem worth it to me. (Optionally) assigning identity to every Connect instance seems strictly more clear, without any downside.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)