You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@kafka.apache.org by "Ewen Cheslack-Postava (JIRA)" <ji...@apache.org> on 2018/02/23 05:51:00 UTC

[jira] [Commented] (KAFKA-6433) Connect distributed workers should fail if their config is "incompatible" with leader's

    [ https://issues.apache.org/jira/browse/KAFKA-6433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16373961#comment-16373961 ] 

Ewen Cheslack-Postava commented on KAFKA-6433:
----------------------------------------------

This needs a lot of thought around upgrades, compatibility, and debuggability. There are all sorts of weird issues you can get into with something like this.

I agree that the general goal of checking that important configs are aligned is absolutely the right thing to do. Today, unless I'm forgetting something, we basically only check that the group and config offset match. Lots of other things could potentially mismatch and cause problems.

But things "matching" can be tricky. Topic names are pretty straightforward and we can validate easily. Validating anything like "the same set of connectors" is tricky given both versioning and upgrading a cluster with a *new* connector. Same for converters and transformations. We'd need to define clear rules for what "compatibility" means here and when a node is allowed to run a connector/task. And who is the source of truth? Who defines what's new?

Personally, I'd argue it's actually clearer to have a log message saying "couldn't start connector X because class not found" from node Y than have to determine why all connectors/tasks are running on node Z because node W wasn't allowed to join worker group N for some mismatch of connectors. It might fail faster, but it tells you exactly what the problem is and leads to a clear resolution.

 

 

> Connect distributed workers should fail if their config is "incompatible" with leader's
> ---------------------------------------------------------------------------------------
>
>                 Key: KAFKA-6433
>                 URL: https://issues.apache.org/jira/browse/KAFKA-6433
>             Project: Kafka
>          Issue Type: Improvement
>          Components: KafkaConnect
>    Affects Versions: 1.0.0
>            Reporter: Randall Hauch
>            Priority: Major
>              Labels: needs-kip
>
> Currently, each distributed worker config must have the same `worker.id` and must use the same internal topics for configs, offsets, and status. Additionally, each worker must be configured to have the same connectors, SMTs, and converters; confusing error messages will result when some workers are able to deploy connector tasks with SMTs while others fail when they are missing plugins the other workers do have.
> Ideally, a Connect workers would only be allowed to join the cluster if it were "compatible" with the the existing cluster, where "compatible" perhaps includes using the same internal topics and having the same set of plugins.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)