You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by Mickael Maison <mi...@gmail.com> on 2023/02/23 09:52:02 UTC

[DISCUSS] KIP-908: Add description field to connector configuration

Hi,

I created a very small KIP to add a description field to connectors:
https://cwiki.apache.org/confluence/display/KAFKA/KIP-908%3A+Add+description+field+to+connector+configuration

Let me know if you have any feedback.

Thanks,
Mickael

Re: [DISCUSS] KIP-908: Add description field to connector configuration

Posted by Mickael Maison <mi...@gmail.com>.
Thanks Chris for the discussion.

Since nobody else expressed interest in this KIP, I'm going to discard it.

On Wed, Mar 8, 2023 at 8:41 PM Chris Egerton <ch...@aiven.io.invalid> wrote:
>
> Hi Mickael,
>
> I do sympathize with the desire for a "quick fix". I think your point about
> these being problematic to support sums up my hesitation here pretty well,
> both with respect to the potential footgun of unintended rebalances (should
> users try to do more with this field than we expect), and with making other
> similar improvements (i.e., expanded metadata support in Kafka Connect)
> more difficult to design correctly for us and to use effectively for users
> (especially if we were to deprecate/remove the description field at a later
> date).
>
> It's also worth noting that users can already add a "description" field to
> the JSON config for any connector they create; the benefit provided by this
> KIP is that they're automatically prompted to do so if they're using a
> UI/CLI/etc. that builds on top of the various /connector-plugins endpoints
> to find the set of configuration properties that a user can/should populate
> before creating a connector.
>
> I'd welcome thoughts from you and others on whether the tradeoffs here are
> worth it; right now I'm still not sure about this one.
>
> Cheers,
>
> Chris
>
> On Tue, Mar 7, 2023 at 6:42 AM Mickael Maison <mi...@gmail.com>
> wrote:
>
> > H Chris,
> >
> > Thanks for taking a look.
> >
> > 1. Yes updating the description can potentially trigger a rebalance. I
> > don't expect users to frequently update the description so I thought
> > this was acceptable. I've added a note to the KIP to mention it.
> >
> > 2. The tags model you described could be interesting but it looks
> > pretty involved with multiple new endpoints and brand new concepts.
> >
> > With this KIP, I really took the most basic approach and proposed the
> > simplest set of changes that could get in the next release and, I
> > think, immediately bring benefits. However sometimes this is not the
> > best approach as a "quick fix" could end up problematic to support in
> > the future. The drawbacks (new connector config field + causing a
> > rebalance on update) look relatively benign to me so I thought this
> > could be an acceptable proposal.
> >
> > Thanks,
> > Mickael
> >
> >
> >
> > On Thu, Feb 23, 2023 at 2:45 PM Chris Egerton <ch...@aiven.io.invalid>
> > wrote:
> > >
> > > Actually, I misspoke--a rebalance isn't triggered when an existing
> > > connector's config is updated. Assuming the set of workers remains
> > stable,
> > > a rebalance is only necessary when a new connector is created, an
> > existing
> > > one is deleted, or a new set of task configs is generated.
> > >
> > > This weakens the point about unnecessary rebalances when a connector's
> > > description is updated, but doesn't entirely address it. Spurious
> > > rebalances may still be triggered if a new set of task configs is
> > > generated, which for reasons outlined above, is fairly likely.
> > >
> > > On Thu, Feb 23, 2023 at 7:41 AM Chris Egerton <ch...@aiven.io> wrote:
> > >
> > > > Hi Mickael,
> > > >
> > > > Thanks for the KIP!
> > > >
> > > > While it's tempting to add this field to the out-of-the-box connector
> > > > config def, I'm a little hesitant, for two reasons.
> > > >
> > > > 1. Adding this directly to the connector config could have unintended
> > > > consequences on the actual data processing by the connector. Any time a
> > > > connector's config is modified, the Connector object running for it is
> > > > restarted with that new config. In most cases this is a trivial
> > operation
> > > > since we have incremental rebalancing enabled by default, the
> > connector can
> > > > (and probably should!) generate task configs that are functionally
> > > > identical to the ones it last generated, and most (though not all)
> > > > Connector classes are fairly lightweight and leave the real work to
> > their
> > > > Task class. However, due to KAFKA-9228 [1], it's not just common
> > practice
> > > > for connectors to perform transparent passthrough of most of their
> > configs
> > > > when generating task configs, it's actually necessary to work around a
> > bug
> > > > in the runtime. As a result, tweaking the description of a connector
> > would
> > > > be fairly likely to result in a full restart of all of its tasks, in
> > > > addition to triggering two rebalances (which may not be so lightweight
> > if
> > > > users are still running with eager rebalancing... which, sadly, I've
> > heard
> > > > is still happening today).
> > > >
> > > > 2. The motivation section mentions some information that might go into
> > the
> > > > description field, such as the team that owns the connector and
> > emergency
> > > > contact info. It seems like this info might benefit from a little more
> > > > structure if we're trying to design for programmatic access by GUIs and
> > > > CLIs (which I'm assuming is the case, since I can't imagine a human
> > being
> > > > getting much use out of the raw output from the GET
> > > > /connector-plugins/{name}/config and PUT
> > > > /connector-plugins/{name}/config/validate endpoints). This might also
> > make
> > > > it easier to add custom validation logic around what kind of
> > information is
> > > > present via REST extension.
> > > >
> > > >
> > > > With these thoughts in mind, what do you think about adding a new
> > generic
> > > > "tags" object to connectors that can support arbitrary user-provided
> > > > key/value pairs? If using the POST /connectors endpoint, it might look
> > > > something like this:
> > > >
> > > > {
> > > >   "name": "my-connector",
> > > >   "config": {
> > > >     "connector.class": "MirrorSource",
> > > >     "tasks.max": "908"
> > > >   },
> > > >   "tags": {
> > > >     "team": "data-infra",
> > > >     "phone": "12345678901",
> > > >     "email": "dinfra@example.org"
> > > >   }
> > > > }
> > > >
> > > > And, to allow users to modify connector tags after one has been
> > created,
> > > > we might introduce a /connectors/{name}/tags endpoint with
> > PUT/PATCH/DELETE
> > > > verbs that writes tag info for a connector to the config topic, but
> > without
> > > > altering the actual connector config (allowing us to skip a rebalance
> > > > altogether).
> > > >
> > > > One other thing we could consider is allowing cluster administrators to
> > > > require that connectors are created with a certain set of tags, or
> > even add
> > > > per-tag regex validation. They might specify something like
> > > > "connector.required.tags = team, phone, email" or
> > > > "connector.tags.phone.regex = [0-9]{11}" in a worker config file. But
> > this
> > > > is probably overboard for now, especially since it's already possible
> > to
> > > > accomplish via a REST extension.
> > > >
> > > > Finally, I'm also wondering if, in pursuit of expanding Connect's
> > > > out-of-the-box support for dealing with connector metadata, we might
> > want
> > > > to expose created/last-updated times for connector configurations. We
> > > > definitely don't have to do that in this KIP but if you agree that this
> > > > would be useful, we should probably keep it in mind to avoid painting
> > > > ourselves into a corner. This is why I'm thinking of using "tags"
> > instead
> > > > of something a little more generic like "metadata", BTW.
> > > >
> > > > [1] -
> > > >
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-908%3A+Add+description+field+to+connector+configuration
> > > >
> > > > Cheers,
> > > >
> > > > Chris
> > > >
> > > > On Thu, Feb 23, 2023 at 4:52 AM Mickael Maison <
> > mickael.maison@gmail.com>
> > > > wrote:
> > > >
> > > >> Hi,
> > > >>
> > > >> I created a very small KIP to add a description field to connectors:
> > > >>
> > > >>
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-908%3A+Add+description+field+to+connector+configuration
> > > >>
> > > >> Let me know if you have any feedback.
> > > >>
> > > >> Thanks,
> > > >> Mickael
> > > >>
> > > >
> >

Re: [DISCUSS] KIP-908: Add description field to connector configuration

Posted by Chris Egerton <ch...@aiven.io.INVALID>.
Hi Mickael,

I do sympathize with the desire for a "quick fix". I think your point about
these being problematic to support sums up my hesitation here pretty well,
both with respect to the potential footgun of unintended rebalances (should
users try to do more with this field than we expect), and with making other
similar improvements (i.e., expanded metadata support in Kafka Connect)
more difficult to design correctly for us and to use effectively for users
(especially if we were to deprecate/remove the description field at a later
date).

It's also worth noting that users can already add a "description" field to
the JSON config for any connector they create; the benefit provided by this
KIP is that they're automatically prompted to do so if they're using a
UI/CLI/etc. that builds on top of the various /connector-plugins endpoints
to find the set of configuration properties that a user can/should populate
before creating a connector.

I'd welcome thoughts from you and others on whether the tradeoffs here are
worth it; right now I'm still not sure about this one.

Cheers,

Chris

On Tue, Mar 7, 2023 at 6:42 AM Mickael Maison <mi...@gmail.com>
wrote:

> H Chris,
>
> Thanks for taking a look.
>
> 1. Yes updating the description can potentially trigger a rebalance. I
> don't expect users to frequently update the description so I thought
> this was acceptable. I've added a note to the KIP to mention it.
>
> 2. The tags model you described could be interesting but it looks
> pretty involved with multiple new endpoints and brand new concepts.
>
> With this KIP, I really took the most basic approach and proposed the
> simplest set of changes that could get in the next release and, I
> think, immediately bring benefits. However sometimes this is not the
> best approach as a "quick fix" could end up problematic to support in
> the future. The drawbacks (new connector config field + causing a
> rebalance on update) look relatively benign to me so I thought this
> could be an acceptable proposal.
>
> Thanks,
> Mickael
>
>
>
> On Thu, Feb 23, 2023 at 2:45 PM Chris Egerton <ch...@aiven.io.invalid>
> wrote:
> >
> > Actually, I misspoke--a rebalance isn't triggered when an existing
> > connector's config is updated. Assuming the set of workers remains
> stable,
> > a rebalance is only necessary when a new connector is created, an
> existing
> > one is deleted, or a new set of task configs is generated.
> >
> > This weakens the point about unnecessary rebalances when a connector's
> > description is updated, but doesn't entirely address it. Spurious
> > rebalances may still be triggered if a new set of task configs is
> > generated, which for reasons outlined above, is fairly likely.
> >
> > On Thu, Feb 23, 2023 at 7:41 AM Chris Egerton <ch...@aiven.io> wrote:
> >
> > > Hi Mickael,
> > >
> > > Thanks for the KIP!
> > >
> > > While it's tempting to add this field to the out-of-the-box connector
> > > config def, I'm a little hesitant, for two reasons.
> > >
> > > 1. Adding this directly to the connector config could have unintended
> > > consequences on the actual data processing by the connector. Any time a
> > > connector's config is modified, the Connector object running for it is
> > > restarted with that new config. In most cases this is a trivial
> operation
> > > since we have incremental rebalancing enabled by default, the
> connector can
> > > (and probably should!) generate task configs that are functionally
> > > identical to the ones it last generated, and most (though not all)
> > > Connector classes are fairly lightweight and leave the real work to
> their
> > > Task class. However, due to KAFKA-9228 [1], it's not just common
> practice
> > > for connectors to perform transparent passthrough of most of their
> configs
> > > when generating task configs, it's actually necessary to work around a
> bug
> > > in the runtime. As a result, tweaking the description of a connector
> would
> > > be fairly likely to result in a full restart of all of its tasks, in
> > > addition to triggering two rebalances (which may not be so lightweight
> if
> > > users are still running with eager rebalancing... which, sadly, I've
> heard
> > > is still happening today).
> > >
> > > 2. The motivation section mentions some information that might go into
> the
> > > description field, such as the team that owns the connector and
> emergency
> > > contact info. It seems like this info might benefit from a little more
> > > structure if we're trying to design for programmatic access by GUIs and
> > > CLIs (which I'm assuming is the case, since I can't imagine a human
> being
> > > getting much use out of the raw output from the GET
> > > /connector-plugins/{name}/config and PUT
> > > /connector-plugins/{name}/config/validate endpoints). This might also
> make
> > > it easier to add custom validation logic around what kind of
> information is
> > > present via REST extension.
> > >
> > >
> > > With these thoughts in mind, what do you think about adding a new
> generic
> > > "tags" object to connectors that can support arbitrary user-provided
> > > key/value pairs? If using the POST /connectors endpoint, it might look
> > > something like this:
> > >
> > > {
> > >   "name": "my-connector",
> > >   "config": {
> > >     "connector.class": "MirrorSource",
> > >     "tasks.max": "908"
> > >   },
> > >   "tags": {
> > >     "team": "data-infra",
> > >     "phone": "12345678901",
> > >     "email": "dinfra@example.org"
> > >   }
> > > }
> > >
> > > And, to allow users to modify connector tags after one has been
> created,
> > > we might introduce a /connectors/{name}/tags endpoint with
> PUT/PATCH/DELETE
> > > verbs that writes tag info for a connector to the config topic, but
> without
> > > altering the actual connector config (allowing us to skip a rebalance
> > > altogether).
> > >
> > > One other thing we could consider is allowing cluster administrators to
> > > require that connectors are created with a certain set of tags, or
> even add
> > > per-tag regex validation. They might specify something like
> > > "connector.required.tags = team, phone, email" or
> > > "connector.tags.phone.regex = [0-9]{11}" in a worker config file. But
> this
> > > is probably overboard for now, especially since it's already possible
> to
> > > accomplish via a REST extension.
> > >
> > > Finally, I'm also wondering if, in pursuit of expanding Connect's
> > > out-of-the-box support for dealing with connector metadata, we might
> want
> > > to expose created/last-updated times for connector configurations. We
> > > definitely don't have to do that in this KIP but if you agree that this
> > > would be useful, we should probably keep it in mind to avoid painting
> > > ourselves into a corner. This is why I'm thinking of using "tags"
> instead
> > > of something a little more generic like "metadata", BTW.
> > >
> > > [1] -
> > >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-908%3A+Add+description+field+to+connector+configuration
> > >
> > > Cheers,
> > >
> > > Chris
> > >
> > > On Thu, Feb 23, 2023 at 4:52 AM Mickael Maison <
> mickael.maison@gmail.com>
> > > wrote:
> > >
> > >> Hi,
> > >>
> > >> I created a very small KIP to add a description field to connectors:
> > >>
> > >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-908%3A+Add+description+field+to+connector+configuration
> > >>
> > >> Let me know if you have any feedback.
> > >>
> > >> Thanks,
> > >> Mickael
> > >>
> > >
>

Re: [DISCUSS] KIP-908: Add description field to connector configuration

Posted by Mickael Maison <mi...@gmail.com>.
H Chris,

Thanks for taking a look.

1. Yes updating the description can potentially trigger a rebalance. I
don't expect users to frequently update the description so I thought
this was acceptable. I've added a note to the KIP to mention it.

2. The tags model you described could be interesting but it looks
pretty involved with multiple new endpoints and brand new concepts.

With this KIP, I really took the most basic approach and proposed the
simplest set of changes that could get in the next release and, I
think, immediately bring benefits. However sometimes this is not the
best approach as a "quick fix" could end up problematic to support in
the future. The drawbacks (new connector config field + causing a
rebalance on update) look relatively benign to me so I thought this
could be an acceptable proposal.

Thanks,
Mickael



On Thu, Feb 23, 2023 at 2:45 PM Chris Egerton <ch...@aiven.io.invalid> wrote:
>
> Actually, I misspoke--a rebalance isn't triggered when an existing
> connector's config is updated. Assuming the set of workers remains stable,
> a rebalance is only necessary when a new connector is created, an existing
> one is deleted, or a new set of task configs is generated.
>
> This weakens the point about unnecessary rebalances when a connector's
> description is updated, but doesn't entirely address it. Spurious
> rebalances may still be triggered if a new set of task configs is
> generated, which for reasons outlined above, is fairly likely.
>
> On Thu, Feb 23, 2023 at 7:41 AM Chris Egerton <ch...@aiven.io> wrote:
>
> > Hi Mickael,
> >
> > Thanks for the KIP!
> >
> > While it's tempting to add this field to the out-of-the-box connector
> > config def, I'm a little hesitant, for two reasons.
> >
> > 1. Adding this directly to the connector config could have unintended
> > consequences on the actual data processing by the connector. Any time a
> > connector's config is modified, the Connector object running for it is
> > restarted with that new config. In most cases this is a trivial operation
> > since we have incremental rebalancing enabled by default, the connector can
> > (and probably should!) generate task configs that are functionally
> > identical to the ones it last generated, and most (though not all)
> > Connector classes are fairly lightweight and leave the real work to their
> > Task class. However, due to KAFKA-9228 [1], it's not just common practice
> > for connectors to perform transparent passthrough of most of their configs
> > when generating task configs, it's actually necessary to work around a bug
> > in the runtime. As a result, tweaking the description of a connector would
> > be fairly likely to result in a full restart of all of its tasks, in
> > addition to triggering two rebalances (which may not be so lightweight if
> > users are still running with eager rebalancing... which, sadly, I've heard
> > is still happening today).
> >
> > 2. The motivation section mentions some information that might go into the
> > description field, such as the team that owns the connector and emergency
> > contact info. It seems like this info might benefit from a little more
> > structure if we're trying to design for programmatic access by GUIs and
> > CLIs (which I'm assuming is the case, since I can't imagine a human being
> > getting much use out of the raw output from the GET
> > /connector-plugins/{name}/config and PUT
> > /connector-plugins/{name}/config/validate endpoints). This might also make
> > it easier to add custom validation logic around what kind of information is
> > present via REST extension.
> >
> >
> > With these thoughts in mind, what do you think about adding a new generic
> > "tags" object to connectors that can support arbitrary user-provided
> > key/value pairs? If using the POST /connectors endpoint, it might look
> > something like this:
> >
> > {
> >   "name": "my-connector",
> >   "config": {
> >     "connector.class": "MirrorSource",
> >     "tasks.max": "908"
> >   },
> >   "tags": {
> >     "team": "data-infra",
> >     "phone": "12345678901",
> >     "email": "dinfra@example.org"
> >   }
> > }
> >
> > And, to allow users to modify connector tags after one has been created,
> > we might introduce a /connectors/{name}/tags endpoint with PUT/PATCH/DELETE
> > verbs that writes tag info for a connector to the config topic, but without
> > altering the actual connector config (allowing us to skip a rebalance
> > altogether).
> >
> > One other thing we could consider is allowing cluster administrators to
> > require that connectors are created with a certain set of tags, or even add
> > per-tag regex validation. They might specify something like
> > "connector.required.tags = team, phone, email" or
> > "connector.tags.phone.regex = [0-9]{11}" in a worker config file. But this
> > is probably overboard for now, especially since it's already possible to
> > accomplish via a REST extension.
> >
> > Finally, I'm also wondering if, in pursuit of expanding Connect's
> > out-of-the-box support for dealing with connector metadata, we might want
> > to expose created/last-updated times for connector configurations. We
> > definitely don't have to do that in this KIP but if you agree that this
> > would be useful, we should probably keep it in mind to avoid painting
> > ourselves into a corner. This is why I'm thinking of using "tags" instead
> > of something a little more generic like "metadata", BTW.
> >
> > [1] -
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-908%3A+Add+description+field+to+connector+configuration
> >
> > Cheers,
> >
> > Chris
> >
> > On Thu, Feb 23, 2023 at 4:52 AM Mickael Maison <mi...@gmail.com>
> > wrote:
> >
> >> Hi,
> >>
> >> I created a very small KIP to add a description field to connectors:
> >>
> >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-908%3A+Add+description+field+to+connector+configuration
> >>
> >> Let me know if you have any feedback.
> >>
> >> Thanks,
> >> Mickael
> >>
> >

Re: [DISCUSS] KIP-908: Add description field to connector configuration

Posted by Chris Egerton <ch...@aiven.io.INVALID>.
Actually, I misspoke--a rebalance isn't triggered when an existing
connector's config is updated. Assuming the set of workers remains stable,
a rebalance is only necessary when a new connector is created, an existing
one is deleted, or a new set of task configs is generated.

This weakens the point about unnecessary rebalances when a connector's
description is updated, but doesn't entirely address it. Spurious
rebalances may still be triggered if a new set of task configs is
generated, which for reasons outlined above, is fairly likely.

On Thu, Feb 23, 2023 at 7:41 AM Chris Egerton <ch...@aiven.io> wrote:

> Hi Mickael,
>
> Thanks for the KIP!
>
> While it's tempting to add this field to the out-of-the-box connector
> config def, I'm a little hesitant, for two reasons.
>
> 1. Adding this directly to the connector config could have unintended
> consequences on the actual data processing by the connector. Any time a
> connector's config is modified, the Connector object running for it is
> restarted with that new config. In most cases this is a trivial operation
> since we have incremental rebalancing enabled by default, the connector can
> (and probably should!) generate task configs that are functionally
> identical to the ones it last generated, and most (though not all)
> Connector classes are fairly lightweight and leave the real work to their
> Task class. However, due to KAFKA-9228 [1], it's not just common practice
> for connectors to perform transparent passthrough of most of their configs
> when generating task configs, it's actually necessary to work around a bug
> in the runtime. As a result, tweaking the description of a connector would
> be fairly likely to result in a full restart of all of its tasks, in
> addition to triggering two rebalances (which may not be so lightweight if
> users are still running with eager rebalancing... which, sadly, I've heard
> is still happening today).
>
> 2. The motivation section mentions some information that might go into the
> description field, such as the team that owns the connector and emergency
> contact info. It seems like this info might benefit from a little more
> structure if we're trying to design for programmatic access by GUIs and
> CLIs (which I'm assuming is the case, since I can't imagine a human being
> getting much use out of the raw output from the GET
> /connector-plugins/{name}/config and PUT
> /connector-plugins/{name}/config/validate endpoints). This might also make
> it easier to add custom validation logic around what kind of information is
> present via REST extension.
>
>
> With these thoughts in mind, what do you think about adding a new generic
> "tags" object to connectors that can support arbitrary user-provided
> key/value pairs? If using the POST /connectors endpoint, it might look
> something like this:
>
> {
>   "name": "my-connector",
>   "config": {
>     "connector.class": "MirrorSource",
>     "tasks.max": "908"
>   },
>   "tags": {
>     "team": "data-infra",
>     "phone": "12345678901",
>     "email": "dinfra@example.org"
>   }
> }
>
> And, to allow users to modify connector tags after one has been created,
> we might introduce a /connectors/{name}/tags endpoint with PUT/PATCH/DELETE
> verbs that writes tag info for a connector to the config topic, but without
> altering the actual connector config (allowing us to skip a rebalance
> altogether).
>
> One other thing we could consider is allowing cluster administrators to
> require that connectors are created with a certain set of tags, or even add
> per-tag regex validation. They might specify something like
> "connector.required.tags = team, phone, email" or
> "connector.tags.phone.regex = [0-9]{11}" in a worker config file. But this
> is probably overboard for now, especially since it's already possible to
> accomplish via a REST extension.
>
> Finally, I'm also wondering if, in pursuit of expanding Connect's
> out-of-the-box support for dealing with connector metadata, we might want
> to expose created/last-updated times for connector configurations. We
> definitely don't have to do that in this KIP but if you agree that this
> would be useful, we should probably keep it in mind to avoid painting
> ourselves into a corner. This is why I'm thinking of using "tags" instead
> of something a little more generic like "metadata", BTW.
>
> [1] -
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-908%3A+Add+description+field+to+connector+configuration
>
> Cheers,
>
> Chris
>
> On Thu, Feb 23, 2023 at 4:52 AM Mickael Maison <mi...@gmail.com>
> wrote:
>
>> Hi,
>>
>> I created a very small KIP to add a description field to connectors:
>>
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-908%3A+Add+description+field+to+connector+configuration
>>
>> Let me know if you have any feedback.
>>
>> Thanks,
>> Mickael
>>
>

Re: [DISCUSS] KIP-908: Add description field to connector configuration

Posted by Chris Egerton <ch...@aiven.io.INVALID>.
Hi Mickael,

Thanks for the KIP!

While it's tempting to add this field to the out-of-the-box connector
config def, I'm a little hesitant, for two reasons.

1. Adding this directly to the connector config could have unintended
consequences on the actual data processing by the connector. Any time a
connector's config is modified, the Connector object running for it is
restarted with that new config. In most cases this is a trivial operation
since we have incremental rebalancing enabled by default, the connector can
(and probably should!) generate task configs that are functionally
identical to the ones it last generated, and most (though not all)
Connector classes are fairly lightweight and leave the real work to their
Task class. However, due to KAFKA-9228 [1], it's not just common practice
for connectors to perform transparent passthrough of most of their configs
when generating task configs, it's actually necessary to work around a bug
in the runtime. As a result, tweaking the description of a connector would
be fairly likely to result in a full restart of all of its tasks, in
addition to triggering two rebalances (which may not be so lightweight if
users are still running with eager rebalancing... which, sadly, I've heard
is still happening today).

2. The motivation section mentions some information that might go into the
description field, such as the team that owns the connector and emergency
contact info. It seems like this info might benefit from a little more
structure if we're trying to design for programmatic access by GUIs and
CLIs (which I'm assuming is the case, since I can't imagine a human being
getting much use out of the raw output from the GET
/connector-plugins/{name}/config and PUT
/connector-plugins/{name}/config/validate endpoints). This might also make
it easier to add custom validation logic around what kind of information is
present via REST extension.


With these thoughts in mind, what do you think about adding a new generic
"tags" object to connectors that can support arbitrary user-provided
key/value pairs? If using the POST /connectors endpoint, it might look
something like this:

{
  "name": "my-connector",
  "config": {
    "connector.class": "MirrorSource",
    "tasks.max": "908"
  },
  "tags": {
    "team": "data-infra",
    "phone": "12345678901",
    "email": "dinfra@example.org"
  }
}

And, to allow users to modify connector tags after one has been created, we
might introduce a /connectors/{name}/tags endpoint with PUT/PATCH/DELETE
verbs that writes tag info for a connector to the config topic, but without
altering the actual connector config (allowing us to skip a rebalance
altogether).

One other thing we could consider is allowing cluster administrators to
require that connectors are created with a certain set of tags, or even add
per-tag regex validation. They might specify something like
"connector.required.tags = team, phone, email" or
"connector.tags.phone.regex = [0-9]{11}" in a worker config file. But this
is probably overboard for now, especially since it's already possible to
accomplish via a REST extension.

Finally, I'm also wondering if, in pursuit of expanding Connect's
out-of-the-box support for dealing with connector metadata, we might want
to expose created/last-updated times for connector configurations. We
definitely don't have to do that in this KIP but if you agree that this
would be useful, we should probably keep it in mind to avoid painting
ourselves into a corner. This is why I'm thinking of using "tags" instead
of something a little more generic like "metadata", BTW.

[1] -
https://cwiki.apache.org/confluence/display/KAFKA/KIP-908%3A+Add+description+field+to+connector+configuration

Cheers,

Chris

On Thu, Feb 23, 2023 at 4:52 AM Mickael Maison <mi...@gmail.com>
wrote:

> Hi,
>
> I created a very small KIP to add a description field to connectors:
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-908%3A+Add+description+field+to+connector+configuration
>
> Let me know if you have any feedback.
>
> Thanks,
> Mickael
>