You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by Gunnar Morling <gu...@hibernate.org> on 2021/01/21 10:48:16 UTC

[Connect] Different validation requirements for connector creation and update

Hi,

In the Debezium community, we ran into an interesting corner case of
connector config validation [1].

The Debezium Postgres connector requires a database resource called a
"replication slot", which identifies this connector to the database and
tracks progress it has made reading the TX log. This replication slot must
not be shared between multiple clients (Debezium connectors, or others), so
we added a validation to make sure that the slot configured by the user
isn't active, i.e. no client is connected to it already. This works as
expected when setting up, or restarting a connector, but when trying to
update the connector configuration, the connector still is running when the
configuration is validated, so the slot is active and validation hence
fails.

Is there a way we can distinguish during config validation whether the
connector is (re-)started or whether it's a validation upon
re-configuration (allowing us to skip this particular validation in the
re-configuration case)?

If that's not the case, would there be interest for a KIP for adding such
capability to the Kafka Connect API?

Thanks for any feedback,

--Gunnar

[1] https://issues.redhat.com/browse/DBZ-2952

Re: [Connect] Different validation requirements for connector creation and update

Posted by Randall Hauch <rh...@gmail.com>.
Thanks for raising this issue, Gunnar.

It is a shortcoming that Connect does not differentiate between starting
for the first time and restarting, nor between validating prior to
connector creation vs (re)validating a (potentially modified) connector
configuration while the connector is running. Proposing a KIP certainly
would be fine, though we do need to weigh this against increasing the
complexity of the APIs.

In the meantime, Chris did have some good suggestions for how a connector
might be able to deal with the current limitation. ATM I can't think of any
other obvious workarounds.

Best regards,

Randall


On Thu, Jan 21, 2021 at 9:52 AM Chris Egerton <ch...@confluent.io> wrote:

> Hi Gunnar,
>
> It's not possible to do this in a generalized fashion with the API provided
> by the framework today. Trying to hack your way around things by setting a
> flag or storing the connector name in some shared JVM state wouldn't work
> in a cluster with more than one worker since that state would obviously not
> be available across workers.
>
> With the specific case of the Debezium PostgreSQL connector, I'm wondering
> if you might be able to store the name of the connector in some external
> system (likely either the database itself or a Kafka topic, as I seem to
> recall that Debezium connectors create and consume from topics outside of
> the framework) after successfully claiming the replication slot. Then,
> during config validation, you could skip the replication slot validation if
> that stored name matched the name of the connector being validated. There
> are obviously some edge cases that'd need to be addressed such as sudden
> death of connectors after claiming the replication slot but before storing
> their name; just wanted to share the thought in case it leads somewhere
> useful.
>
> Either way, I think a small, simple KIP for this would be fine, as long as
> we could maintain backwards compatibility for existing connectors and allow
> connectors that use this new API to work on older versions of Connect that
> don't have support for it.
>
> Cheers,
>
> Chris
>
> On Thu, Jan 21, 2021 at 6:00 AM Gunnar Morling <gu...@hibernate.org>
> wrote:
>
> > Hi,
> >
> > In the Debezium community, we ran into an interesting corner case of
> > connector config validation [1].
> >
> > The Debezium Postgres connector requires a database resource called a
> > "replication slot", which identifies this connector to the database and
> > tracks progress it has made reading the TX log. This replication slot
> must
> > not be shared between multiple clients (Debezium connectors, or others),
> so
> > we added a validation to make sure that the slot configured by the user
> > isn't active, i.e. no client is connected to it already. This works as
> > expected when setting up, or restarting a connector, but when trying to
> > update the connector configuration, the connector still is running when
> the
> > configuration is validated, so the slot is active and validation hence
> > fails.
> >
> > Is there a way we can distinguish during config validation whether the
> > connector is (re-)started or whether it's a validation upon
> > re-configuration (allowing us to skip this particular validation in the
> > re-configuration case)?
> >
> > If that's not the case, would there be interest for a KIP for adding such
> > capability to the Kafka Connect API?
> >
> > Thanks for any feedback,
> >
> > --Gunnar
> >
> > [1] https://issues.redhat.com/browse/DBZ-2952
> >
>

Re: [Connect] Different validation requirements for connector creation and update

Posted by Chris Egerton <ch...@confluent.io>.
Hi Gunnar,

It's not possible to do this in a generalized fashion with the API provided
by the framework today. Trying to hack your way around things by setting a
flag or storing the connector name in some shared JVM state wouldn't work
in a cluster with more than one worker since that state would obviously not
be available across workers.

With the specific case of the Debezium PostgreSQL connector, I'm wondering
if you might be able to store the name of the connector in some external
system (likely either the database itself or a Kafka topic, as I seem to
recall that Debezium connectors create and consume from topics outside of
the framework) after successfully claiming the replication slot. Then,
during config validation, you could skip the replication slot validation if
that stored name matched the name of the connector being validated. There
are obviously some edge cases that'd need to be addressed such as sudden
death of connectors after claiming the replication slot but before storing
their name; just wanted to share the thought in case it leads somewhere
useful.

Either way, I think a small, simple KIP for this would be fine, as long as
we could maintain backwards compatibility for existing connectors and allow
connectors that use this new API to work on older versions of Connect that
don't have support for it.

Cheers,

Chris

On Thu, Jan 21, 2021 at 6:00 AM Gunnar Morling <gu...@hibernate.org> wrote:

> Hi,
>
> In the Debezium community, we ran into an interesting corner case of
> connector config validation [1].
>
> The Debezium Postgres connector requires a database resource called a
> "replication slot", which identifies this connector to the database and
> tracks progress it has made reading the TX log. This replication slot must
> not be shared between multiple clients (Debezium connectors, or others), so
> we added a validation to make sure that the slot configured by the user
> isn't active, i.e. no client is connected to it already. This works as
> expected when setting up, or restarting a connector, but when trying to
> update the connector configuration, the connector still is running when the
> configuration is validated, so the slot is active and validation hence
> fails.
>
> Is there a way we can distinguish during config validation whether the
> connector is (re-)started or whether it's a validation upon
> re-configuration (allowing us to skip this particular validation in the
> re-configuration case)?
>
> If that's not the case, would there be interest for a KIP for adding such
> capability to the Kafka Connect API?
>
> Thanks for any feedback,
>
> --Gunnar
>
> [1] https://issues.redhat.com/browse/DBZ-2952
>