You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by Chris Egerton <ch...@aiven.io.INVALID> on 2023/10/02 17:44:23 UTC

Re: [DISCUSS] KIP-980: Allow creating connectors in a stopped state

Hi Yash,

Thanks for the KIP! Frankly this feels like an oversight in 875 and I'm
glad we're closing that loop ASAP.


Here are my thoughts:

1. (Nit): IMO "start.state", "initial.state", or "target.state" might be
better than just "state" for the field name in the connector creation
request.

2. Why implement this in distributed mode with two writes to the config
topic? We could augment the existing format for connector configs in the
config topic [1] with a new field instead.

3. Although standalone mode is mentioned in the KIP, there's no detail on
the Java properties file format that we support for standalone mode (i.e.,
`connect-standalone config/connect-standalone.properties
config/connect-file-source.properties
config/connect-file-sink.properties`). Do we plan on adding support for
that mode as well?

4. I suspect there will be advantages for this feature beyond offsets
migration, but if offsets migration were the only motivation for it,
wouldn't it be simpler to accept an "offsets" field in connector creation
requests? That way, users wouldn't have to start a connector in a different
state and then resume it, they could just create the connector like normal.
I think the current proposal is acceptable, but wanted to float this
alternative in case we anticipate lots of connector migrations and want to
optimize for them a bit more.

5. (NIt): We can link to our own Javadocs [2] instead of javadoc.io


[1] -
https://github.com/apache/kafka/blob/dcd8c7d05f2f22f2d815405e7ab3ad7439669239/connect/runtime/src/main/java/org/apache/kafka/connect/storage/KafkaConfigBackingStore.java#L234-L236

[2] - https://kafka.apache.org/35/javadoc/index.html?overview-summary.html


Cheers,

Chris

On Thu, Sep 21, 2023 at 2:37 AM Yash Mayya <ya...@gmail.com> wrote:

> Hi Ashwin,
>
> Thanks for taking a look and sharing your thoughts!
>
> 1. Yes, the request / response formats of the two APIs were intentionally
> made identical for such use-cases. [1]
>
> 2. I'm assuming you're referring to retaining the offset / config topic
> records for a connector when it is deleted by a user? Firstly, a
> connector's offsets aren't actually currently deleted when the connector is
> deleted - it was listed as potential future work in KIP-875 [2]. Secondly,
> the config topic is the source of truth for the state of a Connect cluster
> and a connector deletion is done by writing a null-valued record
> (tombstone) with the connector key to the config topic. We could
> potentially modify the delete mechanism to use a special new record
> (instead of a tombstone with the connector key) in order to retain the
> latest configuration of a connector while still deleting the actual
> connector - however, this would have its downsides and I don't see too many
> benefits. Furthermore, connector migration between different Kafka clusters
> was just used as a representational use case for creating connectors in a
> stopped state - it isn't the primary focus of this KIP as such.
>
> 3. KIP-875 goes into some more details about this [3], but the TLDR is that
> the STOPPED state will be treated like the PAUSED state on older workers
> that don't recognize the STOPPED state.
>
> Thanks,
> Yash
>
> [1] -
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-875%3A+First-class+offsets+support+in+Kafka+Connect#KIP875:FirstclassoffsetssupportinKafkaConnect-Request/responseformat
>
> [2] -
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-875%3A+First-class+offsets+support+in+Kafka+Connect#KIP875:FirstclassoffsetssupportinKafkaConnect-Automaticallydeleteoffsetswithconnectors
>
> [3] -
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-875%3A+First-class+offsets+support+in+Kafka+Connect#KIP875:FirstclassoffsetssupportinKafkaConnect-STOPPEDtargetstate
>
> On Wed, Sep 20, 2023 at 7:24 PM Ashwin <ap...@confluent.io.invalid>
> wrote:
>
> > Thanks Yash! This is very useful for migrating connectors from one
> cluster
> > to another.
> >
> > I had the following comments/questions
> >
> > 1. Is the offset read using `GET /offsets` api always guaranteed to be
> in a
> > format accepted by `PATCH /offsets` ?
> > 2. I had to tackle a similar migration situation but the two connect
> > clusters in question were using the same backing Kafka cluster. The
> > challenge in this case is that when I delete the original connector, I
> want
> > to retain offsets and config topics. Do you think we should support
> > deletion of a connector without removal of these topics as part of this
> KIP
> > ?
> > 3. In the case of a downgrade, how will Connect worker handle the
> optional
> > “state” field in config topic ?
> >
> > Thanks,
> > Ashwin
> >
> >
> >
> >
> > On Sun, Sep 17, 2023 at 11:09 PM Yash Mayya <ya...@gmail.com>
> wrote:
> >
> > > Hi all,
> > >
> > > I'd like to begin discussion on a KIP to allow creating connectors in a
> > > stopped state -
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-980%3A+Allow+creating+connectors+in+a+stopped+state
> > >
> > >
> > > Thanks,
> > > Yash
> > >
> >
>

Re: [DISCUSS] KIP-980: Allow creating connectors in a stopped state

Posted by Yash Mayya <ya...@gmail.com>.
Hi Chris,

I've updated the KIP to call out the parsing logic and the user
expectations explicitly. Thanks again for all your feedback on this KIP!
I'll wait for a few more days to see if anyone else has comments before
kicking off a vote thread.

Thanks,
Yash

On Thu, Oct 5, 2023 at 10:49 PM Chris Egerton <ch...@aiven.io.invalid>
wrote:

> Hi Yash,
>
> Yeah, I think just hardcoding with JSON-first, properties-second is fine.
> IMO it's worth calling this out explicitly in the KIP.
>
> Apart from that, no further comments. LGTM, thanks for the KIP!
>
> Cheers,
>
> Chris
>
> On Thu, Oct 5, 2023 at 6:23 AM Yash Mayya <ya...@gmail.com> wrote:
>
> > Hi Chris,
> >
> > Thanks for all your feedback so far!
> >
> > 3. That's a good question. I was thinking we'd do some "intelligent"
> > parsing internally during the implementation (i.e. essentially your last
> > option - attempting to parse first as one format, then the other) which
> is
> > why I didn't include any more details in the KIP itself (where I've only
> > outlined the contract changes). This would allow for the smoothest user
> > experience IMO and all the heavy lifting will be done in the parsing
> logic.
> > All the other options seemed either clunky or brittle from the user
> > experience point of view. In terms of the actual implementation itself,
> > we'd probably want to first try parsing it into the supported JSON
> > structures before trying to parse it into Java properties since the Java
> > properties format is very permissive (i.e. we won't really see any errors
> > on attempting to parse a JSON file into Java properties).
> >
> > Thanks,
> > Yash
> >
> > On Thu, Oct 5, 2023 at 1:39 AM Chris Egerton <ch...@aiven.io.invalid>
> > wrote:
> >
> > > Hi Yash,
> > >
> > > Looking great! Few more thoughts:
> > >
> > >
> > > 1. (Downgraded to nit) I still prefer dot-delimitation but it's not a
> > > blocker; thanks for addressing my concerns about the name of the field
> > and
> > > how it may be perceived by users.
> > >
> > > 2. (Addressed) Thanks for looking into this, and sorry it turned out to
> > be
> > > a bit of a dead end! I'm convinced that the current proposal is good
> > > enough.
> > >
> > > 3. Can you shed a little more light on how we'll determine whether a
> > > connector config should be parsed as JSON or as a properties file? Will
> > > this be based on file extension, a command-line flag (which might apply
> > to
> > > all configs, or individual configs), attempting to parse first as one
> > > format then the other, something else?
> > >
> > > 4. (Addressed) Thanks! Looks great.
> > >
> > > 6. (Addressed) Awesome, great to hear. The point about laggy connector
> > > startup is very convincing; my paranoia is satiated.
> > >
> > >
> > > Cheers,
> > >
> > > Chris
> > >
> > > On Wed, Oct 4, 2023 at 5:35 AM Yash Mayya <ya...@gmail.com>
> wrote:
> > >
> > > > Hi Chris,
> > > >
> > > > Thanks for the quick follow up and the continued insightful
> discourse!
> > > >
> > > > 1. Fair point on the need to differentiate it from the actual state
> > > > displayed in the status API, I like the prefix of "initial" to make
> > that
> > > > differentiation (from your suggested alternatives previously).
> > Regarding
> > > > the dots vs underscores as delimiters - the new state field will be a
> > top
> > > > level field in the connector creation request body alongside the
> > "config"
> > > > map (i.e. it won't be a connector configuration itself), so I think
> we
> > > > should be using the underscore delimiter for consistency. For now,
> I've
> > > > updated the KIP to use "initial_state" as the new field's name - let
> me
> > > > know if you disagree, and I'd be happy to reconsider.
> > > >
> > > > 2. Hm, I actually hadn't considered the downgrade implications with
> > your
> > > > proposed single record approach. I agree that it's a bigger downside
> > than
> > > > writing two records to the config topic. I do understand your
> concerns
> > > with
> > > > the potential for config topic inconsistencies which is why I
> proposed
> > > > writing the target state first (since the presence of a target state
> > for
> > > a
> > > > connector with no configuration is a benign condition). Also, even in
> > the
> > > > non-transactional config topic producer case - if there is a failure
> > > > between the two writes, the user will be notified of the error
> > > > synchronously via the API response (ref -
> > > > https://github.com/apache/kafka/pull/12984) and will be able to
> safely
> > > > retry the operation. I don't see how we'd be able to do a single
> record
> > > > write approach along with supporting clean downgrades since we'd
> either
> > > > need to introduce a new record type or add a new field to an existing
> > > > record type - neither of which would be recognized as such by an
> older
> > > > Connect worker.
> > > >
> > > > > Standalone mode has always supported the REST API,
> > > > > and so far FWICTwe've maintained feature parity between
> > > > > the two modes
> > > >
> > > > > add support for JSON files with standalone mode.
> > > >
> > > > 3. Thanks, I wasn't aware about standalone mode always having
> supported
> > > the
> > > > full REST API - I thought I'd seen some references earlier indicating
> > > > otherwise. In that case, I do agree that it makes sense to maintain
> > > parity
> > > > across both methods of connector creation for user experience
> > > consistency.
> > > > I really like the idea of updating the standalone mode CLI to be able
> > to
> > > > parse JSON files (in the same format as the connector creation REST
> API
> > > > endpoint request body) along with Java properties files since I think
> > > that
> > > > offers two big benefits. One is that users will be able to copy and
> use
> > > > examples across both the methods of connector creation (REST API
> > requests
> > > > with JSON request bodies and JSON files passed to the standalone mode
> > > > startup CLI). The second benefit is that any future extensions (such
> as
> > > the
> > > > "offsets" field we've discussed in this thread) would be easily
> applied
> > > > across both the methods consistently instead of introducing new (and
> > > likely
> > > > ugly) CLI flags. I've updated the KIP to include this change in the
> > > > standalone mode CLI.
> > > >
> > > > 4. Makes sense, I've added this under a new "Future Work" section in
> > the
> > > > KIP.
> > > >
> > > > 6. From what I can tell, there shouldn't be any issues with the lack
> of
> > > > task configurations in the config topic and it seems to be a
> supported
> > > > assumption across the Connect code-base that a connector
> configuration
> > > > could exist without any task configurations for the connector (a
> > > situation
> > > > that could currently manifest with slow starting connectors,
> connectors
> > > > that fail during startup, connectors that fail to generate task
> > > > configurations, connectors that are paused right after being created
> > > etc.).
> > > > I did also try out a small prototype before publishing this KIP and
> > > things
> > > > do work as expected when creating a connector in the PAUSED / STOPPED
> > > state
> > > > by simply writing the appropriate target state along with the
> connector
> > > > configuration to the config topic.
> > > >
> > > > Thanks,
> > > > Yash
> > > >
> > > > On Wed, Oct 4, 2023 at 1:44 AM Chris Egerton <chrise@aiven.io.invalid
> >
> > > > wrote:
> > > >
> > > > > Hi Yash,
> > > > >
> > > > > Thanks for the in-depth discussion! Continuations here:
> > > > >
> > > > > 1. Regarding delimiters (dots vs. underscores), we use dots in
> > > connector
> > > > > configs for all runtime-recognized properties, including but not
> > > limited
> > > > to
> > > > > connector.class, tasks.max, key.converter, key.converter.*, and
> > > > > transforms.*.type. Regarding the choice of name--I think it's
> because
> > > the
> > > > > concept here is different from the "state" that we display in the
> > > status
> > > > > API that a different name is warranted. Users can't directly write
> > the
> > > > > connector's actual state, they can still only specify a target
> state.
> > > And
> > > > > there's no guarantee that a connector will ever enter a target
> state
> > > > > (whether it's specified at creation time or later), since a failure
> > can
> > > > > always occur.
> > > > >
> > > > > 2. It still seems simpler to emit one record instead of two,
> > especially
> > > > > since we're not guaranteed that the leader will be using a
> > > transactional
> > > > > producer. I guess I'm just wary of knowingly putting the config
> topic
> > > in
> > > > an
> > > > > inconsistent state (target state present without accompanying
> > connector
> > > > > config), even if it's meant to be only for a brief period. Thinking
> > > about
> > > > > it some more though, a one-record approach comes with drawbacks in
> > the
> > > > > downgrade scenario: older workers wouldn't know how to handle the
> new
> > > > > config format and would just fall back to creating the connector in
> > the
> > > > > running state. I suppose we should favor the two-record approach
> > since
> > > > the
> > > > > downgrade scenario is more likely than the other failure mode, but
> > it'd
> > > > be
> > > > > nice if we could think of a way to satisfy both concerns. Not a
> > > blocker,
> > > > > though.
> > > > >
> > > > > 3. Standalone mode has always supported the REST API, and so far
> > FWICT
> > > > > we've maintained feature parity between the two modes for
> everything
> > > > except
> > > > > exactly-once source connectors, which would have required
> significant
> > > > > additional work since we'd have to add support for storing source
> > > > connector
> > > > > offsets in a Kafka topic instead of on local storage like we
> > currently
> > > > do.
> > > > > I'd really prefer if we could try to maintain feature parity
> wherever
> > > > > possible--one way we could possibly do that with this KIP is to
> also
> > > add
> > > > > support for JSON files with standalone mode.
> > > > >
> > > > > 4. Yeah, no need to block on that idea since there are other use
> > cases
> > > > for
> > > > > creating stopped connectors. We can treat it like the option to
> > delete
> > > > > offsets along with the connector discussed in KIP-875: punt for
> now,
> > > > > possibly implement later pending user feedback and indication of
> > > demand.
> > > > > Might be worth adding to a "Future work" section as an indication
> > that
> > > we
> > > > > haven't ruled it out (in which case it'd make sense as a rejected
> > > > > alternative) but have chosen not to implement yet.
> > > > >
> > > > >
> > > > > And I had one new thought that's pretty implementation-oriented but
> > may
> > > > > influence the design slightly:
> > > > >
> > > > > 6. Right now we write an empty set of task configs to the config
> > topic
> > > > when
> > > > > handling requests to stop a connector in distributed mode. Do we
> need
> > > to
> > > > do
> > > > > the same when creating connectors in the stopped state, or add any
> > > other
> > > > > special logic besides noting the new state in the config topic? Or
> is
> > > it
> > > > > sufficient to write a non-running target state to the config topic
> > and
> > > > then
> > > > > rely on existing logic to simply refuse to generate task configs
> for
> > > the
> > > > > newly-created connector? Is there any chance that the lack of task
> > > > configs
> > > > > (as opposed to an empty list of task configs) in the config topic
> > for a
> > > > > connector that exists will cause issues?
> > > > >
> > > > >
> > > > > Cheers,
> > > > >
> > > > > Chris
> > > > >
> > > > > On Tue, Oct 3, 2023 at 3:29 AM Yash Mayya <ya...@gmail.com>
> > > wrote:
> > > > >
> > > > > > Hi Chris,
> > > > > >
> > > > > > Thanks for taking a look at this KIP!
> > > > > >
> > > > > > 1. I chose to go with simply "state" as that exact term is
> already
> > > > > exposed
> > > > > > via some of the existing REST API responses and would be one that
> > > users
> > > > > are
> > > > > > already familiar with (although admittedly something like
> > > > "initial_state"
> > > > > > wouldn't be much of a jump). Since it's a field in the request
> body
> > > for
> > > > > the
> > > > > > connector creation endpoint, wouldn't it be implied that it is
> the
> > > > > > "initial" state just like the "config" field represents the
> > "initial"
> > > > > > configuration? Also, I don't think x.y has been established as
> the
> > > > field
> > > > > > naming convention in the Connect REST API right? From what I can
> > > tell,
> > > > > x_y
> > > > > > is the convention being followed for fields in requests
> > > ("kafka_topic"
> > > > /
> > > > > > "kafka_partition" / "kafka_offset" in the offsets APIs for
> > instance)
> > > > and
> > > > > > responses ("error_count", "kafka_cluster_id",
> "recommended_values"
> > > > etc.).
> > > > > >
> > > > > > 2. The connector configuration record is currently used for both
> > > > > connector
> > > > > > create requests as well as connector config update requests.
> Since
> > > > we're
> > > > > > only allowing configuring the target state for newly created
> > > > connectors,
> > > > > I
> > > > > > feel like it'll be a cleaner separation of concerns to use the
> > > existing
> > > > > > records for connector configurations and connector target states
> > > rather
> > > > > > than bundling the "state" and "state.v2" (or equivalent) fields
> > into
> > > > the
> > > > > > connector configuration record. The additional write should be
> very
> > > > > minimal
> > > > > > overhead and the two writes would be an atomic operation for
> > Connect
> > > > > > clusters that are using a transactional producer for the config
> > topic
> > > > > > anyway. Thoughts?
> > > > > >
> > > > > > 3. I was thinking that we'd support standalone mode via the same
> > > > > connector
> > > > > > creation REST API endpoint changes (addition of the "state"
> field).
> > > If
> > > > > > there is further interest in adding similar support to the
> command
> > > line
> > > > > > method of creating connectors as well, perhaps we could do so in
> a
> > > > small
> > > > > > follow-on KIP? I feel like ever since the standalone mode started
> > > > > > supporting the full Connect REST API, the command line method of
> > > > creating
> > > > > > connectors is more of a "legacy" concept.
> > > > > >
> > > > > > 4. Yeah, connector / offsets migration was just used as a
> > > > representative
> > > > > > example of how this feature could be useful - I didn't intend for
> > it
> > > to
> > > > > be
> > > > > > the sole purpose of this KIP. However, that said, I really like
> the
> > > > idea
> > > > > of
> > > > > > accepting an "offsets" field in the connector creation request
> > since
> > > > it'd
> > > > > > reduce the number of user operations required from 3 (create the
> > > > > connector
> > > > > > in a STOPPED state; alter the offsets; resume the connector) to
> 1.
> > > I'd
> > > > be
> > > > > > happy to either create or review a separate small KIP for that if
> > it
> > > > > sounds
> > > > > > acceptable to you?
> > > > > >
> > > > > > 5. Whoops, thanks, I hadn't even noticed that that's where I had
> > > linked
> > > > > to.
> > > > > > Fixed!
> > > > > >
> > > > > > Thanks,
> > > > > > Yash
> > > > > >
> > > > > > On Mon, Oct 2, 2023 at 11:14 PM Chris Egerton
> > > <chrise@aiven.io.invalid
> > > > >
> > > > > > wrote:
> > > > > >
> > > > > > > Hi Yash,
> > > > > > >
> > > > > > > Thanks for the KIP! Frankly this feels like an oversight in 875
> > and
> > > > I'm
> > > > > > > glad we're closing that loop ASAP.
> > > > > > >
> > > > > > >
> > > > > > > Here are my thoughts:
> > > > > > >
> > > > > > > 1. (Nit): IMO "start.state", "initial.state", or "target.state"
> > > might
> > > > > be
> > > > > > > better than just "state" for the field name in the connector
> > > creation
> > > > > > > request.
> > > > > > >
> > > > > > > 2. Why implement this in distributed mode with two writes to
> the
> > > > config
> > > > > > > topic? We could augment the existing format for connector
> configs
> > > in
> > > > > the
> > > > > > > config topic [1] with a new field instead.
> > > > > > >
> > > > > > > 3. Although standalone mode is mentioned in the KIP, there's no
> > > > detail
> > > > > on
> > > > > > > the Java properties file format that we support for standalone
> > mode
> > > > > > (i.e.,
> > > > > > > `connect-standalone config/connect-standalone.properties
> > > > > > > config/connect-file-source.properties
> > > > > > > config/connect-file-sink.properties`). Do we plan on adding
> > support
> > > > for
> > > > > > > that mode as well?
> > > > > > >
> > > > > > > 4. I suspect there will be advantages for this feature beyond
> > > offsets
> > > > > > > migration, but if offsets migration were the only motivation
> for
> > > it,
> > > > > > > wouldn't it be simpler to accept an "offsets" field in
> connector
> > > > > creation
> > > > > > > requests? That way, users wouldn't have to start a connector
> in a
> > > > > > different
> > > > > > > state and then resume it, they could just create the connector
> > like
> > > > > > normal.
> > > > > > > I think the current proposal is acceptable, but wanted to float
> > > this
> > > > > > > alternative in case we anticipate lots of connector migrations
> > and
> > > > want
> > > > > > to
> > > > > > > optimize for them a bit more.
> > > > > > >
> > > > > > > 5. (NIt): We can link to our own Javadocs [2] instead of
> > > javadoc.io
> > > > > > >
> > > > > > >
> > > > > > > [1] -
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/kafka/blob/dcd8c7d05f2f22f2d815405e7ab3ad7439669239/connect/runtime/src/main/java/org/apache/kafka/connect/storage/KafkaConfigBackingStore.java#L234-L236
> > > > > > >
> > > > > > > [2] -
> > > > > >
> > https://kafka.apache.org/35/javadoc/index.html?overview-summary.html
> > > > > > >
> > > > > > >
> > > > > > > Cheers,
> > > > > > >
> > > > > > > Chris
> > > > > > >
> > > > > > > On Thu, Sep 21, 2023 at 2:37 AM Yash Mayya <
> yash.mayya@gmail.com
> > >
> > > > > wrote:
> > > > > > >
> > > > > > > > Hi Ashwin,
> > > > > > > >
> > > > > > > > Thanks for taking a look and sharing your thoughts!
> > > > > > > >
> > > > > > > > 1. Yes, the request / response formats of the two APIs were
> > > > > > intentionally
> > > > > > > > made identical for such use-cases. [1]
> > > > > > > >
> > > > > > > > 2. I'm assuming you're referring to retaining the offset /
> > config
> > > > > topic
> > > > > > > > records for a connector when it is deleted by a user?
> Firstly,
> > a
> > > > > > > > connector's offsets aren't actually currently deleted when
> the
> > > > > > connector
> > > > > > > is
> > > > > > > > deleted - it was listed as potential future work in KIP-875
> > [2].
> > > > > > > Secondly,
> > > > > > > > the config topic is the source of truth for the state of a
> > > Connect
> > > > > > > cluster
> > > > > > > > and a connector deletion is done by writing a null-valued
> > record
> > > > > > > > (tombstone) with the connector key to the config topic. We
> > could
> > > > > > > > potentially modify the delete mechanism to use a special new
> > > record
> > > > > > > > (instead of a tombstone with the connector key) in order to
> > > retain
> > > > > the
> > > > > > > > latest configuration of a connector while still deleting the
> > > actual
> > > > > > > > connector - however, this would have its downsides and I
> don't
> > > see
> > > > > too
> > > > > > > many
> > > > > > > > benefits. Furthermore, connector migration between different
> > > Kafka
> > > > > > > clusters
> > > > > > > > was just used as a representational use case for creating
> > > > connectors
> > > > > > in a
> > > > > > > > stopped state - it isn't the primary focus of this KIP as
> such.
> > > > > > > >
> > > > > > > > 3. KIP-875 goes into some more details about this [3], but
> the
> > > TLDR
> > > > > is
> > > > > > > that
> > > > > > > > the STOPPED state will be treated like the PAUSED state on
> > older
> > > > > > workers
> > > > > > > > that don't recognize the STOPPED state.
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > > Yash
> > > > > > > >
> > > > > > > > [1] -
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-875%3A+First-class+offsets+support+in+Kafka+Connect#KIP875:FirstclassoffsetssupportinKafkaConnect-Request/responseformat
> > > > > > > >
> > > > > > > > [2] -
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-875%3A+First-class+offsets+support+in+Kafka+Connect#KIP875:FirstclassoffsetssupportinKafkaConnect-Automaticallydeleteoffsetswithconnectors
> > > > > > > >
> > > > > > > > [3] -
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-875%3A+First-class+offsets+support+in+Kafka+Connect#KIP875:FirstclassoffsetssupportinKafkaConnect-STOPPEDtargetstate
> > > > > > > >
> > > > > > > > On Wed, Sep 20, 2023 at 7:24 PM Ashwin
> > > > <apankaj@confluent.io.invalid
> > > > > >
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Thanks Yash! This is very useful for migrating connectors
> > from
> > > > one
> > > > > > > > cluster
> > > > > > > > > to another.
> > > > > > > > >
> > > > > > > > > I had the following comments/questions
> > > > > > > > >
> > > > > > > > > 1. Is the offset read using `GET /offsets` api always
> > > guaranteed
> > > > to
> > > > > > be
> > > > > > > > in a
> > > > > > > > > format accepted by `PATCH /offsets` ?
> > > > > > > > > 2. I had to tackle a similar migration situation but the
> two
> > > > > connect
> > > > > > > > > clusters in question were using the same backing Kafka
> > cluster.
> > > > The
> > > > > > > > > challenge in this case is that when I delete the original
> > > > > connector,
> > > > > > I
> > > > > > > > want
> > > > > > > > > to retain offsets and config topics. Do you think we should
> > > > support
> > > > > > > > > deletion of a connector without removal of these topics as
> > part
> > > > of
> > > > > > this
> > > > > > > > KIP
> > > > > > > > > ?
> > > > > > > > > 3. In the case of a downgrade, how will Connect worker
> handle
> > > the
> > > > > > > > optional
> > > > > > > > > “state” field in config topic ?
> > > > > > > > >
> > > > > > > > > Thanks,
> > > > > > > > > Ashwin
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Sun, Sep 17, 2023 at 11:09 PM Yash Mayya <
> > > > yash.mayya@gmail.com>
> > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Hi all,
> > > > > > > > > >
> > > > > > > > > > I'd like to begin discussion on a KIP to allow creating
> > > > > connectors
> > > > > > > in a
> > > > > > > > > > stopped state -
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-980%3A+Allow+creating+connectors+in+a+stopped+state
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Thanks,
> > > > > > > > > > Yash
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] KIP-980: Allow creating connectors in a stopped state

Posted by Chris Egerton <ch...@aiven.io.INVALID>.
Hi Yash,

Yeah, I think just hardcoding with JSON-first, properties-second is fine.
IMO it's worth calling this out explicitly in the KIP.

Apart from that, no further comments. LGTM, thanks for the KIP!

Cheers,

Chris

On Thu, Oct 5, 2023 at 6:23 AM Yash Mayya <ya...@gmail.com> wrote:

> Hi Chris,
>
> Thanks for all your feedback so far!
>
> 3. That's a good question. I was thinking we'd do some "intelligent"
> parsing internally during the implementation (i.e. essentially your last
> option - attempting to parse first as one format, then the other) which is
> why I didn't include any more details in the KIP itself (where I've only
> outlined the contract changes). This would allow for the smoothest user
> experience IMO and all the heavy lifting will be done in the parsing logic.
> All the other options seemed either clunky or brittle from the user
> experience point of view. In terms of the actual implementation itself,
> we'd probably want to first try parsing it into the supported JSON
> structures before trying to parse it into Java properties since the Java
> properties format is very permissive (i.e. we won't really see any errors
> on attempting to parse a JSON file into Java properties).
>
> Thanks,
> Yash
>
> On Thu, Oct 5, 2023 at 1:39 AM Chris Egerton <ch...@aiven.io.invalid>
> wrote:
>
> > Hi Yash,
> >
> > Looking great! Few more thoughts:
> >
> >
> > 1. (Downgraded to nit) I still prefer dot-delimitation but it's not a
> > blocker; thanks for addressing my concerns about the name of the field
> and
> > how it may be perceived by users.
> >
> > 2. (Addressed) Thanks for looking into this, and sorry it turned out to
> be
> > a bit of a dead end! I'm convinced that the current proposal is good
> > enough.
> >
> > 3. Can you shed a little more light on how we'll determine whether a
> > connector config should be parsed as JSON or as a properties file? Will
> > this be based on file extension, a command-line flag (which might apply
> to
> > all configs, or individual configs), attempting to parse first as one
> > format then the other, something else?
> >
> > 4. (Addressed) Thanks! Looks great.
> >
> > 6. (Addressed) Awesome, great to hear. The point about laggy connector
> > startup is very convincing; my paranoia is satiated.
> >
> >
> > Cheers,
> >
> > Chris
> >
> > On Wed, Oct 4, 2023 at 5:35 AM Yash Mayya <ya...@gmail.com> wrote:
> >
> > > Hi Chris,
> > >
> > > Thanks for the quick follow up and the continued insightful discourse!
> > >
> > > 1. Fair point on the need to differentiate it from the actual state
> > > displayed in the status API, I like the prefix of "initial" to make
> that
> > > differentiation (from your suggested alternatives previously).
> Regarding
> > > the dots vs underscores as delimiters - the new state field will be a
> top
> > > level field in the connector creation request body alongside the
> "config"
> > > map (i.e. it won't be a connector configuration itself), so I think we
> > > should be using the underscore delimiter for consistency. For now, I've
> > > updated the KIP to use "initial_state" as the new field's name - let me
> > > know if you disagree, and I'd be happy to reconsider.
> > >
> > > 2. Hm, I actually hadn't considered the downgrade implications with
> your
> > > proposed single record approach. I agree that it's a bigger downside
> than
> > > writing two records to the config topic. I do understand your concerns
> > with
> > > the potential for config topic inconsistencies which is why I proposed
> > > writing the target state first (since the presence of a target state
> for
> > a
> > > connector with no configuration is a benign condition). Also, even in
> the
> > > non-transactional config topic producer case - if there is a failure
> > > between the two writes, the user will be notified of the error
> > > synchronously via the API response (ref -
> > > https://github.com/apache/kafka/pull/12984) and will be able to safely
> > > retry the operation. I don't see how we'd be able to do a single record
> > > write approach along with supporting clean downgrades since we'd either
> > > need to introduce a new record type or add a new field to an existing
> > > record type - neither of which would be recognized as such by an older
> > > Connect worker.
> > >
> > > > Standalone mode has always supported the REST API,
> > > > and so far FWICTwe've maintained feature parity between
> > > > the two modes
> > >
> > > > add support for JSON files with standalone mode.
> > >
> > > 3. Thanks, I wasn't aware about standalone mode always having supported
> > the
> > > full REST API - I thought I'd seen some references earlier indicating
> > > otherwise. In that case, I do agree that it makes sense to maintain
> > parity
> > > across both methods of connector creation for user experience
> > consistency.
> > > I really like the idea of updating the standalone mode CLI to be able
> to
> > > parse JSON files (in the same format as the connector creation REST API
> > > endpoint request body) along with Java properties files since I think
> > that
> > > offers two big benefits. One is that users will be able to copy and use
> > > examples across both the methods of connector creation (REST API
> requests
> > > with JSON request bodies and JSON files passed to the standalone mode
> > > startup CLI). The second benefit is that any future extensions (such as
> > the
> > > "offsets" field we've discussed in this thread) would be easily applied
> > > across both the methods consistently instead of introducing new (and
> > likely
> > > ugly) CLI flags. I've updated the KIP to include this change in the
> > > standalone mode CLI.
> > >
> > > 4. Makes sense, I've added this under a new "Future Work" section in
> the
> > > KIP.
> > >
> > > 6. From what I can tell, there shouldn't be any issues with the lack of
> > > task configurations in the config topic and it seems to be a supported
> > > assumption across the Connect code-base that a connector configuration
> > > could exist without any task configurations for the connector (a
> > situation
> > > that could currently manifest with slow starting connectors, connectors
> > > that fail during startup, connectors that fail to generate task
> > > configurations, connectors that are paused right after being created
> > etc.).
> > > I did also try out a small prototype before publishing this KIP and
> > things
> > > do work as expected when creating a connector in the PAUSED / STOPPED
> > state
> > > by simply writing the appropriate target state along with the connector
> > > configuration to the config topic.
> > >
> > > Thanks,
> > > Yash
> > >
> > > On Wed, Oct 4, 2023 at 1:44 AM Chris Egerton <ch...@aiven.io.invalid>
> > > wrote:
> > >
> > > > Hi Yash,
> > > >
> > > > Thanks for the in-depth discussion! Continuations here:
> > > >
> > > > 1. Regarding delimiters (dots vs. underscores), we use dots in
> > connector
> > > > configs for all runtime-recognized properties, including but not
> > limited
> > > to
> > > > connector.class, tasks.max, key.converter, key.converter.*, and
> > > > transforms.*.type. Regarding the choice of name--I think it's because
> > the
> > > > concept here is different from the "state" that we display in the
> > status
> > > > API that a different name is warranted. Users can't directly write
> the
> > > > connector's actual state, they can still only specify a target state.
> > And
> > > > there's no guarantee that a connector will ever enter a target state
> > > > (whether it's specified at creation time or later), since a failure
> can
> > > > always occur.
> > > >
> > > > 2. It still seems simpler to emit one record instead of two,
> especially
> > > > since we're not guaranteed that the leader will be using a
> > transactional
> > > > producer. I guess I'm just wary of knowingly putting the config topic
> > in
> > > an
> > > > inconsistent state (target state present without accompanying
> connector
> > > > config), even if it's meant to be only for a brief period. Thinking
> > about
> > > > it some more though, a one-record approach comes with drawbacks in
> the
> > > > downgrade scenario: older workers wouldn't know how to handle the new
> > > > config format and would just fall back to creating the connector in
> the
> > > > running state. I suppose we should favor the two-record approach
> since
> > > the
> > > > downgrade scenario is more likely than the other failure mode, but
> it'd
> > > be
> > > > nice if we could think of a way to satisfy both concerns. Not a
> > blocker,
> > > > though.
> > > >
> > > > 3. Standalone mode has always supported the REST API, and so far
> FWICT
> > > > we've maintained feature parity between the two modes for everything
> > > except
> > > > exactly-once source connectors, which would have required significant
> > > > additional work since we'd have to add support for storing source
> > > connector
> > > > offsets in a Kafka topic instead of on local storage like we
> currently
> > > do.
> > > > I'd really prefer if we could try to maintain feature parity wherever
> > > > possible--one way we could possibly do that with this KIP is to also
> > add
> > > > support for JSON files with standalone mode.
> > > >
> > > > 4. Yeah, no need to block on that idea since there are other use
> cases
> > > for
> > > > creating stopped connectors. We can treat it like the option to
> delete
> > > > offsets along with the connector discussed in KIP-875: punt for now,
> > > > possibly implement later pending user feedback and indication of
> > demand.
> > > > Might be worth adding to a "Future work" section as an indication
> that
> > we
> > > > haven't ruled it out (in which case it'd make sense as a rejected
> > > > alternative) but have chosen not to implement yet.
> > > >
> > > >
> > > > And I had one new thought that's pretty implementation-oriented but
> may
> > > > influence the design slightly:
> > > >
> > > > 6. Right now we write an empty set of task configs to the config
> topic
> > > when
> > > > handling requests to stop a connector in distributed mode. Do we need
> > to
> > > do
> > > > the same when creating connectors in the stopped state, or add any
> > other
> > > > special logic besides noting the new state in the config topic? Or is
> > it
> > > > sufficient to write a non-running target state to the config topic
> and
> > > then
> > > > rely on existing logic to simply refuse to generate task configs for
> > the
> > > > newly-created connector? Is there any chance that the lack of task
> > > configs
> > > > (as opposed to an empty list of task configs) in the config topic
> for a
> > > > connector that exists will cause issues?
> > > >
> > > >
> > > > Cheers,
> > > >
> > > > Chris
> > > >
> > > > On Tue, Oct 3, 2023 at 3:29 AM Yash Mayya <ya...@gmail.com>
> > wrote:
> > > >
> > > > > Hi Chris,
> > > > >
> > > > > Thanks for taking a look at this KIP!
> > > > >
> > > > > 1. I chose to go with simply "state" as that exact term is already
> > > > exposed
> > > > > via some of the existing REST API responses and would be one that
> > users
> > > > are
> > > > > already familiar with (although admittedly something like
> > > "initial_state"
> > > > > wouldn't be much of a jump). Since it's a field in the request body
> > for
> > > > the
> > > > > connector creation endpoint, wouldn't it be implied that it is the
> > > > > "initial" state just like the "config" field represents the
> "initial"
> > > > > configuration? Also, I don't think x.y has been established as the
> > > field
> > > > > naming convention in the Connect REST API right? From what I can
> > tell,
> > > > x_y
> > > > > is the convention being followed for fields in requests
> > ("kafka_topic"
> > > /
> > > > > "kafka_partition" / "kafka_offset" in the offsets APIs for
> instance)
> > > and
> > > > > responses ("error_count", "kafka_cluster_id", "recommended_values"
> > > etc.).
> > > > >
> > > > > 2. The connector configuration record is currently used for both
> > > > connector
> > > > > create requests as well as connector config update requests. Since
> > > we're
> > > > > only allowing configuring the target state for newly created
> > > connectors,
> > > > I
> > > > > feel like it'll be a cleaner separation of concerns to use the
> > existing
> > > > > records for connector configurations and connector target states
> > rather
> > > > > than bundling the "state" and "state.v2" (or equivalent) fields
> into
> > > the
> > > > > connector configuration record. The additional write should be very
> > > > minimal
> > > > > overhead and the two writes would be an atomic operation for
> Connect
> > > > > clusters that are using a transactional producer for the config
> topic
> > > > > anyway. Thoughts?
> > > > >
> > > > > 3. I was thinking that we'd support standalone mode via the same
> > > > connector
> > > > > creation REST API endpoint changes (addition of the "state" field).
> > If
> > > > > there is further interest in adding similar support to the command
> > line
> > > > > method of creating connectors as well, perhaps we could do so in a
> > > small
> > > > > follow-on KIP? I feel like ever since the standalone mode started
> > > > > supporting the full Connect REST API, the command line method of
> > > creating
> > > > > connectors is more of a "legacy" concept.
> > > > >
> > > > > 4. Yeah, connector / offsets migration was just used as a
> > > representative
> > > > > example of how this feature could be useful - I didn't intend for
> it
> > to
> > > > be
> > > > > the sole purpose of this KIP. However, that said, I really like the
> > > idea
> > > > of
> > > > > accepting an "offsets" field in the connector creation request
> since
> > > it'd
> > > > > reduce the number of user operations required from 3 (create the
> > > > connector
> > > > > in a STOPPED state; alter the offsets; resume the connector) to 1.
> > I'd
> > > be
> > > > > happy to either create or review a separate small KIP for that if
> it
> > > > sounds
> > > > > acceptable to you?
> > > > >
> > > > > 5. Whoops, thanks, I hadn't even noticed that that's where I had
> > linked
> > > > to.
> > > > > Fixed!
> > > > >
> > > > > Thanks,
> > > > > Yash
> > > > >
> > > > > On Mon, Oct 2, 2023 at 11:14 PM Chris Egerton
> > <chrise@aiven.io.invalid
> > > >
> > > > > wrote:
> > > > >
> > > > > > Hi Yash,
> > > > > >
> > > > > > Thanks for the KIP! Frankly this feels like an oversight in 875
> and
> > > I'm
> > > > > > glad we're closing that loop ASAP.
> > > > > >
> > > > > >
> > > > > > Here are my thoughts:
> > > > > >
> > > > > > 1. (Nit): IMO "start.state", "initial.state", or "target.state"
> > might
> > > > be
> > > > > > better than just "state" for the field name in the connector
> > creation
> > > > > > request.
> > > > > >
> > > > > > 2. Why implement this in distributed mode with two writes to the
> > > config
> > > > > > topic? We could augment the existing format for connector configs
> > in
> > > > the
> > > > > > config topic [1] with a new field instead.
> > > > > >
> > > > > > 3. Although standalone mode is mentioned in the KIP, there's no
> > > detail
> > > > on
> > > > > > the Java properties file format that we support for standalone
> mode
> > > > > (i.e.,
> > > > > > `connect-standalone config/connect-standalone.properties
> > > > > > config/connect-file-source.properties
> > > > > > config/connect-file-sink.properties`). Do we plan on adding
> support
> > > for
> > > > > > that mode as well?
> > > > > >
> > > > > > 4. I suspect there will be advantages for this feature beyond
> > offsets
> > > > > > migration, but if offsets migration were the only motivation for
> > it,
> > > > > > wouldn't it be simpler to accept an "offsets" field in connector
> > > > creation
> > > > > > requests? That way, users wouldn't have to start a connector in a
> > > > > different
> > > > > > state and then resume it, they could just create the connector
> like
> > > > > normal.
> > > > > > I think the current proposal is acceptable, but wanted to float
> > this
> > > > > > alternative in case we anticipate lots of connector migrations
> and
> > > want
> > > > > to
> > > > > > optimize for them a bit more.
> > > > > >
> > > > > > 5. (NIt): We can link to our own Javadocs [2] instead of
> > javadoc.io
> > > > > >
> > > > > >
> > > > > > [1] -
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/kafka/blob/dcd8c7d05f2f22f2d815405e7ab3ad7439669239/connect/runtime/src/main/java/org/apache/kafka/connect/storage/KafkaConfigBackingStore.java#L234-L236
> > > > > >
> > > > > > [2] -
> > > > >
> https://kafka.apache.org/35/javadoc/index.html?overview-summary.html
> > > > > >
> > > > > >
> > > > > > Cheers,
> > > > > >
> > > > > > Chris
> > > > > >
> > > > > > On Thu, Sep 21, 2023 at 2:37 AM Yash Mayya <yash.mayya@gmail.com
> >
> > > > wrote:
> > > > > >
> > > > > > > Hi Ashwin,
> > > > > > >
> > > > > > > Thanks for taking a look and sharing your thoughts!
> > > > > > >
> > > > > > > 1. Yes, the request / response formats of the two APIs were
> > > > > intentionally
> > > > > > > made identical for such use-cases. [1]
> > > > > > >
> > > > > > > 2. I'm assuming you're referring to retaining the offset /
> config
> > > > topic
> > > > > > > records for a connector when it is deleted by a user? Firstly,
> a
> > > > > > > connector's offsets aren't actually currently deleted when the
> > > > > connector
> > > > > > is
> > > > > > > deleted - it was listed as potential future work in KIP-875
> [2].
> > > > > > Secondly,
> > > > > > > the config topic is the source of truth for the state of a
> > Connect
> > > > > > cluster
> > > > > > > and a connector deletion is done by writing a null-valued
> record
> > > > > > > (tombstone) with the connector key to the config topic. We
> could
> > > > > > > potentially modify the delete mechanism to use a special new
> > record
> > > > > > > (instead of a tombstone with the connector key) in order to
> > retain
> > > > the
> > > > > > > latest configuration of a connector while still deleting the
> > actual
> > > > > > > connector - however, this would have its downsides and I don't
> > see
> > > > too
> > > > > > many
> > > > > > > benefits. Furthermore, connector migration between different
> > Kafka
> > > > > > clusters
> > > > > > > was just used as a representational use case for creating
> > > connectors
> > > > > in a
> > > > > > > stopped state - it isn't the primary focus of this KIP as such.
> > > > > > >
> > > > > > > 3. KIP-875 goes into some more details about this [3], but the
> > TLDR
> > > > is
> > > > > > that
> > > > > > > the STOPPED state will be treated like the PAUSED state on
> older
> > > > > workers
> > > > > > > that don't recognize the STOPPED state.
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Yash
> > > > > > >
> > > > > > > [1] -
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-875%3A+First-class+offsets+support+in+Kafka+Connect#KIP875:FirstclassoffsetssupportinKafkaConnect-Request/responseformat
> > > > > > >
> > > > > > > [2] -
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-875%3A+First-class+offsets+support+in+Kafka+Connect#KIP875:FirstclassoffsetssupportinKafkaConnect-Automaticallydeleteoffsetswithconnectors
> > > > > > >
> > > > > > > [3] -
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-875%3A+First-class+offsets+support+in+Kafka+Connect#KIP875:FirstclassoffsetssupportinKafkaConnect-STOPPEDtargetstate
> > > > > > >
> > > > > > > On Wed, Sep 20, 2023 at 7:24 PM Ashwin
> > > <apankaj@confluent.io.invalid
> > > > >
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Thanks Yash! This is very useful for migrating connectors
> from
> > > one
> > > > > > > cluster
> > > > > > > > to another.
> > > > > > > >
> > > > > > > > I had the following comments/questions
> > > > > > > >
> > > > > > > > 1. Is the offset read using `GET /offsets` api always
> > guaranteed
> > > to
> > > > > be
> > > > > > > in a
> > > > > > > > format accepted by `PATCH /offsets` ?
> > > > > > > > 2. I had to tackle a similar migration situation but the two
> > > > connect
> > > > > > > > clusters in question were using the same backing Kafka
> cluster.
> > > The
> > > > > > > > challenge in this case is that when I delete the original
> > > > connector,
> > > > > I
> > > > > > > want
> > > > > > > > to retain offsets and config topics. Do you think we should
> > > support
> > > > > > > > deletion of a connector without removal of these topics as
> part
> > > of
> > > > > this
> > > > > > > KIP
> > > > > > > > ?
> > > > > > > > 3. In the case of a downgrade, how will Connect worker handle
> > the
> > > > > > > optional
> > > > > > > > “state” field in config topic ?
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > > Ashwin
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > On Sun, Sep 17, 2023 at 11:09 PM Yash Mayya <
> > > yash.mayya@gmail.com>
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Hi all,
> > > > > > > > >
> > > > > > > > > I'd like to begin discussion on a KIP to allow creating
> > > > connectors
> > > > > > in a
> > > > > > > > > stopped state -
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-980%3A+Allow+creating+connectors+in+a+stopped+state
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Thanks,
> > > > > > > > > Yash
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] KIP-980: Allow creating connectors in a stopped state

Posted by Yash Mayya <ya...@gmail.com>.
Hi Chris,

Thanks for all your feedback so far!

3. That's a good question. I was thinking we'd do some "intelligent"
parsing internally during the implementation (i.e. essentially your last
option - attempting to parse first as one format, then the other) which is
why I didn't include any more details in the KIP itself (where I've only
outlined the contract changes). This would allow for the smoothest user
experience IMO and all the heavy lifting will be done in the parsing logic.
All the other options seemed either clunky or brittle from the user
experience point of view. In terms of the actual implementation itself,
we'd probably want to first try parsing it into the supported JSON
structures before trying to parse it into Java properties since the Java
properties format is very permissive (i.e. we won't really see any errors
on attempting to parse a JSON file into Java properties).

Thanks,
Yash

On Thu, Oct 5, 2023 at 1:39 AM Chris Egerton <ch...@aiven.io.invalid>
wrote:

> Hi Yash,
>
> Looking great! Few more thoughts:
>
>
> 1. (Downgraded to nit) I still prefer dot-delimitation but it's not a
> blocker; thanks for addressing my concerns about the name of the field and
> how it may be perceived by users.
>
> 2. (Addressed) Thanks for looking into this, and sorry it turned out to be
> a bit of a dead end! I'm convinced that the current proposal is good
> enough.
>
> 3. Can you shed a little more light on how we'll determine whether a
> connector config should be parsed as JSON or as a properties file? Will
> this be based on file extension, a command-line flag (which might apply to
> all configs, or individual configs), attempting to parse first as one
> format then the other, something else?
>
> 4. (Addressed) Thanks! Looks great.
>
> 6. (Addressed) Awesome, great to hear. The point about laggy connector
> startup is very convincing; my paranoia is satiated.
>
>
> Cheers,
>
> Chris
>
> On Wed, Oct 4, 2023 at 5:35 AM Yash Mayya <ya...@gmail.com> wrote:
>
> > Hi Chris,
> >
> > Thanks for the quick follow up and the continued insightful discourse!
> >
> > 1. Fair point on the need to differentiate it from the actual state
> > displayed in the status API, I like the prefix of "initial" to make that
> > differentiation (from your suggested alternatives previously). Regarding
> > the dots vs underscores as delimiters - the new state field will be a top
> > level field in the connector creation request body alongside the "config"
> > map (i.e. it won't be a connector configuration itself), so I think we
> > should be using the underscore delimiter for consistency. For now, I've
> > updated the KIP to use "initial_state" as the new field's name - let me
> > know if you disagree, and I'd be happy to reconsider.
> >
> > 2. Hm, I actually hadn't considered the downgrade implications with your
> > proposed single record approach. I agree that it's a bigger downside than
> > writing two records to the config topic. I do understand your concerns
> with
> > the potential for config topic inconsistencies which is why I proposed
> > writing the target state first (since the presence of a target state for
> a
> > connector with no configuration is a benign condition). Also, even in the
> > non-transactional config topic producer case - if there is a failure
> > between the two writes, the user will be notified of the error
> > synchronously via the API response (ref -
> > https://github.com/apache/kafka/pull/12984) and will be able to safely
> > retry the operation. I don't see how we'd be able to do a single record
> > write approach along with supporting clean downgrades since we'd either
> > need to introduce a new record type or add a new field to an existing
> > record type - neither of which would be recognized as such by an older
> > Connect worker.
> >
> > > Standalone mode has always supported the REST API,
> > > and so far FWICTwe've maintained feature parity between
> > > the two modes
> >
> > > add support for JSON files with standalone mode.
> >
> > 3. Thanks, I wasn't aware about standalone mode always having supported
> the
> > full REST API - I thought I'd seen some references earlier indicating
> > otherwise. In that case, I do agree that it makes sense to maintain
> parity
> > across both methods of connector creation for user experience
> consistency.
> > I really like the idea of updating the standalone mode CLI to be able to
> > parse JSON files (in the same format as the connector creation REST API
> > endpoint request body) along with Java properties files since I think
> that
> > offers two big benefits. One is that users will be able to copy and use
> > examples across both the methods of connector creation (REST API requests
> > with JSON request bodies and JSON files passed to the standalone mode
> > startup CLI). The second benefit is that any future extensions (such as
> the
> > "offsets" field we've discussed in this thread) would be easily applied
> > across both the methods consistently instead of introducing new (and
> likely
> > ugly) CLI flags. I've updated the KIP to include this change in the
> > standalone mode CLI.
> >
> > 4. Makes sense, I've added this under a new "Future Work" section in the
> > KIP.
> >
> > 6. From what I can tell, there shouldn't be any issues with the lack of
> > task configurations in the config topic and it seems to be a supported
> > assumption across the Connect code-base that a connector configuration
> > could exist without any task configurations for the connector (a
> situation
> > that could currently manifest with slow starting connectors, connectors
> > that fail during startup, connectors that fail to generate task
> > configurations, connectors that are paused right after being created
> etc.).
> > I did also try out a small prototype before publishing this KIP and
> things
> > do work as expected when creating a connector in the PAUSED / STOPPED
> state
> > by simply writing the appropriate target state along with the connector
> > configuration to the config topic.
> >
> > Thanks,
> > Yash
> >
> > On Wed, Oct 4, 2023 at 1:44 AM Chris Egerton <ch...@aiven.io.invalid>
> > wrote:
> >
> > > Hi Yash,
> > >
> > > Thanks for the in-depth discussion! Continuations here:
> > >
> > > 1. Regarding delimiters (dots vs. underscores), we use dots in
> connector
> > > configs for all runtime-recognized properties, including but not
> limited
> > to
> > > connector.class, tasks.max, key.converter, key.converter.*, and
> > > transforms.*.type. Regarding the choice of name--I think it's because
> the
> > > concept here is different from the "state" that we display in the
> status
> > > API that a different name is warranted. Users can't directly write the
> > > connector's actual state, they can still only specify a target state.
> And
> > > there's no guarantee that a connector will ever enter a target state
> > > (whether it's specified at creation time or later), since a failure can
> > > always occur.
> > >
> > > 2. It still seems simpler to emit one record instead of two, especially
> > > since we're not guaranteed that the leader will be using a
> transactional
> > > producer. I guess I'm just wary of knowingly putting the config topic
> in
> > an
> > > inconsistent state (target state present without accompanying connector
> > > config), even if it's meant to be only for a brief period. Thinking
> about
> > > it some more though, a one-record approach comes with drawbacks in the
> > > downgrade scenario: older workers wouldn't know how to handle the new
> > > config format and would just fall back to creating the connector in the
> > > running state. I suppose we should favor the two-record approach since
> > the
> > > downgrade scenario is more likely than the other failure mode, but it'd
> > be
> > > nice if we could think of a way to satisfy both concerns. Not a
> blocker,
> > > though.
> > >
> > > 3. Standalone mode has always supported the REST API, and so far FWICT
> > > we've maintained feature parity between the two modes for everything
> > except
> > > exactly-once source connectors, which would have required significant
> > > additional work since we'd have to add support for storing source
> > connector
> > > offsets in a Kafka topic instead of on local storage like we currently
> > do.
> > > I'd really prefer if we could try to maintain feature parity wherever
> > > possible--one way we could possibly do that with this KIP is to also
> add
> > > support for JSON files with standalone mode.
> > >
> > > 4. Yeah, no need to block on that idea since there are other use cases
> > for
> > > creating stopped connectors. We can treat it like the option to delete
> > > offsets along with the connector discussed in KIP-875: punt for now,
> > > possibly implement later pending user feedback and indication of
> demand.
> > > Might be worth adding to a "Future work" section as an indication that
> we
> > > haven't ruled it out (in which case it'd make sense as a rejected
> > > alternative) but have chosen not to implement yet.
> > >
> > >
> > > And I had one new thought that's pretty implementation-oriented but may
> > > influence the design slightly:
> > >
> > > 6. Right now we write an empty set of task configs to the config topic
> > when
> > > handling requests to stop a connector in distributed mode. Do we need
> to
> > do
> > > the same when creating connectors in the stopped state, or add any
> other
> > > special logic besides noting the new state in the config topic? Or is
> it
> > > sufficient to write a non-running target state to the config topic and
> > then
> > > rely on existing logic to simply refuse to generate task configs for
> the
> > > newly-created connector? Is there any chance that the lack of task
> > configs
> > > (as opposed to an empty list of task configs) in the config topic for a
> > > connector that exists will cause issues?
> > >
> > >
> > > Cheers,
> > >
> > > Chris
> > >
> > > On Tue, Oct 3, 2023 at 3:29 AM Yash Mayya <ya...@gmail.com>
> wrote:
> > >
> > > > Hi Chris,
> > > >
> > > > Thanks for taking a look at this KIP!
> > > >
> > > > 1. I chose to go with simply "state" as that exact term is already
> > > exposed
> > > > via some of the existing REST API responses and would be one that
> users
> > > are
> > > > already familiar with (although admittedly something like
> > "initial_state"
> > > > wouldn't be much of a jump). Since it's a field in the request body
> for
> > > the
> > > > connector creation endpoint, wouldn't it be implied that it is the
> > > > "initial" state just like the "config" field represents the "initial"
> > > > configuration? Also, I don't think x.y has been established as the
> > field
> > > > naming convention in the Connect REST API right? From what I can
> tell,
> > > x_y
> > > > is the convention being followed for fields in requests
> ("kafka_topic"
> > /
> > > > "kafka_partition" / "kafka_offset" in the offsets APIs for instance)
> > and
> > > > responses ("error_count", "kafka_cluster_id", "recommended_values"
> > etc.).
> > > >
> > > > 2. The connector configuration record is currently used for both
> > > connector
> > > > create requests as well as connector config update requests. Since
> > we're
> > > > only allowing configuring the target state for newly created
> > connectors,
> > > I
> > > > feel like it'll be a cleaner separation of concerns to use the
> existing
> > > > records for connector configurations and connector target states
> rather
> > > > than bundling the "state" and "state.v2" (or equivalent) fields into
> > the
> > > > connector configuration record. The additional write should be very
> > > minimal
> > > > overhead and the two writes would be an atomic operation for Connect
> > > > clusters that are using a transactional producer for the config topic
> > > > anyway. Thoughts?
> > > >
> > > > 3. I was thinking that we'd support standalone mode via the same
> > > connector
> > > > creation REST API endpoint changes (addition of the "state" field).
> If
> > > > there is further interest in adding similar support to the command
> line
> > > > method of creating connectors as well, perhaps we could do so in a
> > small
> > > > follow-on KIP? I feel like ever since the standalone mode started
> > > > supporting the full Connect REST API, the command line method of
> > creating
> > > > connectors is more of a "legacy" concept.
> > > >
> > > > 4. Yeah, connector / offsets migration was just used as a
> > representative
> > > > example of how this feature could be useful - I didn't intend for it
> to
> > > be
> > > > the sole purpose of this KIP. However, that said, I really like the
> > idea
> > > of
> > > > accepting an "offsets" field in the connector creation request since
> > it'd
> > > > reduce the number of user operations required from 3 (create the
> > > connector
> > > > in a STOPPED state; alter the offsets; resume the connector) to 1.
> I'd
> > be
> > > > happy to either create or review a separate small KIP for that if it
> > > sounds
> > > > acceptable to you?
> > > >
> > > > 5. Whoops, thanks, I hadn't even noticed that that's where I had
> linked
> > > to.
> > > > Fixed!
> > > >
> > > > Thanks,
> > > > Yash
> > > >
> > > > On Mon, Oct 2, 2023 at 11:14 PM Chris Egerton
> <chrise@aiven.io.invalid
> > >
> > > > wrote:
> > > >
> > > > > Hi Yash,
> > > > >
> > > > > Thanks for the KIP! Frankly this feels like an oversight in 875 and
> > I'm
> > > > > glad we're closing that loop ASAP.
> > > > >
> > > > >
> > > > > Here are my thoughts:
> > > > >
> > > > > 1. (Nit): IMO "start.state", "initial.state", or "target.state"
> might
> > > be
> > > > > better than just "state" for the field name in the connector
> creation
> > > > > request.
> > > > >
> > > > > 2. Why implement this in distributed mode with two writes to the
> > config
> > > > > topic? We could augment the existing format for connector configs
> in
> > > the
> > > > > config topic [1] with a new field instead.
> > > > >
> > > > > 3. Although standalone mode is mentioned in the KIP, there's no
> > detail
> > > on
> > > > > the Java properties file format that we support for standalone mode
> > > > (i.e.,
> > > > > `connect-standalone config/connect-standalone.properties
> > > > > config/connect-file-source.properties
> > > > > config/connect-file-sink.properties`). Do we plan on adding support
> > for
> > > > > that mode as well?
> > > > >
> > > > > 4. I suspect there will be advantages for this feature beyond
> offsets
> > > > > migration, but if offsets migration were the only motivation for
> it,
> > > > > wouldn't it be simpler to accept an "offsets" field in connector
> > > creation
> > > > > requests? That way, users wouldn't have to start a connector in a
> > > > different
> > > > > state and then resume it, they could just create the connector like
> > > > normal.
> > > > > I think the current proposal is acceptable, but wanted to float
> this
> > > > > alternative in case we anticipate lots of connector migrations and
> > want
> > > > to
> > > > > optimize for them a bit more.
> > > > >
> > > > > 5. (NIt): We can link to our own Javadocs [2] instead of
> javadoc.io
> > > > >
> > > > >
> > > > > [1] -
> > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/kafka/blob/dcd8c7d05f2f22f2d815405e7ab3ad7439669239/connect/runtime/src/main/java/org/apache/kafka/connect/storage/KafkaConfigBackingStore.java#L234-L236
> > > > >
> > > > > [2] -
> > > > https://kafka.apache.org/35/javadoc/index.html?overview-summary.html
> > > > >
> > > > >
> > > > > Cheers,
> > > > >
> > > > > Chris
> > > > >
> > > > > On Thu, Sep 21, 2023 at 2:37 AM Yash Mayya <ya...@gmail.com>
> > > wrote:
> > > > >
> > > > > > Hi Ashwin,
> > > > > >
> > > > > > Thanks for taking a look and sharing your thoughts!
> > > > > >
> > > > > > 1. Yes, the request / response formats of the two APIs were
> > > > intentionally
> > > > > > made identical for such use-cases. [1]
> > > > > >
> > > > > > 2. I'm assuming you're referring to retaining the offset / config
> > > topic
> > > > > > records for a connector when it is deleted by a user? Firstly, a
> > > > > > connector's offsets aren't actually currently deleted when the
> > > > connector
> > > > > is
> > > > > > deleted - it was listed as potential future work in KIP-875 [2].
> > > > > Secondly,
> > > > > > the config topic is the source of truth for the state of a
> Connect
> > > > > cluster
> > > > > > and a connector deletion is done by writing a null-valued record
> > > > > > (tombstone) with the connector key to the config topic. We could
> > > > > > potentially modify the delete mechanism to use a special new
> record
> > > > > > (instead of a tombstone with the connector key) in order to
> retain
> > > the
> > > > > > latest configuration of a connector while still deleting the
> actual
> > > > > > connector - however, this would have its downsides and I don't
> see
> > > too
> > > > > many
> > > > > > benefits. Furthermore, connector migration between different
> Kafka
> > > > > clusters
> > > > > > was just used as a representational use case for creating
> > connectors
> > > > in a
> > > > > > stopped state - it isn't the primary focus of this KIP as such.
> > > > > >
> > > > > > 3. KIP-875 goes into some more details about this [3], but the
> TLDR
> > > is
> > > > > that
> > > > > > the STOPPED state will be treated like the PAUSED state on older
> > > > workers
> > > > > > that don't recognize the STOPPED state.
> > > > > >
> > > > > > Thanks,
> > > > > > Yash
> > > > > >
> > > > > > [1] -
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-875%3A+First-class+offsets+support+in+Kafka+Connect#KIP875:FirstclassoffsetssupportinKafkaConnect-Request/responseformat
> > > > > >
> > > > > > [2] -
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-875%3A+First-class+offsets+support+in+Kafka+Connect#KIP875:FirstclassoffsetssupportinKafkaConnect-Automaticallydeleteoffsetswithconnectors
> > > > > >
> > > > > > [3] -
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-875%3A+First-class+offsets+support+in+Kafka+Connect#KIP875:FirstclassoffsetssupportinKafkaConnect-STOPPEDtargetstate
> > > > > >
> > > > > > On Wed, Sep 20, 2023 at 7:24 PM Ashwin
> > <apankaj@confluent.io.invalid
> > > >
> > > > > > wrote:
> > > > > >
> > > > > > > Thanks Yash! This is very useful for migrating connectors from
> > one
> > > > > > cluster
> > > > > > > to another.
> > > > > > >
> > > > > > > I had the following comments/questions
> > > > > > >
> > > > > > > 1. Is the offset read using `GET /offsets` api always
> guaranteed
> > to
> > > > be
> > > > > > in a
> > > > > > > format accepted by `PATCH /offsets` ?
> > > > > > > 2. I had to tackle a similar migration situation but the two
> > > connect
> > > > > > > clusters in question were using the same backing Kafka cluster.
> > The
> > > > > > > challenge in this case is that when I delete the original
> > > connector,
> > > > I
> > > > > > want
> > > > > > > to retain offsets and config topics. Do you think we should
> > support
> > > > > > > deletion of a connector without removal of these topics as part
> > of
> > > > this
> > > > > > KIP
> > > > > > > ?
> > > > > > > 3. In the case of a downgrade, how will Connect worker handle
> the
> > > > > > optional
> > > > > > > “state” field in config topic ?
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Ashwin
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > On Sun, Sep 17, 2023 at 11:09 PM Yash Mayya <
> > yash.mayya@gmail.com>
> > > > > > wrote:
> > > > > > >
> > > > > > > > Hi all,
> > > > > > > >
> > > > > > > > I'd like to begin discussion on a KIP to allow creating
> > > connectors
> > > > > in a
> > > > > > > > stopped state -
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-980%3A+Allow+creating+connectors+in+a+stopped+state
> > > > > > > >
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > > Yash
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] KIP-980: Allow creating connectors in a stopped state

Posted by Chris Egerton <ch...@aiven.io.INVALID>.
Hi Yash,

Looking great! Few more thoughts:


1. (Downgraded to nit) I still prefer dot-delimitation but it's not a
blocker; thanks for addressing my concerns about the name of the field and
how it may be perceived by users.

2. (Addressed) Thanks for looking into this, and sorry it turned out to be
a bit of a dead end! I'm convinced that the current proposal is good enough.

3. Can you shed a little more light on how we'll determine whether a
connector config should be parsed as JSON or as a properties file? Will
this be based on file extension, a command-line flag (which might apply to
all configs, or individual configs), attempting to parse first as one
format then the other, something else?

4. (Addressed) Thanks! Looks great.

6. (Addressed) Awesome, great to hear. The point about laggy connector
startup is very convincing; my paranoia is satiated.


Cheers,

Chris

On Wed, Oct 4, 2023 at 5:35 AM Yash Mayya <ya...@gmail.com> wrote:

> Hi Chris,
>
> Thanks for the quick follow up and the continued insightful discourse!
>
> 1. Fair point on the need to differentiate it from the actual state
> displayed in the status API, I like the prefix of "initial" to make that
> differentiation (from your suggested alternatives previously). Regarding
> the dots vs underscores as delimiters - the new state field will be a top
> level field in the connector creation request body alongside the "config"
> map (i.e. it won't be a connector configuration itself), so I think we
> should be using the underscore delimiter for consistency. For now, I've
> updated the KIP to use "initial_state" as the new field's name - let me
> know if you disagree, and I'd be happy to reconsider.
>
> 2. Hm, I actually hadn't considered the downgrade implications with your
> proposed single record approach. I agree that it's a bigger downside than
> writing two records to the config topic. I do understand your concerns with
> the potential for config topic inconsistencies which is why I proposed
> writing the target state first (since the presence of a target state for a
> connector with no configuration is a benign condition). Also, even in the
> non-transactional config topic producer case - if there is a failure
> between the two writes, the user will be notified of the error
> synchronously via the API response (ref -
> https://github.com/apache/kafka/pull/12984) and will be able to safely
> retry the operation. I don't see how we'd be able to do a single record
> write approach along with supporting clean downgrades since we'd either
> need to introduce a new record type or add a new field to an existing
> record type - neither of which would be recognized as such by an older
> Connect worker.
>
> > Standalone mode has always supported the REST API,
> > and so far FWICTwe've maintained feature parity between
> > the two modes
>
> > add support for JSON files with standalone mode.
>
> 3. Thanks, I wasn't aware about standalone mode always having supported the
> full REST API - I thought I'd seen some references earlier indicating
> otherwise. In that case, I do agree that it makes sense to maintain parity
> across both methods of connector creation for user experience consistency.
> I really like the idea of updating the standalone mode CLI to be able to
> parse JSON files (in the same format as the connector creation REST API
> endpoint request body) along with Java properties files since I think that
> offers two big benefits. One is that users will be able to copy and use
> examples across both the methods of connector creation (REST API requests
> with JSON request bodies and JSON files passed to the standalone mode
> startup CLI). The second benefit is that any future extensions (such as the
> "offsets" field we've discussed in this thread) would be easily applied
> across both the methods consistently instead of introducing new (and likely
> ugly) CLI flags. I've updated the KIP to include this change in the
> standalone mode CLI.
>
> 4. Makes sense, I've added this under a new "Future Work" section in the
> KIP.
>
> 6. From what I can tell, there shouldn't be any issues with the lack of
> task configurations in the config topic and it seems to be a supported
> assumption across the Connect code-base that a connector configuration
> could exist without any task configurations for the connector (a situation
> that could currently manifest with slow starting connectors, connectors
> that fail during startup, connectors that fail to generate task
> configurations, connectors that are paused right after being created etc.).
> I did also try out a small prototype before publishing this KIP and things
> do work as expected when creating a connector in the PAUSED / STOPPED state
> by simply writing the appropriate target state along with the connector
> configuration to the config topic.
>
> Thanks,
> Yash
>
> On Wed, Oct 4, 2023 at 1:44 AM Chris Egerton <ch...@aiven.io.invalid>
> wrote:
>
> > Hi Yash,
> >
> > Thanks for the in-depth discussion! Continuations here:
> >
> > 1. Regarding delimiters (dots vs. underscores), we use dots in connector
> > configs for all runtime-recognized properties, including but not limited
> to
> > connector.class, tasks.max, key.converter, key.converter.*, and
> > transforms.*.type. Regarding the choice of name--I think it's because the
> > concept here is different from the "state" that we display in the status
> > API that a different name is warranted. Users can't directly write the
> > connector's actual state, they can still only specify a target state. And
> > there's no guarantee that a connector will ever enter a target state
> > (whether it's specified at creation time or later), since a failure can
> > always occur.
> >
> > 2. It still seems simpler to emit one record instead of two, especially
> > since we're not guaranteed that the leader will be using a transactional
> > producer. I guess I'm just wary of knowingly putting the config topic in
> an
> > inconsistent state (target state present without accompanying connector
> > config), even if it's meant to be only for a brief period. Thinking about
> > it some more though, a one-record approach comes with drawbacks in the
> > downgrade scenario: older workers wouldn't know how to handle the new
> > config format and would just fall back to creating the connector in the
> > running state. I suppose we should favor the two-record approach since
> the
> > downgrade scenario is more likely than the other failure mode, but it'd
> be
> > nice if we could think of a way to satisfy both concerns. Not a blocker,
> > though.
> >
> > 3. Standalone mode has always supported the REST API, and so far FWICT
> > we've maintained feature parity between the two modes for everything
> except
> > exactly-once source connectors, which would have required significant
> > additional work since we'd have to add support for storing source
> connector
> > offsets in a Kafka topic instead of on local storage like we currently
> do.
> > I'd really prefer if we could try to maintain feature parity wherever
> > possible--one way we could possibly do that with this KIP is to also add
> > support for JSON files with standalone mode.
> >
> > 4. Yeah, no need to block on that idea since there are other use cases
> for
> > creating stopped connectors. We can treat it like the option to delete
> > offsets along with the connector discussed in KIP-875: punt for now,
> > possibly implement later pending user feedback and indication of demand.
> > Might be worth adding to a "Future work" section as an indication that we
> > haven't ruled it out (in which case it'd make sense as a rejected
> > alternative) but have chosen not to implement yet.
> >
> >
> > And I had one new thought that's pretty implementation-oriented but may
> > influence the design slightly:
> >
> > 6. Right now we write an empty set of task configs to the config topic
> when
> > handling requests to stop a connector in distributed mode. Do we need to
> do
> > the same when creating connectors in the stopped state, or add any other
> > special logic besides noting the new state in the config topic? Or is it
> > sufficient to write a non-running target state to the config topic and
> then
> > rely on existing logic to simply refuse to generate task configs for the
> > newly-created connector? Is there any chance that the lack of task
> configs
> > (as opposed to an empty list of task configs) in the config topic for a
> > connector that exists will cause issues?
> >
> >
> > Cheers,
> >
> > Chris
> >
> > On Tue, Oct 3, 2023 at 3:29 AM Yash Mayya <ya...@gmail.com> wrote:
> >
> > > Hi Chris,
> > >
> > > Thanks for taking a look at this KIP!
> > >
> > > 1. I chose to go with simply "state" as that exact term is already
> > exposed
> > > via some of the existing REST API responses and would be one that users
> > are
> > > already familiar with (although admittedly something like
> "initial_state"
> > > wouldn't be much of a jump). Since it's a field in the request body for
> > the
> > > connector creation endpoint, wouldn't it be implied that it is the
> > > "initial" state just like the "config" field represents the "initial"
> > > configuration? Also, I don't think x.y has been established as the
> field
> > > naming convention in the Connect REST API right? From what I can tell,
> > x_y
> > > is the convention being followed for fields in requests ("kafka_topic"
> /
> > > "kafka_partition" / "kafka_offset" in the offsets APIs for instance)
> and
> > > responses ("error_count", "kafka_cluster_id", "recommended_values"
> etc.).
> > >
> > > 2. The connector configuration record is currently used for both
> > connector
> > > create requests as well as connector config update requests. Since
> we're
> > > only allowing configuring the target state for newly created
> connectors,
> > I
> > > feel like it'll be a cleaner separation of concerns to use the existing
> > > records for connector configurations and connector target states rather
> > > than bundling the "state" and "state.v2" (or equivalent) fields into
> the
> > > connector configuration record. The additional write should be very
> > minimal
> > > overhead and the two writes would be an atomic operation for Connect
> > > clusters that are using a transactional producer for the config topic
> > > anyway. Thoughts?
> > >
> > > 3. I was thinking that we'd support standalone mode via the same
> > connector
> > > creation REST API endpoint changes (addition of the "state" field). If
> > > there is further interest in adding similar support to the command line
> > > method of creating connectors as well, perhaps we could do so in a
> small
> > > follow-on KIP? I feel like ever since the standalone mode started
> > > supporting the full Connect REST API, the command line method of
> creating
> > > connectors is more of a "legacy" concept.
> > >
> > > 4. Yeah, connector / offsets migration was just used as a
> representative
> > > example of how this feature could be useful - I didn't intend for it to
> > be
> > > the sole purpose of this KIP. However, that said, I really like the
> idea
> > of
> > > accepting an "offsets" field in the connector creation request since
> it'd
> > > reduce the number of user operations required from 3 (create the
> > connector
> > > in a STOPPED state; alter the offsets; resume the connector) to 1. I'd
> be
> > > happy to either create or review a separate small KIP for that if it
> > sounds
> > > acceptable to you?
> > >
> > > 5. Whoops, thanks, I hadn't even noticed that that's where I had linked
> > to.
> > > Fixed!
> > >
> > > Thanks,
> > > Yash
> > >
> > > On Mon, Oct 2, 2023 at 11:14 PM Chris Egerton <chrise@aiven.io.invalid
> >
> > > wrote:
> > >
> > > > Hi Yash,
> > > >
> > > > Thanks for the KIP! Frankly this feels like an oversight in 875 and
> I'm
> > > > glad we're closing that loop ASAP.
> > > >
> > > >
> > > > Here are my thoughts:
> > > >
> > > > 1. (Nit): IMO "start.state", "initial.state", or "target.state" might
> > be
> > > > better than just "state" for the field name in the connector creation
> > > > request.
> > > >
> > > > 2. Why implement this in distributed mode with two writes to the
> config
> > > > topic? We could augment the existing format for connector configs in
> > the
> > > > config topic [1] with a new field instead.
> > > >
> > > > 3. Although standalone mode is mentioned in the KIP, there's no
> detail
> > on
> > > > the Java properties file format that we support for standalone mode
> > > (i.e.,
> > > > `connect-standalone config/connect-standalone.properties
> > > > config/connect-file-source.properties
> > > > config/connect-file-sink.properties`). Do we plan on adding support
> for
> > > > that mode as well?
> > > >
> > > > 4. I suspect there will be advantages for this feature beyond offsets
> > > > migration, but if offsets migration were the only motivation for it,
> > > > wouldn't it be simpler to accept an "offsets" field in connector
> > creation
> > > > requests? That way, users wouldn't have to start a connector in a
> > > different
> > > > state and then resume it, they could just create the connector like
> > > normal.
> > > > I think the current proposal is acceptable, but wanted to float this
> > > > alternative in case we anticipate lots of connector migrations and
> want
> > > to
> > > > optimize for them a bit more.
> > > >
> > > > 5. (NIt): We can link to our own Javadocs [2] instead of javadoc.io
> > > >
> > > >
> > > > [1] -
> > > >
> > > >
> > >
> >
> https://github.com/apache/kafka/blob/dcd8c7d05f2f22f2d815405e7ab3ad7439669239/connect/runtime/src/main/java/org/apache/kafka/connect/storage/KafkaConfigBackingStore.java#L234-L236
> > > >
> > > > [2] -
> > > https://kafka.apache.org/35/javadoc/index.html?overview-summary.html
> > > >
> > > >
> > > > Cheers,
> > > >
> > > > Chris
> > > >
> > > > On Thu, Sep 21, 2023 at 2:37 AM Yash Mayya <ya...@gmail.com>
> > wrote:
> > > >
> > > > > Hi Ashwin,
> > > > >
> > > > > Thanks for taking a look and sharing your thoughts!
> > > > >
> > > > > 1. Yes, the request / response formats of the two APIs were
> > > intentionally
> > > > > made identical for such use-cases. [1]
> > > > >
> > > > > 2. I'm assuming you're referring to retaining the offset / config
> > topic
> > > > > records for a connector when it is deleted by a user? Firstly, a
> > > > > connector's offsets aren't actually currently deleted when the
> > > connector
> > > > is
> > > > > deleted - it was listed as potential future work in KIP-875 [2].
> > > > Secondly,
> > > > > the config topic is the source of truth for the state of a Connect
> > > > cluster
> > > > > and a connector deletion is done by writing a null-valued record
> > > > > (tombstone) with the connector key to the config topic. We could
> > > > > potentially modify the delete mechanism to use a special new record
> > > > > (instead of a tombstone with the connector key) in order to retain
> > the
> > > > > latest configuration of a connector while still deleting the actual
> > > > > connector - however, this would have its downsides and I don't see
> > too
> > > > many
> > > > > benefits. Furthermore, connector migration between different Kafka
> > > > clusters
> > > > > was just used as a representational use case for creating
> connectors
> > > in a
> > > > > stopped state - it isn't the primary focus of this KIP as such.
> > > > >
> > > > > 3. KIP-875 goes into some more details about this [3], but the TLDR
> > is
> > > > that
> > > > > the STOPPED state will be treated like the PAUSED state on older
> > > workers
> > > > > that don't recognize the STOPPED state.
> > > > >
> > > > > Thanks,
> > > > > Yash
> > > > >
> > > > > [1] -
> > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-875%3A+First-class+offsets+support+in+Kafka+Connect#KIP875:FirstclassoffsetssupportinKafkaConnect-Request/responseformat
> > > > >
> > > > > [2] -
> > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-875%3A+First-class+offsets+support+in+Kafka+Connect#KIP875:FirstclassoffsetssupportinKafkaConnect-Automaticallydeleteoffsetswithconnectors
> > > > >
> > > > > [3] -
> > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-875%3A+First-class+offsets+support+in+Kafka+Connect#KIP875:FirstclassoffsetssupportinKafkaConnect-STOPPEDtargetstate
> > > > >
> > > > > On Wed, Sep 20, 2023 at 7:24 PM Ashwin
> <apankaj@confluent.io.invalid
> > >
> > > > > wrote:
> > > > >
> > > > > > Thanks Yash! This is very useful for migrating connectors from
> one
> > > > > cluster
> > > > > > to another.
> > > > > >
> > > > > > I had the following comments/questions
> > > > > >
> > > > > > 1. Is the offset read using `GET /offsets` api always guaranteed
> to
> > > be
> > > > > in a
> > > > > > format accepted by `PATCH /offsets` ?
> > > > > > 2. I had to tackle a similar migration situation but the two
> > connect
> > > > > > clusters in question were using the same backing Kafka cluster.
> The
> > > > > > challenge in this case is that when I delete the original
> > connector,
> > > I
> > > > > want
> > > > > > to retain offsets and config topics. Do you think we should
> support
> > > > > > deletion of a connector without removal of these topics as part
> of
> > > this
> > > > > KIP
> > > > > > ?
> > > > > > 3. In the case of a downgrade, how will Connect worker handle the
> > > > > optional
> > > > > > “state” field in config topic ?
> > > > > >
> > > > > > Thanks,
> > > > > > Ashwin
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Sun, Sep 17, 2023 at 11:09 PM Yash Mayya <
> yash.mayya@gmail.com>
> > > > > wrote:
> > > > > >
> > > > > > > Hi all,
> > > > > > >
> > > > > > > I'd like to begin discussion on a KIP to allow creating
> > connectors
> > > > in a
> > > > > > > stopped state -
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-980%3A+Allow+creating+connectors+in+a+stopped+state
> > > > > > >
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Yash
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] KIP-980: Allow creating connectors in a stopped state

Posted by Yash Mayya <ya...@gmail.com>.
Hi Chris,

Thanks for the quick follow up and the continued insightful discourse!

1. Fair point on the need to differentiate it from the actual state
displayed in the status API, I like the prefix of "initial" to make that
differentiation (from your suggested alternatives previously). Regarding
the dots vs underscores as delimiters - the new state field will be a top
level field in the connector creation request body alongside the "config"
map (i.e. it won't be a connector configuration itself), so I think we
should be using the underscore delimiter for consistency. For now, I've
updated the KIP to use "initial_state" as the new field's name - let me
know if you disagree, and I'd be happy to reconsider.

2. Hm, I actually hadn't considered the downgrade implications with your
proposed single record approach. I agree that it's a bigger downside than
writing two records to the config topic. I do understand your concerns with
the potential for config topic inconsistencies which is why I proposed
writing the target state first (since the presence of a target state for a
connector with no configuration is a benign condition). Also, even in the
non-transactional config topic producer case - if there is a failure
between the two writes, the user will be notified of the error
synchronously via the API response (ref -
https://github.com/apache/kafka/pull/12984) and will be able to safely
retry the operation. I don't see how we'd be able to do a single record
write approach along with supporting clean downgrades since we'd either
need to introduce a new record type or add a new field to an existing
record type - neither of which would be recognized as such by an older
Connect worker.

> Standalone mode has always supported the REST API,
> and so far FWICTwe've maintained feature parity between
> the two modes

> add support for JSON files with standalone mode.

3. Thanks, I wasn't aware about standalone mode always having supported the
full REST API - I thought I'd seen some references earlier indicating
otherwise. In that case, I do agree that it makes sense to maintain parity
across both methods of connector creation for user experience consistency.
I really like the idea of updating the standalone mode CLI to be able to
parse JSON files (in the same format as the connector creation REST API
endpoint request body) along with Java properties files since I think that
offers two big benefits. One is that users will be able to copy and use
examples across both the methods of connector creation (REST API requests
with JSON request bodies and JSON files passed to the standalone mode
startup CLI). The second benefit is that any future extensions (such as the
"offsets" field we've discussed in this thread) would be easily applied
across both the methods consistently instead of introducing new (and likely
ugly) CLI flags. I've updated the KIP to include this change in the
standalone mode CLI.

4. Makes sense, I've added this under a new "Future Work" section in the
KIP.

6. From what I can tell, there shouldn't be any issues with the lack of
task configurations in the config topic and it seems to be a supported
assumption across the Connect code-base that a connector configuration
could exist without any task configurations for the connector (a situation
that could currently manifest with slow starting connectors, connectors
that fail during startup, connectors that fail to generate task
configurations, connectors that are paused right after being created etc.).
I did also try out a small prototype before publishing this KIP and things
do work as expected when creating a connector in the PAUSED / STOPPED state
by simply writing the appropriate target state along with the connector
configuration to the config topic.

Thanks,
Yash

On Wed, Oct 4, 2023 at 1:44 AM Chris Egerton <ch...@aiven.io.invalid>
wrote:

> Hi Yash,
>
> Thanks for the in-depth discussion! Continuations here:
>
> 1. Regarding delimiters (dots vs. underscores), we use dots in connector
> configs for all runtime-recognized properties, including but not limited to
> connector.class, tasks.max, key.converter, key.converter.*, and
> transforms.*.type. Regarding the choice of name--I think it's because the
> concept here is different from the "state" that we display in the status
> API that a different name is warranted. Users can't directly write the
> connector's actual state, they can still only specify a target state. And
> there's no guarantee that a connector will ever enter a target state
> (whether it's specified at creation time or later), since a failure can
> always occur.
>
> 2. It still seems simpler to emit one record instead of two, especially
> since we're not guaranteed that the leader will be using a transactional
> producer. I guess I'm just wary of knowingly putting the config topic in an
> inconsistent state (target state present without accompanying connector
> config), even if it's meant to be only for a brief period. Thinking about
> it some more though, a one-record approach comes with drawbacks in the
> downgrade scenario: older workers wouldn't know how to handle the new
> config format and would just fall back to creating the connector in the
> running state. I suppose we should favor the two-record approach since the
> downgrade scenario is more likely than the other failure mode, but it'd be
> nice if we could think of a way to satisfy both concerns. Not a blocker,
> though.
>
> 3. Standalone mode has always supported the REST API, and so far FWICT
> we've maintained feature parity between the two modes for everything except
> exactly-once source connectors, which would have required significant
> additional work since we'd have to add support for storing source connector
> offsets in a Kafka topic instead of on local storage like we currently do.
> I'd really prefer if we could try to maintain feature parity wherever
> possible--one way we could possibly do that with this KIP is to also add
> support for JSON files with standalone mode.
>
> 4. Yeah, no need to block on that idea since there are other use cases for
> creating stopped connectors. We can treat it like the option to delete
> offsets along with the connector discussed in KIP-875: punt for now,
> possibly implement later pending user feedback and indication of demand.
> Might be worth adding to a "Future work" section as an indication that we
> haven't ruled it out (in which case it'd make sense as a rejected
> alternative) but have chosen not to implement yet.
>
>
> And I had one new thought that's pretty implementation-oriented but may
> influence the design slightly:
>
> 6. Right now we write an empty set of task configs to the config topic when
> handling requests to stop a connector in distributed mode. Do we need to do
> the same when creating connectors in the stopped state, or add any other
> special logic besides noting the new state in the config topic? Or is it
> sufficient to write a non-running target state to the config topic and then
> rely on existing logic to simply refuse to generate task configs for the
> newly-created connector? Is there any chance that the lack of task configs
> (as opposed to an empty list of task configs) in the config topic for a
> connector that exists will cause issues?
>
>
> Cheers,
>
> Chris
>
> On Tue, Oct 3, 2023 at 3:29 AM Yash Mayya <ya...@gmail.com> wrote:
>
> > Hi Chris,
> >
> > Thanks for taking a look at this KIP!
> >
> > 1. I chose to go with simply "state" as that exact term is already
> exposed
> > via some of the existing REST API responses and would be one that users
> are
> > already familiar with (although admittedly something like "initial_state"
> > wouldn't be much of a jump). Since it's a field in the request body for
> the
> > connector creation endpoint, wouldn't it be implied that it is the
> > "initial" state just like the "config" field represents the "initial"
> > configuration? Also, I don't think x.y has been established as the field
> > naming convention in the Connect REST API right? From what I can tell,
> x_y
> > is the convention being followed for fields in requests ("kafka_topic" /
> > "kafka_partition" / "kafka_offset" in the offsets APIs for instance) and
> > responses ("error_count", "kafka_cluster_id", "recommended_values" etc.).
> >
> > 2. The connector configuration record is currently used for both
> connector
> > create requests as well as connector config update requests. Since we're
> > only allowing configuring the target state for newly created connectors,
> I
> > feel like it'll be a cleaner separation of concerns to use the existing
> > records for connector configurations and connector target states rather
> > than bundling the "state" and "state.v2" (or equivalent) fields into the
> > connector configuration record. The additional write should be very
> minimal
> > overhead and the two writes would be an atomic operation for Connect
> > clusters that are using a transactional producer for the config topic
> > anyway. Thoughts?
> >
> > 3. I was thinking that we'd support standalone mode via the same
> connector
> > creation REST API endpoint changes (addition of the "state" field). If
> > there is further interest in adding similar support to the command line
> > method of creating connectors as well, perhaps we could do so in a small
> > follow-on KIP? I feel like ever since the standalone mode started
> > supporting the full Connect REST API, the command line method of creating
> > connectors is more of a "legacy" concept.
> >
> > 4. Yeah, connector / offsets migration was just used as a representative
> > example of how this feature could be useful - I didn't intend for it to
> be
> > the sole purpose of this KIP. However, that said, I really like the idea
> of
> > accepting an "offsets" field in the connector creation request since it'd
> > reduce the number of user operations required from 3 (create the
> connector
> > in a STOPPED state; alter the offsets; resume the connector) to 1. I'd be
> > happy to either create or review a separate small KIP for that if it
> sounds
> > acceptable to you?
> >
> > 5. Whoops, thanks, I hadn't even noticed that that's where I had linked
> to.
> > Fixed!
> >
> > Thanks,
> > Yash
> >
> > On Mon, Oct 2, 2023 at 11:14 PM Chris Egerton <ch...@aiven.io.invalid>
> > wrote:
> >
> > > Hi Yash,
> > >
> > > Thanks for the KIP! Frankly this feels like an oversight in 875 and I'm
> > > glad we're closing that loop ASAP.
> > >
> > >
> > > Here are my thoughts:
> > >
> > > 1. (Nit): IMO "start.state", "initial.state", or "target.state" might
> be
> > > better than just "state" for the field name in the connector creation
> > > request.
> > >
> > > 2. Why implement this in distributed mode with two writes to the config
> > > topic? We could augment the existing format for connector configs in
> the
> > > config topic [1] with a new field instead.
> > >
> > > 3. Although standalone mode is mentioned in the KIP, there's no detail
> on
> > > the Java properties file format that we support for standalone mode
> > (i.e.,
> > > `connect-standalone config/connect-standalone.properties
> > > config/connect-file-source.properties
> > > config/connect-file-sink.properties`). Do we plan on adding support for
> > > that mode as well?
> > >
> > > 4. I suspect there will be advantages for this feature beyond offsets
> > > migration, but if offsets migration were the only motivation for it,
> > > wouldn't it be simpler to accept an "offsets" field in connector
> creation
> > > requests? That way, users wouldn't have to start a connector in a
> > different
> > > state and then resume it, they could just create the connector like
> > normal.
> > > I think the current proposal is acceptable, but wanted to float this
> > > alternative in case we anticipate lots of connector migrations and want
> > to
> > > optimize for them a bit more.
> > >
> > > 5. (NIt): We can link to our own Javadocs [2] instead of javadoc.io
> > >
> > >
> > > [1] -
> > >
> > >
> >
> https://github.com/apache/kafka/blob/dcd8c7d05f2f22f2d815405e7ab3ad7439669239/connect/runtime/src/main/java/org/apache/kafka/connect/storage/KafkaConfigBackingStore.java#L234-L236
> > >
> > > [2] -
> > https://kafka.apache.org/35/javadoc/index.html?overview-summary.html
> > >
> > >
> > > Cheers,
> > >
> > > Chris
> > >
> > > On Thu, Sep 21, 2023 at 2:37 AM Yash Mayya <ya...@gmail.com>
> wrote:
> > >
> > > > Hi Ashwin,
> > > >
> > > > Thanks for taking a look and sharing your thoughts!
> > > >
> > > > 1. Yes, the request / response formats of the two APIs were
> > intentionally
> > > > made identical for such use-cases. [1]
> > > >
> > > > 2. I'm assuming you're referring to retaining the offset / config
> topic
> > > > records for a connector when it is deleted by a user? Firstly, a
> > > > connector's offsets aren't actually currently deleted when the
> > connector
> > > is
> > > > deleted - it was listed as potential future work in KIP-875 [2].
> > > Secondly,
> > > > the config topic is the source of truth for the state of a Connect
> > > cluster
> > > > and a connector deletion is done by writing a null-valued record
> > > > (tombstone) with the connector key to the config topic. We could
> > > > potentially modify the delete mechanism to use a special new record
> > > > (instead of a tombstone with the connector key) in order to retain
> the
> > > > latest configuration of a connector while still deleting the actual
> > > > connector - however, this would have its downsides and I don't see
> too
> > > many
> > > > benefits. Furthermore, connector migration between different Kafka
> > > clusters
> > > > was just used as a representational use case for creating connectors
> > in a
> > > > stopped state - it isn't the primary focus of this KIP as such.
> > > >
> > > > 3. KIP-875 goes into some more details about this [3], but the TLDR
> is
> > > that
> > > > the STOPPED state will be treated like the PAUSED state on older
> > workers
> > > > that don't recognize the STOPPED state.
> > > >
> > > > Thanks,
> > > > Yash
> > > >
> > > > [1] -
> > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-875%3A+First-class+offsets+support+in+Kafka+Connect#KIP875:FirstclassoffsetssupportinKafkaConnect-Request/responseformat
> > > >
> > > > [2] -
> > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-875%3A+First-class+offsets+support+in+Kafka+Connect#KIP875:FirstclassoffsetssupportinKafkaConnect-Automaticallydeleteoffsetswithconnectors
> > > >
> > > > [3] -
> > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-875%3A+First-class+offsets+support+in+Kafka+Connect#KIP875:FirstclassoffsetssupportinKafkaConnect-STOPPEDtargetstate
> > > >
> > > > On Wed, Sep 20, 2023 at 7:24 PM Ashwin <apankaj@confluent.io.invalid
> >
> > > > wrote:
> > > >
> > > > > Thanks Yash! This is very useful for migrating connectors from one
> > > > cluster
> > > > > to another.
> > > > >
> > > > > I had the following comments/questions
> > > > >
> > > > > 1. Is the offset read using `GET /offsets` api always guaranteed to
> > be
> > > > in a
> > > > > format accepted by `PATCH /offsets` ?
> > > > > 2. I had to tackle a similar migration situation but the two
> connect
> > > > > clusters in question were using the same backing Kafka cluster. The
> > > > > challenge in this case is that when I delete the original
> connector,
> > I
> > > > want
> > > > > to retain offsets and config topics. Do you think we should support
> > > > > deletion of a connector without removal of these topics as part of
> > this
> > > > KIP
> > > > > ?
> > > > > 3. In the case of a downgrade, how will Connect worker handle the
> > > > optional
> > > > > “state” field in config topic ?
> > > > >
> > > > > Thanks,
> > > > > Ashwin
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > On Sun, Sep 17, 2023 at 11:09 PM Yash Mayya <ya...@gmail.com>
> > > > wrote:
> > > > >
> > > > > > Hi all,
> > > > > >
> > > > > > I'd like to begin discussion on a KIP to allow creating
> connectors
> > > in a
> > > > > > stopped state -
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-980%3A+Allow+creating+connectors+in+a+stopped+state
> > > > > >
> > > > > >
> > > > > > Thanks,
> > > > > > Yash
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] KIP-980: Allow creating connectors in a stopped state

Posted by Chris Egerton <ch...@aiven.io.INVALID>.
Hi Yash,

Thanks for the in-depth discussion! Continuations here:

1. Regarding delimiters (dots vs. underscores), we use dots in connector
configs for all runtime-recognized properties, including but not limited to
connector.class, tasks.max, key.converter, key.converter.*, and
transforms.*.type. Regarding the choice of name--I think it's because the
concept here is different from the "state" that we display in the status
API that a different name is warranted. Users can't directly write the
connector's actual state, they can still only specify a target state. And
there's no guarantee that a connector will ever enter a target state
(whether it's specified at creation time or later), since a failure can
always occur.

2. It still seems simpler to emit one record instead of two, especially
since we're not guaranteed that the leader will be using a transactional
producer. I guess I'm just wary of knowingly putting the config topic in an
inconsistent state (target state present without accompanying connector
config), even if it's meant to be only for a brief period. Thinking about
it some more though, a one-record approach comes with drawbacks in the
downgrade scenario: older workers wouldn't know how to handle the new
config format and would just fall back to creating the connector in the
running state. I suppose we should favor the two-record approach since the
downgrade scenario is more likely than the other failure mode, but it'd be
nice if we could think of a way to satisfy both concerns. Not a blocker,
though.

3. Standalone mode has always supported the REST API, and so far FWICT
we've maintained feature parity between the two modes for everything except
exactly-once source connectors, which would have required significant
additional work since we'd have to add support for storing source connector
offsets in a Kafka topic instead of on local storage like we currently do.
I'd really prefer if we could try to maintain feature parity wherever
possible--one way we could possibly do that with this KIP is to also add
support for JSON files with standalone mode.

4. Yeah, no need to block on that idea since there are other use cases for
creating stopped connectors. We can treat it like the option to delete
offsets along with the connector discussed in KIP-875: punt for now,
possibly implement later pending user feedback and indication of demand.
Might be worth adding to a "Future work" section as an indication that we
haven't ruled it out (in which case it'd make sense as a rejected
alternative) but have chosen not to implement yet.


And I had one new thought that's pretty implementation-oriented but may
influence the design slightly:

6. Right now we write an empty set of task configs to the config topic when
handling requests to stop a connector in distributed mode. Do we need to do
the same when creating connectors in the stopped state, or add any other
special logic besides noting the new state in the config topic? Or is it
sufficient to write a non-running target state to the config topic and then
rely on existing logic to simply refuse to generate task configs for the
newly-created connector? Is there any chance that the lack of task configs
(as opposed to an empty list of task configs) in the config topic for a
connector that exists will cause issues?


Cheers,

Chris

On Tue, Oct 3, 2023 at 3:29 AM Yash Mayya <ya...@gmail.com> wrote:

> Hi Chris,
>
> Thanks for taking a look at this KIP!
>
> 1. I chose to go with simply "state" as that exact term is already exposed
> via some of the existing REST API responses and would be one that users are
> already familiar with (although admittedly something like "initial_state"
> wouldn't be much of a jump). Since it's a field in the request body for the
> connector creation endpoint, wouldn't it be implied that it is the
> "initial" state just like the "config" field represents the "initial"
> configuration? Also, I don't think x.y has been established as the field
> naming convention in the Connect REST API right? From what I can tell, x_y
> is the convention being followed for fields in requests ("kafka_topic" /
> "kafka_partition" / "kafka_offset" in the offsets APIs for instance) and
> responses ("error_count", "kafka_cluster_id", "recommended_values" etc.).
>
> 2. The connector configuration record is currently used for both connector
> create requests as well as connector config update requests. Since we're
> only allowing configuring the target state for newly created connectors, I
> feel like it'll be a cleaner separation of concerns to use the existing
> records for connector configurations and connector target states rather
> than bundling the "state" and "state.v2" (or equivalent) fields into the
> connector configuration record. The additional write should be very minimal
> overhead and the two writes would be an atomic operation for Connect
> clusters that are using a transactional producer for the config topic
> anyway. Thoughts?
>
> 3. I was thinking that we'd support standalone mode via the same connector
> creation REST API endpoint changes (addition of the "state" field). If
> there is further interest in adding similar support to the command line
> method of creating connectors as well, perhaps we could do so in a small
> follow-on KIP? I feel like ever since the standalone mode started
> supporting the full Connect REST API, the command line method of creating
> connectors is more of a "legacy" concept.
>
> 4. Yeah, connector / offsets migration was just used as a representative
> example of how this feature could be useful - I didn't intend for it to be
> the sole purpose of this KIP. However, that said, I really like the idea of
> accepting an "offsets" field in the connector creation request since it'd
> reduce the number of user operations required from 3 (create the connector
> in a STOPPED state; alter the offsets; resume the connector) to 1. I'd be
> happy to either create or review a separate small KIP for that if it sounds
> acceptable to you?
>
> 5. Whoops, thanks, I hadn't even noticed that that's where I had linked to.
> Fixed!
>
> Thanks,
> Yash
>
> On Mon, Oct 2, 2023 at 11:14 PM Chris Egerton <ch...@aiven.io.invalid>
> wrote:
>
> > Hi Yash,
> >
> > Thanks for the KIP! Frankly this feels like an oversight in 875 and I'm
> > glad we're closing that loop ASAP.
> >
> >
> > Here are my thoughts:
> >
> > 1. (Nit): IMO "start.state", "initial.state", or "target.state" might be
> > better than just "state" for the field name in the connector creation
> > request.
> >
> > 2. Why implement this in distributed mode with two writes to the config
> > topic? We could augment the existing format for connector configs in the
> > config topic [1] with a new field instead.
> >
> > 3. Although standalone mode is mentioned in the KIP, there's no detail on
> > the Java properties file format that we support for standalone mode
> (i.e.,
> > `connect-standalone config/connect-standalone.properties
> > config/connect-file-source.properties
> > config/connect-file-sink.properties`). Do we plan on adding support for
> > that mode as well?
> >
> > 4. I suspect there will be advantages for this feature beyond offsets
> > migration, but if offsets migration were the only motivation for it,
> > wouldn't it be simpler to accept an "offsets" field in connector creation
> > requests? That way, users wouldn't have to start a connector in a
> different
> > state and then resume it, they could just create the connector like
> normal.
> > I think the current proposal is acceptable, but wanted to float this
> > alternative in case we anticipate lots of connector migrations and want
> to
> > optimize for them a bit more.
> >
> > 5. (NIt): We can link to our own Javadocs [2] instead of javadoc.io
> >
> >
> > [1] -
> >
> >
> https://github.com/apache/kafka/blob/dcd8c7d05f2f22f2d815405e7ab3ad7439669239/connect/runtime/src/main/java/org/apache/kafka/connect/storage/KafkaConfigBackingStore.java#L234-L236
> >
> > [2] -
> https://kafka.apache.org/35/javadoc/index.html?overview-summary.html
> >
> >
> > Cheers,
> >
> > Chris
> >
> > On Thu, Sep 21, 2023 at 2:37 AM Yash Mayya <ya...@gmail.com> wrote:
> >
> > > Hi Ashwin,
> > >
> > > Thanks for taking a look and sharing your thoughts!
> > >
> > > 1. Yes, the request / response formats of the two APIs were
> intentionally
> > > made identical for such use-cases. [1]
> > >
> > > 2. I'm assuming you're referring to retaining the offset / config topic
> > > records for a connector when it is deleted by a user? Firstly, a
> > > connector's offsets aren't actually currently deleted when the
> connector
> > is
> > > deleted - it was listed as potential future work in KIP-875 [2].
> > Secondly,
> > > the config topic is the source of truth for the state of a Connect
> > cluster
> > > and a connector deletion is done by writing a null-valued record
> > > (tombstone) with the connector key to the config topic. We could
> > > potentially modify the delete mechanism to use a special new record
> > > (instead of a tombstone with the connector key) in order to retain the
> > > latest configuration of a connector while still deleting the actual
> > > connector - however, this would have its downsides and I don't see too
> > many
> > > benefits. Furthermore, connector migration between different Kafka
> > clusters
> > > was just used as a representational use case for creating connectors
> in a
> > > stopped state - it isn't the primary focus of this KIP as such.
> > >
> > > 3. KIP-875 goes into some more details about this [3], but the TLDR is
> > that
> > > the STOPPED state will be treated like the PAUSED state on older
> workers
> > > that don't recognize the STOPPED state.
> > >
> > > Thanks,
> > > Yash
> > >
> > > [1] -
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-875%3A+First-class+offsets+support+in+Kafka+Connect#KIP875:FirstclassoffsetssupportinKafkaConnect-Request/responseformat
> > >
> > > [2] -
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-875%3A+First-class+offsets+support+in+Kafka+Connect#KIP875:FirstclassoffsetssupportinKafkaConnect-Automaticallydeleteoffsetswithconnectors
> > >
> > > [3] -
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-875%3A+First-class+offsets+support+in+Kafka+Connect#KIP875:FirstclassoffsetssupportinKafkaConnect-STOPPEDtargetstate
> > >
> > > On Wed, Sep 20, 2023 at 7:24 PM Ashwin <ap...@confluent.io.invalid>
> > > wrote:
> > >
> > > > Thanks Yash! This is very useful for migrating connectors from one
> > > cluster
> > > > to another.
> > > >
> > > > I had the following comments/questions
> > > >
> > > > 1. Is the offset read using `GET /offsets` api always guaranteed to
> be
> > > in a
> > > > format accepted by `PATCH /offsets` ?
> > > > 2. I had to tackle a similar migration situation but the two connect
> > > > clusters in question were using the same backing Kafka cluster. The
> > > > challenge in this case is that when I delete the original connector,
> I
> > > want
> > > > to retain offsets and config topics. Do you think we should support
> > > > deletion of a connector without removal of these topics as part of
> this
> > > KIP
> > > > ?
> > > > 3. In the case of a downgrade, how will Connect worker handle the
> > > optional
> > > > “state” field in config topic ?
> > > >
> > > > Thanks,
> > > > Ashwin
> > > >
> > > >
> > > >
> > > >
> > > > On Sun, Sep 17, 2023 at 11:09 PM Yash Mayya <ya...@gmail.com>
> > > wrote:
> > > >
> > > > > Hi all,
> > > > >
> > > > > I'd like to begin discussion on a KIP to allow creating connectors
> > in a
> > > > > stopped state -
> > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-980%3A+Allow+creating+connectors+in+a+stopped+state
> > > > >
> > > > >
> > > > > Thanks,
> > > > > Yash
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] KIP-980: Allow creating connectors in a stopped state

Posted by Yash Mayya <ya...@gmail.com>.
Hi Chris,

Thanks for taking a look at this KIP!

1. I chose to go with simply "state" as that exact term is already exposed
via some of the existing REST API responses and would be one that users are
already familiar with (although admittedly something like "initial_state"
wouldn't be much of a jump). Since it's a field in the request body for the
connector creation endpoint, wouldn't it be implied that it is the
"initial" state just like the "config" field represents the "initial"
configuration? Also, I don't think x.y has been established as the field
naming convention in the Connect REST API right? From what I can tell, x_y
is the convention being followed for fields in requests ("kafka_topic" /
"kafka_partition" / "kafka_offset" in the offsets APIs for instance) and
responses ("error_count", "kafka_cluster_id", "recommended_values" etc.).

2. The connector configuration record is currently used for both connector
create requests as well as connector config update requests. Since we're
only allowing configuring the target state for newly created connectors, I
feel like it'll be a cleaner separation of concerns to use the existing
records for connector configurations and connector target states rather
than bundling the "state" and "state.v2" (or equivalent) fields into the
connector configuration record. The additional write should be very minimal
overhead and the two writes would be an atomic operation for Connect
clusters that are using a transactional producer for the config topic
anyway. Thoughts?

3. I was thinking that we'd support standalone mode via the same connector
creation REST API endpoint changes (addition of the "state" field). If
there is further interest in adding similar support to the command line
method of creating connectors as well, perhaps we could do so in a small
follow-on KIP? I feel like ever since the standalone mode started
supporting the full Connect REST API, the command line method of creating
connectors is more of a "legacy" concept.

4. Yeah, connector / offsets migration was just used as a representative
example of how this feature could be useful - I didn't intend for it to be
the sole purpose of this KIP. However, that said, I really like the idea of
accepting an "offsets" field in the connector creation request since it'd
reduce the number of user operations required from 3 (create the connector
in a STOPPED state; alter the offsets; resume the connector) to 1. I'd be
happy to either create or review a separate small KIP for that if it sounds
acceptable to you?

5. Whoops, thanks, I hadn't even noticed that that's where I had linked to.
Fixed!

Thanks,
Yash

On Mon, Oct 2, 2023 at 11:14 PM Chris Egerton <ch...@aiven.io.invalid>
wrote:

> Hi Yash,
>
> Thanks for the KIP! Frankly this feels like an oversight in 875 and I'm
> glad we're closing that loop ASAP.
>
>
> Here are my thoughts:
>
> 1. (Nit): IMO "start.state", "initial.state", or "target.state" might be
> better than just "state" for the field name in the connector creation
> request.
>
> 2. Why implement this in distributed mode with two writes to the config
> topic? We could augment the existing format for connector configs in the
> config topic [1] with a new field instead.
>
> 3. Although standalone mode is mentioned in the KIP, there's no detail on
> the Java properties file format that we support for standalone mode (i.e.,
> `connect-standalone config/connect-standalone.properties
> config/connect-file-source.properties
> config/connect-file-sink.properties`). Do we plan on adding support for
> that mode as well?
>
> 4. I suspect there will be advantages for this feature beyond offsets
> migration, but if offsets migration were the only motivation for it,
> wouldn't it be simpler to accept an "offsets" field in connector creation
> requests? That way, users wouldn't have to start a connector in a different
> state and then resume it, they could just create the connector like normal.
> I think the current proposal is acceptable, but wanted to float this
> alternative in case we anticipate lots of connector migrations and want to
> optimize for them a bit more.
>
> 5. (NIt): We can link to our own Javadocs [2] instead of javadoc.io
>
>
> [1] -
>
> https://github.com/apache/kafka/blob/dcd8c7d05f2f22f2d815405e7ab3ad7439669239/connect/runtime/src/main/java/org/apache/kafka/connect/storage/KafkaConfigBackingStore.java#L234-L236
>
> [2] - https://kafka.apache.org/35/javadoc/index.html?overview-summary.html
>
>
> Cheers,
>
> Chris
>
> On Thu, Sep 21, 2023 at 2:37 AM Yash Mayya <ya...@gmail.com> wrote:
>
> > Hi Ashwin,
> >
> > Thanks for taking a look and sharing your thoughts!
> >
> > 1. Yes, the request / response formats of the two APIs were intentionally
> > made identical for such use-cases. [1]
> >
> > 2. I'm assuming you're referring to retaining the offset / config topic
> > records for a connector when it is deleted by a user? Firstly, a
> > connector's offsets aren't actually currently deleted when the connector
> is
> > deleted - it was listed as potential future work in KIP-875 [2].
> Secondly,
> > the config topic is the source of truth for the state of a Connect
> cluster
> > and a connector deletion is done by writing a null-valued record
> > (tombstone) with the connector key to the config topic. We could
> > potentially modify the delete mechanism to use a special new record
> > (instead of a tombstone with the connector key) in order to retain the
> > latest configuration of a connector while still deleting the actual
> > connector - however, this would have its downsides and I don't see too
> many
> > benefits. Furthermore, connector migration between different Kafka
> clusters
> > was just used as a representational use case for creating connectors in a
> > stopped state - it isn't the primary focus of this KIP as such.
> >
> > 3. KIP-875 goes into some more details about this [3], but the TLDR is
> that
> > the STOPPED state will be treated like the PAUSED state on older workers
> > that don't recognize the STOPPED state.
> >
> > Thanks,
> > Yash
> >
> > [1] -
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-875%3A+First-class+offsets+support+in+Kafka+Connect#KIP875:FirstclassoffsetssupportinKafkaConnect-Request/responseformat
> >
> > [2] -
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-875%3A+First-class+offsets+support+in+Kafka+Connect#KIP875:FirstclassoffsetssupportinKafkaConnect-Automaticallydeleteoffsetswithconnectors
> >
> > [3] -
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-875%3A+First-class+offsets+support+in+Kafka+Connect#KIP875:FirstclassoffsetssupportinKafkaConnect-STOPPEDtargetstate
> >
> > On Wed, Sep 20, 2023 at 7:24 PM Ashwin <ap...@confluent.io.invalid>
> > wrote:
> >
> > > Thanks Yash! This is very useful for migrating connectors from one
> > cluster
> > > to another.
> > >
> > > I had the following comments/questions
> > >
> > > 1. Is the offset read using `GET /offsets` api always guaranteed to be
> > in a
> > > format accepted by `PATCH /offsets` ?
> > > 2. I had to tackle a similar migration situation but the two connect
> > > clusters in question were using the same backing Kafka cluster. The
> > > challenge in this case is that when I delete the original connector, I
> > want
> > > to retain offsets and config topics. Do you think we should support
> > > deletion of a connector without removal of these topics as part of this
> > KIP
> > > ?
> > > 3. In the case of a downgrade, how will Connect worker handle the
> > optional
> > > “state” field in config topic ?
> > >
> > > Thanks,
> > > Ashwin
> > >
> > >
> > >
> > >
> > > On Sun, Sep 17, 2023 at 11:09 PM Yash Mayya <ya...@gmail.com>
> > wrote:
> > >
> > > > Hi all,
> > > >
> > > > I'd like to begin discussion on a KIP to allow creating connectors
> in a
> > > > stopped state -
> > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-980%3A+Allow+creating+connectors+in+a+stopped+state
> > > >
> > > >
> > > > Thanks,
> > > > Yash
> > > >
> > >
> >
>