You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by Greg Harris <gr...@aiven.io.INVALID> on 2023/01/17 19:45:35 UTC

[DISCUSS] KIP-898: Modernize Connect plugin discovery

Hi all!

I'd like to start a discussion about
https://cwiki.apache.org/confluence/display/KAFKA/KIP-898%3A+Modernize+Connect+plugin+discovery
.

Thanks!
Greg Harris

Re: [DISCUSS] KIP-898: Modernize Connect plugin discovery

Posted by Greg Harris <gr...@aiven.io.INVALID>.
Hey Tom,

Thank you for your comments!

1. I have updated the language in the KIP and added relevant links.

2. The `plugin.path.discovery` name was intended to communicate that it
controlled the interpretation of the `plugin.path` configuration, but I can
see how that can be confusing.
I suppose the `discovery` mechanism does not really have anything to do
with the `path` itself. If in the future we have an alternative way to give
Connect plugins that doesn't use the `plugin.path`, that mechanism may
still want to use the `discovery` parameter to control it's behavior, and
`plugin.path.discovery` would be a poor name in that situation.
I'll update the KIP to `plugin.discovery`, as `policy` is already used to
describe a pluggable function for override policies; I do not expect that
this mechanism will be made pluggable, as it's part of the plugin
infrastructure itself.

3. I agree that it would be undesirable for someone to drop down to
ONLY_SCAN and then forget about it.
We'll obviously do our due-diligence for error-handling for this feature,
but there's always a risk that the HYBRID modes cause worse behavior than
ONLY_SCAN. I think I'd rather risk operators forgetting to go back to
HYBRID_WARN than require them to wait for an upstream patch to this feature.
But to make it marginally less likely that an operator will forget, I'll
add a static warn to ONLY_SCAN mode that advises the operator to change to
HYBRID_WARN. I'll also remove the "silence warnings" comment from the KIP.

4. I have not worked with sealed jars before, so I'm not sure of the
behavior exactly. But based on the documentation I could find, I don't
expect it to be a problem.
Jar sealing appears to pertain to classes and not resources, and the
ServiceLoader manifest files are resources. Because we are not generating
or mutating any classes, there should not be any impact on the sealing
mechanisms.

However, your comment certainly pertains to cryptographically signed JARs.
If we would modify the JARs themselves, we would certainly invalidate any
cryptographic signatures, and may cause a verification failure during
worker startup.
That is why the manifest generation script only modifies JAR files when no
alternative is present (single-jar-plugins), and prefers to generate shim
jars as you described.
As an alternative, we could consider _not_ migrating single-jar-plugins,
but I think that's more confusing than a script which manipulates code
artifacts causing a signature failure :)
For operators which do use signature verification and the migration script,
they can workaround this by enclosing their single-jar plugins in a
containing directory, and then the script will generate a new jar instead
of mutating the source jar.

Thanks!
Greg Harris

On Thu, Feb 16, 2023 at 3:53 AM Tom Bentley <tb...@redhat.com> wrote:

> Hi Greg,
>
> Thanks for the KIP. Overall I think using service loading would be a
> worthwhile improvement.
>
> 1. "Both of these are well-established Java interfaces for which public
> documentation should be readily available." I see no harm in adding a well
> chosen link or two just to save  readers unfamiliar with service loading
> from having to search for documentation.
>
> 2. "plugin.path.discovery" is a bit of an odd name, since it's not
> discovering plugin _paths_, rather it's discovering plugins. So perhaps
> plugin.discovery, or plugin.discovery.policy would be a better name.
>
> 3. I wondered if we really needed "ONLY_SCAN". It seems like the sort of
> thing where someone silences the warnings then forgets about it until one
> day they (or worse, someone else) gets the pleasure of upgrading and
> finding they can't find their plugins. I guess the proposed roll out, over
> multiple major releases, makes this less likely. Non-the-less, logged
> warnings seem like a small price to pay overall, and would make it harder
> for users to ignore the problem. We should be encouraging users to use the
> migration script to silence warnings and adopt SERVICE_LOAD. That way they
> get the benefits sooner than Kafka 5.0.
>
> 4. As described, connect-plugin-path.sh can modify jar files in-place. Have
> you tested/considered whether this will work for sealed jars? I know
> they're not common, but they are part of the Java platform so we can't rule
> out that they're being used. I _think_ service loading should still work
> even if you wrote a new shim jar file which just contained the service
> files needed for loading (alongside the original unmodified jar containing
> the plugins); if that's right then there's no need to actually modify
> existing jars.
>
> Thanks again,
>
> Tom
>
> On Wed, 8 Feb 2023 at 21:45, Chris Egerton <ch...@aiven.io.invalid>
> wrote:
>
> > Hi Greg,
> >
> > Thanks for the updates. This is looking great. A few minor questions:
> >
> > 1. The description for the list command in the plugin path CLI states
> that
> > it'll show "Whether the class is discoverable via scanning". Are there
> any
> > cases where a plugin found by the CLI won't be discoverable via scanning?
> >
> > 2. Why are we treating module info files differently than service loader
> > manifests (where the latter will be removed for not-found plugins on an
> > opt-out basis, but the former are left unmodified no matter what)? Basing
> > this off of this sentence in the KIP: "If a plugin declares an
> > implementation via a module-info.java which is not loadable, the
> > module-info.java will not be modified."; let me know if I'm
> > misinterpreting.
> >
> > Cheers,
> >
> > Chris
> >
> > On Tue, Feb 7, 2023 at 4:32 PM Greg Harris <greg.harris@aiven.io.invalid
> >
> > wrote:
> >
> > > Chris,
> > >
> > > Thanks for the comments!
> > >
> > > 1. I've updated the script to sync-manifests [--keep-not-found] with
> your
> > > described semantics.
> > >
> > > 2. I've pushed all of the deprecation decisions to a follow-up KIP that
> > can
> > > take place after an intervening release.
> > > I hope that this feature has a high enough ROI that the direction of
> the
> > > migration is obvious to users even without a formal deprecation notice.
> > > I do think that without a formal deprecation notice that the migration
> > > scripts may be in use for a long time, particularly for connector
> > > implementations which are not receiving active maintenance.
> > >
> > > 3. Thanks for looking into the ServiceLoader error handling contract.
> > > From my inspection of the ServiceLoader implementation, it seems to be
> > able
> > > to recover from most of the poor packaging scenarios that I'm aware of.
> > > I think with or without the deprecation, operators are free to select
> the
> > > ONLY_SCAN mode to work around any errors we don't discover before
> > release.
> > >
> > > Thanks,
> > > Greg
> > >
> > > On Mon, Feb 6, 2023 at 8:44 AM Chris Egerton <ch...@aiven.io.invalid>
> > > wrote:
> > >
> > > > Hi Greg,
> > > >
> > > > Thanks for the updates. Unless stated below, I agree with your
> > > responses. A
> > > > few more thoughts:
> > > >
> > > > 1. IMO it's not worth it to have separate commands for
> adding/removing
> > > > manifests, mostly because it adds complexity to the tool and might
> make
> > > it
> > > > harder for users to understand. I think a single "update-manifests"
> or
> > > > "sync-manifests" command that both adds manifests for found plugins
> and
> > > > removes manifests for unfound plugins by default, with a
> --keep-unfound
> > > > flag to disable removal on an opt-in basis, would make more sense.
> > > >
> > > > 2. I agree with this point you raise about deprecation: "if we agree
> > that
> > > > the old mechanism has some sunset date that we just don't know yet,
> we
> > > > should still communicate to users that the sunset date is somewhere
> in
> > > the
> > > > future." However, I'm not certain that releasing the features
> proposed
> > in
> > > > this KIP alone gives us enough confidence in them to sunset the
> legacy
> > > > plugin loading logic, which is why I was hoping we could have at
> least
> > > one
> > > > successful release (preferably even two or three) with this feature
> in
> > > > order to identify potential gaps in the design before proceeding.
> This
> > is
> > > > similar to the strategy we took with incremental rebalancing in
> > Connect,
> > > > which introduced the "connect.protocol" property with the proposal
> that
> > > it
> > > > could be marked deprecated in the next major release. That strategy
> > > served
> > > > us well, since there were a few kinks in the logic for that feature
> > that
> > > > needed to be worked out, and many users continued to rely on the
> legacy
> > > > rebalancing logic as a workaround for those issues.
> > > >
> > > > 3. I was wondering about error handling with the service loading
> > > mechanism
> > > > because the Javadocs for the Iterator returned by
> > ServiceLoader::iterator
> > > > [1] state that "If an error is thrown then subsequent invocations of
> > the
> > > > iterator will make a best effort to locate and instantiate the next
> > > > available provider, but in general such recovery cannot be
> guaranteed."
> > > > Upon further reflection, I agree that it's fine for now to leave
> > details
> > > on
> > > > error handling out of the public contract for this new loading mode,
> > but
> > > > only if we also agree to hold off on deprecating the old loading
> mode,
> > > > since there may be cases where worker startup is crippled by an
> > > > improperly-packaged plugin when using the service loader mechanism
> for
> > > its
> > > > plugins.
> > > >
> > > > [1] -
> > > >
> > > >
> > >
> >
> https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/util/ServiceLoader.html#iterator()
> > > >
> > > > Cheers,
> > > >
> > > > Chris
> > > >
> > > > On Tue, Jan 24, 2023 at 10:26 AM Federico Valeri <
> fedevaleri@gmail.com
> > >
> > > > wrote:
> > > >
> > > > > Hi Greg,
> > > > >
> > > > > On Tue, Jan 24, 2023 at 2:48 AM Greg Harris
> > > > > <gr...@aiven.io.invalid> wrote:
> > > > > >
> > > > > > Federico,
> > > > > >
> > > > > > Thanks for taking a look!
> > > > > >
> > > > > > > I was just wondering if we should use a better script name like
> > > > > > > "connect-convert-to-service-provider.sh" or something like this
> > > > > >
> > > > > > I agree that the current name is not ideal, while the current
> name
> > > > > > describes part of what it is doing, it does not describe all of
> > what
> > > it
> > > > > is
> > > > > > doing, or why that is important to the caller.
> > > > > > I looked around to see what style the other Kafka commands use,
> and
> > > it
> > > > > > seems like there's not a standard pattern. Some of the scripts
> are
> > > > > > noun-phrases, and some are verb-phrases.
> > > > > > Some take commands like `create`, `format`, etc, and some take
> > > options
> > > > > like
> > > > > > `--generate`, `--execute`, `--verify`.
> > > > > > I've updated the command to `bin/connect-plugin-path.sh
> > > > > > (list|add-manifests|remove-manifests) --worker-config
> > > <worker-config>`,
> > > > > let
> > > > > > me know if that's better or worse than
> > > > `bin/connect-scan-plugin-path.sh`.
> > > > > >
> > > > > > > maybe add a --dry-run option.
> > > > > >
> > > > > > Thanks for the suggestion, I've updated the KIP to include a
> > > --dry-run
> > > > > > option with some typical semantics.
> > > > >
> > > > > This is much better. Thanks!
> > > > >
> > > > > I think it's better to not deprecate the add-manifests and
> > > > > remove-manifests script sub-commands. When we will remove the
> > > > > deprecated plugin discovery modes, one may still have the need to
> > > > > convert an old connector release, maybe because it's the only
> version
> > > > > compatible with a third-party external system.
> > > > >
> > > > > >
> > > > > > Chris,
> > > > > >
> > > > > > I'm glad you liked the personas! Since this KIP requires others'
> > help
> > > > to
> > > > > > improve the ecosystem, I needed a way to make sure everyone knows
> > > what
> > > > > role
> > > > > > they have to play. I hope I've accomplished that.
> > > > > >
> > > > > > > 1. Will workers' resiliency to packaging issues be affected
> when
> > > the
> > > > > new
> > > > > > > scanning behavior is enabled? For example, will a single
> > > > > poorly-packaged
> > > > > > > plugin prevent other correctly-packaged plugins from being
> > loaded,
> > > or
> > > > > > > worse, cause the worker to fail? And either way, are there any
> > > > details
> > > > > > that
> > > > > > > we can call out to back up any guarantees we want to provide on
> > > that
> > > > > > front?
> > > > > >
> > > > > > The current behavior when encountering packaging issues seems to
> be
> > > to
> > > > > log
> > > > > > errors and continue, and that will certainly continue for the
> > > ONLY_SCAN
> > > > > and
> > > > > > HYBID_WARN modes.
> > > > > > As written, the HYBRID_FAIL mode breaks from this trend and will
> > > throw
> > > > > > exceptions that crash the worker only when the connectors are
> > missing
> > > > > from
> > > > > > serviceloader discovery.
> > > > > > The behavior for SERVICE_LOAD is not currently defined in the
> KIP.
> > I
> > > > > think
> > > > > > we can either keep the log-and-continue behavior whenever
> possible,
> > > or
> > > > > > begin to surface errors in a fail-fast model.
> > > > > > And if we can't choose between those two, perhaps we can add
> > > > > > SERVICE_LOAD_FAIL_FAST, or a
> > > plugin.path.discovery.fail.fast=true/false
> > > > > to
> > > > > > allow users to configure the worker to fail to start up when
> > > > plugin.path
> > > > > > issues are present.
> > > > > >
> > > > > > With regards to incorrectly packaged connectors affecting other
> > > > correctly
> > > > > > packaged ones: I think one of the current behaviors of the
> > > Reflections
> > > > > > implementation is that certain exceptions can prevent discovery
> of
> > > > other
> > > > > > unrelated plugins on the path.
> > > > > > I think that this may be able to be improved in all modes, and
> > > doesn't
> > > > > need
> > > > > > the ServiceLoader mechanism to resolve. I'll explore this some
> more
> > > and
> > > > > > pursue a fix outside of this KIP.
> > > > > > The new mechanism should be as good or better than the
> Reflections
> > > > > > mechanism in this regard. I am not sure if it is necessary to
> > commit
> > > to
> > > > > the
> > > > > > error handling behavior as part of the public API.
> > > > > >
> > > > > > > 2. I think the description of the "--plugin-location" flag may
> > need
> > > > > some
> > > > > > > updating? In the middle of the list of bullet points there's
> "The
> > > > > value of
> > > > > > > this argument will be a single plugin, which can be any of the
> > > > > following:"
> > > > > > > but the following points don't appear to be related.
> > > > > >
> > > > > > Thanks, I think that was a copy-paste error. I have updated the
> > KIP.
> > > > > >
> > > > > > > 3. (Nit) Could we rename the migration script something like
> > > > > > > "connect-migrate-plugin-path" or even just
> "connect-plugin-path"
> > > > with a
> > > > > > > "migrate" subcommand?
> > > > > >
> > > > > > Yes, you and Federico had the same idea :)
> > > > > > Please see my feedback above regarding naming.
> > > > > >
> > > > > > > 4. It's noted that "If a plugin class is removed from the path,
> > the
> > > > > > > corresponding shim manifest should also be removed". Did you
> > > consider
> > > > > > > adding behavior (possibly guarded behind a CLI flag) to find
> and
> > > > remove
> > > > > > > manifests for nonexistent plugins?
> > > > > >
> > > > > > When writing that originally, I thought that the script would
> only
> > > > remove
> > > > > > manifests that it had generated, as filtered by some in-file
> > comment
> > > or
> > > > > by
> > > > > > jar filename, which would be an implementation detail.
> > > > > > If we were to add a CLI flag to also enable/disable the removal
> > > > behavior,
> > > > > > we could eliminate that implementation detail and make the
> removal
> > an
> > > > > > all-or-nothing part of the script.
> > > > > >
> > > > > > I updated the script to accept a `remove-manifests` subcommand
> > which
> > > I
> > > > > > think captures your suggestion. Therefore, a full sync would
> > include
> > > > two
> > > > > > invocations: an `add-manifests` and `remove-manifests`.
> > > > > >
> > > > > > > 5. What's the concrete plan for updating the plugins that ship
> > OOTB
> > > > > with
> > > > > > > Connect? Will they be given a service loader manifest, a module
> > > info
> > > > > file,
> > > > > > > or both?
> > > > > >
> > > > > > At a minimum, we need to add service loader manifests in this KIP
> > to
> > > be
> > > > > > compliant with the new discovery model.
> > > > > > This is because Kafka 3.x supports Java 8, and that version of
> the
> > > > > > ServiceLoader can only read manifest files, and not module-info
> > > files.
> > > > > > Beyond that, I think it becomes a question of scope, and whether
> we
> > > > > should
> > > > > > add module-info files for all of the Connect components.
> > > > > > I was not planning on doing so within this KIP, and would prefer
> to
> > > > leave
> > > > > > that for a follow-up KIP.
> > > > > >
> > > > > > I have clarified this in the KIP.
> > > > > >
> > > > > > > 6. In the HYBRID_WARN mode, what precedence will we give to
> > plugins
> > > > > that
> > > > > > > are found by both the legacy scanning logic and the new service
> > > > > > > loader/module info loading logic?
> > > > > >
> > > > > > In HYBRID_WARN, the control-flow should be the same as SCAN_ONLY,
> > so
> > > > the
> > > > > > same classes are discovered in the same order.
> > > > > > All of the classes that are found via the ServiceLoader mechanism
> > > > should
> > > > > > only be used by the runtime to generate warnings.
> > > > > > I believe that by the nature of the Reflections library, all
> > > instances
> > > > > > which are discoverable by the ServiceLoader should be
> discoverable
> > by
> > > > > > Reflections, and so we don't need to merge the two discovery
> > methods
> > > or
> > > > > > handle precedence.
> > > > > >
> > > > > > > 7. It's a bit strange to immediately deprecate config property
> > > values
> > > > > and
> > > > > > > the migration script in their first release. IMO as long as we
> > have
> > > > the
> > > > > > > migration script in the project, it doesn't make sense to
> > deprecate
> > > > it,
> > > > > > and
> > > > > > > until we've had at least one successful release with the new
> > > loading
> > > > > logic
> > > > > > > and some confidence in it, we should not deprecate the older
> > > logic. I
> > > > > > think
> > > > > > > a more reasonable pace for the update here could be to
> deprecate
> > > the
> > > > > > legacy
> > > > > > > scanning logic in 4.0 (changing the default accordingly) and
> > remove
> > > > it
> > > > > > > entirely in 5.0.
> > > > > >
> > > > > > I understand that it seems a bit silly to introduce something as
> > > > > > deprecated, but I think that's because 'deprecated' is not the
> same
> > > as
> > > > > > 'old', and not mutually exclusive with 'new'.
> > > > > > The way I understand it, marking something as deprecated and the
> > > > process
> > > > > of
> > > > > > deprecation is about communicating the direction in which an API
> is
> > > > > > evolving, and which methods can be relied upon for the
> foreseeable
> > > > > future.
> > > > > > And if we agree that the old mechanism has some sunset date that
> we
> > > > just
> > > > > > don't know yet, we should still communicate to users that the
> > sunset
> > > > date
> > > > > > is somewhere in the future.
> > > > > >
> > > > > > I've updated the KIP to describe separate follow-on votes for
> > > changing
> > > > > the
> > > > > > default, and removing the scanning behavior, as that was my
> > original
> > > > > intent
> > > > > > but I don't think that was made clear.
> > > > > > I also pushed back the deprecation of the migration sub-commands,
> > > since
> > > > > > we'd like to usher people towards the migration script instead of
> > > > relying
> > > > > > on the HYBRID modes.
> > > > > > I kept the decision for deprecating the non-SERVICE_LOAD modes in
> > the
> > > > > > current vote for now, but if introducing deprecated
> configurations
> > is
> > > > too
> > > > > > unpalatable we can move the deprecation to a later vote.
> > > > > >
> > > > > > Thanks,
> > > > > > Greg
> > > > > >
> > > > > > On Mon, Jan 23, 2023 at 12:49 PM Chris Egerton
> > > <chrise@aiven.io.invalid
> > > > >
> > > > > > wrote:
> > > > > >
> > > > > > > Hi Greg,
> > > > > > >
> > > > > > > Thanks for the KIP! This part of Connect has been due for an
> > update
> > > > > for a
> > > > > > > while, it's nice to see it getting some attention. I especially
> > > like
> > > > > how
> > > > > > > the migration plan is broken down into affected personas; makes
> > > > review
> > > > > > > easier and, hopefully, makes the feature easier to understand
> for
> > > > > affected
> > > > > > > users.
> > > > > > >
> > > > > > > Here are my initial thoughts:
> > > > > > >
> > > > > > > 1. Will workers' resiliency to packaging issues be affected
> when
> > > the
> > > > > new
> > > > > > > scanning behavior is enabled? For example, will a single
> > > > > poorly-packaged
> > > > > > > plugin prevent other correctly-packaged plugins from being
> > loaded,
> > > or
> > > > > > > worse, cause the worker to fail? And either way, are there any
> > > > details
> > > > > that
> > > > > > > we can call out to back up any guarantees we want to provide on
> > > that
> > > > > front?
> > > > > > >
> > > > > > > 2. I think the description of the "--plugin-location" flag may
> > need
> > > > > some
> > > > > > > updating? In the middle of the list of bullet points there's
> "The
> > > > > value of
> > > > > > > this argument will be a single plugin, which can be any of the
> > > > > following:"
> > > > > > > but the following points don't appear to be related.
> > > > > > >
> > > > > > > 3. (Nit) Could we rename the migration script something like
> > > > > > > "connect-migrate-plugin-path" or even just
> "connect-plugin-path"
> > > > with a
> > > > > > > "migrate" subcommand?
> > > > > > >
> > > > > > > 4. It's noted that "If a plugin class is removed from the path,
> > the
> > > > > > > corresponding shim manifest should also be removed". Did you
> > > consider
> > > > > > > adding behavior (possibly guarded behind a CLI flag) to find
> and
> > > > remove
> > > > > > > manifests for nonexistent plugins?
> > > > > > >
> > > > > > > 5. What's the concrete plan for updating the plugins that ship
> > OOTB
> > > > > with
> > > > > > > Connect? Will they be given a service loader manifest, a module
> > > info
> > > > > file,
> > > > > > > or both?
> > > > > > >
> > > > > > > 6. In the HYBRID_WARN mode, what precedence will we give to
> > plugins
> > > > > that
> > > > > > > are found by both the legacy scanning logic and the new service
> > > > > > > loader/module info loading logic?
> > > > > > >
> > > > > > > 7. It's a bit strange to immediately deprecate config property
> > > values
> > > > > and
> > > > > > > the migration script in their first release. IMO as long as we
> > have
> > > > the
> > > > > > > migration script in the project, it doesn't make sense to
> > deprecate
> > > > > it, and
> > > > > > > until we've had at least one successful release with the new
> > > loading
> > > > > logic
> > > > > > > and some confidence in it, we should not deprecate the older
> > > logic. I
> > > > > think
> > > > > > > a more reasonable pace for the update here could be to
> deprecate
> > > the
> > > > > legacy
> > > > > > > scanning logic in 4.0 (changing the default accordingly) and
> > remove
> > > > it
> > > > > > > entirely in 5.0.
> > > > > > >
> > > > > > > Cheers,
> > > > > > >
> > > > > > > Chris
> > > > > > >
> > > > > > > On Mon, Jan 23, 2023 at 3:21 AM Federico Valeri <
> > > > fedevaleri@gmail.com>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Hi Greg, this looks like a useful change to me.
> > > > > > > >
> > > > > > > > I was just wondering if we should use a better script name
> like
> > > > > > > > "connect-convert-to-service-provider.sh" or something like
> > this,
> > > > and
> > > > > > > > maybe add a --dry-run option.
> > > > > > > >
> > > > > > > > On Tue, Jan 17, 2023 at 8:45 PM Greg Harris
> > > > > > > > <gr...@aiven.io.invalid> wrote:
> > > > > > > > >
> > > > > > > > > Hi all!
> > > > > > > > >
> > > > > > > > > I'd like to start a discussion about
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-898%3A+Modernize+Connect+plugin+discovery
> > > > > > > > > .
> > > > > > > > >
> > > > > > > > > Thanks!
> > > > > > > > > Greg Harris
> > > > > > > >
> > > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] KIP-898: Modernize Connect plugin discovery

Posted by Tom Bentley <tb...@redhat.com>.
Hi Greg,

Thanks for the KIP. Overall I think using service loading would be a
worthwhile improvement.

1. "Both of these are well-established Java interfaces for which public
documentation should be readily available." I see no harm in adding a well
chosen link or two just to save  readers unfamiliar with service loading
from having to search for documentation.

2. "plugin.path.discovery" is a bit of an odd name, since it's not
discovering plugin _paths_, rather it's discovering plugins. So perhaps
plugin.discovery, or plugin.discovery.policy would be a better name.

3. I wondered if we really needed "ONLY_SCAN". It seems like the sort of
thing where someone silences the warnings then forgets about it until one
day they (or worse, someone else) gets the pleasure of upgrading and
finding they can't find their plugins. I guess the proposed roll out, over
multiple major releases, makes this less likely. Non-the-less, logged
warnings seem like a small price to pay overall, and would make it harder
for users to ignore the problem. We should be encouraging users to use the
migration script to silence warnings and adopt SERVICE_LOAD. That way they
get the benefits sooner than Kafka 5.0.

4. As described, connect-plugin-path.sh can modify jar files in-place. Have
you tested/considered whether this will work for sealed jars? I know
they're not common, but they are part of the Java platform so we can't rule
out that they're being used. I _think_ service loading should still work
even if you wrote a new shim jar file which just contained the service
files needed for loading (alongside the original unmodified jar containing
the plugins); if that's right then there's no need to actually modify
existing jars.

Thanks again,

Tom

On Wed, 8 Feb 2023 at 21:45, Chris Egerton <ch...@aiven.io.invalid> wrote:

> Hi Greg,
>
> Thanks for the updates. This is looking great. A few minor questions:
>
> 1. The description for the list command in the plugin path CLI states that
> it'll show "Whether the class is discoverable via scanning". Are there any
> cases where a plugin found by the CLI won't be discoverable via scanning?
>
> 2. Why are we treating module info files differently than service loader
> manifests (where the latter will be removed for not-found plugins on an
> opt-out basis, but the former are left unmodified no matter what)? Basing
> this off of this sentence in the KIP: "If a plugin declares an
> implementation via a module-info.java which is not loadable, the
> module-info.java will not be modified."; let me know if I'm
> misinterpreting.
>
> Cheers,
>
> Chris
>
> On Tue, Feb 7, 2023 at 4:32 PM Greg Harris <gr...@aiven.io.invalid>
> wrote:
>
> > Chris,
> >
> > Thanks for the comments!
> >
> > 1. I've updated the script to sync-manifests [--keep-not-found] with your
> > described semantics.
> >
> > 2. I've pushed all of the deprecation decisions to a follow-up KIP that
> can
> > take place after an intervening release.
> > I hope that this feature has a high enough ROI that the direction of the
> > migration is obvious to users even without a formal deprecation notice.
> > I do think that without a formal deprecation notice that the migration
> > scripts may be in use for a long time, particularly for connector
> > implementations which are not receiving active maintenance.
> >
> > 3. Thanks for looking into the ServiceLoader error handling contract.
> > From my inspection of the ServiceLoader implementation, it seems to be
> able
> > to recover from most of the poor packaging scenarios that I'm aware of.
> > I think with or without the deprecation, operators are free to select the
> > ONLY_SCAN mode to work around any errors we don't discover before
> release.
> >
> > Thanks,
> > Greg
> >
> > On Mon, Feb 6, 2023 at 8:44 AM Chris Egerton <ch...@aiven.io.invalid>
> > wrote:
> >
> > > Hi Greg,
> > >
> > > Thanks for the updates. Unless stated below, I agree with your
> > responses. A
> > > few more thoughts:
> > >
> > > 1. IMO it's not worth it to have separate commands for adding/removing
> > > manifests, mostly because it adds complexity to the tool and might make
> > it
> > > harder for users to understand. I think a single "update-manifests" or
> > > "sync-manifests" command that both adds manifests for found plugins and
> > > removes manifests for unfound plugins by default, with a --keep-unfound
> > > flag to disable removal on an opt-in basis, would make more sense.
> > >
> > > 2. I agree with this point you raise about deprecation: "if we agree
> that
> > > the old mechanism has some sunset date that we just don't know yet, we
> > > should still communicate to users that the sunset date is somewhere in
> > the
> > > future." However, I'm not certain that releasing the features proposed
> in
> > > this KIP alone gives us enough confidence in them to sunset the legacy
> > > plugin loading logic, which is why I was hoping we could have at least
> > one
> > > successful release (preferably even two or three) with this feature in
> > > order to identify potential gaps in the design before proceeding. This
> is
> > > similar to the strategy we took with incremental rebalancing in
> Connect,
> > > which introduced the "connect.protocol" property with the proposal that
> > it
> > > could be marked deprecated in the next major release. That strategy
> > served
> > > us well, since there were a few kinks in the logic for that feature
> that
> > > needed to be worked out, and many users continued to rely on the legacy
> > > rebalancing logic as a workaround for those issues.
> > >
> > > 3. I was wondering about error handling with the service loading
> > mechanism
> > > because the Javadocs for the Iterator returned by
> ServiceLoader::iterator
> > > [1] state that "If an error is thrown then subsequent invocations of
> the
> > > iterator will make a best effort to locate and instantiate the next
> > > available provider, but in general such recovery cannot be guaranteed."
> > > Upon further reflection, I agree that it's fine for now to leave
> details
> > on
> > > error handling out of the public contract for this new loading mode,
> but
> > > only if we also agree to hold off on deprecating the old loading mode,
> > > since there may be cases where worker startup is crippled by an
> > > improperly-packaged plugin when using the service loader mechanism for
> > its
> > > plugins.
> > >
> > > [1] -
> > >
> > >
> >
> https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/util/ServiceLoader.html#iterator()
> > >
> > > Cheers,
> > >
> > > Chris
> > >
> > > On Tue, Jan 24, 2023 at 10:26 AM Federico Valeri <fedevaleri@gmail.com
> >
> > > wrote:
> > >
> > > > Hi Greg,
> > > >
> > > > On Tue, Jan 24, 2023 at 2:48 AM Greg Harris
> > > > <gr...@aiven.io.invalid> wrote:
> > > > >
> > > > > Federico,
> > > > >
> > > > > Thanks for taking a look!
> > > > >
> > > > > > I was just wondering if we should use a better script name like
> > > > > > "connect-convert-to-service-provider.sh" or something like this
> > > > >
> > > > > I agree that the current name is not ideal, while the current name
> > > > > describes part of what it is doing, it does not describe all of
> what
> > it
> > > > is
> > > > > doing, or why that is important to the caller.
> > > > > I looked around to see what style the other Kafka commands use, and
> > it
> > > > > seems like there's not a standard pattern. Some of the scripts are
> > > > > noun-phrases, and some are verb-phrases.
> > > > > Some take commands like `create`, `format`, etc, and some take
> > options
> > > > like
> > > > > `--generate`, `--execute`, `--verify`.
> > > > > I've updated the command to `bin/connect-plugin-path.sh
> > > > > (list|add-manifests|remove-manifests) --worker-config
> > <worker-config>`,
> > > > let
> > > > > me know if that's better or worse than
> > > `bin/connect-scan-plugin-path.sh`.
> > > > >
> > > > > > maybe add a --dry-run option.
> > > > >
> > > > > Thanks for the suggestion, I've updated the KIP to include a
> > --dry-run
> > > > > option with some typical semantics.
> > > >
> > > > This is much better. Thanks!
> > > >
> > > > I think it's better to not deprecate the add-manifests and
> > > > remove-manifests script sub-commands. When we will remove the
> > > > deprecated plugin discovery modes, one may still have the need to
> > > > convert an old connector release, maybe because it's the only version
> > > > compatible with a third-party external system.
> > > >
> > > > >
> > > > > Chris,
> > > > >
> > > > > I'm glad you liked the personas! Since this KIP requires others'
> help
> > > to
> > > > > improve the ecosystem, I needed a way to make sure everyone knows
> > what
> > > > role
> > > > > they have to play. I hope I've accomplished that.
> > > > >
> > > > > > 1. Will workers' resiliency to packaging issues be affected when
> > the
> > > > new
> > > > > > scanning behavior is enabled? For example, will a single
> > > > poorly-packaged
> > > > > > plugin prevent other correctly-packaged plugins from being
> loaded,
> > or
> > > > > > worse, cause the worker to fail? And either way, are there any
> > > details
> > > > > that
> > > > > > we can call out to back up any guarantees we want to provide on
> > that
> > > > > front?
> > > > >
> > > > > The current behavior when encountering packaging issues seems to be
> > to
> > > > log
> > > > > errors and continue, and that will certainly continue for the
> > ONLY_SCAN
> > > > and
> > > > > HYBID_WARN modes.
> > > > > As written, the HYBRID_FAIL mode breaks from this trend and will
> > throw
> > > > > exceptions that crash the worker only when the connectors are
> missing
> > > > from
> > > > > serviceloader discovery.
> > > > > The behavior for SERVICE_LOAD is not currently defined in the KIP.
> I
> > > > think
> > > > > we can either keep the log-and-continue behavior whenever possible,
> > or
> > > > > begin to surface errors in a fail-fast model.
> > > > > And if we can't choose between those two, perhaps we can add
> > > > > SERVICE_LOAD_FAIL_FAST, or a
> > plugin.path.discovery.fail.fast=true/false
> > > > to
> > > > > allow users to configure the worker to fail to start up when
> > > plugin.path
> > > > > issues are present.
> > > > >
> > > > > With regards to incorrectly packaged connectors affecting other
> > > correctly
> > > > > packaged ones: I think one of the current behaviors of the
> > Reflections
> > > > > implementation is that certain exceptions can prevent discovery of
> > > other
> > > > > unrelated plugins on the path.
> > > > > I think that this may be able to be improved in all modes, and
> > doesn't
> > > > need
> > > > > the ServiceLoader mechanism to resolve. I'll explore this some more
> > and
> > > > > pursue a fix outside of this KIP.
> > > > > The new mechanism should be as good or better than the Reflections
> > > > > mechanism in this regard. I am not sure if it is necessary to
> commit
> > to
> > > > the
> > > > > error handling behavior as part of the public API.
> > > > >
> > > > > > 2. I think the description of the "--plugin-location" flag may
> need
> > > > some
> > > > > > updating? In the middle of the list of bullet points there's "The
> > > > value of
> > > > > > this argument will be a single plugin, which can be any of the
> > > > following:"
> > > > > > but the following points don't appear to be related.
> > > > >
> > > > > Thanks, I think that was a copy-paste error. I have updated the
> KIP.
> > > > >
> > > > > > 3. (Nit) Could we rename the migration script something like
> > > > > > "connect-migrate-plugin-path" or even just "connect-plugin-path"
> > > with a
> > > > > > "migrate" subcommand?
> > > > >
> > > > > Yes, you and Federico had the same idea :)
> > > > > Please see my feedback above regarding naming.
> > > > >
> > > > > > 4. It's noted that "If a plugin class is removed from the path,
> the
> > > > > > corresponding shim manifest should also be removed". Did you
> > consider
> > > > > > adding behavior (possibly guarded behind a CLI flag) to find and
> > > remove
> > > > > > manifests for nonexistent plugins?
> > > > >
> > > > > When writing that originally, I thought that the script would only
> > > remove
> > > > > manifests that it had generated, as filtered by some in-file
> comment
> > or
> > > > by
> > > > > jar filename, which would be an implementation detail.
> > > > > If we were to add a CLI flag to also enable/disable the removal
> > > behavior,
> > > > > we could eliminate that implementation detail and make the removal
> an
> > > > > all-or-nothing part of the script.
> > > > >
> > > > > I updated the script to accept a `remove-manifests` subcommand
> which
> > I
> > > > > think captures your suggestion. Therefore, a full sync would
> include
> > > two
> > > > > invocations: an `add-manifests` and `remove-manifests`.
> > > > >
> > > > > > 5. What's the concrete plan for updating the plugins that ship
> OOTB
> > > > with
> > > > > > Connect? Will they be given a service loader manifest, a module
> > info
> > > > file,
> > > > > > or both?
> > > > >
> > > > > At a minimum, we need to add service loader manifests in this KIP
> to
> > be
> > > > > compliant with the new discovery model.
> > > > > This is because Kafka 3.x supports Java 8, and that version of the
> > > > > ServiceLoader can only read manifest files, and not module-info
> > files.
> > > > > Beyond that, I think it becomes a question of scope, and whether we
> > > > should
> > > > > add module-info files for all of the Connect components.
> > > > > I was not planning on doing so within this KIP, and would prefer to
> > > leave
> > > > > that for a follow-up KIP.
> > > > >
> > > > > I have clarified this in the KIP.
> > > > >
> > > > > > 6. In the HYBRID_WARN mode, what precedence will we give to
> plugins
> > > > that
> > > > > > are found by both the legacy scanning logic and the new service
> > > > > > loader/module info loading logic?
> > > > >
> > > > > In HYBRID_WARN, the control-flow should be the same as SCAN_ONLY,
> so
> > > the
> > > > > same classes are discovered in the same order.
> > > > > All of the classes that are found via the ServiceLoader mechanism
> > > should
> > > > > only be used by the runtime to generate warnings.
> > > > > I believe that by the nature of the Reflections library, all
> > instances
> > > > > which are discoverable by the ServiceLoader should be discoverable
> by
> > > > > Reflections, and so we don't need to merge the two discovery
> methods
> > or
> > > > > handle precedence.
> > > > >
> > > > > > 7. It's a bit strange to immediately deprecate config property
> > values
> > > > and
> > > > > > the migration script in their first release. IMO as long as we
> have
> > > the
> > > > > > migration script in the project, it doesn't make sense to
> deprecate
> > > it,
> > > > > and
> > > > > > until we've had at least one successful release with the new
> > loading
> > > > logic
> > > > > > and some confidence in it, we should not deprecate the older
> > logic. I
> > > > > think
> > > > > > a more reasonable pace for the update here could be to deprecate
> > the
> > > > > legacy
> > > > > > scanning logic in 4.0 (changing the default accordingly) and
> remove
> > > it
> > > > > > entirely in 5.0.
> > > > >
> > > > > I understand that it seems a bit silly to introduce something as
> > > > > deprecated, but I think that's because 'deprecated' is not the same
> > as
> > > > > 'old', and not mutually exclusive with 'new'.
> > > > > The way I understand it, marking something as deprecated and the
> > > process
> > > > of
> > > > > deprecation is about communicating the direction in which an API is
> > > > > evolving, and which methods can be relied upon for the foreseeable
> > > > future.
> > > > > And if we agree that the old mechanism has some sunset date that we
> > > just
> > > > > don't know yet, we should still communicate to users that the
> sunset
> > > date
> > > > > is somewhere in the future.
> > > > >
> > > > > I've updated the KIP to describe separate follow-on votes for
> > changing
> > > > the
> > > > > default, and removing the scanning behavior, as that was my
> original
> > > > intent
> > > > > but I don't think that was made clear.
> > > > > I also pushed back the deprecation of the migration sub-commands,
> > since
> > > > > we'd like to usher people towards the migration script instead of
> > > relying
> > > > > on the HYBRID modes.
> > > > > I kept the decision for deprecating the non-SERVICE_LOAD modes in
> the
> > > > > current vote for now, but if introducing deprecated configurations
> is
> > > too
> > > > > unpalatable we can move the deprecation to a later vote.
> > > > >
> > > > > Thanks,
> > > > > Greg
> > > > >
> > > > > On Mon, Jan 23, 2023 at 12:49 PM Chris Egerton
> > <chrise@aiven.io.invalid
> > > >
> > > > > wrote:
> > > > >
> > > > > > Hi Greg,
> > > > > >
> > > > > > Thanks for the KIP! This part of Connect has been due for an
> update
> > > > for a
> > > > > > while, it's nice to see it getting some attention. I especially
> > like
> > > > how
> > > > > > the migration plan is broken down into affected personas; makes
> > > review
> > > > > > easier and, hopefully, makes the feature easier to understand for
> > > > affected
> > > > > > users.
> > > > > >
> > > > > > Here are my initial thoughts:
> > > > > >
> > > > > > 1. Will workers' resiliency to packaging issues be affected when
> > the
> > > > new
> > > > > > scanning behavior is enabled? For example, will a single
> > > > poorly-packaged
> > > > > > plugin prevent other correctly-packaged plugins from being
> loaded,
> > or
> > > > > > worse, cause the worker to fail? And either way, are there any
> > > details
> > > > that
> > > > > > we can call out to back up any guarantees we want to provide on
> > that
> > > > front?
> > > > > >
> > > > > > 2. I think the description of the "--plugin-location" flag may
> need
> > > > some
> > > > > > updating? In the middle of the list of bullet points there's "The
> > > > value of
> > > > > > this argument will be a single plugin, which can be any of the
> > > > following:"
> > > > > > but the following points don't appear to be related.
> > > > > >
> > > > > > 3. (Nit) Could we rename the migration script something like
> > > > > > "connect-migrate-plugin-path" or even just "connect-plugin-path"
> > > with a
> > > > > > "migrate" subcommand?
> > > > > >
> > > > > > 4. It's noted that "If a plugin class is removed from the path,
> the
> > > > > > corresponding shim manifest should also be removed". Did you
> > consider
> > > > > > adding behavior (possibly guarded behind a CLI flag) to find and
> > > remove
> > > > > > manifests for nonexistent plugins?
> > > > > >
> > > > > > 5. What's the concrete plan for updating the plugins that ship
> OOTB
> > > > with
> > > > > > Connect? Will they be given a service loader manifest, a module
> > info
> > > > file,
> > > > > > or both?
> > > > > >
> > > > > > 6. In the HYBRID_WARN mode, what precedence will we give to
> plugins
> > > > that
> > > > > > are found by both the legacy scanning logic and the new service
> > > > > > loader/module info loading logic?
> > > > > >
> > > > > > 7. It's a bit strange to immediately deprecate config property
> > values
> > > > and
> > > > > > the migration script in their first release. IMO as long as we
> have
> > > the
> > > > > > migration script in the project, it doesn't make sense to
> deprecate
> > > > it, and
> > > > > > until we've had at least one successful release with the new
> > loading
> > > > logic
> > > > > > and some confidence in it, we should not deprecate the older
> > logic. I
> > > > think
> > > > > > a more reasonable pace for the update here could be to deprecate
> > the
> > > > legacy
> > > > > > scanning logic in 4.0 (changing the default accordingly) and
> remove
> > > it
> > > > > > entirely in 5.0.
> > > > > >
> > > > > > Cheers,
> > > > > >
> > > > > > Chris
> > > > > >
> > > > > > On Mon, Jan 23, 2023 at 3:21 AM Federico Valeri <
> > > fedevaleri@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > Hi Greg, this looks like a useful change to me.
> > > > > > >
> > > > > > > I was just wondering if we should use a better script name like
> > > > > > > "connect-convert-to-service-provider.sh" or something like
> this,
> > > and
> > > > > > > maybe add a --dry-run option.
> > > > > > >
> > > > > > > On Tue, Jan 17, 2023 at 8:45 PM Greg Harris
> > > > > > > <gr...@aiven.io.invalid> wrote:
> > > > > > > >
> > > > > > > > Hi all!
> > > > > > > >
> > > > > > > > I'd like to start a discussion about
> > > > > > > >
> > > > > > >
> > > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-898%3A+Modernize+Connect+plugin+discovery
> > > > > > > > .
> > > > > > > >
> > > > > > > > Thanks!
> > > > > > > > Greg Harris
> > > > > > >
> > > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] KIP-898: Modernize Connect plugin discovery

Posted by Chris Egerton <ch...@aiven.io.INVALID>.
Hi Greg,

Thanks for the updates. This is looking great. A few minor questions:

1. The description for the list command in the plugin path CLI states that
it'll show "Whether the class is discoverable via scanning". Are there any
cases where a plugin found by the CLI won't be discoverable via scanning?

2. Why are we treating module info files differently than service loader
manifests (where the latter will be removed for not-found plugins on an
opt-out basis, but the former are left unmodified no matter what)? Basing
this off of this sentence in the KIP: "If a plugin declares an
implementation via a module-info.java which is not loadable, the
module-info.java will not be modified."; let me know if I'm misinterpreting.

Cheers,

Chris

On Tue, Feb 7, 2023 at 4:32 PM Greg Harris <gr...@aiven.io.invalid>
wrote:

> Chris,
>
> Thanks for the comments!
>
> 1. I've updated the script to sync-manifests [--keep-not-found] with your
> described semantics.
>
> 2. I've pushed all of the deprecation decisions to a follow-up KIP that can
> take place after an intervening release.
> I hope that this feature has a high enough ROI that the direction of the
> migration is obvious to users even without a formal deprecation notice.
> I do think that without a formal deprecation notice that the migration
> scripts may be in use for a long time, particularly for connector
> implementations which are not receiving active maintenance.
>
> 3. Thanks for looking into the ServiceLoader error handling contract.
> From my inspection of the ServiceLoader implementation, it seems to be able
> to recover from most of the poor packaging scenarios that I'm aware of.
> I think with or without the deprecation, operators are free to select the
> ONLY_SCAN mode to work around any errors we don't discover before release.
>
> Thanks,
> Greg
>
> On Mon, Feb 6, 2023 at 8:44 AM Chris Egerton <ch...@aiven.io.invalid>
> wrote:
>
> > Hi Greg,
> >
> > Thanks for the updates. Unless stated below, I agree with your
> responses. A
> > few more thoughts:
> >
> > 1. IMO it's not worth it to have separate commands for adding/removing
> > manifests, mostly because it adds complexity to the tool and might make
> it
> > harder for users to understand. I think a single "update-manifests" or
> > "sync-manifests" command that both adds manifests for found plugins and
> > removes manifests for unfound plugins by default, with a --keep-unfound
> > flag to disable removal on an opt-in basis, would make more sense.
> >
> > 2. I agree with this point you raise about deprecation: "if we agree that
> > the old mechanism has some sunset date that we just don't know yet, we
> > should still communicate to users that the sunset date is somewhere in
> the
> > future." However, I'm not certain that releasing the features proposed in
> > this KIP alone gives us enough confidence in them to sunset the legacy
> > plugin loading logic, which is why I was hoping we could have at least
> one
> > successful release (preferably even two or three) with this feature in
> > order to identify potential gaps in the design before proceeding. This is
> > similar to the strategy we took with incremental rebalancing in Connect,
> > which introduced the "connect.protocol" property with the proposal that
> it
> > could be marked deprecated in the next major release. That strategy
> served
> > us well, since there were a few kinks in the logic for that feature that
> > needed to be worked out, and many users continued to rely on the legacy
> > rebalancing logic as a workaround for those issues.
> >
> > 3. I was wondering about error handling with the service loading
> mechanism
> > because the Javadocs for the Iterator returned by ServiceLoader::iterator
> > [1] state that "If an error is thrown then subsequent invocations of the
> > iterator will make a best effort to locate and instantiate the next
> > available provider, but in general such recovery cannot be guaranteed."
> > Upon further reflection, I agree that it's fine for now to leave details
> on
> > error handling out of the public contract for this new loading mode, but
> > only if we also agree to hold off on deprecating the old loading mode,
> > since there may be cases where worker startup is crippled by an
> > improperly-packaged plugin when using the service loader mechanism for
> its
> > plugins.
> >
> > [1] -
> >
> >
> https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/util/ServiceLoader.html#iterator()
> >
> > Cheers,
> >
> > Chris
> >
> > On Tue, Jan 24, 2023 at 10:26 AM Federico Valeri <fe...@gmail.com>
> > wrote:
> >
> > > Hi Greg,
> > >
> > > On Tue, Jan 24, 2023 at 2:48 AM Greg Harris
> > > <gr...@aiven.io.invalid> wrote:
> > > >
> > > > Federico,
> > > >
> > > > Thanks for taking a look!
> > > >
> > > > > I was just wondering if we should use a better script name like
> > > > > "connect-convert-to-service-provider.sh" or something like this
> > > >
> > > > I agree that the current name is not ideal, while the current name
> > > > describes part of what it is doing, it does not describe all of what
> it
> > > is
> > > > doing, or why that is important to the caller.
> > > > I looked around to see what style the other Kafka commands use, and
> it
> > > > seems like there's not a standard pattern. Some of the scripts are
> > > > noun-phrases, and some are verb-phrases.
> > > > Some take commands like `create`, `format`, etc, and some take
> options
> > > like
> > > > `--generate`, `--execute`, `--verify`.
> > > > I've updated the command to `bin/connect-plugin-path.sh
> > > > (list|add-manifests|remove-manifests) --worker-config
> <worker-config>`,
> > > let
> > > > me know if that's better or worse than
> > `bin/connect-scan-plugin-path.sh`.
> > > >
> > > > > maybe add a --dry-run option.
> > > >
> > > > Thanks for the suggestion, I've updated the KIP to include a
> --dry-run
> > > > option with some typical semantics.
> > >
> > > This is much better. Thanks!
> > >
> > > I think it's better to not deprecate the add-manifests and
> > > remove-manifests script sub-commands. When we will remove the
> > > deprecated plugin discovery modes, one may still have the need to
> > > convert an old connector release, maybe because it's the only version
> > > compatible with a third-party external system.
> > >
> > > >
> > > > Chris,
> > > >
> > > > I'm glad you liked the personas! Since this KIP requires others' help
> > to
> > > > improve the ecosystem, I needed a way to make sure everyone knows
> what
> > > role
> > > > they have to play. I hope I've accomplished that.
> > > >
> > > > > 1. Will workers' resiliency to packaging issues be affected when
> the
> > > new
> > > > > scanning behavior is enabled? For example, will a single
> > > poorly-packaged
> > > > > plugin prevent other correctly-packaged plugins from being loaded,
> or
> > > > > worse, cause the worker to fail? And either way, are there any
> > details
> > > > that
> > > > > we can call out to back up any guarantees we want to provide on
> that
> > > > front?
> > > >
> > > > The current behavior when encountering packaging issues seems to be
> to
> > > log
> > > > errors and continue, and that will certainly continue for the
> ONLY_SCAN
> > > and
> > > > HYBID_WARN modes.
> > > > As written, the HYBRID_FAIL mode breaks from this trend and will
> throw
> > > > exceptions that crash the worker only when the connectors are missing
> > > from
> > > > serviceloader discovery.
> > > > The behavior for SERVICE_LOAD is not currently defined in the KIP. I
> > > think
> > > > we can either keep the log-and-continue behavior whenever possible,
> or
> > > > begin to surface errors in a fail-fast model.
> > > > And if we can't choose between those two, perhaps we can add
> > > > SERVICE_LOAD_FAIL_FAST, or a
> plugin.path.discovery.fail.fast=true/false
> > > to
> > > > allow users to configure the worker to fail to start up when
> > plugin.path
> > > > issues are present.
> > > >
> > > > With regards to incorrectly packaged connectors affecting other
> > correctly
> > > > packaged ones: I think one of the current behaviors of the
> Reflections
> > > > implementation is that certain exceptions can prevent discovery of
> > other
> > > > unrelated plugins on the path.
> > > > I think that this may be able to be improved in all modes, and
> doesn't
> > > need
> > > > the ServiceLoader mechanism to resolve. I'll explore this some more
> and
> > > > pursue a fix outside of this KIP.
> > > > The new mechanism should be as good or better than the Reflections
> > > > mechanism in this regard. I am not sure if it is necessary to commit
> to
> > > the
> > > > error handling behavior as part of the public API.
> > > >
> > > > > 2. I think the description of the "--plugin-location" flag may need
> > > some
> > > > > updating? In the middle of the list of bullet points there's "The
> > > value of
> > > > > this argument will be a single plugin, which can be any of the
> > > following:"
> > > > > but the following points don't appear to be related.
> > > >
> > > > Thanks, I think that was a copy-paste error. I have updated the KIP.
> > > >
> > > > > 3. (Nit) Could we rename the migration script something like
> > > > > "connect-migrate-plugin-path" or even just "connect-plugin-path"
> > with a
> > > > > "migrate" subcommand?
> > > >
> > > > Yes, you and Federico had the same idea :)
> > > > Please see my feedback above regarding naming.
> > > >
> > > > > 4. It's noted that "If a plugin class is removed from the path, the
> > > > > corresponding shim manifest should also be removed". Did you
> consider
> > > > > adding behavior (possibly guarded behind a CLI flag) to find and
> > remove
> > > > > manifests for nonexistent plugins?
> > > >
> > > > When writing that originally, I thought that the script would only
> > remove
> > > > manifests that it had generated, as filtered by some in-file comment
> or
> > > by
> > > > jar filename, which would be an implementation detail.
> > > > If we were to add a CLI flag to also enable/disable the removal
> > behavior,
> > > > we could eliminate that implementation detail and make the removal an
> > > > all-or-nothing part of the script.
> > > >
> > > > I updated the script to accept a `remove-manifests` subcommand which
> I
> > > > think captures your suggestion. Therefore, a full sync would include
> > two
> > > > invocations: an `add-manifests` and `remove-manifests`.
> > > >
> > > > > 5. What's the concrete plan for updating the plugins that ship OOTB
> > > with
> > > > > Connect? Will they be given a service loader manifest, a module
> info
> > > file,
> > > > > or both?
> > > >
> > > > At a minimum, we need to add service loader manifests in this KIP to
> be
> > > > compliant with the new discovery model.
> > > > This is because Kafka 3.x supports Java 8, and that version of the
> > > > ServiceLoader can only read manifest files, and not module-info
> files.
> > > > Beyond that, I think it becomes a question of scope, and whether we
> > > should
> > > > add module-info files for all of the Connect components.
> > > > I was not planning on doing so within this KIP, and would prefer to
> > leave
> > > > that for a follow-up KIP.
> > > >
> > > > I have clarified this in the KIP.
> > > >
> > > > > 6. In the HYBRID_WARN mode, what precedence will we give to plugins
> > > that
> > > > > are found by both the legacy scanning logic and the new service
> > > > > loader/module info loading logic?
> > > >
> > > > In HYBRID_WARN, the control-flow should be the same as SCAN_ONLY, so
> > the
> > > > same classes are discovered in the same order.
> > > > All of the classes that are found via the ServiceLoader mechanism
> > should
> > > > only be used by the runtime to generate warnings.
> > > > I believe that by the nature of the Reflections library, all
> instances
> > > > which are discoverable by the ServiceLoader should be discoverable by
> > > > Reflections, and so we don't need to merge the two discovery methods
> or
> > > > handle precedence.
> > > >
> > > > > 7. It's a bit strange to immediately deprecate config property
> values
> > > and
> > > > > the migration script in their first release. IMO as long as we have
> > the
> > > > > migration script in the project, it doesn't make sense to deprecate
> > it,
> > > > and
> > > > > until we've had at least one successful release with the new
> loading
> > > logic
> > > > > and some confidence in it, we should not deprecate the older
> logic. I
> > > > think
> > > > > a more reasonable pace for the update here could be to deprecate
> the
> > > > legacy
> > > > > scanning logic in 4.0 (changing the default accordingly) and remove
> > it
> > > > > entirely in 5.0.
> > > >
> > > > I understand that it seems a bit silly to introduce something as
> > > > deprecated, but I think that's because 'deprecated' is not the same
> as
> > > > 'old', and not mutually exclusive with 'new'.
> > > > The way I understand it, marking something as deprecated and the
> > process
> > > of
> > > > deprecation is about communicating the direction in which an API is
> > > > evolving, and which methods can be relied upon for the foreseeable
> > > future.
> > > > And if we agree that the old mechanism has some sunset date that we
> > just
> > > > don't know yet, we should still communicate to users that the sunset
> > date
> > > > is somewhere in the future.
> > > >
> > > > I've updated the KIP to describe separate follow-on votes for
> changing
> > > the
> > > > default, and removing the scanning behavior, as that was my original
> > > intent
> > > > but I don't think that was made clear.
> > > > I also pushed back the deprecation of the migration sub-commands,
> since
> > > > we'd like to usher people towards the migration script instead of
> > relying
> > > > on the HYBRID modes.
> > > > I kept the decision for deprecating the non-SERVICE_LOAD modes in the
> > > > current vote for now, but if introducing deprecated configurations is
> > too
> > > > unpalatable we can move the deprecation to a later vote.
> > > >
> > > > Thanks,
> > > > Greg
> > > >
> > > > On Mon, Jan 23, 2023 at 12:49 PM Chris Egerton
> <chrise@aiven.io.invalid
> > >
> > > > wrote:
> > > >
> > > > > Hi Greg,
> > > > >
> > > > > Thanks for the KIP! This part of Connect has been due for an update
> > > for a
> > > > > while, it's nice to see it getting some attention. I especially
> like
> > > how
> > > > > the migration plan is broken down into affected personas; makes
> > review
> > > > > easier and, hopefully, makes the feature easier to understand for
> > > affected
> > > > > users.
> > > > >
> > > > > Here are my initial thoughts:
> > > > >
> > > > > 1. Will workers' resiliency to packaging issues be affected when
> the
> > > new
> > > > > scanning behavior is enabled? For example, will a single
> > > poorly-packaged
> > > > > plugin prevent other correctly-packaged plugins from being loaded,
> or
> > > > > worse, cause the worker to fail? And either way, are there any
> > details
> > > that
> > > > > we can call out to back up any guarantees we want to provide on
> that
> > > front?
> > > > >
> > > > > 2. I think the description of the "--plugin-location" flag may need
> > > some
> > > > > updating? In the middle of the list of bullet points there's "The
> > > value of
> > > > > this argument will be a single plugin, which can be any of the
> > > following:"
> > > > > but the following points don't appear to be related.
> > > > >
> > > > > 3. (Nit) Could we rename the migration script something like
> > > > > "connect-migrate-plugin-path" or even just "connect-plugin-path"
> > with a
> > > > > "migrate" subcommand?
> > > > >
> > > > > 4. It's noted that "If a plugin class is removed from the path, the
> > > > > corresponding shim manifest should also be removed". Did you
> consider
> > > > > adding behavior (possibly guarded behind a CLI flag) to find and
> > remove
> > > > > manifests for nonexistent plugins?
> > > > >
> > > > > 5. What's the concrete plan for updating the plugins that ship OOTB
> > > with
> > > > > Connect? Will they be given a service loader manifest, a module
> info
> > > file,
> > > > > or both?
> > > > >
> > > > > 6. In the HYBRID_WARN mode, what precedence will we give to plugins
> > > that
> > > > > are found by both the legacy scanning logic and the new service
> > > > > loader/module info loading logic?
> > > > >
> > > > > 7. It's a bit strange to immediately deprecate config property
> values
> > > and
> > > > > the migration script in their first release. IMO as long as we have
> > the
> > > > > migration script in the project, it doesn't make sense to deprecate
> > > it, and
> > > > > until we've had at least one successful release with the new
> loading
> > > logic
> > > > > and some confidence in it, we should not deprecate the older
> logic. I
> > > think
> > > > > a more reasonable pace for the update here could be to deprecate
> the
> > > legacy
> > > > > scanning logic in 4.0 (changing the default accordingly) and remove
> > it
> > > > > entirely in 5.0.
> > > > >
> > > > > Cheers,
> > > > >
> > > > > Chris
> > > > >
> > > > > On Mon, Jan 23, 2023 at 3:21 AM Federico Valeri <
> > fedevaleri@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Hi Greg, this looks like a useful change to me.
> > > > > >
> > > > > > I was just wondering if we should use a better script name like
> > > > > > "connect-convert-to-service-provider.sh" or something like this,
> > and
> > > > > > maybe add a --dry-run option.
> > > > > >
> > > > > > On Tue, Jan 17, 2023 at 8:45 PM Greg Harris
> > > > > > <gr...@aiven.io.invalid> wrote:
> > > > > > >
> > > > > > > Hi all!
> > > > > > >
> > > > > > > I'd like to start a discussion about
> > > > > > >
> > > > > >
> > > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-898%3A+Modernize+Connect+plugin+discovery
> > > > > > > .
> > > > > > >
> > > > > > > Thanks!
> > > > > > > Greg Harris
> > > > > >
> > > > >
> > >
> >
>

Re: [DISCUSS] KIP-898: Modernize Connect plugin discovery

Posted by Greg Harris <gr...@aiven.io.INVALID>.
Chris,

Thanks for the comments!

1. I've updated the script to sync-manifests [--keep-not-found] with your
described semantics.

2. I've pushed all of the deprecation decisions to a follow-up KIP that can
take place after an intervening release.
I hope that this feature has a high enough ROI that the direction of the
migration is obvious to users even without a formal deprecation notice.
I do think that without a formal deprecation notice that the migration
scripts may be in use for a long time, particularly for connector
implementations which are not receiving active maintenance.

3. Thanks for looking into the ServiceLoader error handling contract.
From my inspection of the ServiceLoader implementation, it seems to be able
to recover from most of the poor packaging scenarios that I'm aware of.
I think with or without the deprecation, operators are free to select the
ONLY_SCAN mode to work around any errors we don't discover before release.

Thanks,
Greg

On Mon, Feb 6, 2023 at 8:44 AM Chris Egerton <ch...@aiven.io.invalid>
wrote:

> Hi Greg,
>
> Thanks for the updates. Unless stated below, I agree with your responses. A
> few more thoughts:
>
> 1. IMO it's not worth it to have separate commands for adding/removing
> manifests, mostly because it adds complexity to the tool and might make it
> harder for users to understand. I think a single "update-manifests" or
> "sync-manifests" command that both adds manifests for found plugins and
> removes manifests for unfound plugins by default, with a --keep-unfound
> flag to disable removal on an opt-in basis, would make more sense.
>
> 2. I agree with this point you raise about deprecation: "if we agree that
> the old mechanism has some sunset date that we just don't know yet, we
> should still communicate to users that the sunset date is somewhere in the
> future." However, I'm not certain that releasing the features proposed in
> this KIP alone gives us enough confidence in them to sunset the legacy
> plugin loading logic, which is why I was hoping we could have at least one
> successful release (preferably even two or three) with this feature in
> order to identify potential gaps in the design before proceeding. This is
> similar to the strategy we took with incremental rebalancing in Connect,
> which introduced the "connect.protocol" property with the proposal that it
> could be marked deprecated in the next major release. That strategy served
> us well, since there were a few kinks in the logic for that feature that
> needed to be worked out, and many users continued to rely on the legacy
> rebalancing logic as a workaround for those issues.
>
> 3. I was wondering about error handling with the service loading mechanism
> because the Javadocs for the Iterator returned by ServiceLoader::iterator
> [1] state that "If an error is thrown then subsequent invocations of the
> iterator will make a best effort to locate and instantiate the next
> available provider, but in general such recovery cannot be guaranteed."
> Upon further reflection, I agree that it's fine for now to leave details on
> error handling out of the public contract for this new loading mode, but
> only if we also agree to hold off on deprecating the old loading mode,
> since there may be cases where worker startup is crippled by an
> improperly-packaged plugin when using the service loader mechanism for its
> plugins.
>
> [1] -
>
> https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/util/ServiceLoader.html#iterator()
>
> Cheers,
>
> Chris
>
> On Tue, Jan 24, 2023 at 10:26 AM Federico Valeri <fe...@gmail.com>
> wrote:
>
> > Hi Greg,
> >
> > On Tue, Jan 24, 2023 at 2:48 AM Greg Harris
> > <gr...@aiven.io.invalid> wrote:
> > >
> > > Federico,
> > >
> > > Thanks for taking a look!
> > >
> > > > I was just wondering if we should use a better script name like
> > > > "connect-convert-to-service-provider.sh" or something like this
> > >
> > > I agree that the current name is not ideal, while the current name
> > > describes part of what it is doing, it does not describe all of what it
> > is
> > > doing, or why that is important to the caller.
> > > I looked around to see what style the other Kafka commands use, and it
> > > seems like there's not a standard pattern. Some of the scripts are
> > > noun-phrases, and some are verb-phrases.
> > > Some take commands like `create`, `format`, etc, and some take options
> > like
> > > `--generate`, `--execute`, `--verify`.
> > > I've updated the command to `bin/connect-plugin-path.sh
> > > (list|add-manifests|remove-manifests) --worker-config <worker-config>`,
> > let
> > > me know if that's better or worse than
> `bin/connect-scan-plugin-path.sh`.
> > >
> > > > maybe add a --dry-run option.
> > >
> > > Thanks for the suggestion, I've updated the KIP to include a --dry-run
> > > option with some typical semantics.
> >
> > This is much better. Thanks!
> >
> > I think it's better to not deprecate the add-manifests and
> > remove-manifests script sub-commands. When we will remove the
> > deprecated plugin discovery modes, one may still have the need to
> > convert an old connector release, maybe because it's the only version
> > compatible with a third-party external system.
> >
> > >
> > > Chris,
> > >
> > > I'm glad you liked the personas! Since this KIP requires others' help
> to
> > > improve the ecosystem, I needed a way to make sure everyone knows what
> > role
> > > they have to play. I hope I've accomplished that.
> > >
> > > > 1. Will workers' resiliency to packaging issues be affected when the
> > new
> > > > scanning behavior is enabled? For example, will a single
> > poorly-packaged
> > > > plugin prevent other correctly-packaged plugins from being loaded, or
> > > > worse, cause the worker to fail? And either way, are there any
> details
> > > that
> > > > we can call out to back up any guarantees we want to provide on that
> > > front?
> > >
> > > The current behavior when encountering packaging issues seems to be to
> > log
> > > errors and continue, and that will certainly continue for the ONLY_SCAN
> > and
> > > HYBID_WARN modes.
> > > As written, the HYBRID_FAIL mode breaks from this trend and will throw
> > > exceptions that crash the worker only when the connectors are missing
> > from
> > > serviceloader discovery.
> > > The behavior for SERVICE_LOAD is not currently defined in the KIP. I
> > think
> > > we can either keep the log-and-continue behavior whenever possible, or
> > > begin to surface errors in a fail-fast model.
> > > And if we can't choose between those two, perhaps we can add
> > > SERVICE_LOAD_FAIL_FAST, or a plugin.path.discovery.fail.fast=true/false
> > to
> > > allow users to configure the worker to fail to start up when
> plugin.path
> > > issues are present.
> > >
> > > With regards to incorrectly packaged connectors affecting other
> correctly
> > > packaged ones: I think one of the current behaviors of the Reflections
> > > implementation is that certain exceptions can prevent discovery of
> other
> > > unrelated plugins on the path.
> > > I think that this may be able to be improved in all modes, and doesn't
> > need
> > > the ServiceLoader mechanism to resolve. I'll explore this some more and
> > > pursue a fix outside of this KIP.
> > > The new mechanism should be as good or better than the Reflections
> > > mechanism in this regard. I am not sure if it is necessary to commit to
> > the
> > > error handling behavior as part of the public API.
> > >
> > > > 2. I think the description of the "--plugin-location" flag may need
> > some
> > > > updating? In the middle of the list of bullet points there's "The
> > value of
> > > > this argument will be a single plugin, which can be any of the
> > following:"
> > > > but the following points don't appear to be related.
> > >
> > > Thanks, I think that was a copy-paste error. I have updated the KIP.
> > >
> > > > 3. (Nit) Could we rename the migration script something like
> > > > "connect-migrate-plugin-path" or even just "connect-plugin-path"
> with a
> > > > "migrate" subcommand?
> > >
> > > Yes, you and Federico had the same idea :)
> > > Please see my feedback above regarding naming.
> > >
> > > > 4. It's noted that "If a plugin class is removed from the path, the
> > > > corresponding shim manifest should also be removed". Did you consider
> > > > adding behavior (possibly guarded behind a CLI flag) to find and
> remove
> > > > manifests for nonexistent plugins?
> > >
> > > When writing that originally, I thought that the script would only
> remove
> > > manifests that it had generated, as filtered by some in-file comment or
> > by
> > > jar filename, which would be an implementation detail.
> > > If we were to add a CLI flag to also enable/disable the removal
> behavior,
> > > we could eliminate that implementation detail and make the removal an
> > > all-or-nothing part of the script.
> > >
> > > I updated the script to accept a `remove-manifests` subcommand which I
> > > think captures your suggestion. Therefore, a full sync would include
> two
> > > invocations: an `add-manifests` and `remove-manifests`.
> > >
> > > > 5. What's the concrete plan for updating the plugins that ship OOTB
> > with
> > > > Connect? Will they be given a service loader manifest, a module info
> > file,
> > > > or both?
> > >
> > > At a minimum, we need to add service loader manifests in this KIP to be
> > > compliant with the new discovery model.
> > > This is because Kafka 3.x supports Java 8, and that version of the
> > > ServiceLoader can only read manifest files, and not module-info files.
> > > Beyond that, I think it becomes a question of scope, and whether we
> > should
> > > add module-info files for all of the Connect components.
> > > I was not planning on doing so within this KIP, and would prefer to
> leave
> > > that for a follow-up KIP.
> > >
> > > I have clarified this in the KIP.
> > >
> > > > 6. In the HYBRID_WARN mode, what precedence will we give to plugins
> > that
> > > > are found by both the legacy scanning logic and the new service
> > > > loader/module info loading logic?
> > >
> > > In HYBRID_WARN, the control-flow should be the same as SCAN_ONLY, so
> the
> > > same classes are discovered in the same order.
> > > All of the classes that are found via the ServiceLoader mechanism
> should
> > > only be used by the runtime to generate warnings.
> > > I believe that by the nature of the Reflections library, all instances
> > > which are discoverable by the ServiceLoader should be discoverable by
> > > Reflections, and so we don't need to merge the two discovery methods or
> > > handle precedence.
> > >
> > > > 7. It's a bit strange to immediately deprecate config property values
> > and
> > > > the migration script in their first release. IMO as long as we have
> the
> > > > migration script in the project, it doesn't make sense to deprecate
> it,
> > > and
> > > > until we've had at least one successful release with the new loading
> > logic
> > > > and some confidence in it, we should not deprecate the older logic. I
> > > think
> > > > a more reasonable pace for the update here could be to deprecate the
> > > legacy
> > > > scanning logic in 4.0 (changing the default accordingly) and remove
> it
> > > > entirely in 5.0.
> > >
> > > I understand that it seems a bit silly to introduce something as
> > > deprecated, but I think that's because 'deprecated' is not the same as
> > > 'old', and not mutually exclusive with 'new'.
> > > The way I understand it, marking something as deprecated and the
> process
> > of
> > > deprecation is about communicating the direction in which an API is
> > > evolving, and which methods can be relied upon for the foreseeable
> > future.
> > > And if we agree that the old mechanism has some sunset date that we
> just
> > > don't know yet, we should still communicate to users that the sunset
> date
> > > is somewhere in the future.
> > >
> > > I've updated the KIP to describe separate follow-on votes for changing
> > the
> > > default, and removing the scanning behavior, as that was my original
> > intent
> > > but I don't think that was made clear.
> > > I also pushed back the deprecation of the migration sub-commands, since
> > > we'd like to usher people towards the migration script instead of
> relying
> > > on the HYBRID modes.
> > > I kept the decision for deprecating the non-SERVICE_LOAD modes in the
> > > current vote for now, but if introducing deprecated configurations is
> too
> > > unpalatable we can move the deprecation to a later vote.
> > >
> > > Thanks,
> > > Greg
> > >
> > > On Mon, Jan 23, 2023 at 12:49 PM Chris Egerton <chrise@aiven.io.invalid
> >
> > > wrote:
> > >
> > > > Hi Greg,
> > > >
> > > > Thanks for the KIP! This part of Connect has been due for an update
> > for a
> > > > while, it's nice to see it getting some attention. I especially like
> > how
> > > > the migration plan is broken down into affected personas; makes
> review
> > > > easier and, hopefully, makes the feature easier to understand for
> > affected
> > > > users.
> > > >
> > > > Here are my initial thoughts:
> > > >
> > > > 1. Will workers' resiliency to packaging issues be affected when the
> > new
> > > > scanning behavior is enabled? For example, will a single
> > poorly-packaged
> > > > plugin prevent other correctly-packaged plugins from being loaded, or
> > > > worse, cause the worker to fail? And either way, are there any
> details
> > that
> > > > we can call out to back up any guarantees we want to provide on that
> > front?
> > > >
> > > > 2. I think the description of the "--plugin-location" flag may need
> > some
> > > > updating? In the middle of the list of bullet points there's "The
> > value of
> > > > this argument will be a single plugin, which can be any of the
> > following:"
> > > > but the following points don't appear to be related.
> > > >
> > > > 3. (Nit) Could we rename the migration script something like
> > > > "connect-migrate-plugin-path" or even just "connect-plugin-path"
> with a
> > > > "migrate" subcommand?
> > > >
> > > > 4. It's noted that "If a plugin class is removed from the path, the
> > > > corresponding shim manifest should also be removed". Did you consider
> > > > adding behavior (possibly guarded behind a CLI flag) to find and
> remove
> > > > manifests for nonexistent plugins?
> > > >
> > > > 5. What's the concrete plan for updating the plugins that ship OOTB
> > with
> > > > Connect? Will they be given a service loader manifest, a module info
> > file,
> > > > or both?
> > > >
> > > > 6. In the HYBRID_WARN mode, what precedence will we give to plugins
> > that
> > > > are found by both the legacy scanning logic and the new service
> > > > loader/module info loading logic?
> > > >
> > > > 7. It's a bit strange to immediately deprecate config property values
> > and
> > > > the migration script in their first release. IMO as long as we have
> the
> > > > migration script in the project, it doesn't make sense to deprecate
> > it, and
> > > > until we've had at least one successful release with the new loading
> > logic
> > > > and some confidence in it, we should not deprecate the older logic. I
> > think
> > > > a more reasonable pace for the update here could be to deprecate the
> > legacy
> > > > scanning logic in 4.0 (changing the default accordingly) and remove
> it
> > > > entirely in 5.0.
> > > >
> > > > Cheers,
> > > >
> > > > Chris
> > > >
> > > > On Mon, Jan 23, 2023 at 3:21 AM Federico Valeri <
> fedevaleri@gmail.com>
> > > > wrote:
> > > >
> > > > > Hi Greg, this looks like a useful change to me.
> > > > >
> > > > > I was just wondering if we should use a better script name like
> > > > > "connect-convert-to-service-provider.sh" or something like this,
> and
> > > > > maybe add a --dry-run option.
> > > > >
> > > > > On Tue, Jan 17, 2023 at 8:45 PM Greg Harris
> > > > > <gr...@aiven.io.invalid> wrote:
> > > > > >
> > > > > > Hi all!
> > > > > >
> > > > > > I'd like to start a discussion about
> > > > > >
> > > > >
> > > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-898%3A+Modernize+Connect+plugin+discovery
> > > > > > .
> > > > > >
> > > > > > Thanks!
> > > > > > Greg Harris
> > > > >
> > > >
> >
>

Re: [DISCUSS] KIP-898: Modernize Connect plugin discovery

Posted by Chris Egerton <ch...@aiven.io.INVALID>.
Hi Greg,

Thanks for the updates. Unless stated below, I agree with your responses. A
few more thoughts:

1. IMO it's not worth it to have separate commands for adding/removing
manifests, mostly because it adds complexity to the tool and might make it
harder for users to understand. I think a single "update-manifests" or
"sync-manifests" command that both adds manifests for found plugins and
removes manifests for unfound plugins by default, with a --keep-unfound
flag to disable removal on an opt-in basis, would make more sense.

2. I agree with this point you raise about deprecation: "if we agree that
the old mechanism has some sunset date that we just don't know yet, we
should still communicate to users that the sunset date is somewhere in the
future." However, I'm not certain that releasing the features proposed in
this KIP alone gives us enough confidence in them to sunset the legacy
plugin loading logic, which is why I was hoping we could have at least one
successful release (preferably even two or three) with this feature in
order to identify potential gaps in the design before proceeding. This is
similar to the strategy we took with incremental rebalancing in Connect,
which introduced the "connect.protocol" property with the proposal that it
could be marked deprecated in the next major release. That strategy served
us well, since there were a few kinks in the logic for that feature that
needed to be worked out, and many users continued to rely on the legacy
rebalancing logic as a workaround for those issues.

3. I was wondering about error handling with the service loading mechanism
because the Javadocs for the Iterator returned by ServiceLoader::iterator
[1] state that "If an error is thrown then subsequent invocations of the
iterator will make a best effort to locate and instantiate the next
available provider, but in general such recovery cannot be guaranteed."
Upon further reflection, I agree that it's fine for now to leave details on
error handling out of the public contract for this new loading mode, but
only if we also agree to hold off on deprecating the old loading mode,
since there may be cases where worker startup is crippled by an
improperly-packaged plugin when using the service loader mechanism for its
plugins.

[1] -
https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/util/ServiceLoader.html#iterator()

Cheers,

Chris

On Tue, Jan 24, 2023 at 10:26 AM Federico Valeri <fe...@gmail.com>
wrote:

> Hi Greg,
>
> On Tue, Jan 24, 2023 at 2:48 AM Greg Harris
> <gr...@aiven.io.invalid> wrote:
> >
> > Federico,
> >
> > Thanks for taking a look!
> >
> > > I was just wondering if we should use a better script name like
> > > "connect-convert-to-service-provider.sh" or something like this
> >
> > I agree that the current name is not ideal, while the current name
> > describes part of what it is doing, it does not describe all of what it
> is
> > doing, or why that is important to the caller.
> > I looked around to see what style the other Kafka commands use, and it
> > seems like there's not a standard pattern. Some of the scripts are
> > noun-phrases, and some are verb-phrases.
> > Some take commands like `create`, `format`, etc, and some take options
> like
> > `--generate`, `--execute`, `--verify`.
> > I've updated the command to `bin/connect-plugin-path.sh
> > (list|add-manifests|remove-manifests) --worker-config <worker-config>`,
> let
> > me know if that's better or worse than `bin/connect-scan-plugin-path.sh`.
> >
> > > maybe add a --dry-run option.
> >
> > Thanks for the suggestion, I've updated the KIP to include a --dry-run
> > option with some typical semantics.
>
> This is much better. Thanks!
>
> I think it's better to not deprecate the add-manifests and
> remove-manifests script sub-commands. When we will remove the
> deprecated plugin discovery modes, one may still have the need to
> convert an old connector release, maybe because it's the only version
> compatible with a third-party external system.
>
> >
> > Chris,
> >
> > I'm glad you liked the personas! Since this KIP requires others' help to
> > improve the ecosystem, I needed a way to make sure everyone knows what
> role
> > they have to play. I hope I've accomplished that.
> >
> > > 1. Will workers' resiliency to packaging issues be affected when the
> new
> > > scanning behavior is enabled? For example, will a single
> poorly-packaged
> > > plugin prevent other correctly-packaged plugins from being loaded, or
> > > worse, cause the worker to fail? And either way, are there any details
> > that
> > > we can call out to back up any guarantees we want to provide on that
> > front?
> >
> > The current behavior when encountering packaging issues seems to be to
> log
> > errors and continue, and that will certainly continue for the ONLY_SCAN
> and
> > HYBID_WARN modes.
> > As written, the HYBRID_FAIL mode breaks from this trend and will throw
> > exceptions that crash the worker only when the connectors are missing
> from
> > serviceloader discovery.
> > The behavior for SERVICE_LOAD is not currently defined in the KIP. I
> think
> > we can either keep the log-and-continue behavior whenever possible, or
> > begin to surface errors in a fail-fast model.
> > And if we can't choose between those two, perhaps we can add
> > SERVICE_LOAD_FAIL_FAST, or a plugin.path.discovery.fail.fast=true/false
> to
> > allow users to configure the worker to fail to start up when plugin.path
> > issues are present.
> >
> > With regards to incorrectly packaged connectors affecting other correctly
> > packaged ones: I think one of the current behaviors of the Reflections
> > implementation is that certain exceptions can prevent discovery of other
> > unrelated plugins on the path.
> > I think that this may be able to be improved in all modes, and doesn't
> need
> > the ServiceLoader mechanism to resolve. I'll explore this some more and
> > pursue a fix outside of this KIP.
> > The new mechanism should be as good or better than the Reflections
> > mechanism in this regard. I am not sure if it is necessary to commit to
> the
> > error handling behavior as part of the public API.
> >
> > > 2. I think the description of the "--plugin-location" flag may need
> some
> > > updating? In the middle of the list of bullet points there's "The
> value of
> > > this argument will be a single plugin, which can be any of the
> following:"
> > > but the following points don't appear to be related.
> >
> > Thanks, I think that was a copy-paste error. I have updated the KIP.
> >
> > > 3. (Nit) Could we rename the migration script something like
> > > "connect-migrate-plugin-path" or even just "connect-plugin-path" with a
> > > "migrate" subcommand?
> >
> > Yes, you and Federico had the same idea :)
> > Please see my feedback above regarding naming.
> >
> > > 4. It's noted that "If a plugin class is removed from the path, the
> > > corresponding shim manifest should also be removed". Did you consider
> > > adding behavior (possibly guarded behind a CLI flag) to find and remove
> > > manifests for nonexistent plugins?
> >
> > When writing that originally, I thought that the script would only remove
> > manifests that it had generated, as filtered by some in-file comment or
> by
> > jar filename, which would be an implementation detail.
> > If we were to add a CLI flag to also enable/disable the removal behavior,
> > we could eliminate that implementation detail and make the removal an
> > all-or-nothing part of the script.
> >
> > I updated the script to accept a `remove-manifests` subcommand which I
> > think captures your suggestion. Therefore, a full sync would include two
> > invocations: an `add-manifests` and `remove-manifests`.
> >
> > > 5. What's the concrete plan for updating the plugins that ship OOTB
> with
> > > Connect? Will they be given a service loader manifest, a module info
> file,
> > > or both?
> >
> > At a minimum, we need to add service loader manifests in this KIP to be
> > compliant with the new discovery model.
> > This is because Kafka 3.x supports Java 8, and that version of the
> > ServiceLoader can only read manifest files, and not module-info files.
> > Beyond that, I think it becomes a question of scope, and whether we
> should
> > add module-info files for all of the Connect components.
> > I was not planning on doing so within this KIP, and would prefer to leave
> > that for a follow-up KIP.
> >
> > I have clarified this in the KIP.
> >
> > > 6. In the HYBRID_WARN mode, what precedence will we give to plugins
> that
> > > are found by both the legacy scanning logic and the new service
> > > loader/module info loading logic?
> >
> > In HYBRID_WARN, the control-flow should be the same as SCAN_ONLY, so the
> > same classes are discovered in the same order.
> > All of the classes that are found via the ServiceLoader mechanism should
> > only be used by the runtime to generate warnings.
> > I believe that by the nature of the Reflections library, all instances
> > which are discoverable by the ServiceLoader should be discoverable by
> > Reflections, and so we don't need to merge the two discovery methods or
> > handle precedence.
> >
> > > 7. It's a bit strange to immediately deprecate config property values
> and
> > > the migration script in their first release. IMO as long as we have the
> > > migration script in the project, it doesn't make sense to deprecate it,
> > and
> > > until we've had at least one successful release with the new loading
> logic
> > > and some confidence in it, we should not deprecate the older logic. I
> > think
> > > a more reasonable pace for the update here could be to deprecate the
> > legacy
> > > scanning logic in 4.0 (changing the default accordingly) and remove it
> > > entirely in 5.0.
> >
> > I understand that it seems a bit silly to introduce something as
> > deprecated, but I think that's because 'deprecated' is not the same as
> > 'old', and not mutually exclusive with 'new'.
> > The way I understand it, marking something as deprecated and the process
> of
> > deprecation is about communicating the direction in which an API is
> > evolving, and which methods can be relied upon for the foreseeable
> future.
> > And if we agree that the old mechanism has some sunset date that we just
> > don't know yet, we should still communicate to users that the sunset date
> > is somewhere in the future.
> >
> > I've updated the KIP to describe separate follow-on votes for changing
> the
> > default, and removing the scanning behavior, as that was my original
> intent
> > but I don't think that was made clear.
> > I also pushed back the deprecation of the migration sub-commands, since
> > we'd like to usher people towards the migration script instead of relying
> > on the HYBRID modes.
> > I kept the decision for deprecating the non-SERVICE_LOAD modes in the
> > current vote for now, but if introducing deprecated configurations is too
> > unpalatable we can move the deprecation to a later vote.
> >
> > Thanks,
> > Greg
> >
> > On Mon, Jan 23, 2023 at 12:49 PM Chris Egerton <ch...@aiven.io.invalid>
> > wrote:
> >
> > > Hi Greg,
> > >
> > > Thanks for the KIP! This part of Connect has been due for an update
> for a
> > > while, it's nice to see it getting some attention. I especially like
> how
> > > the migration plan is broken down into affected personas; makes review
> > > easier and, hopefully, makes the feature easier to understand for
> affected
> > > users.
> > >
> > > Here are my initial thoughts:
> > >
> > > 1. Will workers' resiliency to packaging issues be affected when the
> new
> > > scanning behavior is enabled? For example, will a single
> poorly-packaged
> > > plugin prevent other correctly-packaged plugins from being loaded, or
> > > worse, cause the worker to fail? And either way, are there any details
> that
> > > we can call out to back up any guarantees we want to provide on that
> front?
> > >
> > > 2. I think the description of the "--plugin-location" flag may need
> some
> > > updating? In the middle of the list of bullet points there's "The
> value of
> > > this argument will be a single plugin, which can be any of the
> following:"
> > > but the following points don't appear to be related.
> > >
> > > 3. (Nit) Could we rename the migration script something like
> > > "connect-migrate-plugin-path" or even just "connect-plugin-path" with a
> > > "migrate" subcommand?
> > >
> > > 4. It's noted that "If a plugin class is removed from the path, the
> > > corresponding shim manifest should also be removed". Did you consider
> > > adding behavior (possibly guarded behind a CLI flag) to find and remove
> > > manifests for nonexistent plugins?
> > >
> > > 5. What's the concrete plan for updating the plugins that ship OOTB
> with
> > > Connect? Will they be given a service loader manifest, a module info
> file,
> > > or both?
> > >
> > > 6. In the HYBRID_WARN mode, what precedence will we give to plugins
> that
> > > are found by both the legacy scanning logic and the new service
> > > loader/module info loading logic?
> > >
> > > 7. It's a bit strange to immediately deprecate config property values
> and
> > > the migration script in their first release. IMO as long as we have the
> > > migration script in the project, it doesn't make sense to deprecate
> it, and
> > > until we've had at least one successful release with the new loading
> logic
> > > and some confidence in it, we should not deprecate the older logic. I
> think
> > > a more reasonable pace for the update here could be to deprecate the
> legacy
> > > scanning logic in 4.0 (changing the default accordingly) and remove it
> > > entirely in 5.0.
> > >
> > > Cheers,
> > >
> > > Chris
> > >
> > > On Mon, Jan 23, 2023 at 3:21 AM Federico Valeri <fe...@gmail.com>
> > > wrote:
> > >
> > > > Hi Greg, this looks like a useful change to me.
> > > >
> > > > I was just wondering if we should use a better script name like
> > > > "connect-convert-to-service-provider.sh" or something like this, and
> > > > maybe add a --dry-run option.
> > > >
> > > > On Tue, Jan 17, 2023 at 8:45 PM Greg Harris
> > > > <gr...@aiven.io.invalid> wrote:
> > > > >
> > > > > Hi all!
> > > > >
> > > > > I'd like to start a discussion about
> > > > >
> > > >
> > >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-898%3A+Modernize+Connect+plugin+discovery
> > > > > .
> > > > >
> > > > > Thanks!
> > > > > Greg Harris
> > > >
> > >
>

Re: [DISCUSS] KIP-898: Modernize Connect plugin discovery

Posted by Federico Valeri <fe...@gmail.com>.
Hi Greg,

On Tue, Jan 24, 2023 at 2:48 AM Greg Harris
<gr...@aiven.io.invalid> wrote:
>
> Federico,
>
> Thanks for taking a look!
>
> > I was just wondering if we should use a better script name like
> > "connect-convert-to-service-provider.sh" or something like this
>
> I agree that the current name is not ideal, while the current name
> describes part of what it is doing, it does not describe all of what it is
> doing, or why that is important to the caller.
> I looked around to see what style the other Kafka commands use, and it
> seems like there's not a standard pattern. Some of the scripts are
> noun-phrases, and some are verb-phrases.
> Some take commands like `create`, `format`, etc, and some take options like
> `--generate`, `--execute`, `--verify`.
> I've updated the command to `bin/connect-plugin-path.sh
> (list|add-manifests|remove-manifests) --worker-config <worker-config>`, let
> me know if that's better or worse than `bin/connect-scan-plugin-path.sh`.
>
> > maybe add a --dry-run option.
>
> Thanks for the suggestion, I've updated the KIP to include a --dry-run
> option with some typical semantics.

This is much better. Thanks!

I think it's better to not deprecate the add-manifests and
remove-manifests script sub-commands. When we will remove the
deprecated plugin discovery modes, one may still have the need to
convert an old connector release, maybe because it's the only version
compatible with a third-party external system.

>
> Chris,
>
> I'm glad you liked the personas! Since this KIP requires others' help to
> improve the ecosystem, I needed a way to make sure everyone knows what role
> they have to play. I hope I've accomplished that.
>
> > 1. Will workers' resiliency to packaging issues be affected when the new
> > scanning behavior is enabled? For example, will a single poorly-packaged
> > plugin prevent other correctly-packaged plugins from being loaded, or
> > worse, cause the worker to fail? And either way, are there any details
> that
> > we can call out to back up any guarantees we want to provide on that
> front?
>
> The current behavior when encountering packaging issues seems to be to log
> errors and continue, and that will certainly continue for the ONLY_SCAN and
> HYBID_WARN modes.
> As written, the HYBRID_FAIL mode breaks from this trend and will throw
> exceptions that crash the worker only when the connectors are missing from
> serviceloader discovery.
> The behavior for SERVICE_LOAD is not currently defined in the KIP. I think
> we can either keep the log-and-continue behavior whenever possible, or
> begin to surface errors in a fail-fast model.
> And if we can't choose between those two, perhaps we can add
> SERVICE_LOAD_FAIL_FAST, or a plugin.path.discovery.fail.fast=true/false to
> allow users to configure the worker to fail to start up when plugin.path
> issues are present.
>
> With regards to incorrectly packaged connectors affecting other correctly
> packaged ones: I think one of the current behaviors of the Reflections
> implementation is that certain exceptions can prevent discovery of other
> unrelated plugins on the path.
> I think that this may be able to be improved in all modes, and doesn't need
> the ServiceLoader mechanism to resolve. I'll explore this some more and
> pursue a fix outside of this KIP.
> The new mechanism should be as good or better than the Reflections
> mechanism in this regard. I am not sure if it is necessary to commit to the
> error handling behavior as part of the public API.
>
> > 2. I think the description of the "--plugin-location" flag may need some
> > updating? In the middle of the list of bullet points there's "The value of
> > this argument will be a single plugin, which can be any of the following:"
> > but the following points don't appear to be related.
>
> Thanks, I think that was a copy-paste error. I have updated the KIP.
>
> > 3. (Nit) Could we rename the migration script something like
> > "connect-migrate-plugin-path" or even just "connect-plugin-path" with a
> > "migrate" subcommand?
>
> Yes, you and Federico had the same idea :)
> Please see my feedback above regarding naming.
>
> > 4. It's noted that "If a plugin class is removed from the path, the
> > corresponding shim manifest should also be removed". Did you consider
> > adding behavior (possibly guarded behind a CLI flag) to find and remove
> > manifests for nonexistent plugins?
>
> When writing that originally, I thought that the script would only remove
> manifests that it had generated, as filtered by some in-file comment or by
> jar filename, which would be an implementation detail.
> If we were to add a CLI flag to also enable/disable the removal behavior,
> we could eliminate that implementation detail and make the removal an
> all-or-nothing part of the script.
>
> I updated the script to accept a `remove-manifests` subcommand which I
> think captures your suggestion. Therefore, a full sync would include two
> invocations: an `add-manifests` and `remove-manifests`.
>
> > 5. What's the concrete plan for updating the plugins that ship OOTB with
> > Connect? Will they be given a service loader manifest, a module info file,
> > or both?
>
> At a minimum, we need to add service loader manifests in this KIP to be
> compliant with the new discovery model.
> This is because Kafka 3.x supports Java 8, and that version of the
> ServiceLoader can only read manifest files, and not module-info files.
> Beyond that, I think it becomes a question of scope, and whether we should
> add module-info files for all of the Connect components.
> I was not planning on doing so within this KIP, and would prefer to leave
> that for a follow-up KIP.
>
> I have clarified this in the KIP.
>
> > 6. In the HYBRID_WARN mode, what precedence will we give to plugins that
> > are found by both the legacy scanning logic and the new service
> > loader/module info loading logic?
>
> In HYBRID_WARN, the control-flow should be the same as SCAN_ONLY, so the
> same classes are discovered in the same order.
> All of the classes that are found via the ServiceLoader mechanism should
> only be used by the runtime to generate warnings.
> I believe that by the nature of the Reflections library, all instances
> which are discoverable by the ServiceLoader should be discoverable by
> Reflections, and so we don't need to merge the two discovery methods or
> handle precedence.
>
> > 7. It's a bit strange to immediately deprecate config property values and
> > the migration script in their first release. IMO as long as we have the
> > migration script in the project, it doesn't make sense to deprecate it,
> and
> > until we've had at least one successful release with the new loading logic
> > and some confidence in it, we should not deprecate the older logic. I
> think
> > a more reasonable pace for the update here could be to deprecate the
> legacy
> > scanning logic in 4.0 (changing the default accordingly) and remove it
> > entirely in 5.0.
>
> I understand that it seems a bit silly to introduce something as
> deprecated, but I think that's because 'deprecated' is not the same as
> 'old', and not mutually exclusive with 'new'.
> The way I understand it, marking something as deprecated and the process of
> deprecation is about communicating the direction in which an API is
> evolving, and which methods can be relied upon for the foreseeable future.
> And if we agree that the old mechanism has some sunset date that we just
> don't know yet, we should still communicate to users that the sunset date
> is somewhere in the future.
>
> I've updated the KIP to describe separate follow-on votes for changing the
> default, and removing the scanning behavior, as that was my original intent
> but I don't think that was made clear.
> I also pushed back the deprecation of the migration sub-commands, since
> we'd like to usher people towards the migration script instead of relying
> on the HYBRID modes.
> I kept the decision for deprecating the non-SERVICE_LOAD modes in the
> current vote for now, but if introducing deprecated configurations is too
> unpalatable we can move the deprecation to a later vote.
>
> Thanks,
> Greg
>
> On Mon, Jan 23, 2023 at 12:49 PM Chris Egerton <ch...@aiven.io.invalid>
> wrote:
>
> > Hi Greg,
> >
> > Thanks for the KIP! This part of Connect has been due for an update for a
> > while, it's nice to see it getting some attention. I especially like how
> > the migration plan is broken down into affected personas; makes review
> > easier and, hopefully, makes the feature easier to understand for affected
> > users.
> >
> > Here are my initial thoughts:
> >
> > 1. Will workers' resiliency to packaging issues be affected when the new
> > scanning behavior is enabled? For example, will a single poorly-packaged
> > plugin prevent other correctly-packaged plugins from being loaded, or
> > worse, cause the worker to fail? And either way, are there any details that
> > we can call out to back up any guarantees we want to provide on that front?
> >
> > 2. I think the description of the "--plugin-location" flag may need some
> > updating? In the middle of the list of bullet points there's "The value of
> > this argument will be a single plugin, which can be any of the following:"
> > but the following points don't appear to be related.
> >
> > 3. (Nit) Could we rename the migration script something like
> > "connect-migrate-plugin-path" or even just "connect-plugin-path" with a
> > "migrate" subcommand?
> >
> > 4. It's noted that "If a plugin class is removed from the path, the
> > corresponding shim manifest should also be removed". Did you consider
> > adding behavior (possibly guarded behind a CLI flag) to find and remove
> > manifests for nonexistent plugins?
> >
> > 5. What's the concrete plan for updating the plugins that ship OOTB with
> > Connect? Will they be given a service loader manifest, a module info file,
> > or both?
> >
> > 6. In the HYBRID_WARN mode, what precedence will we give to plugins that
> > are found by both the legacy scanning logic and the new service
> > loader/module info loading logic?
> >
> > 7. It's a bit strange to immediately deprecate config property values and
> > the migration script in their first release. IMO as long as we have the
> > migration script in the project, it doesn't make sense to deprecate it, and
> > until we've had at least one successful release with the new loading logic
> > and some confidence in it, we should not deprecate the older logic. I think
> > a more reasonable pace for the update here could be to deprecate the legacy
> > scanning logic in 4.0 (changing the default accordingly) and remove it
> > entirely in 5.0.
> >
> > Cheers,
> >
> > Chris
> >
> > On Mon, Jan 23, 2023 at 3:21 AM Federico Valeri <fe...@gmail.com>
> > wrote:
> >
> > > Hi Greg, this looks like a useful change to me.
> > >
> > > I was just wondering if we should use a better script name like
> > > "connect-convert-to-service-provider.sh" or something like this, and
> > > maybe add a --dry-run option.
> > >
> > > On Tue, Jan 17, 2023 at 8:45 PM Greg Harris
> > > <gr...@aiven.io.invalid> wrote:
> > > >
> > > > Hi all!
> > > >
> > > > I'd like to start a discussion about
> > > >
> > >
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-898%3A+Modernize+Connect+plugin+discovery
> > > > .
> > > >
> > > > Thanks!
> > > > Greg Harris
> > >
> >

Re: [DISCUSS] KIP-898: Modernize Connect plugin discovery

Posted by Greg Harris <gr...@aiven.io.INVALID>.
Federico,

Thanks for taking a look!

> I was just wondering if we should use a better script name like
> "connect-convert-to-service-provider.sh" or something like this

I agree that the current name is not ideal, while the current name
describes part of what it is doing, it does not describe all of what it is
doing, or why that is important to the caller.
I looked around to see what style the other Kafka commands use, and it
seems like there's not a standard pattern. Some of the scripts are
noun-phrases, and some are verb-phrases.
Some take commands like `create`, `format`, etc, and some take options like
`--generate`, `--execute`, `--verify`.
I've updated the command to `bin/connect-plugin-path.sh
(list|add-manifests|remove-manifests) --worker-config <worker-config>`, let
me know if that's better or worse than `bin/connect-scan-plugin-path.sh`.

> maybe add a --dry-run option.

Thanks for the suggestion, I've updated the KIP to include a --dry-run
option with some typical semantics.

Chris,

I'm glad you liked the personas! Since this KIP requires others' help to
improve the ecosystem, I needed a way to make sure everyone knows what role
they have to play. I hope I've accomplished that.

> 1. Will workers' resiliency to packaging issues be affected when the new
> scanning behavior is enabled? For example, will a single poorly-packaged
> plugin prevent other correctly-packaged plugins from being loaded, or
> worse, cause the worker to fail? And either way, are there any details
that
> we can call out to back up any guarantees we want to provide on that
front?

The current behavior when encountering packaging issues seems to be to log
errors and continue, and that will certainly continue for the ONLY_SCAN and
HYBID_WARN modes.
As written, the HYBRID_FAIL mode breaks from this trend and will throw
exceptions that crash the worker only when the connectors are missing from
serviceloader discovery.
The behavior for SERVICE_LOAD is not currently defined in the KIP. I think
we can either keep the log-and-continue behavior whenever possible, or
begin to surface errors in a fail-fast model.
And if we can't choose between those two, perhaps we can add
SERVICE_LOAD_FAIL_FAST, or a plugin.path.discovery.fail.fast=true/false to
allow users to configure the worker to fail to start up when plugin.path
issues are present.

With regards to incorrectly packaged connectors affecting other correctly
packaged ones: I think one of the current behaviors of the Reflections
implementation is that certain exceptions can prevent discovery of other
unrelated plugins on the path.
I think that this may be able to be improved in all modes, and doesn't need
the ServiceLoader mechanism to resolve. I'll explore this some more and
pursue a fix outside of this KIP.
The new mechanism should be as good or better than the Reflections
mechanism in this regard. I am not sure if it is necessary to commit to the
error handling behavior as part of the public API.

> 2. I think the description of the "--plugin-location" flag may need some
> updating? In the middle of the list of bullet points there's "The value of
> this argument will be a single plugin, which can be any of the following:"
> but the following points don't appear to be related.

Thanks, I think that was a copy-paste error. I have updated the KIP.

> 3. (Nit) Could we rename the migration script something like
> "connect-migrate-plugin-path" or even just "connect-plugin-path" with a
> "migrate" subcommand?

Yes, you and Federico had the same idea :)
Please see my feedback above regarding naming.

> 4. It's noted that "If a plugin class is removed from the path, the
> corresponding shim manifest should also be removed". Did you consider
> adding behavior (possibly guarded behind a CLI flag) to find and remove
> manifests for nonexistent plugins?

When writing that originally, I thought that the script would only remove
manifests that it had generated, as filtered by some in-file comment or by
jar filename, which would be an implementation detail.
If we were to add a CLI flag to also enable/disable the removal behavior,
we could eliminate that implementation detail and make the removal an
all-or-nothing part of the script.

I updated the script to accept a `remove-manifests` subcommand which I
think captures your suggestion. Therefore, a full sync would include two
invocations: an `add-manifests` and `remove-manifests`.

> 5. What's the concrete plan for updating the plugins that ship OOTB with
> Connect? Will they be given a service loader manifest, a module info file,
> or both?

At a minimum, we need to add service loader manifests in this KIP to be
compliant with the new discovery model.
This is because Kafka 3.x supports Java 8, and that version of the
ServiceLoader can only read manifest files, and not module-info files.
Beyond that, I think it becomes a question of scope, and whether we should
add module-info files for all of the Connect components.
I was not planning on doing so within this KIP, and would prefer to leave
that for a follow-up KIP.

I have clarified this in the KIP.

> 6. In the HYBRID_WARN mode, what precedence will we give to plugins that
> are found by both the legacy scanning logic and the new service
> loader/module info loading logic?

In HYBRID_WARN, the control-flow should be the same as SCAN_ONLY, so the
same classes are discovered in the same order.
All of the classes that are found via the ServiceLoader mechanism should
only be used by the runtime to generate warnings.
I believe that by the nature of the Reflections library, all instances
which are discoverable by the ServiceLoader should be discoverable by
Reflections, and so we don't need to merge the two discovery methods or
handle precedence.

> 7. It's a bit strange to immediately deprecate config property values and
> the migration script in their first release. IMO as long as we have the
> migration script in the project, it doesn't make sense to deprecate it,
and
> until we've had at least one successful release with the new loading logic
> and some confidence in it, we should not deprecate the older logic. I
think
> a more reasonable pace for the update here could be to deprecate the
legacy
> scanning logic in 4.0 (changing the default accordingly) and remove it
> entirely in 5.0.

I understand that it seems a bit silly to introduce something as
deprecated, but I think that's because 'deprecated' is not the same as
'old', and not mutually exclusive with 'new'.
The way I understand it, marking something as deprecated and the process of
deprecation is about communicating the direction in which an API is
evolving, and which methods can be relied upon for the foreseeable future.
And if we agree that the old mechanism has some sunset date that we just
don't know yet, we should still communicate to users that the sunset date
is somewhere in the future.

I've updated the KIP to describe separate follow-on votes for changing the
default, and removing the scanning behavior, as that was my original intent
but I don't think that was made clear.
I also pushed back the deprecation of the migration sub-commands, since
we'd like to usher people towards the migration script instead of relying
on the HYBRID modes.
I kept the decision for deprecating the non-SERVICE_LOAD modes in the
current vote for now, but if introducing deprecated configurations is too
unpalatable we can move the deprecation to a later vote.

Thanks,
Greg

On Mon, Jan 23, 2023 at 12:49 PM Chris Egerton <ch...@aiven.io.invalid>
wrote:

> Hi Greg,
>
> Thanks for the KIP! This part of Connect has been due for an update for a
> while, it's nice to see it getting some attention. I especially like how
> the migration plan is broken down into affected personas; makes review
> easier and, hopefully, makes the feature easier to understand for affected
> users.
>
> Here are my initial thoughts:
>
> 1. Will workers' resiliency to packaging issues be affected when the new
> scanning behavior is enabled? For example, will a single poorly-packaged
> plugin prevent other correctly-packaged plugins from being loaded, or
> worse, cause the worker to fail? And either way, are there any details that
> we can call out to back up any guarantees we want to provide on that front?
>
> 2. I think the description of the "--plugin-location" flag may need some
> updating? In the middle of the list of bullet points there's "The value of
> this argument will be a single plugin, which can be any of the following:"
> but the following points don't appear to be related.
>
> 3. (Nit) Could we rename the migration script something like
> "connect-migrate-plugin-path" or even just "connect-plugin-path" with a
> "migrate" subcommand?
>
> 4. It's noted that "If a plugin class is removed from the path, the
> corresponding shim manifest should also be removed". Did you consider
> adding behavior (possibly guarded behind a CLI flag) to find and remove
> manifests for nonexistent plugins?
>
> 5. What's the concrete plan for updating the plugins that ship OOTB with
> Connect? Will they be given a service loader manifest, a module info file,
> or both?
>
> 6. In the HYBRID_WARN mode, what precedence will we give to plugins that
> are found by both the legacy scanning logic and the new service
> loader/module info loading logic?
>
> 7. It's a bit strange to immediately deprecate config property values and
> the migration script in their first release. IMO as long as we have the
> migration script in the project, it doesn't make sense to deprecate it, and
> until we've had at least one successful release with the new loading logic
> and some confidence in it, we should not deprecate the older logic. I think
> a more reasonable pace for the update here could be to deprecate the legacy
> scanning logic in 4.0 (changing the default accordingly) and remove it
> entirely in 5.0.
>
> Cheers,
>
> Chris
>
> On Mon, Jan 23, 2023 at 3:21 AM Federico Valeri <fe...@gmail.com>
> wrote:
>
> > Hi Greg, this looks like a useful change to me.
> >
> > I was just wondering if we should use a better script name like
> > "connect-convert-to-service-provider.sh" or something like this, and
> > maybe add a --dry-run option.
> >
> > On Tue, Jan 17, 2023 at 8:45 PM Greg Harris
> > <gr...@aiven.io.invalid> wrote:
> > >
> > > Hi all!
> > >
> > > I'd like to start a discussion about
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-898%3A+Modernize+Connect+plugin+discovery
> > > .
> > >
> > > Thanks!
> > > Greg Harris
> >
>

Re: [DISCUSS] KIP-898: Modernize Connect plugin discovery

Posted by Chris Egerton <ch...@aiven.io.INVALID>.
Hi Greg,

Thanks for the KIP! This part of Connect has been due for an update for a
while, it's nice to see it getting some attention. I especially like how
the migration plan is broken down into affected personas; makes review
easier and, hopefully, makes the feature easier to understand for affected
users.

Here are my initial thoughts:

1. Will workers' resiliency to packaging issues be affected when the new
scanning behavior is enabled? For example, will a single poorly-packaged
plugin prevent other correctly-packaged plugins from being loaded, or
worse, cause the worker to fail? And either way, are there any details that
we can call out to back up any guarantees we want to provide on that front?

2. I think the description of the "--plugin-location" flag may need some
updating? In the middle of the list of bullet points there's "The value of
this argument will be a single plugin, which can be any of the following:"
but the following points don't appear to be related.

3. (Nit) Could we rename the migration script something like
"connect-migrate-plugin-path" or even just "connect-plugin-path" with a
"migrate" subcommand?

4. It's noted that "If a plugin class is removed from the path, the
corresponding shim manifest should also be removed". Did you consider
adding behavior (possibly guarded behind a CLI flag) to find and remove
manifests for nonexistent plugins?

5. What's the concrete plan for updating the plugins that ship OOTB with
Connect? Will they be given a service loader manifest, a module info file,
or both?

6. In the HYBRID_WARN mode, what precedence will we give to plugins that
are found by both the legacy scanning logic and the new service
loader/module info loading logic?

7. It's a bit strange to immediately deprecate config property values and
the migration script in their first release. IMO as long as we have the
migration script in the project, it doesn't make sense to deprecate it, and
until we've had at least one successful release with the new loading logic
and some confidence in it, we should not deprecate the older logic. I think
a more reasonable pace for the update here could be to deprecate the legacy
scanning logic in 4.0 (changing the default accordingly) and remove it
entirely in 5.0.

Cheers,

Chris

On Mon, Jan 23, 2023 at 3:21 AM Federico Valeri <fe...@gmail.com>
wrote:

> Hi Greg, this looks like a useful change to me.
>
> I was just wondering if we should use a better script name like
> "connect-convert-to-service-provider.sh" or something like this, and
> maybe add a --dry-run option.
>
> On Tue, Jan 17, 2023 at 8:45 PM Greg Harris
> <gr...@aiven.io.invalid> wrote:
> >
> > Hi all!
> >
> > I'd like to start a discussion about
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-898%3A+Modernize+Connect+plugin+discovery
> > .
> >
> > Thanks!
> > Greg Harris
>

Re: [DISCUSS] KIP-898: Modernize Connect plugin discovery

Posted by Federico Valeri <fe...@gmail.com>.
Hi Greg, this looks like a useful change to me.

I was just wondering if we should use a better script name like
"connect-convert-to-service-provider.sh" or something like this, and
maybe add a --dry-run option.

On Tue, Jan 17, 2023 at 8:45 PM Greg Harris
<gr...@aiven.io.invalid> wrote:
>
> Hi all!
>
> I'd like to start a discussion about
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-898%3A+Modernize+Connect+plugin+discovery
> .
>
> Thanks!
> Greg Harris