You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@kafka.apache.org by Gwen Shapira <gw...@confluent.io> on 2017/05/01 13:43:26 UTC

Re: [DISCUSS] KIP-146: Classloading Isolation in Connect

Hi Konstantine,

Thank you so much for driving this! The connector classpath mess is driving
me nuts (or worse, driving me to use Docker).

I like the proposal for micro-benchmarks to test the context switching
overhead.

I have a difficult time figuring out the module.isolation.enabled.
Especially with a default to false. I can't think of a reason that anyone
will not want classpath isolation. "No! I want my connectors to mess up
each other's dependencies" said no one ever.

So it looks like this is mostly for upgrade purpose? Because the initial
upgrade will not have the module.path set and therefore classpath isolation
will simply not work by default?

In that case, why don't we simply use the existence of non-empty
module.path as an indicator of whether isolation should work or not? seem
simpler and intuitive to me.

Thanks!

Gwen

On Sat, Apr 29, 2017 at 9:16 AM, Konstantine Karantasis <
konstantine@confluent.io> wrote:

> * Because of KIP number collision, please disregard my previous KIP
> announcement and use this thread for discussion instead *
>
>
> Hi everyone,
>
> we aim to address dependency conflicts in Kafka Connect soon by applying
> class loading isolation.
>
> Feel free to take a look at KIP-146 here:
>
> *https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> 146+-+Classloading+Isolation+in+Connect
> <https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> 146+-+Classloading+Isolation+in+Connect>*
>
> which describes minimal required changes to public interfaces and the
> general implementation approach.
>
> This is a much wanted feature for Kafka Connect. Your feedback is highly
> appreciated.
>
> -Konstantine
>

-- 
*Gwen Shapira*
Product Manager | Confluent
650.450.2760 | @gwenshap
Follow us: Twitter <https://twitter.com/ConfluentInc> | blog
<http://www.confluent.io/blog>

Re: [DISCUSS] KIP-146: Classloading Isolation in Connect

Posted by Konstantine Karantasis <ko...@confluent.io>.

Thanks again Stephane. Your comments were pretty accurate.

A "future work" section was added to account for the suggested enhancements
that will follow the initial release of classloading isolation.
Specifically regarding versioning, you are right, the corresponding KIP
that will describe versioning will include the required extensions to the
configuration as well.

-Konstantine


On Wed, May 3, 2017 at 5:49 PM, Stephane Maarek <
stephane@simplemachines.com.au> wrote:

> Glad you find the feedback useful !
> Definitely all the ideas should be split in reasonable length KIPs. I just
> want to make sure the ideas are not lost. I won’t create the subsequent
> KIPs because I’m not good enough to implement the changes, but happy to
> keep on providing feedback alongside the way.
>
> Regarding the versioning comments: yes there’s a version of Connector, but
> how is that referenced in the config? I believe no config exposes a
> “version” field, which would tie a configuration to a connector version?
> Regarding shipping connect with a few connectors, that’s fine, but once a
> capability to pull from maven is here, I’d rather have a vanilla
> lightweight connect. Anyway, discussions for later.
>
>
>
> On 4/5/17, 4:17 am, "Konstantine Karantasis" <ko...@confluent.io>
> wrote:
>
>     Thank you Stephane,
>
>     your comments bring interesting and useful subjects to the discussion.
> I'm
>     adding my replies below Ewen's comments.
>
>
>     On Tue, May 2, 2017 at 10:15 PM, Ewen Cheslack-Postava <
> ewen@confluent.io>
>     wrote:
>
>     > On Tue, May 2, 2017 at 10:01 PM, Stephane Maarek <
>     > stephane@simplemachines.com.au> wrote:
>     >
>     > Excellent feedback, Stephane!
>     >
>     > Thanks for the work, it’s definitely needed!
>     > > I’d like to suggest to take it one step further.
>     > >
>     > > To me, I’d like to see Kafka Connect the same way we have Docker
> and
>     > > Docker repositories.
>     > >
>     > > Here’s how I would envision the flow:
>     > > - Kafka Connect workers are just workers. They come with no jars
>     > whatsoever
>     > > - The REST API allow you to add a config to the connect cluster
>     > > - The workers, seeing the config, pull the jars from the available
>     > > (maven?) repositories (public or private)
>     > >
>     >
>     > I think supporting this mode is really valuable. It seems *really*
>     > attractive if you have some easily accessible, scalable, centralized
>     > storage for connectors (i.e. some central distributed FS).
>     >
>     > But having to jump through these hoops (which presumably include
> some extra
>     > URLs to point to the connectors, some definition of what that URL
> points to
>     > e.g. zip of jars? uberjar? something with a more complex spec?) just
> to do
>     > a quickstart seems like quite a bit of overhead. I think we should
> consider
>     > both a) incremental progress that opens up new options but doesn't
> prevent
>     > us from the ideal end state and b) all local testing/dev/prod use
> cases
>     > (which is also why I still like having the plain old CLASSPATH option
>     > available).
>     >
>     > I think the proposal leaves open the scope for this -- it doesn't
> specify
>     > connector-specific overrides, but that's obviously something that
> could be
>     > added easily.
>     >
>     >
>     Regarding 1) the workers carrying no modules, I believe we are pretty
> close
>     to this situation today, at least in terms of connectors. Still, I feel
>     that whether we bundle a few basic modules with the framework is
> orthogonal
>     to class loading isolation. I agree with you that Connectors,
> Converters
>     and Transformations should be external modules that are loaded by the
>     framework. But having a few basic ones shipped with Connect simplifies
>     on-boarding and quickstarts significantly.
>
>     Regarding 2) and 3) these are very good to have, and definitely belong
> to
>     the near term vision for Kafka Connect. Many interesting things to do
> here,
>     from programmatically fetching connectors and their dependencies from
> maven
>     repos (using something like Aether maybe), to deciding to support
> certain
>     types of module bundling such as zip, uberjars etc. However dealing
> with
>     the issue of extended discoverability at this point seems to broaden
>     significantly the scope of this KIP. I think it's more practical (and
>     probably faster) to proceed in phases. I estimate that after this KIP,
>     subsequent KIPs will be intuitive and quite transparent.
>
>
>
>     > > - Classpath isolation comes into play so that the pulled jar
> doesn’t
>     > > interact with other connectors
>     > > - Additionally, I believe the config should have a “tag” or
> “version”
>     > > (like docker really), so that
>     > > o you can run different versions of the same connector on your
> connect
>     > > cluster
>     > > o configurations are strongly linked to a connector (right now if I
>     > update
>     > > my connector jars, I may break my configuration)
>     > >
>     >
>     > We have versions on Connectors. We don't have them on
> transformations or
>     > converters, which was definitely an oversight -- we should figure
> out how
>     > to get them in there.
>     >
>     > I think none of the proposals here rule out taking advantage of
> versioning
>     > in the future -- would it be helpful to add a "Future Work" section
> that
>     > gives an idea of how we'd extend this in the future to handle
> versions as
>     > well?
>     >
>     >
>     Dealing with versioning was left out of the scope of this KIP because
>     Transformations and Converters currently are not versioned. I like the
> idea
>     of adding a Future Work section. I'll add one to highlight the
> intention to
>     support versioning.
>
>
>     >
>     > > I know this is a bit out of scope, but if major changes are coming
> to
>     > > connect then these are my two cents.
>     > >
>     >
>     > :) I think we can address most of these incrementally, but keep the
> scope
>     > here a bit smaller and more manageable (and hopefully get it into
>     > 0.11.0.0!). I just don't want perfect to be the enemy of good --
> getting
>     > the first step in as long as it doesn't cause problems down the line
> seems
>     > like a good step.
>     >
>     >
>     > >
>     > > Finally, maybe extend that construct to Transformers. The ability
> to
>     > > externalise transformers as jars would democratize their usage IMO
>     > >
>     >
>     > This should definitely happen! It's important for pretty much any
> pluggable
>     > component, though I think Transformations will, on average, have the
> fewest
>     > extra dependencies. Connectors have the most, followed by
> Converters, then
>     > Transformations. But I think Konstantine has mentioned these in the
> KIP --
>     > if it could be clearer in the proposed changes, perhaps you could
> suggest
>     > where to add details?
>     >
>     > -Ewen
>     >
>
>
>     Thanks again for the comments, let me know if this plan sounds good.
>
>     -Konstantine
>
>
>
>     >
>     >
>     > >
>     > > On 3/5/17, 4:24 am, "Ewen Cheslack-Postava" <ew...@confluent.io>
> wrote:
>     > >
>     > >     Thanks for the KIP.
>     > >
>     > >     A few responses inline, followed by additional comments.
>     > >
>     > >     On Mon, May 1, 2017 at 9:50 PM, Konstantine Karantasis <
>     > >     konstantine@confluent.io> wrote:
>     > >
>     > >     > Gwen, Randall thank you for your very insightful
> observations. I'm
>     > > glad you
>     > >     > find this first draft to be an adequate platform for
> discussion.
>     > >     >
>     > >     > I'll attempt replying to your comments in order.
>     > >     >
>     > >     > Gwen, I also debated exactly the same two options: a)
> interpreting
>     > > absence
>     > >     > of module path as a user's intention to turn off isolation
> and b)
>     > >     > explicitly using an additional boolean property. A few
> reasons why
>     > I
>     > > went
>     > >     > with b) in this first draft are:
>     > >     > 1) As Randall mentions, to leave the option of using a
> default
>     > value
>     > > open.
>     > >     > If not immediately in the first version of isolation, maybe
> in the
>     > > future.
>     > >     > 2) I didn't like the implicit character of the choice of
>     > > interpreting an
>     > >     > empty string as a clear intention to turn isolation off by
> the
>     > user.
>     > > Half
>     > >     > the time could be just that users forget to set a location,
>     > although
>     > > they'd
>     > >     > like to use class loading isolation.
>     > >     > 3) There's a slim possibility that in rare occasions a user
> might
>     > > want to
>     > >     > avoid even the slightest increase in memory consumption due
> to
>     > class
>     > >     > loading duplication. I admit this should be very rare, but
> given
>     > the
>     > > other
>     > >     > concerns and that we would really like to keep the isolation
>     > > implementation
>     > >     > simple, the option to turn off this feature by using only one
>     > > additional
>     > >     > config property might not seem too excessive. At least at
> the start
>     > > of this
>     > >     > discussion.
>     > >     > 4) Debugging during development might be simpler in some
> cases.
>     > >     > 5) Finally, as you mention, this could allow for smoother
> upgrades.
>     > >     >
>     > >
>     > >     I'm not sure any of these keep you from removing the extra
> config. Is
>     > > there
>     > >     any reason you couldn't have clean support for relying on the
>     > CLASSPATH
>     > >     while still supporting the classloaders? Then getting people
> onto the
>     > > new
>     > >     classloaders does require documentation for how to install
>     > connectors,
>     > > but
>     > >     that's pretty minimal. And we don't break existing
> installations
>     > where
>     > >     people are just adding to the CLASSPATH. It seems like this:
>     > >
>     > >     1. Allows you to set a default. Isolation is always enabled,
> but we
>     > > won't
>     > >     include any paths/directories we already use. Setting a
> default just
>     > >     requires specifying a new location where we'd hold these
> directories.
>     > >     2. It doesn't require the implicit choice -- you actually
> never turn
>     > > off
>     > >     isolation, but still support the regular CLASSPATH with an
> empty list
>     > > of
>     > >     isolated loaders
>     > >     3. The user can still use CLASSPATH if they want to minimize
>     > > classloader
>     > >     overhead
>     > >     4. Debugging can still use CLASSPATH
>     > >     5. Upgrades just work.
>     > >
>     > >
>     > >     >
>     > >     > Randall, regarding your comments:
>     > >     > 1) To keep its focus narrow, this KIP, as well as the first
>     > > implementation
>     > >     > of isolation in Connect, assume filesystem based discovery.
> With
>     > > careful
>     > >     > implementation, transitioning to discovery schemes that
> support
>     > > broader
>     > >     > URIs I believe should be easy in the future.
>     > >     >
>     > >
>     > >     Maybe just mention a couple of quick examples in the KIP. When
>     > > described
>     > >     inline it might be more obvious that it will extend cleanly.
>     > >
>     > >
>     > >     > 2) The example you give makes a good point. However I'm
> inclined to
>     > > say
>     > >     > that such cases should be addressed more as exceptions
> rather than
>     > > as being
>     > >     > the common case. Therefore, I wouldn't see all dependencies
>     > imported
>     > > by the
>     > >     > framework as required to be filtered out, because in that
> case we
>     > > lose the
>     > >     > advantage of isolation between the framework and the
> connectors
>     > (and
>     > > we are
>     > >     > left only with isolation between connectors).
>     > >
>     > >     3) I tried to abstract implementation details in this the KIP,
> but
>     > you
>     > > are
>     > >     > right. Even though filtering here is mainly used semantically
>     > rather
>     > > than
>     > >     > literally, it gives an implementation hint that we could
> avoid.
>     > >     >
>     > >
>     > >     I think we're missing another option -- don't do filtering and
>     > require
>     > > that
>     > >     those dependencies are correctly filtered out of the modules.
> If we
>     > > want to
>     > >     be nicer about this, we could also detect maybe 2 or 3 classes
> while
>     > >     scanning for Connectors/Converters/Transformations that
> indicate the
>     > >     classloader has jars that it shouldn't and warn about it. I
> can't
>     > > think of
>     > >     that many that would be an issue -- basically connect-api,
>     > > connect-runtime
>     > >     if they really mess it up, and maybe slf4j.
>     > >
>     > >
>     > >     > 4) In the same spirit as in 3) I believe we should reserve
> enough
>     > >     > flexibility to the implementation to discover and load
> classes,
>     > when
>     > > they
>     > >     > appear in multiple locations under the general module
> location.
>     > >     >
>     > >     >
>     > >     And a couple of addition comments:
>     > >
>     > >     - module.path should be module.paths if it accepts multiple
> paths
>     > >     - I think the description of the module paths is a bit
> confusing
>     > > because I
>     > >     think for simpler configuration we actually want to specify the
>     > > *parent*
>     > >     directory of module paths. I definitely prefer this since it's
>     > simpler
>     > >     although I am not certain how it will mix with future
> extensions to
>     > > other
>     > >     URL handlers (where a zip file or http URL wouldn't do the same
>     > > discovery
>     > >     within subdirectories)
>     > >
>     > >     -Ewen
>     > >
>     > >
>     > >     > Thanks again! Let me know what you think.
>     > >     > Konstantine
>     > >     >
>     > >     >
>     > >     > On Mon, May 1, 2017 at 10:12 AM, Randall Hauch <
> rhauch@gmail.com>
>     > > wrote:
>     > >     >
>     > >     > > Very nice work, Konstantine. Conflicting dependencies of
>     > > connectors is
>     > >     > > indeed a big issue that makes it hard to manage installed
>     > > connectors.
>     > >     > >
>     > >     > > I do like Gwen's idea about removing the
>     > 'module.isolation.enabled'
>     > >     > > property. However, I would have anticipated always using
>     > classpath
>     > >     > > isolation for *only* those components registered under the
> module
>     > > path
>     > >     > and
>     > >     > > not really for anything else already on the normal
> classpath. So,
>     > > people
>     > >     > > could continue to place custom connector JARs onto the
> classpath,
>     > > though
>     > >     > > this would become deprecated in favor of installing custom
>     > > connector
>     > >     > JARs /
>     > >     > > modules via the module path. This keeps configuration
> simple,
>     > gives
>     > >     > people
>     > >     > > time to migrate, but let's people that need classpath
> isolation
>     > > get it to
>     > >     > > install a variety of connectors each with their
> dependencies that
>     > >     > > potentially conflict with other components.
>     > >     > >
>     > >     > > The challenge is whether there should be a default for
>     > > 'module.path'.
>     > >     > > Ideally there would be so that users know where they can
> install
>     > > their
>     > >     > > connectors. However, I suspect that this might be
> difficult to do
>     > > unless
>     > >     > it
>     > >     > > can make use of system properties such as "${kafka.home}"
> so that
>     > >     > relative
>     > >     > > directories can be specified.
>     > >     > >
>     > >     > > A few other questions/comments:
>     > >     > >
>     > >     > > 1) Does the KIP have to specify how are components /
> modules
>     > > installed,
>     > >     > > discovered, or recognized by Kafka Connect? Or perhaps the
> KIP
>     > > needs to
>     > >     > > just specify the semantics of the file system module path
> (e.g.,
>     > > the
>     > >     > > directories below those specified in the module path are
> to be
>     > > unique and
>     > >     > > identify an installed component).
>     > >     > >
>     > >     > > 2) Will the module classloader filtering also have to
> exclude
>     > Kafka
>     > >     > Connect
>     > >     > > dependencies? The only one that I can think of is the
> SLF4J API,
>     > > which
>     > >     > > can't be loaded from the module's classloader if the
> connector is
>     > > to send
>     > >     > > its log messages to the same logging system.
>     > >     > >
>     > >     > > 3) Rather than specify filtering, would be it a bit more
> flexible
>     > > to
>     > >     > simply
>     > >     > > say that the implementation will need to ensure that Java,
> Kafka
>     > > Connect,
>     > >     > > and other third party APIs (e.g., SLF4J API) will not be
> loaded
>     > > from the
>     > >     > > module classloaders? It'd be better to avoid specifying
> how it
>     > > will be
>     > >     > > done, just in case the implementation needs to evolve or
> use a
>     > > different
>     > >     > > technique (e.g., load the Java and public Kafka Connect
> APIs via
>     > > one
>     > >     > > classloader that is reused and that always appears before
> the
>     > > module
>     > >     > > classloader, while Kafka Connect implementation JARs
> appear after
>     > > the
>     > >     > > component's classloader.
>     > >     > >
>     > >     > > 4) Perhaps to address #2 and #3 above, perhaps the KIP
> could
>     > > explicitly
>     > >     > > specify the classloader order for a deployed connector. For
>     > > example,
>     > >     > > 'java', 'kafka-connect-apis', 'connector-module',
> 'smt-module-1',
>     > > ...,
>     > >     > > 'kafka-connect-impls', where 'connector-module' is the
>     > classloader
>     > > for
>     > >     > the
>     > >     > > (first) module where the connector is found,
> 'smt-module-1' is
>     > the
>     > >     > > classloader for the (first) module where the first SMT
> class is
>     > > found (if
>     > >     > > specified and found in a separate module), 'smt-module-2'
> is the
>     > >     > > classloader .... Might also need to say that the KIP does
> not
>     > > specify how
>     > >     > > the implementation will pick the module if a specified
> class if
>     > > found in
>     > >     > > more than one module.
>     > >     > >
>     > >     > > Thoughts?
>     > >     > >
>     > >     > > Randall
>     > >     > >
>     > >     > > On Mon, May 1, 2017 at 6:43 AM, Gwen Shapira <
> gwen@confluent.io>
>     > > wrote:
>     > >     > >
>     > >     > > > Hi Konstantine,
>     > >     > > >
>     > >     > > > Thank you so much for driving this! The connector
> classpath
>     > mess
>     > > is
>     > >     > > driving
>     > >     > > > me nuts (or worse, driving me to use Docker).
>     > >     > > >
>     > >     > > > I like the proposal for micro-benchmarks to test the
> context
>     > > switching
>     > >     > > > overhead.
>     > >     > > >
>     > >     > > > I have a difficult time figuring out the
>     > > module.isolation.enabled.
>     > >     > > > Especially with a default to false. I can't think of a
> reason
>     > > that
>     > >     > anyone
>     > >     > > > will not want classpath isolation. "No! I want my
> connectors to
>     > > mess up
>     > >     > > > each other's dependencies" said no one ever.
>     > >     > > >
>     > >     > > > So it looks like this is mostly for upgrade purpose?
> Because
>     > the
>     > >     > initial
>     > >     > > > upgrade will not have the module.path set and therefore
>     > classpath
>     > >     > > isolation
>     > >     > > > will simply not work by default?
>     > >     > > >
>     > >     > > > In that case, why don't we simply use the existence of
>     > non-empty
>     > >     > > > module.path as an indicator of whether isolation should
> work or
>     > > not?
>     > >     > seem
>     > >     > > > simpler and intuitive to me.
>     > >     > > >
>     > >     > > > Thanks!
>     > >     > > >
>     > >     > > > Gwen
>     > >     > > >
>     > >     > > >
>     > >     > > >
>     > >     > > >
>     > >     > > >
>     > >     > > > On Sat, Apr 29, 2017 at 9:16 AM, Konstantine Karantasis <
>     > >     > > > konstantine@confluent.io> wrote:
>     > >     > > >
>     > >     > > > > * Because of KIP number collision, please disregard my
>     > > previous KIP
>     > >     > > > > announcement and use this thread for discussion
> instead *
>     > >     > > > >
>     > >     > > > >
>     > >     > > > > Hi everyone,
>     > >     > > > >
>     > >     > > > > we aim to address dependency conflicts in Kafka
> Connect soon
>     > by
>     > >     > > applying
>     > >     > > > > class loading isolation.
>     > >     > > > >
>     > >     > > > > Feel free to take a look at KIP-146 here:
>     > >     > > > >
>     > >     > > > > *https://cwiki.apache.org/
> confluence/display/KAFKA/KIP-
>     > >     > > > > 146+-+Classloading+Isolation+in+Connect
>     > >     > > > > <https://cwiki.apache.org/
> confluence/display/KAFKA/KIP-
>     > >     > > > > 146+-+Classloading+Isolation+in+Connect>*
>     > >     > > > >
>     > >     > > > > which describes minimal required changes to public
> interfaces
>     > > and the
>     > >     > > > > general implementation approach.
>     > >     > > > >
>     > >     > > > > This is a much wanted feature for Kafka Connect. Your
>     > feedback
>     > > is
>     > >     > > highly
>     > >     > > > > appreciated.
>     > >     > > > >
>     > >     > > > > -Konstantine
>     > >     > > > >
>     > >     > > >
>     > >     > > >
>     > >     > > >
>     > >     > > > --
>     > >     > > > *Gwen Shapira*
>     > >     > > > Product Manager | Confluent
>     > >     > > > 650.450.2760 | @gwenshap
>     > >     > > > Follow us: Twitter <https://twitter.com/ConfluentInc> |
> blog
>     > >     > > > <http://www.confluent.io/blog>
>     > >     > > >
>     > >     > >
>     > >     >
>     > >
>     > >
>     > >
>     > >
>     >
>
>
>
>

Re: [DISCUSS] KIP-146: Classloading Isolation in Connect

Posted by Stephane Maarek <st...@simplemachines.com.au>.

Glad you find the feedback useful !
Definitely all the ideas should be split in reasonable length KIPs. I just want to make sure the ideas are not lost. I won’t create the subsequent KIPs because I’m not good enough to implement the changes, but happy to keep on providing feedback alongside the way. 

Regarding the versioning comments: yes there’s a version of Connector, but how is that referenced in the config? I believe no config exposes a “version” field, which would tie a configuration to a connector version?
Regarding shipping connect with a few connectors, that’s fine, but once a capability to pull from maven is here, I’d rather have a vanilla lightweight connect. Anyway, discussions for later. 
 
 

On 4/5/17, 4:17 am, "Konstantine Karantasis" <ko...@confluent.io> wrote:

    Thank you Stephane,
    
    your comments bring interesting and useful subjects to the discussion. I'm
    adding my replies below Ewen's comments.
    
    
    On Tue, May 2, 2017 at 10:15 PM, Ewen Cheslack-Postava <ew...@confluent.io>
    wrote:
    
    > On Tue, May 2, 2017 at 10:01 PM, Stephane Maarek <
    > stephane@simplemachines.com.au> wrote:
    >
    > Excellent feedback, Stephane!
    >
    > Thanks for the work, it’s definitely needed!
    > > I’d like to suggest to take it one step further.
    > >
    > > To me, I’d like to see Kafka Connect the same way we have Docker and
    > > Docker repositories.
    > >
    > > Here’s how I would envision the flow:
    > > - Kafka Connect workers are just workers. They come with no jars
    > whatsoever
    > > - The REST API allow you to add a config to the connect cluster
    > > - The workers, seeing the config, pull the jars from the available
    > > (maven?) repositories (public or private)
    > >
    >
    > I think supporting this mode is really valuable. It seems *really*
    > attractive if you have some easily accessible, scalable, centralized
    > storage for connectors (i.e. some central distributed FS).
    >
    > But having to jump through these hoops (which presumably include some extra
    > URLs to point to the connectors, some definition of what that URL points to
    > e.g. zip of jars? uberjar? something with a more complex spec?) just to do
    > a quickstart seems like quite a bit of overhead. I think we should consider
    > both a) incremental progress that opens up new options but doesn't prevent
    > us from the ideal end state and b) all local testing/dev/prod use cases
    > (which is also why I still like having the plain old CLASSPATH option
    > available).
    >
    > I think the proposal leaves open the scope for this -- it doesn't specify
    > connector-specific overrides, but that's obviously something that could be
    > added easily.
    >
    >
    Regarding 1) the workers carrying no modules, I believe we are pretty close
    to this situation today, at least in terms of connectors. Still, I feel
    that whether we bundle a few basic modules with the framework is orthogonal
    to class loading isolation. I agree with you that Connectors, Converters
    and Transformations should be external modules that are loaded by the
    framework. But having a few basic ones shipped with Connect simplifies
    on-boarding and quickstarts significantly.
    
    Regarding 2) and 3) these are very good to have, and definitely belong to
    the near term vision for Kafka Connect. Many interesting things to do here,
    from programmatically fetching connectors and their dependencies from maven
    repos (using something like Aether maybe), to deciding to support certain
    types of module bundling such as zip, uberjars etc. However dealing with
    the issue of extended discoverability at this point seems to broaden
    significantly the scope of this KIP. I think it's more practical (and
    probably faster) to proceed in phases. I estimate that after this KIP,
    subsequent KIPs will be intuitive and quite transparent.
    
    
    
    > > - Classpath isolation comes into play so that the pulled jar doesn’t
    > > interact with other connectors
    > > - Additionally, I believe the config should have a “tag” or “version”
    > > (like docker really), so that
    > > o you can run different versions of the same connector on your connect
    > > cluster
    > > o configurations are strongly linked to a connector (right now if I
    > update
    > > my connector jars, I may break my configuration)
    > >
    >
    > We have versions on Connectors. We don't have them on transformations or
    > converters, which was definitely an oversight -- we should figure out how
    > to get them in there.
    >
    > I think none of the proposals here rule out taking advantage of versioning
    > in the future -- would it be helpful to add a "Future Work" section that
    > gives an idea of how we'd extend this in the future to handle versions as
    > well?
    >
    >
    Dealing with versioning was left out of the scope of this KIP because
    Transformations and Converters currently are not versioned. I like the idea
    of adding a Future Work section. I'll add one to highlight the intention to
    support versioning.
    
    
    >
    > > I know this is a bit out of scope, but if major changes are coming to
    > > connect then these are my two cents.
    > >
    >
    > :) I think we can address most of these incrementally, but keep the scope
    > here a bit smaller and more manageable (and hopefully get it into
    > 0.11.0.0!). I just don't want perfect to be the enemy of good -- getting
    > the first step in as long as it doesn't cause problems down the line seems
    > like a good step.
    >
    >
    > >
    > > Finally, maybe extend that construct to Transformers. The ability to
    > > externalise transformers as jars would democratize their usage IMO
    > >
    >
    > This should definitely happen! It's important for pretty much any pluggable
    > component, though I think Transformations will, on average, have the fewest
    > extra dependencies. Connectors have the most, followed by Converters, then
    > Transformations. But I think Konstantine has mentioned these in the KIP --
    > if it could be clearer in the proposed changes, perhaps you could suggest
    > where to add details?
    >
    > -Ewen
    >
    
    
    Thanks again for the comments, let me know if this plan sounds good.
    
    -Konstantine
    
    
    
    >
    >
    > >
    > > On 3/5/17, 4:24 am, "Ewen Cheslack-Postava" <ew...@confluent.io> wrote:
    > >
    > >     Thanks for the KIP.
    > >
    > >     A few responses inline, followed by additional comments.
    > >
    > >     On Mon, May 1, 2017 at 9:50 PM, Konstantine Karantasis <
    > >     konstantine@confluent.io> wrote:
    > >
    > >     > Gwen, Randall thank you for your very insightful observations. I'm
    > > glad you
    > >     > find this first draft to be an adequate platform for discussion.
    > >     >
    > >     > I'll attempt replying to your comments in order.
    > >     >
    > >     > Gwen, I also debated exactly the same two options: a) interpreting
    > > absence
    > >     > of module path as a user's intention to turn off isolation and b)
    > >     > explicitly using an additional boolean property. A few reasons why
    > I
    > > went
    > >     > with b) in this first draft are:
    > >     > 1) As Randall mentions, to leave the option of using a default
    > value
    > > open.
    > >     > If not immediately in the first version of isolation, maybe in the
    > > future.
    > >     > 2) I didn't like the implicit character of the choice of
    > > interpreting an
    > >     > empty string as a clear intention to turn isolation off by the
    > user.
    > > Half
    > >     > the time could be just that users forget to set a location,
    > although
    > > they'd
    > >     > like to use class loading isolation.
    > >     > 3) There's a slim possibility that in rare occasions a user might
    > > want to
    > >     > avoid even the slightest increase in memory consumption due to
    > class
    > >     > loading duplication. I admit this should be very rare, but given
    > the
    > > other
    > >     > concerns and that we would really like to keep the isolation
    > > implementation
    > >     > simple, the option to turn off this feature by using only one
    > > additional
    > >     > config property might not seem too excessive. At least at the start
    > > of this
    > >     > discussion.
    > >     > 4) Debugging during development might be simpler in some cases.
    > >     > 5) Finally, as you mention, this could allow for smoother upgrades.
    > >     >
    > >
    > >     I'm not sure any of these keep you from removing the extra config. Is
    > > there
    > >     any reason you couldn't have clean support for relying on the
    > CLASSPATH
    > >     while still supporting the classloaders? Then getting people onto the
    > > new
    > >     classloaders does require documentation for how to install
    > connectors,
    > > but
    > >     that's pretty minimal. And we don't break existing installations
    > where
    > >     people are just adding to the CLASSPATH. It seems like this:
    > >
    > >     1. Allows you to set a default. Isolation is always enabled, but we
    > > won't
    > >     include any paths/directories we already use. Setting a default just
    > >     requires specifying a new location where we'd hold these directories.
    > >     2. It doesn't require the implicit choice -- you actually never turn
    > > off
    > >     isolation, but still support the regular CLASSPATH with an empty list
    > > of
    > >     isolated loaders
    > >     3. The user can still use CLASSPATH if they want to minimize
    > > classloader
    > >     overhead
    > >     4. Debugging can still use CLASSPATH
    > >     5. Upgrades just work.
    > >
    > >
    > >     >
    > >     > Randall, regarding your comments:
    > >     > 1) To keep its focus narrow, this KIP, as well as the first
    > > implementation
    > >     > of isolation in Connect, assume filesystem based discovery. With
    > > careful
    > >     > implementation, transitioning to discovery schemes that support
    > > broader
    > >     > URIs I believe should be easy in the future.
    > >     >
    > >
    > >     Maybe just mention a couple of quick examples in the KIP. When
    > > described
    > >     inline it might be more obvious that it will extend cleanly.
    > >
    > >
    > >     > 2) The example you give makes a good point. However I'm inclined to
    > > say
    > >     > that such cases should be addressed more as exceptions rather than
    > > as being
    > >     > the common case. Therefore, I wouldn't see all dependencies
    > imported
    > > by the
    > >     > framework as required to be filtered out, because in that case we
    > > lose the
    > >     > advantage of isolation between the framework and the connectors
    > (and
    > > we are
    > >     > left only with isolation between connectors).
    > >
    > >     3) I tried to abstract implementation details in this the KIP, but
    > you
    > > are
    > >     > right. Even though filtering here is mainly used semantically
    > rather
    > > than
    > >     > literally, it gives an implementation hint that we could avoid.
    > >     >
    > >
    > >     I think we're missing another option -- don't do filtering and
    > require
    > > that
    > >     those dependencies are correctly filtered out of the modules. If we
    > > want to
    > >     be nicer about this, we could also detect maybe 2 or 3 classes while
    > >     scanning for Connectors/Converters/Transformations that indicate the
    > >     classloader has jars that it shouldn't and warn about it. I can't
    > > think of
    > >     that many that would be an issue -- basically connect-api,
    > > connect-runtime
    > >     if they really mess it up, and maybe slf4j.
    > >
    > >
    > >     > 4) In the same spirit as in 3) I believe we should reserve enough
    > >     > flexibility to the implementation to discover and load classes,
    > when
    > > they
    > >     > appear in multiple locations under the general module location.
    > >     >
    > >     >
    > >     And a couple of addition comments:
    > >
    > >     - module.path should be module.paths if it accepts multiple paths
    > >     - I think the description of the module paths is a bit confusing
    > > because I
    > >     think for simpler configuration we actually want to specify the
    > > *parent*
    > >     directory of module paths. I definitely prefer this since it's
    > simpler
    > >     although I am not certain how it will mix with future extensions to
    > > other
    > >     URL handlers (where a zip file or http URL wouldn't do the same
    > > discovery
    > >     within subdirectories)
    > >
    > >     -Ewen
    > >
    > >
    > >     > Thanks again! Let me know what you think.
    > >     > Konstantine
    > >     >
    > >     >
    > >     > On Mon, May 1, 2017 at 10:12 AM, Randall Hauch <rh...@gmail.com>
    > > wrote:
    > >     >
    > >     > > Very nice work, Konstantine. Conflicting dependencies of
    > > connectors is
    > >     > > indeed a big issue that makes it hard to manage installed
    > > connectors.
    > >     > >
    > >     > > I do like Gwen's idea about removing the
    > 'module.isolation.enabled'
    > >     > > property. However, I would have anticipated always using
    > classpath
    > >     > > isolation for *only* those components registered under the module
    > > path
    > >     > and
    > >     > > not really for anything else already on the normal classpath. So,
    > > people
    > >     > > could continue to place custom connector JARs onto the classpath,
    > > though
    > >     > > this would become deprecated in favor of installing custom
    > > connector
    > >     > JARs /
    > >     > > modules via the module path. This keeps configuration simple,
    > gives
    > >     > people
    > >     > > time to migrate, but let's people that need classpath isolation
    > > get it to
    > >     > > install a variety of connectors each with their dependencies that
    > >     > > potentially conflict with other components.
    > >     > >
    > >     > > The challenge is whether there should be a default for
    > > 'module.path'.
    > >     > > Ideally there would be so that users know where they can install
    > > their
    > >     > > connectors. However, I suspect that this might be difficult to do
    > > unless
    > >     > it
    > >     > > can make use of system properties such as "${kafka.home}" so that
    > >     > relative
    > >     > > directories can be specified.
    > >     > >
    > >     > > A few other questions/comments:
    > >     > >
    > >     > > 1) Does the KIP have to specify how are components / modules
    > > installed,
    > >     > > discovered, or recognized by Kafka Connect? Or perhaps the KIP
    > > needs to
    > >     > > just specify the semantics of the file system module path (e.g.,
    > > the
    > >     > > directories below those specified in the module path are to be
    > > unique and
    > >     > > identify an installed component).
    > >     > >
    > >     > > 2) Will the module classloader filtering also have to exclude
    > Kafka
    > >     > Connect
    > >     > > dependencies? The only one that I can think of is the SLF4J API,
    > > which
    > >     > > can't be loaded from the module's classloader if the connector is
    > > to send
    > >     > > its log messages to the same logging system.
    > >     > >
    > >     > > 3) Rather than specify filtering, would be it a bit more flexible
    > > to
    > >     > simply
    > >     > > say that the implementation will need to ensure that Java, Kafka
    > > Connect,
    > >     > > and other third party APIs (e.g., SLF4J API) will not be loaded
    > > from the
    > >     > > module classloaders? It'd be better to avoid specifying how it
    > > will be
    > >     > > done, just in case the implementation needs to evolve or use a
    > > different
    > >     > > technique (e.g., load the Java and public Kafka Connect APIs via
    > > one
    > >     > > classloader that is reused and that always appears before the
    > > module
    > >     > > classloader, while Kafka Connect implementation JARs appear after
    > > the
    > >     > > component's classloader.
    > >     > >
    > >     > > 4) Perhaps to address #2 and #3 above, perhaps the KIP could
    > > explicitly
    > >     > > specify the classloader order for a deployed connector. For
    > > example,
    > >     > > 'java', 'kafka-connect-apis', 'connector-module', 'smt-module-1',
    > > ...,
    > >     > > 'kafka-connect-impls', where 'connector-module' is the
    > classloader
    > > for
    > >     > the
    > >     > > (first) module where the connector is found, 'smt-module-1' is
    > the
    > >     > > classloader for the (first) module where the first SMT class is
    > > found (if
    > >     > > specified and found in a separate module), 'smt-module-2' is the
    > >     > > classloader .... Might also need to say that the KIP does not
    > > specify how
    > >     > > the implementation will pick the module if a specified class if
    > > found in
    > >     > > more than one module.
    > >     > >
    > >     > > Thoughts?
    > >     > >
    > >     > > Randall
    > >     > >
    > >     > > On Mon, May 1, 2017 at 6:43 AM, Gwen Shapira <gw...@confluent.io>
    > > wrote:
    > >     > >
    > >     > > > Hi Konstantine,
    > >     > > >
    > >     > > > Thank you so much for driving this! The connector classpath
    > mess
    > > is
    > >     > > driving
    > >     > > > me nuts (or worse, driving me to use Docker).
    > >     > > >
    > >     > > > I like the proposal for micro-benchmarks to test the context
    > > switching
    > >     > > > overhead.
    > >     > > >
    > >     > > > I have a difficult time figuring out the
    > > module.isolation.enabled.
    > >     > > > Especially with a default to false. I can't think of a reason
    > > that
    > >     > anyone
    > >     > > > will not want classpath isolation. "No! I want my connectors to
    > > mess up
    > >     > > > each other's dependencies" said no one ever.
    > >     > > >
    > >     > > > So it looks like this is mostly for upgrade purpose? Because
    > the
    > >     > initial
    > >     > > > upgrade will not have the module.path set and therefore
    > classpath
    > >     > > isolation
    > >     > > > will simply not work by default?
    > >     > > >
    > >     > > > In that case, why don't we simply use the existence of
    > non-empty
    > >     > > > module.path as an indicator of whether isolation should work or
    > > not?
    > >     > seem
    > >     > > > simpler and intuitive to me.
    > >     > > >
    > >     > > > Thanks!
    > >     > > >
    > >     > > > Gwen
    > >     > > >
    > >     > > >
    > >     > > >
    > >     > > >
    > >     > > >
    > >     > > > On Sat, Apr 29, 2017 at 9:16 AM, Konstantine Karantasis <
    > >     > > > konstantine@confluent.io> wrote:
    > >     > > >
    > >     > > > > * Because of KIP number collision, please disregard my
    > > previous KIP
    > >     > > > > announcement and use this thread for discussion instead *
    > >     > > > >
    > >     > > > >
    > >     > > > > Hi everyone,
    > >     > > > >
    > >     > > > > we aim to address dependency conflicts in Kafka Connect soon
    > by
    > >     > > applying
    > >     > > > > class loading isolation.
    > >     > > > >
    > >     > > > > Feel free to take a look at KIP-146 here:
    > >     > > > >
    > >     > > > > *https://cwiki.apache.org/confluence/display/KAFKA/KIP-
    > >     > > > > 146+-+Classloading+Isolation+in+Connect
    > >     > > > > <https://cwiki.apache.org/confluence/display/KAFKA/KIP-
    > >     > > > > 146+-+Classloading+Isolation+in+Connect>*
    > >     > > > >
    > >     > > > > which describes minimal required changes to public interfaces
    > > and the
    > >     > > > > general implementation approach.
    > >     > > > >
    > >     > > > > This is a much wanted feature for Kafka Connect. Your
    > feedback
    > > is
    > >     > > highly
    > >     > > > > appreciated.
    > >     > > > >
    > >     > > > > -Konstantine
    > >     > > > >
    > >     > > >
    > >     > > >
    > >     > > >
    > >     > > > --
    > >     > > > *Gwen Shapira*
    > >     > > > Product Manager | Confluent
    > >     > > > 650.450.2760 | @gwenshap
    > >     > > > Follow us: Twitter <https://twitter.com/ConfluentInc> | blog
    > >     > > > <http://www.confluent.io/blog>
    > >     > > >
    > >     > >
    > >     >
    > >
    > >
    > >
    > >
    >

Re: [DISCUSS] KIP-146: Classloading Isolation in Connect

Posted by Konstantine Karantasis <ko...@confluent.io>.

Thank you Stephane,

your comments bring interesting and useful subjects to the discussion. I'm
adding my replies below Ewen's comments.


On Tue, May 2, 2017 at 10:15 PM, Ewen Cheslack-Postava <ew...@confluent.io>
wrote:

> On Tue, May 2, 2017 at 10:01 PM, Stephane Maarek <
> stephane@simplemachines.com.au> wrote:
>
> Excellent feedback, Stephane!
>
> Thanks for the work, it’s definitely needed!
> > I’d like to suggest to take it one step further.
> >
> > To me, I’d like to see Kafka Connect the same way we have Docker and
> > Docker repositories.
> >
> > Here’s how I would envision the flow:
> > - Kafka Connect workers are just workers. They come with no jars
> whatsoever
> > - The REST API allow you to add a config to the connect cluster
> > - The workers, seeing the config, pull the jars from the available
> > (maven?) repositories (public or private)
> >
>
> I think supporting this mode is really valuable. It seems *really*
> attractive if you have some easily accessible, scalable, centralized
> storage for connectors (i.e. some central distributed FS).
>
> But having to jump through these hoops (which presumably include some extra
> URLs to point to the connectors, some definition of what that URL points to
> e.g. zip of jars? uberjar? something with a more complex spec?) just to do
> a quickstart seems like quite a bit of overhead. I think we should consider
> both a) incremental progress that opens up new options but doesn't prevent
> us from the ideal end state and b) all local testing/dev/prod use cases
> (which is also why I still like having the plain old CLASSPATH option
> available).
>
> I think the proposal leaves open the scope for this -- it doesn't specify
> connector-specific overrides, but that's obviously something that could be
> added easily.
>
>
Regarding 1) the workers carrying no modules, I believe we are pretty close
to this situation today, at least in terms of connectors. Still, I feel
that whether we bundle a few basic modules with the framework is orthogonal
to class loading isolation. I agree with you that Connectors, Converters
and Transformations should be external modules that are loaded by the
framework. But having a few basic ones shipped with Connect simplifies
on-boarding and quickstarts significantly.

Regarding 2) and 3) these are very good to have, and definitely belong to
the near term vision for Kafka Connect. Many interesting things to do here,
from programmatically fetching connectors and their dependencies from maven
repos (using something like Aether maybe), to deciding to support certain
types of module bundling such as zip, uberjars etc. However dealing with
the issue of extended discoverability at this point seems to broaden
significantly the scope of this KIP. I think it's more practical (and
probably faster) to proceed in phases. I estimate that after this KIP,
subsequent KIPs will be intuitive and quite transparent.



> > - Classpath isolation comes into play so that the pulled jar doesn’t
> > interact with other connectors
> > - Additionally, I believe the config should have a “tag” or “version”
> > (like docker really), so that
> > o you can run different versions of the same connector on your connect
> > cluster
> > o configurations are strongly linked to a connector (right now if I
> update
> > my connector jars, I may break my configuration)
> >
>
> We have versions on Connectors. We don't have them on transformations or
> converters, which was definitely an oversight -- we should figure out how
> to get them in there.
>
> I think none of the proposals here rule out taking advantage of versioning
> in the future -- would it be helpful to add a "Future Work" section that
> gives an idea of how we'd extend this in the future to handle versions as
> well?
>
>
Dealing with versioning was left out of the scope of this KIP because
Transformations and Converters currently are not versioned. I like the idea
of adding a Future Work section. I'll add one to highlight the intention to
support versioning.


>
> > I know this is a bit out of scope, but if major changes are coming to
> > connect then these are my two cents.
> >
>
> :) I think we can address most of these incrementally, but keep the scope
> here a bit smaller and more manageable (and hopefully get it into
> 0.11.0.0!). I just don't want perfect to be the enemy of good -- getting
> the first step in as long as it doesn't cause problems down the line seems
> like a good step.
>
>
> >
> > Finally, maybe extend that construct to Transformers. The ability to
> > externalise transformers as jars would democratize their usage IMO
> >
>
> This should definitely happen! It's important for pretty much any pluggable
> component, though I think Transformations will, on average, have the fewest
> extra dependencies. Connectors have the most, followed by Converters, then
> Transformations. But I think Konstantine has mentioned these in the KIP --
> if it could be clearer in the proposed changes, perhaps you could suggest
> where to add details?
>
> -Ewen
>


Thanks again for the comments, let me know if this plan sounds good.

-Konstantine



>
>
> >
> > On 3/5/17, 4:24 am, "Ewen Cheslack-Postava" <ew...@confluent.io> wrote:
> >
> >     Thanks for the KIP.
> >
> >     A few responses inline, followed by additional comments.
> >
> >     On Mon, May 1, 2017 at 9:50 PM, Konstantine Karantasis <
> >     konstantine@confluent.io> wrote:
> >
> >     > Gwen, Randall thank you for your very insightful observations. I'm
> > glad you
> >     > find this first draft to be an adequate platform for discussion.
> >     >
> >     > I'll attempt replying to your comments in order.
> >     >
> >     > Gwen, I also debated exactly the same two options: a) interpreting
> > absence
> >     > of module path as a user's intention to turn off isolation and b)
> >     > explicitly using an additional boolean property. A few reasons why
> I
> > went
> >     > with b) in this first draft are:
> >     > 1) As Randall mentions, to leave the option of using a default
> value
> > open.
> >     > If not immediately in the first version of isolation, maybe in the
> > future.
> >     > 2) I didn't like the implicit character of the choice of
> > interpreting an
> >     > empty string as a clear intention to turn isolation off by the
> user.
> > Half
> >     > the time could be just that users forget to set a location,
> although
> > they'd
> >     > like to use class loading isolation.
> >     > 3) There's a slim possibility that in rare occasions a user might
> > want to
> >     > avoid even the slightest increase in memory consumption due to
> class
> >     > loading duplication. I admit this should be very rare, but given
> the
> > other
> >     > concerns and that we would really like to keep the isolation
> > implementation
> >     > simple, the option to turn off this feature by using only one
> > additional
> >     > config property might not seem too excessive. At least at the start
> > of this
> >     > discussion.
> >     > 4) Debugging during development might be simpler in some cases.
> >     > 5) Finally, as you mention, this could allow for smoother upgrades.
> >     >
> >
> >     I'm not sure any of these keep you from removing the extra config. Is
> > there
> >     any reason you couldn't have clean support for relying on the
> CLASSPATH
> >     while still supporting the classloaders? Then getting people onto the
> > new
> >     classloaders does require documentation for how to install
> connectors,
> > but
> >     that's pretty minimal. And we don't break existing installations
> where
> >     people are just adding to the CLASSPATH. It seems like this:
> >
> >     1. Allows you to set a default. Isolation is always enabled, but we
> > won't
> >     include any paths/directories we already use. Setting a default just
> >     requires specifying a new location where we'd hold these directories.
> >     2. It doesn't require the implicit choice -- you actually never turn
> > off
> >     isolation, but still support the regular CLASSPATH with an empty list
> > of
> >     isolated loaders
> >     3. The user can still use CLASSPATH if they want to minimize
> > classloader
> >     overhead
> >     4. Debugging can still use CLASSPATH
> >     5. Upgrades just work.
> >
> >
> >     >
> >     > Randall, regarding your comments:
> >     > 1) To keep its focus narrow, this KIP, as well as the first
> > implementation
> >     > of isolation in Connect, assume filesystem based discovery. With
> > careful
> >     > implementation, transitioning to discovery schemes that support
> > broader
> >     > URIs I believe should be easy in the future.
> >     >
> >
> >     Maybe just mention a couple of quick examples in the KIP. When
> > described
> >     inline it might be more obvious that it will extend cleanly.
> >
> >
> >     > 2) The example you give makes a good point. However I'm inclined to
> > say
> >     > that such cases should be addressed more as exceptions rather than
> > as being
> >     > the common case. Therefore, I wouldn't see all dependencies
> imported
> > by the
> >     > framework as required to be filtered out, because in that case we
> > lose the
> >     > advantage of isolation between the framework and the connectors
> (and
> > we are
> >     > left only with isolation between connectors).
> >
> >     3) I tried to abstract implementation details in this the KIP, but
> you
> > are
> >     > right. Even though filtering here is mainly used semantically
> rather
> > than
> >     > literally, it gives an implementation hint that we could avoid.
> >     >
> >
> >     I think we're missing another option -- don't do filtering and
> require
> > that
> >     those dependencies are correctly filtered out of the modules. If we
> > want to
> >     be nicer about this, we could also detect maybe 2 or 3 classes while
> >     scanning for Connectors/Converters/Transformations that indicate the
> >     classloader has jars that it shouldn't and warn about it. I can't
> > think of
> >     that many that would be an issue -- basically connect-api,
> > connect-runtime
> >     if they really mess it up, and maybe slf4j.
> >
> >
> >     > 4) In the same spirit as in 3) I believe we should reserve enough
> >     > flexibility to the implementation to discover and load classes,
> when
> > they
> >     > appear in multiple locations under the general module location.
> >     >
> >     >
> >     And a couple of addition comments:
> >
> >     - module.path should be module.paths if it accepts multiple paths
> >     - I think the description of the module paths is a bit confusing
> > because I
> >     think for simpler configuration we actually want to specify the
> > *parent*
> >     directory of module paths. I definitely prefer this since it's
> simpler
> >     although I am not certain how it will mix with future extensions to
> > other
> >     URL handlers (where a zip file or http URL wouldn't do the same
> > discovery
> >     within subdirectories)
> >
> >     -Ewen
> >
> >
> >     > Thanks again! Let me know what you think.
> >     > Konstantine
> >     >
> >     >
> >     > On Mon, May 1, 2017 at 10:12 AM, Randall Hauch <rh...@gmail.com>
> > wrote:
> >     >
> >     > > Very nice work, Konstantine. Conflicting dependencies of
> > connectors is
> >     > > indeed a big issue that makes it hard to manage installed
> > connectors.
> >     > >
> >     > > I do like Gwen's idea about removing the
> 'module.isolation.enabled'
> >     > > property. However, I would have anticipated always using
> classpath
> >     > > isolation for *only* those components registered under the module
> > path
> >     > and
> >     > > not really for anything else already on the normal classpath. So,
> > people
> >     > > could continue to place custom connector JARs onto the classpath,
> > though
> >     > > this would become deprecated in favor of installing custom
> > connector
> >     > JARs /
> >     > > modules via the module path. This keeps configuration simple,
> gives
> >     > people
> >     > > time to migrate, but let's people that need classpath isolation
> > get it to
> >     > > install a variety of connectors each with their dependencies that
> >     > > potentially conflict with other components.
> >     > >
> >     > > The challenge is whether there should be a default for
> > 'module.path'.
> >     > > Ideally there would be so that users know where they can install
> > their
> >     > > connectors. However, I suspect that this might be difficult to do
> > unless
> >     > it
> >     > > can make use of system properties such as "${kafka.home}" so that
> >     > relative
> >     > > directories can be specified.
> >     > >
> >     > > A few other questions/comments:
> >     > >
> >     > > 1) Does the KIP have to specify how are components / modules
> > installed,
> >     > > discovered, or recognized by Kafka Connect? Or perhaps the KIP
> > needs to
> >     > > just specify the semantics of the file system module path (e.g.,
> > the
> >     > > directories below those specified in the module path are to be
> > unique and
> >     > > identify an installed component).
> >     > >
> >     > > 2) Will the module classloader filtering also have to exclude
> Kafka
> >     > Connect
> >     > > dependencies? The only one that I can think of is the SLF4J API,
> > which
> >     > > can't be loaded from the module's classloader if the connector is
> > to send
> >     > > its log messages to the same logging system.
> >     > >
> >     > > 3) Rather than specify filtering, would be it a bit more flexible
> > to
> >     > simply
> >     > > say that the implementation will need to ensure that Java, Kafka
> > Connect,
> >     > > and other third party APIs (e.g., SLF4J API) will not be loaded
> > from the
> >     > > module classloaders? It'd be better to avoid specifying how it
> > will be
> >     > > done, just in case the implementation needs to evolve or use a
> > different
> >     > > technique (e.g., load the Java and public Kafka Connect APIs via
> > one
> >     > > classloader that is reused and that always appears before the
> > module
> >     > > classloader, while Kafka Connect implementation JARs appear after
> > the
> >     > > component's classloader.
> >     > >
> >     > > 4) Perhaps to address #2 and #3 above, perhaps the KIP could
> > explicitly
> >     > > specify the classloader order for a deployed connector. For
> > example,
> >     > > 'java', 'kafka-connect-apis', 'connector-module', 'smt-module-1',
> > ...,
> >     > > 'kafka-connect-impls', where 'connector-module' is the
> classloader
> > for
> >     > the
> >     > > (first) module where the connector is found, 'smt-module-1' is
> the
> >     > > classloader for the (first) module where the first SMT class is
> > found (if
> >     > > specified and found in a separate module), 'smt-module-2' is the
> >     > > classloader .... Might also need to say that the KIP does not
> > specify how
> >     > > the implementation will pick the module if a specified class if
> > found in
> >     > > more than one module.
> >     > >
> >     > > Thoughts?
> >     > >
> >     > > Randall
> >     > >
> >     > > On Mon, May 1, 2017 at 6:43 AM, Gwen Shapira <gw...@confluent.io>
> > wrote:
> >     > >
> >     > > > Hi Konstantine,
> >     > > >
> >     > > > Thank you so much for driving this! The connector classpath
> mess
> > is
> >     > > driving
> >     > > > me nuts (or worse, driving me to use Docker).
> >     > > >
> >     > > > I like the proposal for micro-benchmarks to test the context
> > switching
> >     > > > overhead.
> >     > > >
> >     > > > I have a difficult time figuring out the
> > module.isolation.enabled.
> >     > > > Especially with a default to false. I can't think of a reason
> > that
> >     > anyone
> >     > > > will not want classpath isolation. "No! I want my connectors to
> > mess up
> >     > > > each other's dependencies" said no one ever.
> >     > > >
> >     > > > So it looks like this is mostly for upgrade purpose? Because
> the
> >     > initial
> >     > > > upgrade will not have the module.path set and therefore
> classpath
> >     > > isolation
> >     > > > will simply not work by default?
> >     > > >
> >     > > > In that case, why don't we simply use the existence of
> non-empty
> >     > > > module.path as an indicator of whether isolation should work or
> > not?
> >     > seem
> >     > > > simpler and intuitive to me.
> >     > > >
> >     > > > Thanks!
> >     > > >
> >     > > > Gwen
> >     > > >
> >     > > >
> >     > > >
> >     > > >
> >     > > >
> >     > > > On Sat, Apr 29, 2017 at 9:16 AM, Konstantine Karantasis <
> >     > > > konstantine@confluent.io> wrote:
> >     > > >
> >     > > > > * Because of KIP number collision, please disregard my
> > previous KIP
> >     > > > > announcement and use this thread for discussion instead *
> >     > > > >
> >     > > > >
> >     > > > > Hi everyone,
> >     > > > >
> >     > > > > we aim to address dependency conflicts in Kafka Connect soon
> by
> >     > > applying
> >     > > > > class loading isolation.
> >     > > > >
> >     > > > > Feel free to take a look at KIP-146 here:
> >     > > > >
> >     > > > > *https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> >     > > > > 146+-+Classloading+Isolation+in+Connect
> >     > > > > <https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> >     > > > > 146+-+Classloading+Isolation+in+Connect>*
> >     > > > >
> >     > > > > which describes minimal required changes to public interfaces
> > and the
> >     > > > > general implementation approach.
> >     > > > >
> >     > > > > This is a much wanted feature for Kafka Connect. Your
> feedback
> > is
> >     > > highly
> >     > > > > appreciated.
> >     > > > >
> >     > > > > -Konstantine
> >     > > > >
> >     > > >
> >     > > >
> >     > > >
> >     > > > --
> >     > > > *Gwen Shapira*
> >     > > > Product Manager | Confluent
> >     > > > 650.450.2760 | @gwenshap
> >     > > > Follow us: Twitter <https://twitter.com/ConfluentInc> | blog
> >     > > > <http://www.confluent.io/blog>
> >     > > >
> >     > >
> >     >
> >
> >
> >
> >
>

Re: [DISCUSS] KIP-146: Classloading Isolation in Connect

Posted by Ewen Cheslack-Postava <ew...@confluent.io>.

On Tue, May 2, 2017 at 10:01 PM, Stephane Maarek <
stephane@simplemachines.com.au> wrote:

Excellent feedback, Stephane!

Thanks for the work, it’s definitely needed!
> I’d like to suggest to take it one step further.
>
> To me, I’d like to see Kafka Connect the same way we have Docker and
> Docker repositories.
>
> Here’s how I would envision the flow:
> - Kafka Connect workers are just workers. They come with no jars whatsoever
> - The REST API allow you to add a config to the connect cluster
> - The workers, seeing the config, pull the jars from the available
> (maven?) repositories (public or private)
>

I think supporting this mode is really valuable. It seems *really*
attractive if you have some easily accessible, scalable, centralized
storage for connectors (i.e. some central distributed FS).

But having to jump through these hoops (which presumably include some extra
URLs to point to the connectors, some definition of what that URL points to
e.g. zip of jars? uberjar? something with a more complex spec?) just to do
a quickstart seems like quite a bit of overhead. I think we should consider
both a) incremental progress that opens up new options but doesn't prevent
us from the ideal end state and b) all local testing/dev/prod use cases
(which is also why I still like having the plain old CLASSPATH option
available).

I think the proposal leaves open the scope for this -- it doesn't specify
connector-specific overrides, but that's obviously something that could be
added easily.


> - Classpath isolation comes into play so that the pulled jar doesn’t
> interact with other connectors
> - Additionally, I believe the config should have a “tag” or “version”
> (like docker really), so that
> o you can run different versions of the same connector on your connect
> cluster
> o configurations are strongly linked to a connector (right now if I update
> my connector jars, I may break my configuration)
>

We have versions on Connectors. We don't have them on transformations or
converters, which was definitely an oversight -- we should figure out how
to get them in there.

I think none of the proposals here rule out taking advantage of versioning
in the future -- would it be helpful to add a "Future Work" section that
gives an idea of how we'd extend this in the future to handle versions as
well?


> I know this is a bit out of scope, but if major changes are coming to
> connect then these are my two cents.
>

:) I think we can address most of these incrementally, but keep the scope
here a bit smaller and more manageable (and hopefully get it into
0.11.0.0!). I just don't want perfect to be the enemy of good -- getting
the first step in as long as it doesn't cause problems down the line seems
like a good step.


>
> Finally, maybe extend that construct to Transformers. The ability to
> externalise transformers as jars would democratize their usage IMO
>

This should definitely happen! It's important for pretty much any pluggable
component, though I think Transformations will, on average, have the fewest
extra dependencies. Connectors have the most, followed by Converters, then
Transformations. But I think Konstantine has mentioned these in the KIP --
if it could be clearer in the proposed changes, perhaps you could suggest
where to add details?

-Ewen


>
>
> On 3/5/17, 4:24 am, "Ewen Cheslack-Postava" <ew...@confluent.io> wrote:
>
>     Thanks for the KIP.
>
>     A few responses inline, followed by additional comments.
>
>     On Mon, May 1, 2017 at 9:50 PM, Konstantine Karantasis <
>     konstantine@confluent.io> wrote:
>
>     > Gwen, Randall thank you for your very insightful observations. I'm
> glad you
>     > find this first draft to be an adequate platform for discussion.
>     >
>     > I'll attempt replying to your comments in order.
>     >
>     > Gwen, I also debated exactly the same two options: a) interpreting
> absence
>     > of module path as a user's intention to turn off isolation and b)
>     > explicitly using an additional boolean property. A few reasons why I
> went
>     > with b) in this first draft are:
>     > 1) As Randall mentions, to leave the option of using a default value
> open.
>     > If not immediately in the first version of isolation, maybe in the
> future.
>     > 2) I didn't like the implicit character of the choice of
> interpreting an
>     > empty string as a clear intention to turn isolation off by the user.
> Half
>     > the time could be just that users forget to set a location, although
> they'd
>     > like to use class loading isolation.
>     > 3) There's a slim possibility that in rare occasions a user might
> want to
>     > avoid even the slightest increase in memory consumption due to class
>     > loading duplication. I admit this should be very rare, but given the
> other
>     > concerns and that we would really like to keep the isolation
> implementation
>     > simple, the option to turn off this feature by using only one
> additional
>     > config property might not seem too excessive. At least at the start
> of this
>     > discussion.
>     > 4) Debugging during development might be simpler in some cases.
>     > 5) Finally, as you mention, this could allow for smoother upgrades.
>     >
>
>     I'm not sure any of these keep you from removing the extra config. Is
> there
>     any reason you couldn't have clean support for relying on the CLASSPATH
>     while still supporting the classloaders? Then getting people onto the
> new
>     classloaders does require documentation for how to install connectors,
> but
>     that's pretty minimal. And we don't break existing installations where
>     people are just adding to the CLASSPATH. It seems like this:
>
>     1. Allows you to set a default. Isolation is always enabled, but we
> won't
>     include any paths/directories we already use. Setting a default just
>     requires specifying a new location where we'd hold these directories.
>     2. It doesn't require the implicit choice -- you actually never turn
> off
>     isolation, but still support the regular CLASSPATH with an empty list
> of
>     isolated loaders
>     3. The user can still use CLASSPATH if they want to minimize
> classloader
>     overhead
>     4. Debugging can still use CLASSPATH
>     5. Upgrades just work.
>
>
>     >
>     > Randall, regarding your comments:
>     > 1) To keep its focus narrow, this KIP, as well as the first
> implementation
>     > of isolation in Connect, assume filesystem based discovery. With
> careful
>     > implementation, transitioning to discovery schemes that support
> broader
>     > URIs I believe should be easy in the future.
>     >
>
>     Maybe just mention a couple of quick examples in the KIP. When
> described
>     inline it might be more obvious that it will extend cleanly.
>
>
>     > 2) The example you give makes a good point. However I'm inclined to
> say
>     > that such cases should be addressed more as exceptions rather than
> as being
>     > the common case. Therefore, I wouldn't see all dependencies imported
> by the
>     > framework as required to be filtered out, because in that case we
> lose the
>     > advantage of isolation between the framework and the connectors (and
> we are
>     > left only with isolation between connectors).
>
>     3) I tried to abstract implementation details in this the KIP, but you
> are
>     > right. Even though filtering here is mainly used semantically rather
> than
>     > literally, it gives an implementation hint that we could avoid.
>     >
>
>     I think we're missing another option -- don't do filtering and require
> that
>     those dependencies are correctly filtered out of the modules. If we
> want to
>     be nicer about this, we could also detect maybe 2 or 3 classes while
>     scanning for Connectors/Converters/Transformations that indicate the
>     classloader has jars that it shouldn't and warn about it. I can't
> think of
>     that many that would be an issue -- basically connect-api,
> connect-runtime
>     if they really mess it up, and maybe slf4j.
>
>
>     > 4) In the same spirit as in 3) I believe we should reserve enough
>     > flexibility to the implementation to discover and load classes, when
> they
>     > appear in multiple locations under the general module location.
>     >
>     >
>     And a couple of addition comments:
>
>     - module.path should be module.paths if it accepts multiple paths
>     - I think the description of the module paths is a bit confusing
> because I
>     think for simpler configuration we actually want to specify the
> *parent*
>     directory of module paths. I definitely prefer this since it's simpler
>     although I am not certain how it will mix with future extensions to
> other
>     URL handlers (where a zip file or http URL wouldn't do the same
> discovery
>     within subdirectories)
>
>     -Ewen
>
>
>     > Thanks again! Let me know what you think.
>     > Konstantine
>     >
>     >
>     > On Mon, May 1, 2017 at 10:12 AM, Randall Hauch <rh...@gmail.com>
> wrote:
>     >
>     > > Very nice work, Konstantine. Conflicting dependencies of
> connectors is
>     > > indeed a big issue that makes it hard to manage installed
> connectors.
>     > >
>     > > I do like Gwen's idea about removing the 'module.isolation.enabled'
>     > > property. However, I would have anticipated always using classpath
>     > > isolation for *only* those components registered under the module
> path
>     > and
>     > > not really for anything else already on the normal classpath. So,
> people
>     > > could continue to place custom connector JARs onto the classpath,
> though
>     > > this would become deprecated in favor of installing custom
> connector
>     > JARs /
>     > > modules via the module path. This keeps configuration simple, gives
>     > people
>     > > time to migrate, but let's people that need classpath isolation
> get it to
>     > > install a variety of connectors each with their dependencies that
>     > > potentially conflict with other components.
>     > >
>     > > The challenge is whether there should be a default for
> 'module.path'.
>     > > Ideally there would be so that users know where they can install
> their
>     > > connectors. However, I suspect that this might be difficult to do
> unless
>     > it
>     > > can make use of system properties such as "${kafka.home}" so that
>     > relative
>     > > directories can be specified.
>     > >
>     > > A few other questions/comments:
>     > >
>     > > 1) Does the KIP have to specify how are components / modules
> installed,
>     > > discovered, or recognized by Kafka Connect? Or perhaps the KIP
> needs to
>     > > just specify the semantics of the file system module path (e.g.,
> the
>     > > directories below those specified in the module path are to be
> unique and
>     > > identify an installed component).
>     > >
>     > > 2) Will the module classloader filtering also have to exclude Kafka
>     > Connect
>     > > dependencies? The only one that I can think of is the SLF4J API,
> which
>     > > can't be loaded from the module's classloader if the connector is
> to send
>     > > its log messages to the same logging system.
>     > >
>     > > 3) Rather than specify filtering, would be it a bit more flexible
> to
>     > simply
>     > > say that the implementation will need to ensure that Java, Kafka
> Connect,
>     > > and other third party APIs (e.g., SLF4J API) will not be loaded
> from the
>     > > module classloaders? It'd be better to avoid specifying how it
> will be
>     > > done, just in case the implementation needs to evolve or use a
> different
>     > > technique (e.g., load the Java and public Kafka Connect APIs via
> one
>     > > classloader that is reused and that always appears before the
> module
>     > > classloader, while Kafka Connect implementation JARs appear after
> the
>     > > component's classloader.
>     > >
>     > > 4) Perhaps to address #2 and #3 above, perhaps the KIP could
> explicitly
>     > > specify the classloader order for a deployed connector. For
> example,
>     > > 'java', 'kafka-connect-apis', 'connector-module', 'smt-module-1',
> ...,
>     > > 'kafka-connect-impls', where 'connector-module' is the classloader
> for
>     > the
>     > > (first) module where the connector is found, 'smt-module-1' is the
>     > > classloader for the (first) module where the first SMT class is
> found (if
>     > > specified and found in a separate module), 'smt-module-2' is the
>     > > classloader .... Might also need to say that the KIP does not
> specify how
>     > > the implementation will pick the module if a specified class if
> found in
>     > > more than one module.
>     > >
>     > > Thoughts?
>     > >
>     > > Randall
>     > >
>     > > On Mon, May 1, 2017 at 6:43 AM, Gwen Shapira <gw...@confluent.io>
> wrote:
>     > >
>     > > > Hi Konstantine,
>     > > >
>     > > > Thank you so much for driving this! The connector classpath mess
> is
>     > > driving
>     > > > me nuts (or worse, driving me to use Docker).
>     > > >
>     > > > I like the proposal for micro-benchmarks to test the context
> switching
>     > > > overhead.
>     > > >
>     > > > I have a difficult time figuring out the
> module.isolation.enabled.
>     > > > Especially with a default to false. I can't think of a reason
> that
>     > anyone
>     > > > will not want classpath isolation. "No! I want my connectors to
> mess up
>     > > > each other's dependencies" said no one ever.
>     > > >
>     > > > So it looks like this is mostly for upgrade purpose? Because the
>     > initial
>     > > > upgrade will not have the module.path set and therefore classpath
>     > > isolation
>     > > > will simply not work by default?
>     > > >
>     > > > In that case, why don't we simply use the existence of non-empty
>     > > > module.path as an indicator of whether isolation should work or
> not?
>     > seem
>     > > > simpler and intuitive to me.
>     > > >
>     > > > Thanks!
>     > > >
>     > > > Gwen
>     > > >
>     > > >
>     > > >
>     > > >
>     > > >
>     > > > On Sat, Apr 29, 2017 at 9:16 AM, Konstantine Karantasis <
>     > > > konstantine@confluent.io> wrote:
>     > > >
>     > > > > * Because of KIP number collision, please disregard my
> previous KIP
>     > > > > announcement and use this thread for discussion instead *
>     > > > >
>     > > > >
>     > > > > Hi everyone,
>     > > > >
>     > > > > we aim to address dependency conflicts in Kafka Connect soon by
>     > > applying
>     > > > > class loading isolation.
>     > > > >
>     > > > > Feel free to take a look at KIP-146 here:
>     > > > >
>     > > > > *https://cwiki.apache.org/confluence/display/KAFKA/KIP-
>     > > > > 146+-+Classloading+Isolation+in+Connect
>     > > > > <https://cwiki.apache.org/confluence/display/KAFKA/KIP-
>     > > > > 146+-+Classloading+Isolation+in+Connect>*
>     > > > >
>     > > > > which describes minimal required changes to public interfaces
> and the
>     > > > > general implementation approach.
>     > > > >
>     > > > > This is a much wanted feature for Kafka Connect. Your feedback
> is
>     > > highly
>     > > > > appreciated.
>     > > > >
>     > > > > -Konstantine
>     > > > >
>     > > >
>     > > >
>     > > >
>     > > > --
>     > > > *Gwen Shapira*
>     > > > Product Manager | Confluent
>     > > > 650.450.2760 | @gwenshap
>     > > > Follow us: Twitter <https://twitter.com/ConfluentInc> | blog
>     > > > <http://www.confluent.io/blog>
>     > > >
>     > >
>     >
>
>
>
>

Re: [DISCUSS] KIP-146: Classloading Isolation in Connect

Posted by Stephane Maarek <st...@simplemachines.com.au>.

Thanks for the work, it’s definitely needed!
I’d like to suggest to take it one step further.

To me, I’d like to see Kafka Connect the same way we have Docker and Docker repositories.

Here’s how I would envision the flow:
- Kafka Connect workers are just workers. They come with no jars whatsoever
- The REST API allow you to add a config to the connect cluster 
- The workers, seeing the config, pull the jars from the available (maven?) repositories (public or private)
- Classpath isolation comes into play so that the pulled jar doesn’t interact with other connectors
- Additionally, I believe the config should have a “tag” or “version” (like docker really), so that 
o you can run different versions of the same connector on your connect cluster
o configurations are strongly linked to a connector (right now if I update my connector jars, I may break my configuration)

I know this is a bit out of scope, but if major changes are coming to connect then these are my two cents.

Finally, maybe extend that construct to Transformers. The ability to externalise transformers as jars would democratize their usage IMO


On 3/5/17, 4:24 am, "Ewen Cheslack-Postava" <ew...@confluent.io> wrote:

    Thanks for the KIP.
    
    A few responses inline, followed by additional comments.
    
    On Mon, May 1, 2017 at 9:50 PM, Konstantine Karantasis <
    konstantine@confluent.io> wrote:
    
    > Gwen, Randall thank you for your very insightful observations. I'm glad you
    > find this first draft to be an adequate platform for discussion.
    >
    > I'll attempt replying to your comments in order.
    >
    > Gwen, I also debated exactly the same two options: a) interpreting absence
    > of module path as a user's intention to turn off isolation and b)
    > explicitly using an additional boolean property. A few reasons why I went
    > with b) in this first draft are:
    > 1) As Randall mentions, to leave the option of using a default value open.
    > If not immediately in the first version of isolation, maybe in the future.
    > 2) I didn't like the implicit character of the choice of interpreting an
    > empty string as a clear intention to turn isolation off by the user. Half
    > the time could be just that users forget to set a location, although they'd
    > like to use class loading isolation.
    > 3) There's a slim possibility that in rare occasions a user might want to
    > avoid even the slightest increase in memory consumption due to class
    > loading duplication. I admit this should be very rare, but given the other
    > concerns and that we would really like to keep the isolation implementation
    > simple, the option to turn off this feature by using only one additional
    > config property might not seem too excessive. At least at the start of this
    > discussion.
    > 4) Debugging during development might be simpler in some cases.
    > 5) Finally, as you mention, this could allow for smoother upgrades.
    >
    
    I'm not sure any of these keep you from removing the extra config. Is there
    any reason you couldn't have clean support for relying on the CLASSPATH
    while still supporting the classloaders? Then getting people onto the new
    classloaders does require documentation for how to install connectors, but
    that's pretty minimal. And we don't break existing installations where
    people are just adding to the CLASSPATH. It seems like this:
    
    1. Allows you to set a default. Isolation is always enabled, but we won't
    include any paths/directories we already use. Setting a default just
    requires specifying a new location where we'd hold these directories.
    2. It doesn't require the implicit choice -- you actually never turn off
    isolation, but still support the regular CLASSPATH with an empty list of
    isolated loaders
    3. The user can still use CLASSPATH if they want to minimize classloader
    overhead
    4. Debugging can still use CLASSPATH
    5. Upgrades just work.
    
    
    >
    > Randall, regarding your comments:
    > 1) To keep its focus narrow, this KIP, as well as the first implementation
    > of isolation in Connect, assume filesystem based discovery. With careful
    > implementation, transitioning to discovery schemes that support broader
    > URIs I believe should be easy in the future.
    >
    
    Maybe just mention a couple of quick examples in the KIP. When described
    inline it might be more obvious that it will extend cleanly.
    
    
    > 2) The example you give makes a good point. However I'm inclined to say
    > that such cases should be addressed more as exceptions rather than as being
    > the common case. Therefore, I wouldn't see all dependencies imported by the
    > framework as required to be filtered out, because in that case we lose the
    > advantage of isolation between the framework and the connectors (and we are
    > left only with isolation between connectors).
    
    3) I tried to abstract implementation details in this the KIP, but you are
    > right. Even though filtering here is mainly used semantically rather than
    > literally, it gives an implementation hint that we could avoid.
    >
    
    I think we're missing another option -- don't do filtering and require that
    those dependencies are correctly filtered out of the modules. If we want to
    be nicer about this, we could also detect maybe 2 or 3 classes while
    scanning for Connectors/Converters/Transformations that indicate the
    classloader has jars that it shouldn't and warn about it. I can't think of
    that many that would be an issue -- basically connect-api, connect-runtime
    if they really mess it up, and maybe slf4j.
    
    
    > 4) In the same spirit as in 3) I believe we should reserve enough
    > flexibility to the implementation to discover and load classes, when they
    > appear in multiple locations under the general module location.
    >
    >
    And a couple of addition comments:
    
    - module.path should be module.paths if it accepts multiple paths
    - I think the description of the module paths is a bit confusing because I
    think for simpler configuration we actually want to specify the *parent*
    directory of module paths. I definitely prefer this since it's simpler
    although I am not certain how it will mix with future extensions to other
    URL handlers (where a zip file or http URL wouldn't do the same discovery
    within subdirectories)
    
    -Ewen
    
    
    > Thanks again! Let me know what you think.
    > Konstantine
    >
    >
    > On Mon, May 1, 2017 at 10:12 AM, Randall Hauch <rh...@gmail.com> wrote:
    >
    > > Very nice work, Konstantine. Conflicting dependencies of connectors is
    > > indeed a big issue that makes it hard to manage installed connectors.
    > >
    > > I do like Gwen's idea about removing the 'module.isolation.enabled'
    > > property. However, I would have anticipated always using classpath
    > > isolation for *only* those components registered under the module path
    > and
    > > not really for anything else already on the normal classpath. So, people
    > > could continue to place custom connector JARs onto the classpath, though
    > > this would become deprecated in favor of installing custom connector
    > JARs /
    > > modules via the module path. This keeps configuration simple, gives
    > people
    > > time to migrate, but let's people that need classpath isolation get it to
    > > install a variety of connectors each with their dependencies that
    > > potentially conflict with other components.
    > >
    > > The challenge is whether there should be a default for 'module.path'.
    > > Ideally there would be so that users know where they can install their
    > > connectors. However, I suspect that this might be difficult to do unless
    > it
    > > can make use of system properties such as "${kafka.home}" so that
    > relative
    > > directories can be specified.
    > >
    > > A few other questions/comments:
    > >
    > > 1) Does the KIP have to specify how are components / modules installed,
    > > discovered, or recognized by Kafka Connect? Or perhaps the KIP needs to
    > > just specify the semantics of the file system module path (e.g., the
    > > directories below those specified in the module path are to be unique and
    > > identify an installed component).
    > >
    > > 2) Will the module classloader filtering also have to exclude Kafka
    > Connect
    > > dependencies? The only one that I can think of is the SLF4J API, which
    > > can't be loaded from the module's classloader if the connector is to send
    > > its log messages to the same logging system.
    > >
    > > 3) Rather than specify filtering, would be it a bit more flexible to
    > simply
    > > say that the implementation will need to ensure that Java, Kafka Connect,
    > > and other third party APIs (e.g., SLF4J API) will not be loaded from the
    > > module classloaders? It'd be better to avoid specifying how it will be
    > > done, just in case the implementation needs to evolve or use a different
    > > technique (e.g., load the Java and public Kafka Connect APIs via one
    > > classloader that is reused and that always appears before the module
    > > classloader, while Kafka Connect implementation JARs appear after the
    > > component's classloader.
    > >
    > > 4) Perhaps to address #2 and #3 above, perhaps the KIP could explicitly
    > > specify the classloader order for a deployed connector. For example,
    > > 'java', 'kafka-connect-apis', 'connector-module', 'smt-module-1', ...,
    > > 'kafka-connect-impls', where 'connector-module' is the classloader for
    > the
    > > (first) module where the connector is found, 'smt-module-1' is the
    > > classloader for the (first) module where the first SMT class is found (if
    > > specified and found in a separate module), 'smt-module-2' is the
    > > classloader .... Might also need to say that the KIP does not specify how
    > > the implementation will pick the module if a specified class if found in
    > > more than one module.
    > >
    > > Thoughts?
    > >
    > > Randall
    > >
    > > On Mon, May 1, 2017 at 6:43 AM, Gwen Shapira <gw...@confluent.io> wrote:
    > >
    > > > Hi Konstantine,
    > > >
    > > > Thank you so much for driving this! The connector classpath mess is
    > > driving
    > > > me nuts (or worse, driving me to use Docker).
    > > >
    > > > I like the proposal for micro-benchmarks to test the context switching
    > > > overhead.
    > > >
    > > > I have a difficult time figuring out the module.isolation.enabled.
    > > > Especially with a default to false. I can't think of a reason that
    > anyone
    > > > will not want classpath isolation. "No! I want my connectors to mess up
    > > > each other's dependencies" said no one ever.
    > > >
    > > > So it looks like this is mostly for upgrade purpose? Because the
    > initial
    > > > upgrade will not have the module.path set and therefore classpath
    > > isolation
    > > > will simply not work by default?
    > > >
    > > > In that case, why don't we simply use the existence of non-empty
    > > > module.path as an indicator of whether isolation should work or not?
    > seem
    > > > simpler and intuitive to me.
    > > >
    > > > Thanks!
    > > >
    > > > Gwen
    > > >
    > > >
    > > >
    > > >
    > > >
    > > > On Sat, Apr 29, 2017 at 9:16 AM, Konstantine Karantasis <
    > > > konstantine@confluent.io> wrote:
    > > >
    > > > > * Because of KIP number collision, please disregard my previous KIP
    > > > > announcement and use this thread for discussion instead *
    > > > >
    > > > >
    > > > > Hi everyone,
    > > > >
    > > > > we aim to address dependency conflicts in Kafka Connect soon by
    > > applying
    > > > > class loading isolation.
    > > > >
    > > > > Feel free to take a look at KIP-146 here:
    > > > >
    > > > > *https://cwiki.apache.org/confluence/display/KAFKA/KIP-
    > > > > 146+-+Classloading+Isolation+in+Connect
    > > > > <https://cwiki.apache.org/confluence/display/KAFKA/KIP-
    > > > > 146+-+Classloading+Isolation+in+Connect>*
    > > > >
    > > > > which describes minimal required changes to public interfaces and the
    > > > > general implementation approach.
    > > > >
    > > > > This is a much wanted feature for Kafka Connect. Your feedback is
    > > highly
    > > > > appreciated.
    > > > >
    > > > > -Konstantine
    > > > >
    > > >
    > > >
    > > >
    > > > --
    > > > *Gwen Shapira*
    > > > Product Manager | Confluent
    > > > 650.450.2760 | @gwenshap
    > > > Follow us: Twitter <https://twitter.com/ConfluentInc> | blog
    > > > <http://www.confluent.io/blog>
    > > >
    > >
    >

Re: [DISCUSS] KIP-146: Classloading Isolation in Connect

Posted by Gwen Shapira <gw...@confluent.io>.

On Wed, May 3, 2017 at 11:17 AM, Konstantine Karantasis <
konstantine@confluent.io> wrote:

> Thanks Ewen. I'm replying inline as well.
>
> On Tue, May 2, 2017 at 11:24 AM, Ewen Cheslack-Postava <ew...@confluent.io>
> wrote:
>
> > Thanks for the KIP.
> >
> > A few responses inline, followed by additional comments.
> >
> > On Mon, May 1, 2017 at 9:50 PM, Konstantine Karantasis <
> > konstantine@confluent.io> wrote:
> >
> > > Gwen, Randall thank you for your very insightful observations. I'm glad
> > you
> > > find this first draft to be an adequate platform for discussion.
> > >
> > > I'll attempt replying to your comments in order.
> > >
> > > Gwen, I also debated exactly the same two options: a) interpreting
> > absence
> > > of module path as a user's intention to turn off isolation and b)
> > > explicitly using an additional boolean property. A few reasons why I
> went
> > > with b) in this first draft are:
> > > 1) As Randall mentions, to leave the option of using a default value
> > open.
> > > If not immediately in the first version of isolation, maybe in the
> > future.
> > > 2) I didn't like the implicit character of the choice of interpreting
> an
> > > empty string as a clear intention to turn isolation off by the user.
> Half
> > > the time could be just that users forget to set a location, although
> > they'd
> > > like to use class loading isolation.
> > > 3) There's a slim possibility that in rare occasions a user might want
> to
> > > avoid even the slightest increase in memory consumption due to class
> > > loading duplication. I admit this should be very rare, but given the
> > other
> > > concerns and that we would really like to keep the isolation
> > implementation
> > > simple, the option to turn off this feature by using only one
> additional
> > > config property might not seem too excessive. At least at the start of
> > this
> > > discussion.
> > > 4) Debugging during development might be simpler in some cases.
> > > 5) Finally, as you mention, this could allow for smoother upgrades.
> > >
> >
> > I'm not sure any of these keep you from removing the extra config. Is
> there
> > any reason you couldn't have clean support for relying on the CLASSPATH
> > while still supporting the classloaders? Then getting people onto the new
> > classloaders does require documentation for how to install connectors,
> but
> > that's pretty minimal. And we don't break existing installations where
> > people are just adding to the CLASSPATH. It seems like this:
> >
> > 1. Allows you to set a default. Isolation is always enabled, but we won't
> > include any paths/directories we already use. Setting a default just
> > requires specifying a new location where we'd hold these directories.
> > 2. It doesn't require the implicit choice -- you actually never turn off
> > isolation, but still support the regular CLASSPATH with an empty list of
> > isolated loaders
> > 3. The user can still use CLASSPATH if they want to minimize classloader
> > overhead
> > 4. Debugging can still use CLASSPATH
> > 5. Upgrades just work.
> >
>
> Falling back to CLASSPATH for non-isolated mode makes sense. The extra
> config property was suggested proactively, as well as for clarity and
> handling of defaults. But it's much better if we can do without it. Will be
> removed.
>
>
Thank you (and I'm sure Confluent's support team also thanks you - one
configuration should be harder to accidentally screw up than two...)



>
> >
> >
> > > Randall, regarding your comments:
> > > 1) To keep its focus narrow, this KIP, as well as the first
> > implementation
> > > of isolation in Connect, assume filesystem based discovery. With
> careful
> > > implementation, transitioning to discovery schemes that support broader
> > > URIs I believe should be easy in the future.
> > >
> >
> > Maybe just mention a couple of quick examples in the KIP. When described
> > inline it might be more obvious that it will extend cleanly.
> >
> >
> There's an example for a filesystem-based structure. I will enhance it.
>
>
>
> > > 2) The example you give makes a good point. However I'm inclined to say
> > > that such cases should be addressed more as exceptions rather than as
> > being
> > > the common case. Therefore, I wouldn't see all dependencies imported by
> > the
> > > framework as required to be filtered out, because in that case we lose
> > the
> > > advantage of isolation between the framework and the connectors (and we
> > are
> > > left only with isolation between connectors).
> >
> > 3) I tried to abstract implementation details in this the KIP, but you
> are
> > > right. Even though filtering here is mainly used semantically rather
> than
> > > literally, it gives an implementation hint that we could avoid.
> > >
> >
> > I think we're missing another option -- don't do filtering and require
> that
> > those dependencies are correctly filtered out of the modules. If we want
> to
> > be nicer about this, we could also detect maybe 2 or 3 classes while
> > scanning for Connectors/Converters/Transformations that indicate the
> > classloader has jars that it shouldn't and warn about it. I can't think
> of
> > that many that would be an issue -- basically connect-api,
> connect-runtime
> > if they really mess it up, and maybe slf4j.
> >
> >
> This sounds a bit more strict, which is not necessarily a bad thing. Still,
> it seems to also keep the subject under the category of implementation
> decisions.
>
> > 4) In the same spirit as in 3) I believe we should reserve enough
> > > flexibility to the implementation to discover and load classes, when
> they
> > > appear in multiple locations under the general module location.
> > >
> > >
> > And a couple of addition comments:
> >
> > - module.path should be module.paths if it accepts multiple paths
> >
>
> The configuration property was named in a way that resembles classpath. I
> felt this might offer some intuition, even if the word "path" is too
> filesystem oriented. Users are used to give a list of paths in classpath,
> so I didn't find singular too confusing. Additionally, it will work when
> there's only one "path", or I should say URI, given. But I don't feel very
> strong about the name of this property. (Alternatives that I tried such as
> module.uris, module.locations didn't seem satisfying)
>

> > - I think the description of the module paths is a bit confusing because
> I
> > think for simpler configuration we actually want to specify the *parent*
> > directory of module paths. I definitely prefer this since it's simpler
> > although I am not certain how it will mix with future extensions to other
> > URL handlers (where a zip file or http URL wouldn't do the same discovery
> > within subdirectories)
> >
>
> My impression is that the 2nd paragraph below the introduction of
> module.path is where the "parent" convention is attempted to be explained.
> I will attempt to improve this text, but I feel that, again, the filesystem
> convention will follow the generic definition of the property (a list of
> locations/paths).
>

I was also confused while reading the description. It clicked at some
point, but it wasn't immediately clear that we load from sub-directories
too.



>
> -Konstantine
>
>
> >
> > -Ewen
> >
> >
> > > Thanks again! Let me know what you think.
> > > Konstantine
> > >
> > >
> > > On Mon, May 1, 2017 at 10:12 AM, Randall Hauch <rh...@gmail.com>
> wrote:
> > >
> > > > Very nice work, Konstantine. Conflicting dependencies of connectors
> is
> > > > indeed a big issue that makes it hard to manage installed connectors.
> > > >
> > > > I do like Gwen's idea about removing the 'module.isolation.enabled'
> > > > property. However, I would have anticipated always using classpath
> > > > isolation for *only* those components registered under the module
> path
> > > and
> > > > not really for anything else already on the normal classpath. So,
> > people
> > > > could continue to place custom connector JARs onto the classpath,
> > though
> > > > this would become deprecated in favor of installing custom connector
> > > JARs /
> > > > modules via the module path. This keeps configuration simple, gives
> > > people
> > > > time to migrate, but let's people that need classpath isolation get
> it
> > to
> > > > install a variety of connectors each with their dependencies that
> > > > potentially conflict with other components.
> > > >
> > > > The challenge is whether there should be a default for 'module.path'.
> > > > Ideally there would be so that users know where they can install
> their
> > > > connectors. However, I suspect that this might be difficult to do
> > unless
> > > it
> > > > can make use of system properties such as "${kafka.home}" so that
> > > relative
> > > > directories can be specified.
> > > >
> > > > A few other questions/comments:
> > > >
> > > > 1) Does the KIP have to specify how are components / modules
> installed,
> > > > discovered, or recognized by Kafka Connect? Or perhaps the KIP needs
> to
> > > > just specify the semantics of the file system module path (e.g., the
> > > > directories below those specified in the module path are to be unique
> > and
> > > > identify an installed component).
> > > >
> > > > 2) Will the module classloader filtering also have to exclude Kafka
> > > Connect
> > > > dependencies? The only one that I can think of is the SLF4J API,
> which
> > > > can't be loaded from the module's classloader if the connector is to
> > send
> > > > its log messages to the same logging system.
> > > >
> > > > 3) Rather than specify filtering, would be it a bit more flexible to
> > > simply
> > > > say that the implementation will need to ensure that Java, Kafka
> > Connect,
> > > > and other third party APIs (e.g., SLF4J API) will not be loaded from
> > the
> > > > module classloaders? It'd be better to avoid specifying how it will
> be
> > > > done, just in case the implementation needs to evolve or use a
> > different
> > > > technique (e.g., load the Java and public Kafka Connect APIs via one
> > > > classloader that is reused and that always appears before the module
> > > > classloader, while Kafka Connect implementation JARs appear after the
> > > > component's classloader.
> > > >
> > > > 4) Perhaps to address #2 and #3 above, perhaps the KIP could
> explicitly
> > > > specify the classloader order for a deployed connector. For example,
> > > > 'java', 'kafka-connect-apis', 'connector-module', 'smt-module-1',
> ...,
> > > > 'kafka-connect-impls', where 'connector-module' is the classloader
> for
> > > the
> > > > (first) module where the connector is found, 'smt-module-1' is the
> > > > classloader for the (first) module where the first SMT class is found
> > (if
> > > > specified and found in a separate module), 'smt-module-2' is the
> > > > classloader .... Might also need to say that the KIP does not specify
> > how
> > > > the implementation will pick the module if a specified class if found
> > in
> > > > more than one module.
> > > >
> > > > Thoughts?
> > > >
> > > > Randall
> > > >
> > > > On Mon, May 1, 2017 at 6:43 AM, Gwen Shapira <gw...@confluent.io>
> > wrote:
> > > >
> > > > > Hi Konstantine,
> > > > >
> > > > > Thank you so much for driving this! The connector classpath mess is
> > > > driving
> > > > > me nuts (or worse, driving me to use Docker).
> > > > >
> > > > > I like the proposal for micro-benchmarks to test the context
> > switching
> > > > > overhead.
> > > > >
> > > > > I have a difficult time figuring out the module.isolation.enabled.
> > > > > Especially with a default to false. I can't think of a reason that
> > > anyone
> > > > > will not want classpath isolation. "No! I want my connectors to
> mess
> > up
> > > > > each other's dependencies" said no one ever.
> > > > >
> > > > > So it looks like this is mostly for upgrade purpose? Because the
> > > initial
> > > > > upgrade will not have the module.path set and therefore classpath
> > > > isolation
> > > > > will simply not work by default?
> > > > >
> > > > > In that case, why don't we simply use the existence of non-empty
> > > > > module.path as an indicator of whether isolation should work or
> not?
> > > seem
> > > > > simpler and intuitive to me.
> > > > >
> > > > > Thanks!
> > > > >
> > > > > Gwen
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > On Sat, Apr 29, 2017 at 9:16 AM, Konstantine Karantasis <
> > > > > konstantine@confluent.io> wrote:
> > > > >
> > > > > > * Because of KIP number collision, please disregard my previous
> KIP
> > > > > > announcement and use this thread for discussion instead *
> > > > > >
> > > > > >
> > > > > > Hi everyone,
> > > > > >
> > > > > > we aim to address dependency conflicts in Kafka Connect soon by
> > > > applying
> > > > > > class loading isolation.
> > > > > >
> > > > > > Feel free to take a look at KIP-146 here:
> > > > > >
> > > > > > *https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > > > > 146+-+Classloading+Isolation+in+Connect
> > > > > > <https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > > > > 146+-+Classloading+Isolation+in+Connect>*
> > > > > >
> > > > > > which describes minimal required changes to public interfaces and
> > the
> > > > > > general implementation approach.
> > > > > >
> > > > > > This is a much wanted feature for Kafka Connect. Your feedback is
> > > > highly
> > > > > > appreciated.
> > > > > >
> > > > > > -Konstantine
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > *Gwen Shapira*
> > > > > Product Manager | Confluent
> > > > > 650.450.2760 | @gwenshap
> > > > > Follow us: Twitter <https://twitter.com/ConfluentInc> | blog
> > > > > <http://www.confluent.io/blog>
> > > > >
> > > >
> > >
> >
>



-- 
*Gwen Shapira*
Product Manager | Confluent
650.450.2760 | @gwenshap
Follow us: Twitter <https://twitter.com/ConfluentInc> | blog
<http://www.confluent.io/blog>

Re: [DISCUSS] KIP-146: Classloading Isolation in Connect

Posted by Konstantine Karantasis <ko...@confluent.io>.

Thanks Ewen. I'm replying inline as well.

On Tue, May 2, 2017 at 11:24 AM, Ewen Cheslack-Postava <ew...@confluent.io>
wrote:

> Thanks for the KIP.
>
> A few responses inline, followed by additional comments.
>
> On Mon, May 1, 2017 at 9:50 PM, Konstantine Karantasis <
> konstantine@confluent.io> wrote:
>
> > Gwen, Randall thank you for your very insightful observations. I'm glad
> you
> > find this first draft to be an adequate platform for discussion.
> >
> > I'll attempt replying to your comments in order.
> >
> > Gwen, I also debated exactly the same two options: a) interpreting
> absence
> > of module path as a user's intention to turn off isolation and b)
> > explicitly using an additional boolean property. A few reasons why I went
> > with b) in this first draft are:
> > 1) As Randall mentions, to leave the option of using a default value
> open.
> > If not immediately in the first version of isolation, maybe in the
> future.
> > 2) I didn't like the implicit character of the choice of interpreting an
> > empty string as a clear intention to turn isolation off by the user. Half
> > the time could be just that users forget to set a location, although
> they'd
> > like to use class loading isolation.
> > 3) There's a slim possibility that in rare occasions a user might want to
> > avoid even the slightest increase in memory consumption due to class
> > loading duplication. I admit this should be very rare, but given the
> other
> > concerns and that we would really like to keep the isolation
> implementation
> > simple, the option to turn off this feature by using only one additional
> > config property might not seem too excessive. At least at the start of
> this
> > discussion.
> > 4) Debugging during development might be simpler in some cases.
> > 5) Finally, as you mention, this could allow for smoother upgrades.
> >
>
> I'm not sure any of these keep you from removing the extra config. Is there
> any reason you couldn't have clean support for relying on the CLASSPATH
> while still supporting the classloaders? Then getting people onto the new
> classloaders does require documentation for how to install connectors, but
> that's pretty minimal. And we don't break existing installations where
> people are just adding to the CLASSPATH. It seems like this:
>
> 1. Allows you to set a default. Isolation is always enabled, but we won't
> include any paths/directories we already use. Setting a default just
> requires specifying a new location where we'd hold these directories.
> 2. It doesn't require the implicit choice -- you actually never turn off
> isolation, but still support the regular CLASSPATH with an empty list of
> isolated loaders
> 3. The user can still use CLASSPATH if they want to minimize classloader
> overhead
> 4. Debugging can still use CLASSPATH
> 5. Upgrades just work.
>

Falling back to CLASSPATH for non-isolated mode makes sense. The extra
config property was suggested proactively, as well as for clarity and
handling of defaults. But it's much better if we can do without it. Will be
removed.


>
>
> > Randall, regarding your comments:
> > 1) To keep its focus narrow, this KIP, as well as the first
> implementation
> > of isolation in Connect, assume filesystem based discovery. With careful
> > implementation, transitioning to discovery schemes that support broader
> > URIs I believe should be easy in the future.
> >
>
> Maybe just mention a couple of quick examples in the KIP. When described
> inline it might be more obvious that it will extend cleanly.
>
>
There's an example for a filesystem-based structure. I will enhance it.



> > 2) The example you give makes a good point. However I'm inclined to say
> > that such cases should be addressed more as exceptions rather than as
> being
> > the common case. Therefore, I wouldn't see all dependencies imported by
> the
> > framework as required to be filtered out, because in that case we lose
> the
> > advantage of isolation between the framework and the connectors (and we
> are
> > left only with isolation between connectors).
>
> 3) I tried to abstract implementation details in this the KIP, but you are
> > right. Even though filtering here is mainly used semantically rather than
> > literally, it gives an implementation hint that we could avoid.
> >
>
> I think we're missing another option -- don't do filtering and require that
> those dependencies are correctly filtered out of the modules. If we want to
> be nicer about this, we could also detect maybe 2 or 3 classes while
> scanning for Connectors/Converters/Transformations that indicate the
> classloader has jars that it shouldn't and warn about it. I can't think of
> that many that would be an issue -- basically connect-api, connect-runtime
> if they really mess it up, and maybe slf4j.
>
>
This sounds a bit more strict, which is not necessarily a bad thing. Still,
it seems to also keep the subject under the category of implementation
decisions.

> 4) In the same spirit as in 3) I believe we should reserve enough
> > flexibility to the implementation to discover and load classes, when they
> > appear in multiple locations under the general module location.
> >
> >
> And a couple of addition comments:
>
> - module.path should be module.paths if it accepts multiple paths
>

The configuration property was named in a way that resembles classpath. I
felt this might offer some intuition, even if the word "path" is too
filesystem oriented. Users are used to give a list of paths in classpath,
so I didn't find singular too confusing. Additionally, it will work when
there's only one "path", or I should say URI, given. But I don't feel very
strong about the name of this property. (Alternatives that I tried such as
module.uris, module.locations didn't seem satisfying)



> - I think the description of the module paths is a bit confusing because I
> think for simpler configuration we actually want to specify the *parent*
> directory of module paths. I definitely prefer this since it's simpler
> although I am not certain how it will mix with future extensions to other
> URL handlers (where a zip file or http URL wouldn't do the same discovery
> within subdirectories)
>

My impression is that the 2nd paragraph below the introduction of
module.path is where the "parent" convention is attempted to be explained.
I will attempt to improve this text, but I feel that, again, the filesystem
convention will follow the generic definition of the property (a list of
locations/paths).

-Konstantine


>
> -Ewen
>
>
> > Thanks again! Let me know what you think.
> > Konstantine
> >
> >
> > On Mon, May 1, 2017 at 10:12 AM, Randall Hauch <rh...@gmail.com> wrote:
> >
> > > Very nice work, Konstantine. Conflicting dependencies of connectors is
> > > indeed a big issue that makes it hard to manage installed connectors.
> > >
> > > I do like Gwen's idea about removing the 'module.isolation.enabled'
> > > property. However, I would have anticipated always using classpath
> > > isolation for *only* those components registered under the module path
> > and
> > > not really for anything else already on the normal classpath. So,
> people
> > > could continue to place custom connector JARs onto the classpath,
> though
> > > this would become deprecated in favor of installing custom connector
> > JARs /
> > > modules via the module path. This keeps configuration simple, gives
> > people
> > > time to migrate, but let's people that need classpath isolation get it
> to
> > > install a variety of connectors each with their dependencies that
> > > potentially conflict with other components.
> > >
> > > The challenge is whether there should be a default for 'module.path'.
> > > Ideally there would be so that users know where they can install their
> > > connectors. However, I suspect that this might be difficult to do
> unless
> > it
> > > can make use of system properties such as "${kafka.home}" so that
> > relative
> > > directories can be specified.
> > >
> > > A few other questions/comments:
> > >
> > > 1) Does the KIP have to specify how are components / modules installed,
> > > discovered, or recognized by Kafka Connect? Or perhaps the KIP needs to
> > > just specify the semantics of the file system module path (e.g., the
> > > directories below those specified in the module path are to be unique
> and
> > > identify an installed component).
> > >
> > > 2) Will the module classloader filtering also have to exclude Kafka
> > Connect
> > > dependencies? The only one that I can think of is the SLF4J API, which
> > > can't be loaded from the module's classloader if the connector is to
> send
> > > its log messages to the same logging system.
> > >
> > > 3) Rather than specify filtering, would be it a bit more flexible to
> > simply
> > > say that the implementation will need to ensure that Java, Kafka
> Connect,
> > > and other third party APIs (e.g., SLF4J API) will not be loaded from
> the
> > > module classloaders? It'd be better to avoid specifying how it will be
> > > done, just in case the implementation needs to evolve or use a
> different
> > > technique (e.g., load the Java and public Kafka Connect APIs via one
> > > classloader that is reused and that always appears before the module
> > > classloader, while Kafka Connect implementation JARs appear after the
> > > component's classloader.
> > >
> > > 4) Perhaps to address #2 and #3 above, perhaps the KIP could explicitly
> > > specify the classloader order for a deployed connector. For example,
> > > 'java', 'kafka-connect-apis', 'connector-module', 'smt-module-1', ...,
> > > 'kafka-connect-impls', where 'connector-module' is the classloader for
> > the
> > > (first) module where the connector is found, 'smt-module-1' is the
> > > classloader for the (first) module where the first SMT class is found
> (if
> > > specified and found in a separate module), 'smt-module-2' is the
> > > classloader .... Might also need to say that the KIP does not specify
> how
> > > the implementation will pick the module if a specified class if found
> in
> > > more than one module.
> > >
> > > Thoughts?
> > >
> > > Randall
> > >
> > > On Mon, May 1, 2017 at 6:43 AM, Gwen Shapira <gw...@confluent.io>
> wrote:
> > >
> > > > Hi Konstantine,
> > > >
> > > > Thank you so much for driving this! The connector classpath mess is
> > > driving
> > > > me nuts (or worse, driving me to use Docker).
> > > >
> > > > I like the proposal for micro-benchmarks to test the context
> switching
> > > > overhead.
> > > >
> > > > I have a difficult time figuring out the module.isolation.enabled.
> > > > Especially with a default to false. I can't think of a reason that
> > anyone
> > > > will not want classpath isolation. "No! I want my connectors to mess
> up
> > > > each other's dependencies" said no one ever.
> > > >
> > > > So it looks like this is mostly for upgrade purpose? Because the
> > initial
> > > > upgrade will not have the module.path set and therefore classpath
> > > isolation
> > > > will simply not work by default?
> > > >
> > > > In that case, why don't we simply use the existence of non-empty
> > > > module.path as an indicator of whether isolation should work or not?
> > seem
> > > > simpler and intuitive to me.
> > > >
> > > > Thanks!
> > > >
> > > > Gwen
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > On Sat, Apr 29, 2017 at 9:16 AM, Konstantine Karantasis <
> > > > konstantine@confluent.io> wrote:
> > > >
> > > > > * Because of KIP number collision, please disregard my previous KIP
> > > > > announcement and use this thread for discussion instead *
> > > > >
> > > > >
> > > > > Hi everyone,
> > > > >
> > > > > we aim to address dependency conflicts in Kafka Connect soon by
> > > applying
> > > > > class loading isolation.
> > > > >
> > > > > Feel free to take a look at KIP-146 here:
> > > > >
> > > > > *https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > > > 146+-+Classloading+Isolation+in+Connect
> > > > > <https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > > > 146+-+Classloading+Isolation+in+Connect>*
> > > > >
> > > > > which describes minimal required changes to public interfaces and
> the
> > > > > general implementation approach.
> > > > >
> > > > > This is a much wanted feature for Kafka Connect. Your feedback is
> > > highly
> > > > > appreciated.
> > > > >
> > > > > -Konstantine
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > *Gwen Shapira*
> > > > Product Manager | Confluent
> > > > 650.450.2760 | @gwenshap
> > > > Follow us: Twitter <https://twitter.com/ConfluentInc> | blog
> > > > <http://www.confluent.io/blog>
> > > >
> > >
> >
>

Re: [DISCUSS] KIP-146: Classloading Isolation in Connect

Posted by Ewen Cheslack-Postava <ew...@confluent.io>.

Thanks for the KIP.

A few responses inline, followed by additional comments.

On Mon, May 1, 2017 at 9:50 PM, Konstantine Karantasis <
konstantine@confluent.io> wrote:

> Gwen, Randall thank you for your very insightful observations. I'm glad you
> find this first draft to be an adequate platform for discussion.
>
> I'll attempt replying to your comments in order.
>
> Gwen, I also debated exactly the same two options: a) interpreting absence
> of module path as a user's intention to turn off isolation and b)
> explicitly using an additional boolean property. A few reasons why I went
> with b) in this first draft are:
> 1) As Randall mentions, to leave the option of using a default value open.
> If not immediately in the first version of isolation, maybe in the future.
> 2) I didn't like the implicit character of the choice of interpreting an
> empty string as a clear intention to turn isolation off by the user. Half
> the time could be just that users forget to set a location, although they'd
> like to use class loading isolation.
> 3) There's a slim possibility that in rare occasions a user might want to
> avoid even the slightest increase in memory consumption due to class
> loading duplication. I admit this should be very rare, but given the other
> concerns and that we would really like to keep the isolation implementation
> simple, the option to turn off this feature by using only one additional
> config property might not seem too excessive. At least at the start of this
> discussion.
> 4) Debugging during development might be simpler in some cases.
> 5) Finally, as you mention, this could allow for smoother upgrades.
>

I'm not sure any of these keep you from removing the extra config. Is there
any reason you couldn't have clean support for relying on the CLASSPATH
while still supporting the classloaders? Then getting people onto the new
classloaders does require documentation for how to install connectors, but
that's pretty minimal. And we don't break existing installations where
people are just adding to the CLASSPATH. It seems like this:

1. Allows you to set a default. Isolation is always enabled, but we won't
include any paths/directories we already use. Setting a default just
requires specifying a new location where we'd hold these directories.
2. It doesn't require the implicit choice -- you actually never turn off
isolation, but still support the regular CLASSPATH with an empty list of
isolated loaders
3. The user can still use CLASSPATH if they want to minimize classloader
overhead
4. Debugging can still use CLASSPATH
5. Upgrades just work.


>
> Randall, regarding your comments:
> 1) To keep its focus narrow, this KIP, as well as the first implementation
> of isolation in Connect, assume filesystem based discovery. With careful
> implementation, transitioning to discovery schemes that support broader
> URIs I believe should be easy in the future.
>

Maybe just mention a couple of quick examples in the KIP. When described
inline it might be more obvious that it will extend cleanly.


> 2) The example you give makes a good point. However I'm inclined to say
> that such cases should be addressed more as exceptions rather than as being
> the common case. Therefore, I wouldn't see all dependencies imported by the
> framework as required to be filtered out, because in that case we lose the
> advantage of isolation between the framework and the connectors (and we are
> left only with isolation between connectors).

3) I tried to abstract implementation details in this the KIP, but you are
> right. Even though filtering here is mainly used semantically rather than
> literally, it gives an implementation hint that we could avoid.
>

I think we're missing another option -- don't do filtering and require that
those dependencies are correctly filtered out of the modules. If we want to
be nicer about this, we could also detect maybe 2 or 3 classes while
scanning for Connectors/Converters/Transformations that indicate the
classloader has jars that it shouldn't and warn about it. I can't think of
that many that would be an issue -- basically connect-api, connect-runtime
if they really mess it up, and maybe slf4j.


> 4) In the same spirit as in 3) I believe we should reserve enough
> flexibility to the implementation to discover and load classes, when they
> appear in multiple locations under the general module location.
>
>
And a couple of addition comments:

- module.path should be module.paths if it accepts multiple paths
- I think the description of the module paths is a bit confusing because I
think for simpler configuration we actually want to specify the *parent*
directory of module paths. I definitely prefer this since it's simpler
although I am not certain how it will mix with future extensions to other
URL handlers (where a zip file or http URL wouldn't do the same discovery
within subdirectories)

-Ewen


> Thanks again! Let me know what you think.
> Konstantine
>
>
> On Mon, May 1, 2017 at 10:12 AM, Randall Hauch <rh...@gmail.com> wrote:
>
> > Very nice work, Konstantine. Conflicting dependencies of connectors is
> > indeed a big issue that makes it hard to manage installed connectors.
> >
> > I do like Gwen's idea about removing the 'module.isolation.enabled'
> > property. However, I would have anticipated always using classpath
> > isolation for *only* those components registered under the module path
> and
> > not really for anything else already on the normal classpath. So, people
> > could continue to place custom connector JARs onto the classpath, though
> > this would become deprecated in favor of installing custom connector
> JARs /
> > modules via the module path. This keeps configuration simple, gives
> people
> > time to migrate, but let's people that need classpath isolation get it to
> > install a variety of connectors each with their dependencies that
> > potentially conflict with other components.
> >
> > The challenge is whether there should be a default for 'module.path'.
> > Ideally there would be so that users know where they can install their
> > connectors. However, I suspect that this might be difficult to do unless
> it
> > can make use of system properties such as "${kafka.home}" so that
> relative
> > directories can be specified.
> >
> > A few other questions/comments:
> >
> > 1) Does the KIP have to specify how are components / modules installed,
> > discovered, or recognized by Kafka Connect? Or perhaps the KIP needs to
> > just specify the semantics of the file system module path (e.g., the
> > directories below those specified in the module path are to be unique and
> > identify an installed component).
> >
> > 2) Will the module classloader filtering also have to exclude Kafka
> Connect
> > dependencies? The only one that I can think of is the SLF4J API, which
> > can't be loaded from the module's classloader if the connector is to send
> > its log messages to the same logging system.
> >
> > 3) Rather than specify filtering, would be it a bit more flexible to
> simply
> > say that the implementation will need to ensure that Java, Kafka Connect,
> > and other third party APIs (e.g., SLF4J API) will not be loaded from the
> > module classloaders? It'd be better to avoid specifying how it will be
> > done, just in case the implementation needs to evolve or use a different
> > technique (e.g., load the Java and public Kafka Connect APIs via one
> > classloader that is reused and that always appears before the module
> > classloader, while Kafka Connect implementation JARs appear after the
> > component's classloader.
> >
> > 4) Perhaps to address #2 and #3 above, perhaps the KIP could explicitly
> > specify the classloader order for a deployed connector. For example,
> > 'java', 'kafka-connect-apis', 'connector-module', 'smt-module-1', ...,
> > 'kafka-connect-impls', where 'connector-module' is the classloader for
> the
> > (first) module where the connector is found, 'smt-module-1' is the
> > classloader for the (first) module where the first SMT class is found (if
> > specified and found in a separate module), 'smt-module-2' is the
> > classloader .... Might also need to say that the KIP does not specify how
> > the implementation will pick the module if a specified class if found in
> > more than one module.
> >
> > Thoughts?
> >
> > Randall
> >
> > On Mon, May 1, 2017 at 6:43 AM, Gwen Shapira <gw...@confluent.io> wrote:
> >
> > > Hi Konstantine,
> > >
> > > Thank you so much for driving this! The connector classpath mess is
> > driving
> > > me nuts (or worse, driving me to use Docker).
> > >
> > > I like the proposal for micro-benchmarks to test the context switching
> > > overhead.
> > >
> > > I have a difficult time figuring out the module.isolation.enabled.
> > > Especially with a default to false. I can't think of a reason that
> anyone
> > > will not want classpath isolation. "No! I want my connectors to mess up
> > > each other's dependencies" said no one ever.
> > >
> > > So it looks like this is mostly for upgrade purpose? Because the
> initial
> > > upgrade will not have the module.path set and therefore classpath
> > isolation
> > > will simply not work by default?
> > >
> > > In that case, why don't we simply use the existence of non-empty
> > > module.path as an indicator of whether isolation should work or not?
> seem
> > > simpler and intuitive to me.
> > >
> > > Thanks!
> > >
> > > Gwen
> > >
> > >
> > >
> > >
> > >
> > > On Sat, Apr 29, 2017 at 9:16 AM, Konstantine Karantasis <
> > > konstantine@confluent.io> wrote:
> > >
> > > > * Because of KIP number collision, please disregard my previous KIP
> > > > announcement and use this thread for discussion instead *
> > > >
> > > >
> > > > Hi everyone,
> > > >
> > > > we aim to address dependency conflicts in Kafka Connect soon by
> > applying
> > > > class loading isolation.
> > > >
> > > > Feel free to take a look at KIP-146 here:
> > > >
> > > > *https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > > 146+-+Classloading+Isolation+in+Connect
> > > > <https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > > 146+-+Classloading+Isolation+in+Connect>*
> > > >
> > > > which describes minimal required changes to public interfaces and the
> > > > general implementation approach.
> > > >
> > > > This is a much wanted feature for Kafka Connect. Your feedback is
> > highly
> > > > appreciated.
> > > >
> > > > -Konstantine
> > > >
> > >
> > >
> > >
> > > --
> > > *Gwen Shapira*
> > > Product Manager | Confluent
> > > 650.450.2760 | @gwenshap
> > > Follow us: Twitter <https://twitter.com/ConfluentInc> | blog
> > > <http://www.confluent.io/blog>
> > >
> >
>

Re: [DISCUSS] KIP-146: Classloading Isolation in Connect

Posted by Konstantine Karantasis <ko...@confluent.io>.

Gwen, Randall thank you for your very insightful observations. I'm glad you
find this first draft to be an adequate platform for discussion.

I'll attempt replying to your comments in order.

Gwen, I also debated exactly the same two options: a) interpreting absence
of module path as a user's intention to turn off isolation and b)
explicitly using an additional boolean property. A few reasons why I went
with b) in this first draft are:
1) As Randall mentions, to leave the option of using a default value open.
If not immediately in the first version of isolation, maybe in the future.
2) I didn't like the implicit character of the choice of interpreting an
empty string as a clear intention to turn isolation off by the user. Half
the time could be just that users forget to set a location, although they'd
like to use class loading isolation.
3) There's a slim possibility that in rare occasions a user might want to
avoid even the slightest increase in memory consumption due to class
loading duplication. I admit this should be very rare, but given the other
concerns and that we would really like to keep the isolation implementation
simple, the option to turn off this feature by using only one additional
config property might not seem too excessive. At least at the start of this
discussion.
4) Debugging during development might be simpler in some cases.
5) Finally, as you mention, this could allow for smoother upgrades.

Randall, regarding your comments:
1) To keep its focus narrow, this KIP, as well as the first implementation
of isolation in Connect, assume filesystem based discovery. With careful
implementation, transitioning to discovery schemes that support broader
URIs I believe should be easy in the future.
2) The example you give makes a good point. However I'm inclined to say
that such cases should be addressed more as exceptions rather than as being
the common case. Therefore, I wouldn't see all dependencies imported by the
framework as required to be filtered out, because in that case we lose the
advantage of isolation between the framework and the connectors (and we are
left only with isolation between connectors).
3) I tried to abstract implementation details in this the KIP, but you are
right. Even though filtering here is mainly used semantically rather than
literally, it gives an implementation hint that we could avoid.
4) In the same spirit as in 3) I believe we should reserve enough
flexibility to the implementation to discover and load classes, when they
appear in multiple locations under the general module location.

Thanks again! Let me know what you think.
Konstantine

On Mon, May 1, 2017 at 10:12 AM, Randall Hauch <rh...@gmail.com> wrote:

> Very nice work, Konstantine. Conflicting dependencies of connectors is
> indeed a big issue that makes it hard to manage installed connectors.
>
> I do like Gwen's idea about removing the 'module.isolation.enabled'
> property. However, I would have anticipated always using classpath
> isolation for *only* those components registered under the module path and
> not really for anything else already on the normal classpath. So, people
> could continue to place custom connector JARs onto the classpath, though
> this would become deprecated in favor of installing custom connector JARs /
> modules via the module path. This keeps configuration simple, gives people
> time to migrate, but let's people that need classpath isolation get it to
> install a variety of connectors each with their dependencies that
> potentially conflict with other components.
>
> The challenge is whether there should be a default for 'module.path'.
> Ideally there would be so that users know where they can install their
> connectors. However, I suspect that this might be difficult to do unless it
> can make use of system properties such as "${kafka.home}" so that relative
> directories can be specified.
>
> A few other questions/comments:
>
> 1) Does the KIP have to specify how are components / modules installed,
> discovered, or recognized by Kafka Connect? Or perhaps the KIP needs to
> just specify the semantics of the file system module path (e.g., the
> directories below those specified in the module path are to be unique and
> identify an installed component).
>
> 2) Will the module classloader filtering also have to exclude Kafka Connect
> dependencies? The only one that I can think of is the SLF4J API, which
> can't be loaded from the module's classloader if the connector is to send
> its log messages to the same logging system.
>
> 3) Rather than specify filtering, would be it a bit more flexible to simply
> say that the implementation will need to ensure that Java, Kafka Connect,
> and other third party APIs (e.g., SLF4J API) will not be loaded from the
> module classloaders? It'd be better to avoid specifying how it will be
> done, just in case the implementation needs to evolve or use a different
> technique (e.g., load the Java and public Kafka Connect APIs via one
> classloader that is reused and that always appears before the module
> classloader, while Kafka Connect implementation JARs appear after the
> component's classloader.
>
> 4) Perhaps to address #2 and #3 above, perhaps the KIP could explicitly
> specify the classloader order for a deployed connector. For example,
> 'java', 'kafka-connect-apis', 'connector-module', 'smt-module-1', ...,
> 'kafka-connect-impls', where 'connector-module' is the classloader for the
> (first) module where the connector is found, 'smt-module-1' is the
> classloader for the (first) module where the first SMT class is found (if
> specified and found in a separate module), 'smt-module-2' is the
> classloader .... Might also need to say that the KIP does not specify how
> the implementation will pick the module if a specified class if found in
> more than one module.
>
> Thoughts?
>
> Randall
>
> On Mon, May 1, 2017 at 6:43 AM, Gwen Shapira <gw...@confluent.io> wrote:
>
> > Hi Konstantine,
> >
> > Thank you so much for driving this! The connector classpath mess is
> driving
> > me nuts (or worse, driving me to use Docker).
> >
> > I like the proposal for micro-benchmarks to test the context switching
> > overhead.
> >
> > I have a difficult time figuring out the module.isolation.enabled.
> > Especially with a default to false. I can't think of a reason that anyone
> > will not want classpath isolation. "No! I want my connectors to mess up
> > each other's dependencies" said no one ever.
> >
> > So it looks like this is mostly for upgrade purpose? Because the initial
> > upgrade will not have the module.path set and therefore classpath
> isolation
> > will simply not work by default?
> >
> > In that case, why don't we simply use the existence of non-empty
> > module.path as an indicator of whether isolation should work or not? seem
> > simpler and intuitive to me.
> >
> > Thanks!
> >
> > Gwen
> >
> >
> >
> >
> >
> > On Sat, Apr 29, 2017 at 9:16 AM, Konstantine Karantasis <
> > konstantine@confluent.io> wrote:
> >
> > > * Because of KIP number collision, please disregard my previous KIP
> > > announcement and use this thread for discussion instead *
> > >
> > >
> > > Hi everyone,
> > >
> > > we aim to address dependency conflicts in Kafka Connect soon by
> applying
> > > class loading isolation.
> > >
> > > Feel free to take a look at KIP-146 here:
> > >
> > > *https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > 146+-+Classloading+Isolation+in+Connect
> > > <https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > 146+-+Classloading+Isolation+in+Connect>*
> > >
> > > which describes minimal required changes to public interfaces and the
> > > general implementation approach.
> > >
> > > This is a much wanted feature for Kafka Connect. Your feedback is
> highly
> > > appreciated.
> > >
> > > -Konstantine
> > >
> >
> >
> >
> > --
> > *Gwen Shapira*
> > Product Manager | Confluent
> > 650.450.2760 | @gwenshap
> > Follow us: Twitter <https://twitter.com/ConfluentInc> | blog
> > <http://www.confluent.io/blog>
> >
>

Re: [DISCUSS] KIP-146: Classloading Isolation in Connect

Posted by Randall Hauch <rh...@gmail.com>.

Very nice work, Konstantine. Conflicting dependencies of connectors is
indeed a big issue that makes it hard to manage installed connectors.

I do like Gwen's idea about removing the 'module.isolation.enabled'
property. However, I would have anticipated always using classpath
isolation for *only* those components registered under the module path and
not really for anything else already on the normal classpath. So, people
could continue to place custom connector JARs onto the classpath, though
this would become deprecated in favor of installing custom connector JARs /
modules via the module path. This keeps configuration simple, gives people
time to migrate, but let's people that need classpath isolation get it to
install a variety of connectors each with their dependencies that
potentially conflict with other components.

The challenge is whether there should be a default for 'module.path'.
Ideally there would be so that users know where they can install their
connectors. However, I suspect that this might be difficult to do unless it
can make use of system properties such as "${kafka.home}" so that relative
directories can be specified.

A few other questions/comments:

1) Does the KIP have to specify how are components / modules installed,
discovered, or recognized by Kafka Connect? Or perhaps the KIP needs to
just specify the semantics of the file system module path (e.g., the
directories below those specified in the module path are to be unique and
identify an installed component).

2) Will the module classloader filtering also have to exclude Kafka Connect
dependencies? The only one that I can think of is the SLF4J API, which
can't be loaded from the module's classloader if the connector is to send
its log messages to the same logging system.

3) Rather than specify filtering, would be it a bit more flexible to simply
say that the implementation will need to ensure that Java, Kafka Connect,
and other third party APIs (e.g., SLF4J API) will not be loaded from the
module classloaders? It'd be better to avoid specifying how it will be
done, just in case the implementation needs to evolve or use a different
technique (e.g., load the Java and public Kafka Connect APIs via one
classloader that is reused and that always appears before the module
classloader, while Kafka Connect implementation JARs appear after the
component's classloader.

4) Perhaps to address #2 and #3 above, perhaps the KIP could explicitly
specify the classloader order for a deployed connector. For example,
'java', 'kafka-connect-apis', 'connector-module', 'smt-module-1', ...,
'kafka-connect-impls', where 'connector-module' is the classloader for the
(first) module where the connector is found, 'smt-module-1' is the
classloader for the (first) module where the first SMT class is found (if
specified and found in a separate module), 'smt-module-2' is the
classloader .... Might also need to say that the KIP does not specify how
the implementation will pick the module if a specified class if found in
more than one module.

Thoughts?

Randall

On Mon, May 1, 2017 at 6:43 AM, Gwen Shapira <gw...@confluent.io> wrote:

> Hi Konstantine,
>
> Thank you so much for driving this! The connector classpath mess is driving
> me nuts (or worse, driving me to use Docker).
>
> I like the proposal for micro-benchmarks to test the context switching
> overhead.
>
> I have a difficult time figuring out the module.isolation.enabled.
> Especially with a default to false. I can't think of a reason that anyone
> will not want classpath isolation. "No! I want my connectors to mess up
> each other's dependencies" said no one ever.
>
> So it looks like this is mostly for upgrade purpose? Because the initial
> upgrade will not have the module.path set and therefore classpath isolation
> will simply not work by default?
>
> In that case, why don't we simply use the existence of non-empty
> module.path as an indicator of whether isolation should work or not? seem
> simpler and intuitive to me.
>
> Thanks!
>
> Gwen
>
>
>
>
>
> On Sat, Apr 29, 2017 at 9:16 AM, Konstantine Karantasis <
> konstantine@confluent.io> wrote:
>
> > * Because of KIP number collision, please disregard my previous KIP
> > announcement and use this thread for discussion instead *
> >
> >
> > Hi everyone,
> >
> > we aim to address dependency conflicts in Kafka Connect soon by applying
> > class loading isolation.
> >
> > Feel free to take a look at KIP-146 here:
> >
> > *https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > 146+-+Classloading+Isolation+in+Connect
> > <https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > 146+-+Classloading+Isolation+in+Connect>*
> >
> > which describes minimal required changes to public interfaces and the
> > general implementation approach.
> >
> > This is a much wanted feature for Kafka Connect. Your feedback is highly
> > appreciated.
> >
> > -Konstantine
> >
>
>
>
> --
> *Gwen Shapira*
> Product Manager | Confluent
> 650.450.2760 | @gwenshap
> Follow us: Twitter <https://twitter.com/ConfluentInc> | blog
> <http://www.confluent.io/blog>
>