You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pulsar.apache.org by Andrey Yegorov <an...@datastax.com> on 2021/05/25 18:20:22 UTC

Connectors package registry

Hello, As Pulsar becomes increasingly popular, we will have to deal with a
larger userbase looking to deploy Pulsar in a wider array of use cases,
interfacing with a more diverse set of other components.  To help with
this, we should create a plan as a project to help community members
publish and discover connectors beyond what the Pulsar PMC wants to
maintain.

Current plans include splitting connectors into separate repos (PIP 62) or
moving under the umbrella of the projects they integrate with (as per
conversations during the community meetings). This will definitely help
with the build times but may negatively affect the discoverability of the
connectors and ease of installation.

I think Pulsar can benefit from a simple package registry that (1) hosts a
list of free to use (apache or other approved license)
connectors/references to the binaries, and (2) provides a CLI (e.g. via
pulsar admin) to simplify discovery, download, and installation of the
connectors for the new users.

What do you think? Would you find something like this useful?

The implementation can be as simple as another GitHub repo with a
predefined structure like

    {connector name}/{major version}/metadata

where metadata contains url to the nar, checksum, range of compatible
pulsar versions, contacts, license, short description, etc.

Plus the CLI that can search/list compatible connectors,
download/install/update the connector.

As prior art examples, one can refer to:

   1.

   brew (package manager for MacOS)
   1.

      Formulas/registry: https://github.com/Homebrew/homebrew-core
      2.

      brew itself https://github.com/Homebrew/brew
      2.

   Apache Solr
   1.

      https://solr.apache.org/guide/8_8/package-manager.html.

There is also the Helm chart repository from our cousins at the CNCF over
at the Artifact Hub <https://artifacthub.io/>.

I believe such a registry managed by the PMC will reduce the risk of
fragmentation of the ecosystem, improve discoverability, and allow simple
detection of not-up-to-date connectors for the new releases of Pulsar.

-- 
Andrey Yegorov

Re: Connectors package registry

Posted by Jonathan Ellis <jb...@gmail.com>.
On Wed, May 26, 2021 at 10:45 AM Andrey Yegorov <an...@datastax.com>
wrote:

> My notes from the community meeting. Jerry, Matteo, and I talked about this
> idea:
>

Thanks for the proposal and the notes, Andrey!  I think a connector
registry will be very useful as Pulsar adoption grows.

* major problems are:
>   -- the process of adding new stuff (approval, review, quality
> control/approval stuff)
>

This should be pretty lightweight, right?  Brew has an approval process
where the maintainers review your formula, but since Pulsar connectors are
much more standardized that won't be necessary for us.  The Pulsar
connector repository's only role needs to be discovery, QA is the
responsibility of the connector maintainer.


>   -- what to do with stuff that is not under apache umbrella (as in: code
> ownership, not license). ASF possibly does not allow that
>

I think the PMC can set whatever policy it likes here -- it is only linking
resources, not hosting binaries.  Personally, I think it's better to be as
comprehensive as possible.


>   -- conflict resolution with commercial entities pushing their competing
> connectors
>

I recommend we simply decline to get involved in naming conflicts.  (This
is the formal policy of pypi <https://www.python.org/dev/peps/pep-0541/>,
for instance, which also has a liberal "anyone can upload a package"
stance.)

Re: Connectors package registry

Posted by Andrey Yegorov <an...@datastax.com>.
My notes from the community meeting. Jerry, Matteo, and I talked about this
idea:

* major problems are:
  -- the process of adding new stuff (approval, review, quality
control/approval stuff)
  -- what to do with stuff that is not under apache umbrella (as in: code
ownership, not license). ASF possibly does not allow that
  -- conflict resolution with commercial entities pushing their competing
connectors

* overall it is not a strict no; more like we need to think about potential
issues and hear other folks
some options that sounded ok were like a repo with apache stuff and an
option to add (like "brew tap") other repos at own risk


On Tue, May 25, 2021 at 1:58 PM Andrey Yegorov <an...@datastax.com>
wrote:

> I do agree that hosting of the binaries is not an issue.
> Discovering the binaries is. Proposed repo should not host the binaries
> but the urls/checksums etc. and serve as a simple data storage for the CLI
> to use.
>
> Having multiple parties publishing their connectors is fine but it makes
> it hard for the new user to see the full extent of the ecosystem given
> little motivation for all parties to proactively notify the pulsar
> community outside of their own client base.
>
> I.e. Kafka ecosystem does not list Snowflake connector:
> https://cwiki.apache.org/confluence/display/KAFKA/Ecosystem (Confluent
> Hub does, but let's look at FOSS)
> Snowflake's docs do:
> https://docs.snowflake.com/en/user-guide/kafka-connector-install.html
>
> Having a registry at the time when Snowflake and others start building
> Pulsar connectors will result in them notifying the community by adding
> their connectors to the registry; the motivation at that time is obvious:
> ease of installation for their customers.
>
>
> On Tue, May 25, 2021 at 11:59 AM Jerry Peng <je...@gmail.com>
> wrote:
>
>> Hello Andrey,
>>
>> Thank you for bringing this up! This is definitely an important issue!
>>
>>  All of the connector binaries are already hosted on Maven central thus I
>> don't think hosting the binaries is an issue.  Perhaps the key problem
>> here
>> is about discovery.
>>
>>  My thoughts:
>>
>> 1. We should document clearly on the Apache Pulsar website all the
>> connectors that we offer.
>>
>> https://pulsar.apache.org/docs/en/io-connectors/
>>
>> Seems like we already do that? If not, we should make sure to keep this
>> list up to date.  Maybe the list is not visible enough to new users.  If
>> so, we should figure out how to advertise the connectors we already have
>> in
>> a better fashion.
>>
>> 2.  I do like the idea of having a tool that can perhaps search and
>> install
>> connectors automatically for you.  Perhaps this is a feature we can add to
>> the existing pulsar-admin CLI tool.  This feature can search maven for
>> connector binaries and download / install them if instructed by the user.
>>
>> Best,
>>
>> Jerry
>>
>> On Tue, May 25, 2021 at 11:20 AM Andrey Yegorov <
>> andrey.yegorov@datastax.com>
>> wrote:
>>
>> > Hello, As Pulsar becomes increasingly popular, we will have to deal
>> with a
>> > larger userbase looking to deploy Pulsar in a wider array of use cases,
>> > interfacing with a more diverse set of other components.  To help with
>> > this, we should create a plan as a project to help community members
>> > publish and discover connectors beyond what the Pulsar PMC wants to
>> > maintain.
>> >
>> > Current plans include splitting connectors into separate repos (PIP 62)
>> or
>> > moving under the umbrella of the projects they integrate with (as per
>> > conversations during the community meetings). This will definitely help
>> > with the build times but may negatively affect the discoverability of
>> the
>> > connectors and ease of installation.
>> >
>> > I think Pulsar can benefit from a simple package registry that (1)
>> hosts a
>> > list of free to use (apache or other approved license)
>> > connectors/references to the binaries, and (2) provides a CLI (e.g. via
>> > pulsar admin) to simplify discovery, download, and installation of the
>> > connectors for the new users.
>> >
>> > What do you think? Would you find something like this useful?
>> >
>> > The implementation can be as simple as another GitHub repo with a
>> > predefined structure like
>> >
>> >     {connector name}/{major version}/metadata
>> >
>> > where metadata contains url to the nar, checksum, range of compatible
>> > pulsar versions, contacts, license, short description, etc.
>> >
>> > Plus the CLI that can search/list compatible connectors,
>> > download/install/update the connector.
>> >
>> > As prior art examples, one can refer to:
>> >
>> >    1.
>> >
>> >    brew (package manager for MacOS)
>> >    1.
>> >
>> >       Formulas/registry: https://github.com/Homebrew/homebrew-core
>> >       2.
>> >
>> >       brew itself https://github.com/Homebrew/brew
>> >       2.
>> >
>> >    Apache Solr
>> >    1.
>> >
>> >       https://solr.apache.org/guide/8_8/package-manager.html.
>> >
>> > There is also the Helm chart repository from our cousins at the CNCF
>> over
>> > at the Artifact Hub <
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__artifacthub.io_&d=DwIBaQ&c=adz96Xi0w1RHqtPMowiL2g&r=0B1UvYMwy7dr9qtqFwQCfxUyrozUgZzbOshynTIaYUY&m=5VGTvu3-sbRB4yxv7chMNCH9azZNDrJ_ReACRT6o5I8&s=_euXKnolKTg6dZywYpfqpb6q7HapdwpgyUgx3PQUxBU&e=
>> >.
>> >
>> > I believe such a registry managed by the PMC will reduce the risk of
>> > fragmentation of the ecosystem, improve discoverability, and allow
>> simple
>> > detection of not-up-to-date connectors for the new releases of Pulsar.
>> >
>> > --
>> > Andrey Yegorov
>> >
>>
>
>
> --
> Andrey Yegorov
>


-- 
Andrey Yegorov

Re: Connectors package registry

Posted by Andrey Yegorov <an...@datastax.com>.
I do agree that hosting of the binaries is not an issue.
Discovering the binaries is. Proposed repo should not host the binaries but
the urls/checksums etc. and serve as a simple data storage for the CLI to
use.

Having multiple parties publishing their connectors is fine but it makes it
hard for the new user to see the full extent of the ecosystem given little
motivation for all parties to proactively notify the pulsar community
outside of their own client base.

I.e. Kafka ecosystem does not list Snowflake connector:
https://cwiki.apache.org/confluence/display/KAFKA/Ecosystem (Confluent Hub
does, but let's look at FOSS)
Snowflake's docs do:
https://docs.snowflake.com/en/user-guide/kafka-connector-install.html

Having a registry at the time when Snowflake and others start building
Pulsar connectors will result in them notifying the community by adding
their connectors to the registry; the motivation at that time is obvious:
ease of installation for their customers.


On Tue, May 25, 2021 at 11:59 AM Jerry Peng <je...@gmail.com>
wrote:

> Hello Andrey,
>
> Thank you for bringing this up! This is definitely an important issue!
>
>  All of the connector binaries are already hosted on Maven central thus I
> don't think hosting the binaries is an issue.  Perhaps the key problem here
> is about discovery.
>
>  My thoughts:
>
> 1. We should document clearly on the Apache Pulsar website all the
> connectors that we offer.
>
> https://pulsar.apache.org/docs/en/io-connectors/
>
> Seems like we already do that? If not, we should make sure to keep this
> list up to date.  Maybe the list is not visible enough to new users.  If
> so, we should figure out how to advertise the connectors we already have in
> a better fashion.
>
> 2.  I do like the idea of having a tool that can perhaps search and install
> connectors automatically for you.  Perhaps this is a feature we can add to
> the existing pulsar-admin CLI tool.  This feature can search maven for
> connector binaries and download / install them if instructed by the user.
>
> Best,
>
> Jerry
>
> On Tue, May 25, 2021 at 11:20 AM Andrey Yegorov <
> andrey.yegorov@datastax.com>
> wrote:
>
> > Hello, As Pulsar becomes increasingly popular, we will have to deal with
> a
> > larger userbase looking to deploy Pulsar in a wider array of use cases,
> > interfacing with a more diverse set of other components.  To help with
> > this, we should create a plan as a project to help community members
> > publish and discover connectors beyond what the Pulsar PMC wants to
> > maintain.
> >
> > Current plans include splitting connectors into separate repos (PIP 62)
> or
> > moving under the umbrella of the projects they integrate with (as per
> > conversations during the community meetings). This will definitely help
> > with the build times but may negatively affect the discoverability of the
> > connectors and ease of installation.
> >
> > I think Pulsar can benefit from a simple package registry that (1) hosts
> a
> > list of free to use (apache or other approved license)
> > connectors/references to the binaries, and (2) provides a CLI (e.g. via
> > pulsar admin) to simplify discovery, download, and installation of the
> > connectors for the new users.
> >
> > What do you think? Would you find something like this useful?
> >
> > The implementation can be as simple as another GitHub repo with a
> > predefined structure like
> >
> >     {connector name}/{major version}/metadata
> >
> > where metadata contains url to the nar, checksum, range of compatible
> > pulsar versions, contacts, license, short description, etc.
> >
> > Plus the CLI that can search/list compatible connectors,
> > download/install/update the connector.
> >
> > As prior art examples, one can refer to:
> >
> >    1.
> >
> >    brew (package manager for MacOS)
> >    1.
> >
> >       Formulas/registry: https://github.com/Homebrew/homebrew-core
> >       2.
> >
> >       brew itself https://github.com/Homebrew/brew
> >       2.
> >
> >    Apache Solr
> >    1.
> >
> >       https://solr.apache.org/guide/8_8/package-manager.html.
> >
> > There is also the Helm chart repository from our cousins at the CNCF over
> > at the Artifact Hub <
> https://urldefense.proofpoint.com/v2/url?u=https-3A__artifacthub.io_&d=DwIBaQ&c=adz96Xi0w1RHqtPMowiL2g&r=0B1UvYMwy7dr9qtqFwQCfxUyrozUgZzbOshynTIaYUY&m=5VGTvu3-sbRB4yxv7chMNCH9azZNDrJ_ReACRT6o5I8&s=_euXKnolKTg6dZywYpfqpb6q7HapdwpgyUgx3PQUxBU&e=
> >.
> >
> > I believe such a registry managed by the PMC will reduce the risk of
> > fragmentation of the ecosystem, improve discoverability, and allow simple
> > detection of not-up-to-date connectors for the new releases of Pulsar.
> >
> > --
> > Andrey Yegorov
> >
>


-- 
Andrey Yegorov

Re: Connectors package registry

Posted by Jonathan Ellis <jb...@gmail.com>.
Do we have consensus on metadata repo + generated docs page to start with?


On Sat, May 29, 2021 at 8:44 AM Jonathan Ellis <jb...@gmail.com> wrote:

> On Fri, May 28, 2021 at 2:55 PM Sijie Guo <gu...@gmail.com> wrote:
>
>> An alternative is to just add a table to the connectors documentation page
>> like how we add third-party clients.
>> http://pulsar.apache.org/docs/en/client-libraries/#third-party-clients
>>
>> It only refers to the Github repos for those connectors but doesn't point
>> to any downloadable binaries.
>> It provides the ability for people to discover the connectors. People can
>> then go to those repos to find the connectors to use. But the PMC doesn't
>> need to be pulled into the risk for managing versions and releases for
>> external connectors.
>>
>
> I'm really confused as to how I've implied that the PMC should manage
> versions and releases, because that was the opposite of my intention!
>
> But yes, that kind of table is exactly what I have in mind.  I'm just
> saying that as an implementation detail, I like the idea to have a separate
> repo that contains connector metadata of the sort that Andrey proposes,
> that we then generate the docs table from.  It's a pretty high barrier to
> entry if we have people editing the docs by hand (and building it to make
> sure it works) to submit a PR for a new connector.
>
>
>

-- 
Jonathan Ellis
co-founder, http://www.datastax.com
@spyced

Re: Connectors package registry

Posted by Jonathan Ellis <jb...@gmail.com>.
On Fri, May 28, 2021 at 2:55 PM Sijie Guo <gu...@gmail.com> wrote:

> An alternative is to just add a table to the connectors documentation page
> like how we add third-party clients.
> http://pulsar.apache.org/docs/en/client-libraries/#third-party-clients
>
> It only refers to the Github repos for those connectors but doesn't point
> to any downloadable binaries.
> It provides the ability for people to discover the connectors. People can
> then go to those repos to find the connectors to use. But the PMC doesn't
> need to be pulled into the risk for managing versions and releases for
> external connectors.
>

I'm really confused as to how I've implied that the PMC should manage
versions and releases, because that was the opposite of my intention!

But yes, that kind of table is exactly what I have in mind.  I'm just
saying that as an implementation detail, I like the idea to have a separate
repo that contains connector metadata of the sort that Andrey proposes,
that we then generate the docs table from.  It's a pretty high barrier to
entry if we have people editing the docs by hand (and building it to make
sure it works) to submit a PR for a new connector.

Re: Connectors package registry

Posted by Sijie Guo <gu...@gmail.com>.
On Thu, May 27, 2021 at 7:54 PM Jonathan Ellis <jb...@gmail.com> wrote:

> It sounds like you're envisioning an "Apple Store" model where every
> submission is rigorously tested and vetted.  That is certainly an option,
> but since the PMC gets to define what the rules are, it's also an option to
> say, "this index is provided as a community service with no guarantees of
> quality."
>

I am expressing my concern about having such a service hosted by the PMC.
Hence I am leaning towards that PMC only manages the pre-built connectors
that are released as part of a Pulsar release.


>
> Both models have succeeded with users -- the Python package index PyPI is
> an example of the "anything goes" kind.
>
> What are the alternatives if the PMC declines to provide such an index?
> (Let's call it an index, since "repository" seems to imply hosting the kind
> of code or binary releases that I'm *not* suggesting.)  It seems to me that
> the most likely outcomes are
>
> 1. Today's state of affairs where essentially nobody knows about connectors
> that are not directly maintained and released by the PMC
> 2. A Confluent Hub model where a vendor hosts a connector index to fill the
> vacuum, or worse, multiple competing vendors do this.
>

An alternative is to just add a table to the connectors documentation page
like how we add third-party clients.
http://pulsar.apache.org/docs/en/client-libraries/#third-party-clients

It only refers to the Github repos for those connectors but doesn't point
to any downloadable binaries.
It provides the ability for people to discover the connectors. People can
then go to those repos to find the connectors to use. But the PMC doesn't
need to be pulled into the risk for managing versions and releases for
external connectors.


>
> I don't think either of these is good for Apache Pulsar or its users.


> P.S. I get that it's awkward for me to say, "the PMC should do this" as a
> non-PMC member, but I'm happy to volunteer to help with any housekeeping
> needed for this proposal that the PMC would like to delegate, from creating
> a git repo to adding a script to the site builder to turn that into a page
> to approving PRs that people submit.
>

Not sure about the delegation here. I think the proposal here is still
under discussion.

I don't know what the git repo you are referring to here. If it is a new
Git repo that would be created under the Pulsar PMC, that needs to be
discussed and approved by the PMC.

If you are referring to the scripts to improve the documentation for the
pre-build connectors, I think those can be added to the main Pulsar repo.




>
>
> On Thu, May 27, 2021 at 3:46 PM Sijie Guo <gu...@gmail.com> wrote:
>
> > On Thu, May 27, 2021 at 1:17 PM Jonathan Ellis <jb...@gmail.com>
> wrote:
> >
> > > On Thu, May 27, 2021 at 2:38 PM Sijie Guo <gu...@gmail.com> wrote:
> > >
> > > > Agreed that the main problem is about discovering the existing
> > pre-built
> > > > Pulsar connectors. I don't think the PMC should involve hosting and
> > > > managing external connectors because it will put the PMC in the
> > situation
> > > > in handling licensing issues that I think we should avoid.
> > > >
> > >
> > > I totally agree that the PMC shouldn't get tied up in licensing
> > discussions
> > > for third party connectors, but I don't see how that's an issue if
> we're
> > > talking about a repository with descriptions and URLs -- no code, no
> > > binaries.  Am I missing something?
> > >
> >
> > PMC has been treated as an authority of the project and trusted by most
> of
> > the people.
> >
> > If it is managed by the PMC, PMC is responsible for verifying the links,
> > whether the links point to any malformed binaries and their software
> > licenses.
> > For example, if a Pulsar user goes to the Pulsar website and downloads a
> > malformed binary, who is going to be responsible for that?
> > The PMC, the ASF, or the owner of the binary? IMO, it is too risky to
> > manage.
> >
> > As the PMC, we can make recommendations but I would avoid getting into
> the
> > trouble of managing external binaries even via links.
> >
> >
> > >
> > >
> > > > All the ASF accepted connectors are already in the main Pulsar repo.
> > Even
> > > > they are moved to the pulsar-connectors repo. They are managed and
> > > released
> > > > as part of the Pulsar release.
> > > >
> > >
> > > Right, there's no problem here.  Where there is an opportunity to
> improve
> > > is discoverability for third party connectors that the PMC does *not*
> > want
> > > to bring in and maintain officially.
> > >
> >
> > See comments above.
> >
>
>
> --
> Jonathan Ellis
> co-founder, http://www.datastax.com
> @spyced
>

Re: Connectors package registry

Posted by Jonathan Ellis <jb...@gmail.com>.
It sounds like you're envisioning an "Apple Store" model where every
submission is rigorously tested and vetted.  That is certainly an option,
but since the PMC gets to define what the rules are, it's also an option to
say, "this index is provided as a community service with no guarantees of
quality."

Both models have succeeded with users -- the Python package index PyPI is
an example of the "anything goes" kind.

What are the alternatives if the PMC declines to provide such an index?
(Let's call it an index, since "repository" seems to imply hosting the kind
of code or binary releases that I'm *not* suggesting.)  It seems to me that
the most likely outcomes are

1. Today's state of affairs where essentially nobody knows about connectors
that are not directly maintained and released by the PMC
2. A Confluent Hub model where a vendor hosts a connector index to fill the
vacuum, or worse, multiple competing vendors do this.

I don't think either of these is good for Apache Pulsar or its users.

P.S. I get that it's awkward for me to say, "the PMC should do this" as a
non-PMC member, but I'm happy to volunteer to help with any housekeeping
needed for this proposal that the PMC would like to delegate, from creating
a git repo to adding a script to the site builder to turn that into a page
to approving PRs that people submit.


On Thu, May 27, 2021 at 3:46 PM Sijie Guo <gu...@gmail.com> wrote:

> On Thu, May 27, 2021 at 1:17 PM Jonathan Ellis <jb...@gmail.com> wrote:
>
> > On Thu, May 27, 2021 at 2:38 PM Sijie Guo <gu...@gmail.com> wrote:
> >
> > > Agreed that the main problem is about discovering the existing
> pre-built
> > > Pulsar connectors. I don't think the PMC should involve hosting and
> > > managing external connectors because it will put the PMC in the
> situation
> > > in handling licensing issues that I think we should avoid.
> > >
> >
> > I totally agree that the PMC shouldn't get tied up in licensing
> discussions
> > for third party connectors, but I don't see how that's an issue if we're
> > talking about a repository with descriptions and URLs -- no code, no
> > binaries.  Am I missing something?
> >
>
> PMC has been treated as an authority of the project and trusted by most of
> the people.
>
> If it is managed by the PMC, PMC is responsible for verifying the links,
> whether the links point to any malformed binaries and their software
> licenses.
> For example, if a Pulsar user goes to the Pulsar website and downloads a
> malformed binary, who is going to be responsible for that?
> The PMC, the ASF, or the owner of the binary? IMO, it is too risky to
> manage.
>
> As the PMC, we can make recommendations but I would avoid getting into the
> trouble of managing external binaries even via links.
>
>
> >
> >
> > > All the ASF accepted connectors are already in the main Pulsar repo.
> Even
> > > they are moved to the pulsar-connectors repo. They are managed and
> > released
> > > as part of the Pulsar release.
> > >
> >
> > Right, there's no problem here.  Where there is an opportunity to improve
> > is discoverability for third party connectors that the PMC does *not*
> want
> > to bring in and maintain officially.
> >
>
> See comments above.
>


-- 
Jonathan Ellis
co-founder, http://www.datastax.com
@spyced

Re: Connectors package registry

Posted by Sijie Guo <gu...@gmail.com>.
On Thu, May 27, 2021 at 1:17 PM Jonathan Ellis <jb...@gmail.com> wrote:

> On Thu, May 27, 2021 at 2:38 PM Sijie Guo <gu...@gmail.com> wrote:
>
> > Agreed that the main problem is about discovering the existing pre-built
> > Pulsar connectors. I don't think the PMC should involve hosting and
> > managing external connectors because it will put the PMC in the situation
> > in handling licensing issues that I think we should avoid.
> >
>
> I totally agree that the PMC shouldn't get tied up in licensing discussions
> for third party connectors, but I don't see how that's an issue if we're
> talking about a repository with descriptions and URLs -- no code, no
> binaries.  Am I missing something?
>

PMC has been treated as an authority of the project and trusted by most of
the people.

If it is managed by the PMC, PMC is responsible for verifying the links,
whether the links point to any malformed binaries and their software
licenses.
For example, if a Pulsar user goes to the Pulsar website and downloads a
malformed binary, who is going to be responsible for that?
The PMC, the ASF, or the owner of the binary? IMO, it is too risky to
manage.

As the PMC, we can make recommendations but I would avoid getting into the
trouble of managing external binaries even via links.


>
>
> > All the ASF accepted connectors are already in the main Pulsar repo. Even
> > they are moved to the pulsar-connectors repo. They are managed and
> released
> > as part of the Pulsar release.
> >
>
> Right, there's no problem here.  Where there is an opportunity to improve
> is discoverability for third party connectors that the PMC does *not* want
> to bring in and maintain officially.
>

See comments above.

Re: Connectors package registry

Posted by Jonathan Ellis <jb...@gmail.com>.
On Thu, May 27, 2021 at 2:38 PM Sijie Guo <gu...@gmail.com> wrote:

> Agreed that the main problem is about discovering the existing pre-built
> Pulsar connectors. I don't think the PMC should involve hosting and
> managing external connectors because it will put the PMC in the situation
> in handling licensing issues that I think we should avoid.
>

I totally agree that the PMC shouldn't get tied up in licensing discussions
for third party connectors, but I don't see how that's an issue if we're
talking about a repository with descriptions and URLs -- no code, no
binaries.  Am I missing something?


> All the ASF accepted connectors are already in the main Pulsar repo. Even
> they are moved to the pulsar-connectors repo. They are managed and released
> as part of the Pulsar release.
>

Right, there's no problem here.  Where there is an opportunity to improve
is discoverability for third party connectors that the PMC does *not* want
to bring in and maintain officially.

Re: Connectors package registry

Posted by Sijie Guo <gu...@gmail.com>.
Agreed that the main problem is about discovering the existing pre-built
Pulsar connectors. I don't think the PMC should involve hosting and
managing external connectors because it will put the PMC in the situation
in handling licensing issues that I think we should avoid.

All the ASF accepted connectors are already in the main Pulsar repo. Even
they are moved to the pulsar-connectors repo. They are managed and released
as part of the Pulsar release.

We have been improving how we generate documentation for Pulsar APIs and
tools. What we can do is to extend existing tooling to automatically
generate all the documentation for pre-built connectors. This tool can also
generate a JSON file (or a list of JSON files) that contains all the
metadata information for all the pre-built connectors. These JSON files can
be put in a folder within the pulsar website and hosted as part of the
Pulsar website. Then we can just add a command in the `pulsar-admin` tool
to retrieve the JSON files and cache them locally. So pulsar-admin can
search and download the connector accordingly.

This approach ties very well into the current Pulsar release process
without introducing additional maintenance overhead.

Thanks,
Sijie

On Tue, May 25, 2021 at 11:59 AM Jerry Peng <je...@gmail.com>
wrote:

> Hello Andrey,
>
> Thank you for bringing this up! This is definitely an important issue!
>
>  All of the connector binaries are already hosted on Maven central thus I
> don't think hosting the binaries is an issue.  Perhaps the key problem here
> is about discovery.
>
>  My thoughts:
>
> 1. We should document clearly on the Apache Pulsar website all the
> connectors that we offer.
>
> https://pulsar.apache.org/docs/en/io-connectors/
>
> Seems like we already do that? If not, we should make sure to keep this
> list up to date.  Maybe the list is not visible enough to new users.  If
> so, we should figure out how to advertise the connectors we already have in
> a better fashion.
>
> 2.  I do like the idea of having a tool that can perhaps search and install
> connectors automatically for you.  Perhaps this is a feature we can add to
> the existing pulsar-admin CLI tool.  This feature can search maven for
> connector binaries and download / install them if instructed by the user.
>
> Best,
>
> Jerry
>
> On Tue, May 25, 2021 at 11:20 AM Andrey Yegorov <
> andrey.yegorov@datastax.com>
> wrote:
>
> > Hello, As Pulsar becomes increasingly popular, we will have to deal with
> a
> > larger userbase looking to deploy Pulsar in a wider array of use cases,
> > interfacing with a more diverse set of other components.  To help with
> > this, we should create a plan as a project to help community members
> > publish and discover connectors beyond what the Pulsar PMC wants to
> > maintain.
> >
> > Current plans include splitting connectors into separate repos (PIP 62)
> or
> > moving under the umbrella of the projects they integrate with (as per
> > conversations during the community meetings). This will definitely help
> > with the build times but may negatively affect the discoverability of the
> > connectors and ease of installation.
> >
> > I think Pulsar can benefit from a simple package registry that (1) hosts
> a
> > list of free to use (apache or other approved license)
> > connectors/references to the binaries, and (2) provides a CLI (e.g. via
> > pulsar admin) to simplify discovery, download, and installation of the
> > connectors for the new users.
> >
> > What do you think? Would you find something like this useful?
> >
> > The implementation can be as simple as another GitHub repo with a
> > predefined structure like
> >
> >     {connector name}/{major version}/metadata
> >
> > where metadata contains url to the nar, checksum, range of compatible
> > pulsar versions, contacts, license, short description, etc.
> >
> > Plus the CLI that can search/list compatible connectors,
> > download/install/update the connector.
> >
> > As prior art examples, one can refer to:
> >
> >    1.
> >
> >    brew (package manager for MacOS)
> >    1.
> >
> >       Formulas/registry: https://github.com/Homebrew/homebrew-core
> >       2.
> >
> >       brew itself https://github.com/Homebrew/brew
> >       2.
> >
> >    Apache Solr
> >    1.
> >
> >       https://solr.apache.org/guide/8_8/package-manager.html.
> >
> > There is also the Helm chart repository from our cousins at the CNCF over
> > at the Artifact Hub <https://artifacthub.io/>.
> >
> > I believe such a registry managed by the PMC will reduce the risk of
> > fragmentation of the ecosystem, improve discoverability, and allow simple
> > detection of not-up-to-date connectors for the new releases of Pulsar.
> >
> > --
> > Andrey Yegorov
> >
>

Re: Connectors package registry

Posted by Jerry Peng <je...@gmail.com>.
Hello Andrey,

Thank you for bringing this up! This is definitely an important issue!

 All of the connector binaries are already hosted on Maven central thus I
don't think hosting the binaries is an issue.  Perhaps the key problem here
is about discovery.

 My thoughts:

1. We should document clearly on the Apache Pulsar website all the
connectors that we offer.

https://pulsar.apache.org/docs/en/io-connectors/

Seems like we already do that? If not, we should make sure to keep this
list up to date.  Maybe the list is not visible enough to new users.  If
so, we should figure out how to advertise the connectors we already have in
a better fashion.

2.  I do like the idea of having a tool that can perhaps search and install
connectors automatically for you.  Perhaps this is a feature we can add to
the existing pulsar-admin CLI tool.  This feature can search maven for
connector binaries and download / install them if instructed by the user.

Best,

Jerry

On Tue, May 25, 2021 at 11:20 AM Andrey Yegorov <an...@datastax.com>
wrote:

> Hello, As Pulsar becomes increasingly popular, we will have to deal with a
> larger userbase looking to deploy Pulsar in a wider array of use cases,
> interfacing with a more diverse set of other components.  To help with
> this, we should create a plan as a project to help community members
> publish and discover connectors beyond what the Pulsar PMC wants to
> maintain.
>
> Current plans include splitting connectors into separate repos (PIP 62) or
> moving under the umbrella of the projects they integrate with (as per
> conversations during the community meetings). This will definitely help
> with the build times but may negatively affect the discoverability of the
> connectors and ease of installation.
>
> I think Pulsar can benefit from a simple package registry that (1) hosts a
> list of free to use (apache or other approved license)
> connectors/references to the binaries, and (2) provides a CLI (e.g. via
> pulsar admin) to simplify discovery, download, and installation of the
> connectors for the new users.
>
> What do you think? Would you find something like this useful?
>
> The implementation can be as simple as another GitHub repo with a
> predefined structure like
>
>     {connector name}/{major version}/metadata
>
> where metadata contains url to the nar, checksum, range of compatible
> pulsar versions, contacts, license, short description, etc.
>
> Plus the CLI that can search/list compatible connectors,
> download/install/update the connector.
>
> As prior art examples, one can refer to:
>
>    1.
>
>    brew (package manager for MacOS)
>    1.
>
>       Formulas/registry: https://github.com/Homebrew/homebrew-core
>       2.
>
>       brew itself https://github.com/Homebrew/brew
>       2.
>
>    Apache Solr
>    1.
>
>       https://solr.apache.org/guide/8_8/package-manager.html.
>
> There is also the Helm chart repository from our cousins at the CNCF over
> at the Artifact Hub <https://artifacthub.io/>.
>
> I believe such a registry managed by the PMC will reduce the risk of
> fragmentation of the ecosystem, improve discoverability, and allow simple
> detection of not-up-to-date connectors for the new releases of Pulsar.
>
> --
> Andrey Yegorov
>