You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@cassandra.apache.org by Adam Holmberg <ad...@datastax.com> on 2020/04/22 17:00:04 UTC

DataStax Driver Donation to Apache Cassandra Project

The developers who maintain the DataStax drivers would like to start a
conversation about donating these drivers to the Apache Cassandra project.
Since we're actively working on the C* 4.0 support and integration in the
drivers right now, we don't plan on executing on this until after C* 4.0
releases in order to avoid delaying the release. In the meantime we wanted
to open the discussion so that we can all determine what we think best
suits the project going forward.

There are a number of details we would like to discuss as a project
community. Naming a few to get the discussion going:

- Is there interest from the project community to take ownership of the
(currently) DataStax drivers?
- Which drivers should be taken into project stewardship?
-- The project currently bundles Java and Python; there are five others:
C#, Node.js, C++, PHP and Ruby
- Which major branch of the Java driver should be chosen for development?
-- Server currently uses Java driver 3.x but the latest is 4.x
- Who will be the committers that maintain these drivers? Should we
nominate new committers (contributors on the current drivers code-bases) so
they can keep maintaining them with minimal disruption to the project as a
whole?
- What should the new artifacts be named in package indices (coordinates
and artifact names)?
- How will we run CI for these contributions?
- Do we do in-tree? Sub-projects?

There will surely be even more to figure out as we go. We look forward to
discussing this with everyone.

Kind regards,
The DS Drivers Team

Re: DataStax Driver Donation to Apache Cassandra Project

Posted by Patrick McFadin <pm...@gmail.com>.
It would probably be a good idea to get some outside guidance on what other
projects have seen because like what Nate said, this isn't the first time.

https://felix.apache.org/documentation/subprojects.html
https://cocoon.apache.org/subprojects/
Commons has components: http://commons.apache.org/components.html
Hadoop, as mentioned, has modules.

Patrick

On Wed, Apr 22, 2020 at 1:25 PM Nate McCall <zz...@gmail.com> wrote:

> On Thu, Apr 23, 2020 at 5:37 AM Benedict Elliott Smith <
> benedict@apache.org>
> wrote:
>
> > I welcome the donation, and hope we are able to accept all of the
> > drivers.  This is really great news IMO.
> >
> >  I do however wonder if the project may be accumulating too many
> > sub-projects?  I wonder if it's time to think about splitting, and
> perhaps
> > incubating a project for the drivers?
> >
>
> This is a legit concern and good question, but I think this is more a
> natural evolution of growing a project. There is precedent for this in
> Spark, Beam, Hadoop and others who have a number of different repositories
> under the general project umbrella.
>
> What I would like to avoid is a situation like with Apache Curator and
> Apache Zookeeper. The former being a zookeeper client donation from Netflix
> that came in as a top level project. From the peanut gallery, it seems like
> that has been less than ideal a couple of times in the past coordinating
> releases, trademarks and such with separate project management.
>

Re: DataStax Driver Donation to Apache Cassandra Project

Posted by Joshua McKenzie <jm...@apache.org>.
CEP seems reasonable enough. I'll talk to Alex and Olivier.

On Thu, Jun 25, 2020 at 4:51 PM Mick Semb Wever <mc...@apache.org> wrote:

> > > What are next steps here? Anyone knowledgeable on thread?
>
>
> Can we take it to a CEP now?
>
> Even if the decision is to take one driver as a guinea pig and learn as we
> go, there's some questions that need to be thrashed out in advance
> (somewhere better than this thread), e.g. the incubator ip clearance steps,
> and other questions to tackle along the way: versions and supported
> branches, committers and maintainers, artifact and package names, CI, jira,
> ML, coordination to server, etc; so a CEP can be the document to land it
> all.
>

Re: DataStax Driver Donation to Apache Cassandra Project

Posted by Mick Semb Wever <mc...@apache.org>.
> > What are next steps here? Anyone knowledgeable on thread?


Can we take it to a CEP now?

Even if the decision is to take one driver as a guinea pig and learn as we
go, there's some questions that need to be thrashed out in advance
(somewhere better than this thread), e.g. the incubator ip clearance steps,
and other questions to tackle along the way: versions and supported
branches, committers and maintainers, artifact and package names, CI, jira,
ML, coordination to server, etc; so a CEP can be the document to land it
all.

Re: DataStax Driver Donation to Apache Cassandra Project

Posted by Dinesh Joshi <dj...@apache.org>.
I don't see any reason not to bring in other drivers to the project. We can start with the Java driver. I think Nate might be aware of the specifics of the process.

Dinesh

> On Jun 25, 2020, at 11:36 AM, Joshua McKenzie <jm...@apache.org> wrote:
> 
> My understanding is that there's comparable traffic on python, java, and
> node drivers in terms of usage out in the Cassandra ecosystem. Shall we get
> started w/the java process and incubation on the donation (CLA's, vetting
> contributions, etc) with plans to follow up with python and then node?
> 
> What are next steps here? Anyone knowledgeable on thread?
> 
> On Tue, Apr 28, 2020 at 4:38 PM Nate McCall <zz...@gmail.com> wrote:
> 
>>> 
>>> 
>>> Lastly, and to Stephen's previous email, it might be more manageable to
>>> accept one drivers first and figure all the details/issues/questions that
>>> are bound to arise before accepting more. It's worth discussing at least.
>>> 
>> 
>> This approach makes complete sense to me: let's sort out how to accept the
>> Java Driver (I guess? most widely used and reference impl) and then we can
>> iterate from there.
>> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
For additional commands, e-mail: dev-help@cassandra.apache.org


Re: DataStax Driver Donation to Apache Cassandra Project

Posted by Joshua McKenzie <jm...@apache.org>.
My understanding is that there's comparable traffic on python, java, and
node drivers in terms of usage out in the Cassandra ecosystem. Shall we get
started w/the java process and incubation on the donation (CLA's, vetting
contributions, etc) with plans to follow up with python and then node?

What are next steps here? Anyone knowledgeable on thread?

On Tue, Apr 28, 2020 at 4:38 PM Nate McCall <zz...@gmail.com> wrote:

> >
> >
> > Lastly, and to Stephen's previous email, it might be more manageable to
> > accept one drivers first and figure all the details/issues/questions that
> > are bound to arise before accepting more. It's worth discussing at least.
> >
>
> This approach makes complete sense to me: let's sort out how to accept the
> Java Driver (I guess? most widely used and reference impl) and then we can
> iterate from there.
>

Re: DataStax Driver Donation to Apache Cassandra Project

Posted by Nate McCall <zz...@gmail.com>.
>
>
> Lastly, and to Stephen's previous email, it might be more manageable to
> accept one drivers first and figure all the details/issues/questions that
> are bound to arise before accepting more. It's worth discussing at least.
>

This approach makes complete sense to me: let's sort out how to accept the
Java Driver (I guess? most widely used and reference impl) and then we can
iterate from there.

Re: DataStax Driver Donation to Apache Cassandra Project

Posted by Jon Haddad <jo...@jonhaddad.com>.
I agree keeping the source separate is a good idea to start.  If we find
some benefit later in merging the two trees, it's easy enough to do so,
it's more of a pain to split things apart.

The build system used plays a big part as well - ant is definitely not
doing us any favors here.


On Tue, Apr 28, 2020 at 9:00 AM Sylvain Lebresne <le...@gmail.com> wrote:

> On Tue, Apr 28, 2020 at 5:10 PM Joshua McKenzie <jm...@apache.org>
> wrote:
>
> > >
> > >  If we're talking day to day
> > > maintenance, so the bulk of the work really, then I feel rather
> confident
> > > saying that you are wrong,
> >
> > You're confidently responding to something I wasn't trying to say. :) I
> may
> > not have communicated clearly. I was attempting to enumerate:
> >
> >    1. New feature development will likely require coordination between
> >    server and drivers (i.e. driver changes are required to support new
> >    features in server)
> >    2. Future roadmap for the core server and drivers will likely overlap
> >    (see #1)
> >    3. CEP's for 1 and 2, assuming one accepts the assertion that features
> >    require driver changes, mean CEP's will have components of both
> >    4. Independent architectural or API changes in the drivers will likely
> >    impact the server, and thus also require coordination. Especially with
> >    drivers being nested and used extensively in cqlsh, tests, etc.
> >
> > I was not speaking to the day to day maintenance of the projects but
> rather
> > the larger feature-level, roadmap, architectural planning of them. I
> would
> > not expect day to day maintenance to intersect with the governance of the
> > projects on a regular basis.
> >
>
> As often, disagreement is more often than not a communication issue. I
> apologize
> for not doing a better job at understanding your points (and apologize for
> my less
> than optimal phrasing on that sentence).
>
> But hopefully the rest of my email had also clarified that I do not
> disagree with the
> general roadmapping and project planning to be common (for drivers and
> server).
> In fact, I'm in favor of that. That's why I suggesting a single "Apache"
> project, not
> to incubate different ones in particular.
>
> But I also want us to ensure that whatever change of organization we make
> while
> accepting the drivers does not make the day-to-day maintenance harder. And
> again, at that level, I think server and driver are more different than
> they are same,
> so some separation a "some" level is likely necessary. I'll refer to my
> previous email
> for where I'd personally keep things separated and were I wouldn't.
>
> --
> Sylvain
>

Re: DataStax Driver Donation to Apache Cassandra Project

Posted by Sylvain Lebresne <le...@gmail.com>.
On Tue, Apr 28, 2020 at 5:10 PM Joshua McKenzie <jm...@apache.org>
wrote:

> >
> >  If we're talking day to day
> > maintenance, so the bulk of the work really, then I feel rather confident
> > saying that you are wrong,
>
> You're confidently responding to something I wasn't trying to say. :) I may
> not have communicated clearly. I was attempting to enumerate:
>
>    1. New feature development will likely require coordination between
>    server and drivers (i.e. driver changes are required to support new
>    features in server)
>    2. Future roadmap for the core server and drivers will likely overlap
>    (see #1)
>    3. CEP's for 1 and 2, assuming one accepts the assertion that features
>    require driver changes, mean CEP's will have components of both
>    4. Independent architectural or API changes in the drivers will likely
>    impact the server, and thus also require coordination. Especially with
>    drivers being nested and used extensively in cqlsh, tests, etc.
>
> I was not speaking to the day to day maintenance of the projects but rather
> the larger feature-level, roadmap, architectural planning of them. I would
> not expect day to day maintenance to intersect with the governance of the
> projects on a regular basis.
>

As often, disagreement is more often than not a communication issue. I
apologize
for not doing a better job at understanding your points (and apologize for
my less
than optimal phrasing on that sentence).

But hopefully the rest of my email had also clarified that I do not
disagree with the
general roadmapping and project planning to be common (for drivers and
server).
In fact, I'm in favor of that. That's why I suggesting a single "Apache"
project, not
to incubate different ones in particular.

But I also want us to ensure that whatever change of organization we make
while
accepting the drivers does not make the day-to-day maintenance harder. And
again, at that level, I think server and driver are more different than
they are same,
so some separation a "some" level is likely necessary. I'll refer to my
previous email
for where I'd personally keep things separated and were I wouldn't.

--
Sylvain

Re: DataStax Driver Donation to Apache Cassandra Project

Posted by Joshua McKenzie <jm...@apache.org>.
>
>  If we're talking day to day
> maintenance, so the bulk of the work really, then I feel rather confident
> saying that you are wrong,

You're confidently responding to something I wasn't trying to say. :) I may
not have communicated clearly. I was attempting to enumerate:

   1. New feature development will likely require coordination between
   server and drivers (i.e. driver changes are required to support new
   features in server)
   2. Future roadmap for the core server and drivers will likely overlap
   (see #1)
   3. CEP's for 1 and 2, assuming one accepts the assertion that features
   require driver changes, mean CEP's will have components of both
   4. Independent architectural or API changes in the drivers will likely
   impact the server, and thus also require coordination. Especially with
   drivers being nested and used extensively in cqlsh, tests, etc.

I was not speaking to the day to day maintenance of the projects but rather
the larger feature-level, roadmap, architectural planning of them. I would
not expect day to day maintenance to intersect with the governance of the
projects on a regular basis.

On Tue, Apr 28, 2020 at 7:56 AM Sylvain Lebresne <le...@gmail.com> wrote:

> I want to clarify that my plea here is just that we acknowledge that once
> we
> adopt drivers (especially if all of them), the "project" becomes quite big.
>
> All sane big projects have a minimum of organization, so let's make sure we
> have enough organization to make sure we don't make our future lives harder
> than it needs to. And there is a clear and natural separation between the
> server and (each) drivers, so that's an obvious point of
> organization/separation.
>
> Again, at a "high" level, I'm in favor of the Cassandra project being both
> server and drivers (not saying it's not debatable). So a single _user_ ML
> make
> sense, as well as a single web site, document and CEP process (I do see CEP
> as
> being somewhat high-ish level).
>
> My concern is more for the day-to-day maintenance work. Here, I think there
> is
> gonna be 3 types of people:
> 1. some will _primarily_ focus on (a) driver development.
> 2. some will _primarily_ focus on server development.
> 3. some may have interested in both, but won't be able to focus too much on
>    either (because again, the sum is too big, and in a way too unrelated).
>
> And I actually expect 1 and 2 to preponderantly drive the day-to-day
> maintenance. So I'd like to keep things easy for those population (but
> obviously, with the goal of not hinder collaboration and consistency
> overall).
>
> Concretely, my initial thinking (but haven't think some of those through a
> lot)
> are:
> - as said above, user list, web site, documentation and CEP would be
> global.
> - new specific JIRA projects for drivers, and JIRA notifications going to
>   separate 'commits' mailing lists. To me, that one point is a no-brainer,
>   I don't see why we wouldn't do that, and I'll fight for that one.
> - dev mailing lists: I'm conflicted. I see a few "dev" discussion gaining
> from
>   being common, but I think most won't be (common). My gut reaction was
>   to suggest separate lists but I'm warming up to the idea of experimenting
>   with one and splitting later if it's unmanageable.
> - source repository: I think I don't have a super strong opinion so far.
> I'm
>   not a fan of abusing mono-repo, and I think it would be overall cleaner
> to
>   have separate repo with separate history. But I reckon there is pros to
>   mono-repo as well so this might boil down to a personal preference.
> - committers and PMC members pool: I believe that if we keep the
> organization
>   of a single project in the Apache sense (which again, is debatable but
> I'm
>   in favor at this point), then that imply a single pool of committers/PMC
>   members. Which is fine by me, outside of the fact that it imo makes it
>   even more urgent to have the PMC conclude some ongoing and never
>   concluded discussions (around more objective criteria for committers/PMC
>   members nominations).
> - other: there is actually a bunch of other things we'll need to discuss in
>   that scenario. For instance, DataStax drivers currently have their
>   independent release cycles and versioning.Especially if we go the
>   mono-repo route, then it would make sense to move towards releasing
>   everything together as Stephen mentions Tinkerpop is doing, but that
>   in turn may require a non trivial amount of build-tools setup.
>
> Lastly, and to Stephen's previous email, it might be more manageable to
> accept one drivers first and figure all the details/issues/questions that
> are bound to arise before accepting more. It's worth discussing at least.
>
> > In the Venn diagram of overlap vs. non between the two projects, I see
> there
> > being more overlap than not.
>
> I'll address, because it's an important point. If we're talking day to day
> maintenance, so the bulk of the work really, then I feel rather confident
> saying that you are wrong, that the vast majority of the work is mostly
> unrelated.
>
> Which is important, because that's really why I said that no-one can
> effectively focus on both sides. You can only focus on one and only dabble
> in
> the other(s), because the overlap is not that big.
>
> --
> Sylvain
>
>
> On Mon, Apr 27, 2020 at 11:34 PM Nate McCall <zz...@gmail.com> wrote:
>
> > Thanks, Stephen, this is really helpful!
> >
> > On Tue, Apr 28, 2020 at 6:24 AM Stephen Mallette <sp...@gmail.com>
> > wrote:
> >
> > > >
> > > > To step out of the weeds a bit - other than the Zookeeper / Curator
> > > > example, does anyone know of any other apache projects that have
> either
> > > > subprojects or complementary sideprojects they're interdependent upon
> > in
> > > > their ecosystems?
> > >
> > >
> > > Every Apache project is different, so it's quite possible that the
> > > experience I have in this area doesn't apply much here, but I'll offer
> > some
> > > words on the matter in the event that some of it is helpful.
> > >
> > > For many years even prior to joining Apache, TinkerPop was quite
> against
> > > bringing in driver-style sub-projects. Our main concern was one that I
> > > think was voiced here in this thread in some fashion, where core
> > developers
> > > would have to be knowledgeable of the incoming body of work and
> maintain
> > > that going forward. For core contributors who were primarily Java
> > > developers it was difficult to think that we'd suddenly be responsible
> > for
> > > reviews/VOTEs on Python code, for example.  It was with a bit of
> > > trepidation that we eventually decided it a good idea and opened the
> > > project to them. For our purposes we brought all such projects directly
> > > into our core repository as the thinking was that we wanted to keep all
> > > aspects of the project unified (testing, release, etc) to ensure that
> > for a
> > > particular release tag you could be sure that everything worked
> together.
> > > We initially started with just Python and developed that as our model
> for
> > > how new drivers would arrive (there were already other disparate
> projects
> > > out there in other languages).
> > >
> > > We wanted a model that ensured a reasonably high bar for acceptance and
> > > created a rough set of minimum criteria we wanted to have for adding a
> > new
> > > driver to our release lines. The core of that criteria was a common
> > > language agnostic test suite that needed to pass for us to consider it
> > > "ready" in any sense and the project needed to build, test and release
> > > using Maven (which is our build tool for the project). The former
> ensured
> > > that we had a reasonable level of common tested functionality among
> > drivers
> > > and the latter ensured an easy and consistent way to manage
> build/release
> > > practices (which fed nicely into our Docker infrastructure for both
> full
> > > builds and for giving non-JVM developers a nice way to develop drivers
> > > against the latest code without having to be Java experts). Once we
> > > established this approach with Python, we successfully brought in .NET
> > and
> > > Javascript.
> > >
> > > I think there were a number of nice upsides to deciding to bring in
> > drivers
> > > in the first place and then in the model for acceptance that we chose:
> > >
> > > + We saw a greater diversity of folks contributing in general as the
> > > ecosystem opened up beyond just the JVM.
> > > + We saw that the general community coalesced around the "official"
> > > drivers, contributing as one to them, rather than going off and
> creating
> > > one-off projects. I'm not really aware of any third-party drivers right
> > now
> > > for the languages we support, but if you look at something like Go,
> there
> > > are three or more choices. I suppose Go would be our next target for
> > > official inclusion.
> > > + Release day was pretty simple despite the complexity of the
> environment
> > > with that mixed ecosystem because of our unified build model using
> Maven
> > > and there wasn't a lot of disparate tooling exposed to the release
> > manager
> > > directly.
> > > + I can't say that we really saw problems with core project developers
> > (who
> > > mostly new Java) having to review python/c#/javascript. For the most
> > part,
> > > the contribution quality was high and we managed and became more
> > > knowledgeable as we went.
> > > + As we released drivers and core together, we no longer had situations
> > > where some third-party driver lagged behind some feature in core - if
> you
> > > wanted to use the latest core functionality you just used the latest
> > > release of core and driver and you could be assured they worked
> together
> > > and we felt confident saying so.
> > >
> > > Doing it over again, I think I would still consider going single repo
> for
> > > this situation but I think I might not place the requirement that the
> > > projects build with Maven. I think Maven has turned-off some
> contributors
> > > from those language ecosystems who don't know the JVM. They would have
> > been
> > > much more comfortable just working more directly with the tool systems
> > that
> > > they were familiar with. Of course, to get rid of local maven builds
> > > completely we would have to build a "latest" Docker images so that
> folks
> > > didn't need to do that themselves like they do now (also with Maven).
> > >
> > > Aside from TinkerPop experiences I will offer that, while I'm not
> > > completely sure, I think that for a contribution like this one where
> the
> > > bulk of the code has been developed outside of the ASF, the DS drivers
> > > would need to go through an IP Clearance process:
> > >
> > > https://incubator.apache.org/ip-clearance/
> > >
> > >
> > >
> > > On Mon, Apr 27, 2020 at 12:57 PM Joshua McKenzie <jmckenzie@apache.org
> >
> > > wrote:
> > >
> > > > To step out of the weeds a bit - other than the Zookeeper / Curator
> > > > example, does anyone know of any other apache projects that have
> either
> > > > subprojects or complementary sideprojects they're interdependent upon
> > in
> > > > their ecosystems? I'd like to reach out to some other pmc's for
> advice
> > > and
> > > > feedback on this topic since there's no sense in reinventing the
> wheel
> > if
> > > > other projects have wisdom to share on this.
> > > >
> > > > On Mon, Apr 27, 2020 at 12:42 PM Joshua McKenzie <
> jmckenzie@apache.org
> > >
> > > > wrote:
> > > >
> > > > > re: ML noise, how hard would it be to filter out JIRA updates
> > > w/component
> > > > > "Drivers"? Or from JIRA queries?
> > > > >
> > > > > For governance, I see it cutting both ways. If we have two separate
> > > > > projects and ML's for drivers and C*, how do we keep a coherent
> view
> > of
> > > > new
> > > > > features and roadmap stuff? Do we have CEP's for both projects and
> > tie
> > > > them
> > > > > together? Do we drive changes in the driver feature ecosystem via
> > CEP's
> > > > in
> > > > > C*?
> > > > >
> > > > > In the Venn diagram of overlap vs. non between the two projects, I
> > see
> > > > > there being more overlap than not.
> > > > >
> > > > > On Mon, Apr 27, 2020 at 12:34 PM Dinesh Joshi <dj...@apache.org>
> > > wrote:
> > > > >
> > > > >>
> > > > >>
> > > > >> > On Apr 27, 2020, at 2:50 AM, Sylvain Lebresne <
> lebresne@gmail.com
> > >
> > > > >> wrote:
> > > > >> >
> > > > >> > Fwiw, I agree with the concerns raised by Benedict, and think we
> > > > should
> > > > >> > carefully think about how this is handled. Which isn't not a
> > > rejection
> > > > >> of
> > > > >> > the donation in any way.
> > > > >> >
> > > > >> > Drivers are not small projects, and the majority of their day to
> > day
> > > > >> > maintenance is unrelated to the server (and the reverse is
> true).
> > > > >> >
> > > > >> > From the user point of view, I think it would be fabulous that
> > > > Cassandra
> > > > >> > appears like one project with a server and some official
> drivers,
> > > with
> > > > >> one
> > > > >> > coherent website and documentation for all. I'm all for striving
> > for
> > > > >> that.
> > > > >>
> > > > >> +1
> > > > >>
> > > > >> > Behind the scenes however, I feel tings should be setup so that
> > some
> > > > >> amount
> > > > >> > of
> > > > >> > separation remains between server and whichever drivers are
> > donated
> > > > and
> > > > >> > accepted, or I'm fairly sure things would get messy very
> > > quickly[1]).
> > > > >> In my
> > > > >>
> > > > >> Can you say more about what "getting messy very quickly" means
> here?
> > > > >>
> > > > >> > mind that means *at a minimum*:
> > > > >> > - separate JIRA projects.
> > > > >> > - dedicated _dev_ (and commits) mailing lists.
> > > > >>
> > > > >> If we're thinking through how this would be setup, initially we
> had
> > > the
> > > > >> same Jira project for sidecar but now there is a separate one to
> > track
> > > > >> sidecar specific jiras. At the moment we do not have a separate
> > > mailing
> > > > >> list. I think Cassandra dev mailing list's volume is low enough to
> > > keep
> > > > >> using the same ML. There is an added value that it gives
> visibility
> > > and
> > > > >> developers don't need to go between multiple mailing lists.
> > > > >>
> > > > >> > But it's also worth thinking whether a single pool of
> > committers/PMC
> > > > >> > members is
> > > > >> > desirable.
> > > > >> >
> > > > >> > Tbc, I'm not sure what is the best way to achieve this within
> the
> > > > >> > constraint of
> > > > >> > the Apache fundation, and maybe I'm just stating the obvious
> here.
> > > > >> >
> > > > >> >
> > > > >> > [1] fwiw, I say this as someone that at some points in time was
> > > > >> > simultaneously
> > > > >> > somewhat actively involved in both Cassandra and the DataStax
> Java
> > > > >> driver.
> > > > >> >
> > > > >> > --
> > > > >> > Sylvain
> > > > >> >
> > > > >> >
> > > > >> > On Fri, Apr 24, 2020 at 12:54 AM Benedict Elliott Smith <
> > > > >> benedict@apache.org>
> > > > >> > wrote:
> > > > >> >
> > > > >> >> Do you have some examples of issues?
> > > > >> >>
> > > > >> >> So, to explain my thinking: I believe there is value in most
> > > > >> contributors
> > > > >> >> being able to know and understand a majority of what the
> project
> > > > >> >> undertakes.  Many people track a wide variety of activity on
> the
> > > > >> project,
> > > > >> >> and whether they express an opinion they probably form one and
> > will
> > > > >> involve
> > > > >> >> themselves if they consider it important to do so.  I worry
> that
> > > > >> importing
> > > > >> >> several distinct and only loosely related projects to the same
> > > > >> governance
> > > > >> >> and communication structures has a strong potential to
> undermine
> > > that
> > > > >> >> capability, as people begin to assume that activity and
> > > > >> decision-making is
> > > > >> >> unrelated to them - and if that happens I think something
> > important
> > > > is
> > > > >> lost.
> > > > >> >>
> > > > >> >> The sidecar challenges this already but seems hopefully
> > manageable:
> > > > it
> > > > >> is
> > > > >> >> a logical extension of Cassandra, existing primarily to plug
> gaps
> > > in
> > > > >> >> Cassandra's own functionality, and features may migrate to
> > > Cassandra
> > > > >> over
> > > > >> >> time.  It is likely to have releases closely tied to Cassandra
> > > > itself.
> > > > >> >> Other subprojects are so far exclusively for consumption by the
> > > > >> Cassandra
> > > > >> >> project itself, and are all naturally coupled.
> > > > >> >>
> > > > >> >> Drivers however are inherently arms-length endeavours: we
> > publish a
> > > > >> >> protocol specification, and driver maintainers implement it.
> > They
> > > > are
> > > > >> >> otherwise fairly independent, and while a dialogue is helpful
> it
> > > does
> > > > >> not
> > > > >> >> need to be controlled by a single entity.  Many drivers will
> > > continue
> > > > >> to be
> > > > >> >> controlled by others, as they have been until now.  We're of
> > course
> > > > >> able to
> > > > >> >> ensure there's a strong overlap of governance, which I think
> > would
> > > be
> > > > >> very
> > > > >> >> helpful, and something Curator and Zookeeper seem not to have
> > > > managed.
> > > > >> >>
> > > > >> >> Looking at the Curator website, it also seems to pitch itself
> as
> > a
> > > > >> >> relatively opinionated product, and much more than a driver.  I
> > > hope
> > > > >> the
> > > > >> >> recipe for conflict in our case is much more limited given the
> > > > >> functional
> > > > >> >> scope of a driver - and anyway better avoided with more
> > integrated,
> > > > but
> > > > >> >> still distinct governance.
> > > > >> >>
> > > > >> >> That's not to say I don't see some value in the project
> > controlling
> > > > the
> > > > >> >> driver directly, I just worry about the above.
> > > > >> >>
> > > > >> >>
> > > > >> >>
> > > > >> >> On 22/04/2020, 21:25, "Nate McCall" <zz...@gmail.com>
> wrote:
> > > > >> >>
> > > > >> >>    On Thu, Apr 23, 2020 at 5:37 AM Benedict Elliott Smith <
> > > > >> >> benedict@apache.org>
> > > > >> >>    wrote:
> > > > >> >>
> > > > >> >>> I welcome the donation, and hope we are able to accept all of
> > the
> > > > >> >>> drivers.  This is really great news IMO.
> > > > >> >>>
> > > > >> >>> I do however wonder if the project may be accumulating too
> many
> > > > >> >>> sub-projects?  I wonder if it's time to think about splitting,
> > and
> > > > >> >> perhaps
> > > > >> >>> incubating a project for the drivers?
> > > > >> >>>
> > > > >> >>
> > > > >> >>    This is a legit concern and good question, but I think this
> is
> > > > more
> > > > >> a
> > > > >> >>    natural evolution of growing a project. There is precedent
> for
> > > > this
> > > > >> in
> > > > >> >>    Spark, Beam, Hadoop and others who have a number of
> different
> > > > >> >> repositories
> > > > >> >>    under the general project umbrella.
> > > > >> >>
> > > > >> >>    What I would like to avoid is a situation like with Apache
> > > Curator
> > > > >> and
> > > > >> >>    Apache Zookeeper. The former being a zookeeper client
> donation
> > > > from
> > > > >> >> Netflix
> > > > >> >>    that came in as a top level project. From the peanut
> gallery,
> > it
> > > > >> seems
> > > > >> >> like
> > > > >> >>    that has been less than ideal a couple of times in the past
> > > > >> >> coordinating
> > > > >> >>    releases, trademarks and such with separate project
> > management.
> > > > >> >>
> > > > >> >>
> > > > >> >>
> > > > >> >>
> > > > >> >>
> > > ---------------------------------------------------------------------
> > > > >> >> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> > > > >> >> For additional commands, e-mail: dev-help@cassandra.apache.org
> > > > >> >>
> > > > >> >>
> > > > >>
> > > > >>
> > > > >>
> > ---------------------------------------------------------------------
> > > > >> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> > > > >> For additional commands, e-mail: dev-help@cassandra.apache.org
> > > > >>
> > > > >>
> > > >
> > >
> >
>

Re: DataStax Driver Donation to Apache Cassandra Project

Posted by Sylvain Lebresne <le...@gmail.com>.
I want to clarify that my plea here is just that we acknowledge that once we
adopt drivers (especially if all of them), the "project" becomes quite big.

All sane big projects have a minimum of organization, so let's make sure we
have enough organization to make sure we don't make our future lives harder
than it needs to. And there is a clear and natural separation between the
server and (each) drivers, so that's an obvious point of
organization/separation.

Again, at a "high" level, I'm in favor of the Cassandra project being both
server and drivers (not saying it's not debatable). So a single _user_ ML
make
sense, as well as a single web site, document and CEP process (I do see CEP
as
being somewhat high-ish level).

My concern is more for the day-to-day maintenance work. Here, I think there
is
gonna be 3 types of people:
1. some will _primarily_ focus on (a) driver development.
2. some will _primarily_ focus on server development.
3. some may have interested in both, but won't be able to focus too much on
   either (because again, the sum is too big, and in a way too unrelated).

And I actually expect 1 and 2 to preponderantly drive the day-to-day
maintenance. So I'd like to keep things easy for those population (but
obviously, with the goal of not hinder collaboration and consistency
overall).

Concretely, my initial thinking (but haven't think some of those through a
lot)
are:
- as said above, user list, web site, documentation and CEP would be global.
- new specific JIRA projects for drivers, and JIRA notifications going to
  separate 'commits' mailing lists. To me, that one point is a no-brainer,
  I don't see why we wouldn't do that, and I'll fight for that one.
- dev mailing lists: I'm conflicted. I see a few "dev" discussion gaining
from
  being common, but I think most won't be (common). My gut reaction was
  to suggest separate lists but I'm warming up to the idea of experimenting
  with one and splitting later if it's unmanageable.
- source repository: I think I don't have a super strong opinion so far. I'm
  not a fan of abusing mono-repo, and I think it would be overall cleaner to
  have separate repo with separate history. But I reckon there is pros to
  mono-repo as well so this might boil down to a personal preference.
- committers and PMC members pool: I believe that if we keep the
organization
  of a single project in the Apache sense (which again, is debatable but I'm
  in favor at this point), then that imply a single pool of committers/PMC
  members. Which is fine by me, outside of the fact that it imo makes it
  even more urgent to have the PMC conclude some ongoing and never
  concluded discussions (around more objective criteria for committers/PMC
  members nominations).
- other: there is actually a bunch of other things we'll need to discuss in
  that scenario. For instance, DataStax drivers currently have their
  independent release cycles and versioning.Especially if we go the
  mono-repo route, then it would make sense to move towards releasing
  everything together as Stephen mentions Tinkerpop is doing, but that
  in turn may require a non trivial amount of build-tools setup.

Lastly, and to Stephen's previous email, it might be more manageable to
accept one drivers first and figure all the details/issues/questions that
are bound to arise before accepting more. It's worth discussing at least.

> In the Venn diagram of overlap vs. non between the two projects, I see
there
> being more overlap than not.

I'll address, because it's an important point. If we're talking day to day
maintenance, so the bulk of the work really, then I feel rather confident
saying that you are wrong, that the vast majority of the work is mostly
unrelated.

Which is important, because that's really why I said that no-one can
effectively focus on both sides. You can only focus on one and only dabble
in
the other(s), because the overlap is not that big.

--
Sylvain


On Mon, Apr 27, 2020 at 11:34 PM Nate McCall <zz...@gmail.com> wrote:

> Thanks, Stephen, this is really helpful!
>
> On Tue, Apr 28, 2020 at 6:24 AM Stephen Mallette <sp...@gmail.com>
> wrote:
>
> > >
> > > To step out of the weeds a bit - other than the Zookeeper / Curator
> > > example, does anyone know of any other apache projects that have either
> > > subprojects or complementary sideprojects they're interdependent upon
> in
> > > their ecosystems?
> >
> >
> > Every Apache project is different, so it's quite possible that the
> > experience I have in this area doesn't apply much here, but I'll offer
> some
> > words on the matter in the event that some of it is helpful.
> >
> > For many years even prior to joining Apache, TinkerPop was quite against
> > bringing in driver-style sub-projects. Our main concern was one that I
> > think was voiced here in this thread in some fashion, where core
> developers
> > would have to be knowledgeable of the incoming body of work and maintain
> > that going forward. For core contributors who were primarily Java
> > developers it was difficult to think that we'd suddenly be responsible
> for
> > reviews/VOTEs on Python code, for example.  It was with a bit of
> > trepidation that we eventually decided it a good idea and opened the
> > project to them. For our purposes we brought all such projects directly
> > into our core repository as the thinking was that we wanted to keep all
> > aspects of the project unified (testing, release, etc) to ensure that
> for a
> > particular release tag you could be sure that everything worked together.
> > We initially started with just Python and developed that as our model for
> > how new drivers would arrive (there were already other disparate projects
> > out there in other languages).
> >
> > We wanted a model that ensured a reasonably high bar for acceptance and
> > created a rough set of minimum criteria we wanted to have for adding a
> new
> > driver to our release lines. The core of that criteria was a common
> > language agnostic test suite that needed to pass for us to consider it
> > "ready" in any sense and the project needed to build, test and release
> > using Maven (which is our build tool for the project). The former ensured
> > that we had a reasonable level of common tested functionality among
> drivers
> > and the latter ensured an easy and consistent way to manage build/release
> > practices (which fed nicely into our Docker infrastructure for both full
> > builds and for giving non-JVM developers a nice way to develop drivers
> > against the latest code without having to be Java experts). Once we
> > established this approach with Python, we successfully brought in .NET
> and
> > Javascript.
> >
> > I think there were a number of nice upsides to deciding to bring in
> drivers
> > in the first place and then in the model for acceptance that we chose:
> >
> > + We saw a greater diversity of folks contributing in general as the
> > ecosystem opened up beyond just the JVM.
> > + We saw that the general community coalesced around the "official"
> > drivers, contributing as one to them, rather than going off and creating
> > one-off projects. I'm not really aware of any third-party drivers right
> now
> > for the languages we support, but if you look at something like Go, there
> > are three or more choices. I suppose Go would be our next target for
> > official inclusion.
> > + Release day was pretty simple despite the complexity of the environment
> > with that mixed ecosystem because of our unified build model using Maven
> > and there wasn't a lot of disparate tooling exposed to the release
> manager
> > directly.
> > + I can't say that we really saw problems with core project developers
> (who
> > mostly new Java) having to review python/c#/javascript. For the most
> part,
> > the contribution quality was high and we managed and became more
> > knowledgeable as we went.
> > + As we released drivers and core together, we no longer had situations
> > where some third-party driver lagged behind some feature in core - if you
> > wanted to use the latest core functionality you just used the latest
> > release of core and driver and you could be assured they worked together
> > and we felt confident saying so.
> >
> > Doing it over again, I think I would still consider going single repo for
> > this situation but I think I might not place the requirement that the
> > projects build with Maven. I think Maven has turned-off some contributors
> > from those language ecosystems who don't know the JVM. They would have
> been
> > much more comfortable just working more directly with the tool systems
> that
> > they were familiar with. Of course, to get rid of local maven builds
> > completely we would have to build a "latest" Docker images so that folks
> > didn't need to do that themselves like they do now (also with Maven).
> >
> > Aside from TinkerPop experiences I will offer that, while I'm not
> > completely sure, I think that for a contribution like this one where the
> > bulk of the code has been developed outside of the ASF, the DS drivers
> > would need to go through an IP Clearance process:
> >
> > https://incubator.apache.org/ip-clearance/
> >
> >
> >
> > On Mon, Apr 27, 2020 at 12:57 PM Joshua McKenzie <jm...@apache.org>
> > wrote:
> >
> > > To step out of the weeds a bit - other than the Zookeeper / Curator
> > > example, does anyone know of any other apache projects that have either
> > > subprojects or complementary sideprojects they're interdependent upon
> in
> > > their ecosystems? I'd like to reach out to some other pmc's for advice
> > and
> > > feedback on this topic since there's no sense in reinventing the wheel
> if
> > > other projects have wisdom to share on this.
> > >
> > > On Mon, Apr 27, 2020 at 12:42 PM Joshua McKenzie <jmckenzie@apache.org
> >
> > > wrote:
> > >
> > > > re: ML noise, how hard would it be to filter out JIRA updates
> > w/component
> > > > "Drivers"? Or from JIRA queries?
> > > >
> > > > For governance, I see it cutting both ways. If we have two separate
> > > > projects and ML's for drivers and C*, how do we keep a coherent view
> of
> > > new
> > > > features and roadmap stuff? Do we have CEP's for both projects and
> tie
> > > them
> > > > together? Do we drive changes in the driver feature ecosystem via
> CEP's
> > > in
> > > > C*?
> > > >
> > > > In the Venn diagram of overlap vs. non between the two projects, I
> see
> > > > there being more overlap than not.
> > > >
> > > > On Mon, Apr 27, 2020 at 12:34 PM Dinesh Joshi <dj...@apache.org>
> > wrote:
> > > >
> > > >>
> > > >>
> > > >> > On Apr 27, 2020, at 2:50 AM, Sylvain Lebresne <lebresne@gmail.com
> >
> > > >> wrote:
> > > >> >
> > > >> > Fwiw, I agree with the concerns raised by Benedict, and think we
> > > should
> > > >> > carefully think about how this is handled. Which isn't not a
> > rejection
> > > >> of
> > > >> > the donation in any way.
> > > >> >
> > > >> > Drivers are not small projects, and the majority of their day to
> day
> > > >> > maintenance is unrelated to the server (and the reverse is true).
> > > >> >
> > > >> > From the user point of view, I think it would be fabulous that
> > > Cassandra
> > > >> > appears like one project with a server and some official drivers,
> > with
> > > >> one
> > > >> > coherent website and documentation for all. I'm all for striving
> for
> > > >> that.
> > > >>
> > > >> +1
> > > >>
> > > >> > Behind the scenes however, I feel tings should be setup so that
> some
> > > >> amount
> > > >> > of
> > > >> > separation remains between server and whichever drivers are
> donated
> > > and
> > > >> > accepted, or I'm fairly sure things would get messy very
> > quickly[1]).
> > > >> In my
> > > >>
> > > >> Can you say more about what "getting messy very quickly" means here?
> > > >>
> > > >> > mind that means *at a minimum*:
> > > >> > - separate JIRA projects.
> > > >> > - dedicated _dev_ (and commits) mailing lists.
> > > >>
> > > >> If we're thinking through how this would be setup, initially we had
> > the
> > > >> same Jira project for sidecar but now there is a separate one to
> track
> > > >> sidecar specific jiras. At the moment we do not have a separate
> > mailing
> > > >> list. I think Cassandra dev mailing list's volume is low enough to
> > keep
> > > >> using the same ML. There is an added value that it gives visibility
> > and
> > > >> developers don't need to go between multiple mailing lists.
> > > >>
> > > >> > But it's also worth thinking whether a single pool of
> committers/PMC
> > > >> > members is
> > > >> > desirable.
> > > >> >
> > > >> > Tbc, I'm not sure what is the best way to achieve this within the
> > > >> > constraint of
> > > >> > the Apache fundation, and maybe I'm just stating the obvious here.
> > > >> >
> > > >> >
> > > >> > [1] fwiw, I say this as someone that at some points in time was
> > > >> > simultaneously
> > > >> > somewhat actively involved in both Cassandra and the DataStax Java
> > > >> driver.
> > > >> >
> > > >> > --
> > > >> > Sylvain
> > > >> >
> > > >> >
> > > >> > On Fri, Apr 24, 2020 at 12:54 AM Benedict Elliott Smith <
> > > >> benedict@apache.org>
> > > >> > wrote:
> > > >> >
> > > >> >> Do you have some examples of issues?
> > > >> >>
> > > >> >> So, to explain my thinking: I believe there is value in most
> > > >> contributors
> > > >> >> being able to know and understand a majority of what the project
> > > >> >> undertakes.  Many people track a wide variety of activity on the
> > > >> project,
> > > >> >> and whether they express an opinion they probably form one and
> will
> > > >> involve
> > > >> >> themselves if they consider it important to do so.  I worry that
> > > >> importing
> > > >> >> several distinct and only loosely related projects to the same
> > > >> governance
> > > >> >> and communication structures has a strong potential to undermine
> > that
> > > >> >> capability, as people begin to assume that activity and
> > > >> decision-making is
> > > >> >> unrelated to them - and if that happens I think something
> important
> > > is
> > > >> lost.
> > > >> >>
> > > >> >> The sidecar challenges this already but seems hopefully
> manageable:
> > > it
> > > >> is
> > > >> >> a logical extension of Cassandra, existing primarily to plug gaps
> > in
> > > >> >> Cassandra's own functionality, and features may migrate to
> > Cassandra
> > > >> over
> > > >> >> time.  It is likely to have releases closely tied to Cassandra
> > > itself.
> > > >> >> Other subprojects are so far exclusively for consumption by the
> > > >> Cassandra
> > > >> >> project itself, and are all naturally coupled.
> > > >> >>
> > > >> >> Drivers however are inherently arms-length endeavours: we
> publish a
> > > >> >> protocol specification, and driver maintainers implement it.
> They
> > > are
> > > >> >> otherwise fairly independent, and while a dialogue is helpful it
> > does
> > > >> not
> > > >> >> need to be controlled by a single entity.  Many drivers will
> > continue
> > > >> to be
> > > >> >> controlled by others, as they have been until now.  We're of
> course
> > > >> able to
> > > >> >> ensure there's a strong overlap of governance, which I think
> would
> > be
> > > >> very
> > > >> >> helpful, and something Curator and Zookeeper seem not to have
> > > managed.
> > > >> >>
> > > >> >> Looking at the Curator website, it also seems to pitch itself as
> a
> > > >> >> relatively opinionated product, and much more than a driver.  I
> > hope
> > > >> the
> > > >> >> recipe for conflict in our case is much more limited given the
> > > >> functional
> > > >> >> scope of a driver - and anyway better avoided with more
> integrated,
> > > but
> > > >> >> still distinct governance.
> > > >> >>
> > > >> >> That's not to say I don't see some value in the project
> controlling
> > > the
> > > >> >> driver directly, I just worry about the above.
> > > >> >>
> > > >> >>
> > > >> >>
> > > >> >> On 22/04/2020, 21:25, "Nate McCall" <zz...@gmail.com> wrote:
> > > >> >>
> > > >> >>    On Thu, Apr 23, 2020 at 5:37 AM Benedict Elliott Smith <
> > > >> >> benedict@apache.org>
> > > >> >>    wrote:
> > > >> >>
> > > >> >>> I welcome the donation, and hope we are able to accept all of
> the
> > > >> >>> drivers.  This is really great news IMO.
> > > >> >>>
> > > >> >>> I do however wonder if the project may be accumulating too many
> > > >> >>> sub-projects?  I wonder if it's time to think about splitting,
> and
> > > >> >> perhaps
> > > >> >>> incubating a project for the drivers?
> > > >> >>>
> > > >> >>
> > > >> >>    This is a legit concern and good question, but I think this is
> > > more
> > > >> a
> > > >> >>    natural evolution of growing a project. There is precedent for
> > > this
> > > >> in
> > > >> >>    Spark, Beam, Hadoop and others who have a number of different
> > > >> >> repositories
> > > >> >>    under the general project umbrella.
> > > >> >>
> > > >> >>    What I would like to avoid is a situation like with Apache
> > Curator
> > > >> and
> > > >> >>    Apache Zookeeper. The former being a zookeeper client donation
> > > from
> > > >> >> Netflix
> > > >> >>    that came in as a top level project. From the peanut gallery,
> it
> > > >> seems
> > > >> >> like
> > > >> >>    that has been less than ideal a couple of times in the past
> > > >> >> coordinating
> > > >> >>    releases, trademarks and such with separate project
> management.
> > > >> >>
> > > >> >>
> > > >> >>
> > > >> >>
> > > >> >>
> > ---------------------------------------------------------------------
> > > >> >> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> > > >> >> For additional commands, e-mail: dev-help@cassandra.apache.org
> > > >> >>
> > > >> >>
> > > >>
> > > >>
> > > >>
> ---------------------------------------------------------------------
> > > >> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> > > >> For additional commands, e-mail: dev-help@cassandra.apache.org
> > > >>
> > > >>
> > >
> >
>

Re: DataStax Driver Donation to Apache Cassandra Project

Posted by Nate McCall <zz...@gmail.com>.
Thanks, Stephen, this is really helpful!

On Tue, Apr 28, 2020 at 6:24 AM Stephen Mallette <sp...@gmail.com>
wrote:

> >
> > To step out of the weeds a bit - other than the Zookeeper / Curator
> > example, does anyone know of any other apache projects that have either
> > subprojects or complementary sideprojects they're interdependent upon in
> > their ecosystems?
>
>
> Every Apache project is different, so it's quite possible that the
> experience I have in this area doesn't apply much here, but I'll offer some
> words on the matter in the event that some of it is helpful.
>
> For many years even prior to joining Apache, TinkerPop was quite against
> bringing in driver-style sub-projects. Our main concern was one that I
> think was voiced here in this thread in some fashion, where core developers
> would have to be knowledgeable of the incoming body of work and maintain
> that going forward. For core contributors who were primarily Java
> developers it was difficult to think that we'd suddenly be responsible for
> reviews/VOTEs on Python code, for example.  It was with a bit of
> trepidation that we eventually decided it a good idea and opened the
> project to them. For our purposes we brought all such projects directly
> into our core repository as the thinking was that we wanted to keep all
> aspects of the project unified (testing, release, etc) to ensure that for a
> particular release tag you could be sure that everything worked together.
> We initially started with just Python and developed that as our model for
> how new drivers would arrive (there were already other disparate projects
> out there in other languages).
>
> We wanted a model that ensured a reasonably high bar for acceptance and
> created a rough set of minimum criteria we wanted to have for adding a new
> driver to our release lines. The core of that criteria was a common
> language agnostic test suite that needed to pass for us to consider it
> "ready" in any sense and the project needed to build, test and release
> using Maven (which is our build tool for the project). The former ensured
> that we had a reasonable level of common tested functionality among drivers
> and the latter ensured an easy and consistent way to manage build/release
> practices (which fed nicely into our Docker infrastructure for both full
> builds and for giving non-JVM developers a nice way to develop drivers
> against the latest code without having to be Java experts). Once we
> established this approach with Python, we successfully brought in .NET and
> Javascript.
>
> I think there were a number of nice upsides to deciding to bring in drivers
> in the first place and then in the model for acceptance that we chose:
>
> + We saw a greater diversity of folks contributing in general as the
> ecosystem opened up beyond just the JVM.
> + We saw that the general community coalesced around the "official"
> drivers, contributing as one to them, rather than going off and creating
> one-off projects. I'm not really aware of any third-party drivers right now
> for the languages we support, but if you look at something like Go, there
> are three or more choices. I suppose Go would be our next target for
> official inclusion.
> + Release day was pretty simple despite the complexity of the environment
> with that mixed ecosystem because of our unified build model using Maven
> and there wasn't a lot of disparate tooling exposed to the release manager
> directly.
> + I can't say that we really saw problems with core project developers (who
> mostly new Java) having to review python/c#/javascript. For the most part,
> the contribution quality was high and we managed and became more
> knowledgeable as we went.
> + As we released drivers and core together, we no longer had situations
> where some third-party driver lagged behind some feature in core - if you
> wanted to use the latest core functionality you just used the latest
> release of core and driver and you could be assured they worked together
> and we felt confident saying so.
>
> Doing it over again, I think I would still consider going single repo for
> this situation but I think I might not place the requirement that the
> projects build with Maven. I think Maven has turned-off some contributors
> from those language ecosystems who don't know the JVM. They would have been
> much more comfortable just working more directly with the tool systems that
> they were familiar with. Of course, to get rid of local maven builds
> completely we would have to build a "latest" Docker images so that folks
> didn't need to do that themselves like they do now (also with Maven).
>
> Aside from TinkerPop experiences I will offer that, while I'm not
> completely sure, I think that for a contribution like this one where the
> bulk of the code has been developed outside of the ASF, the DS drivers
> would need to go through an IP Clearance process:
>
> https://incubator.apache.org/ip-clearance/
>
>
>
> On Mon, Apr 27, 2020 at 12:57 PM Joshua McKenzie <jm...@apache.org>
> wrote:
>
> > To step out of the weeds a bit - other than the Zookeeper / Curator
> > example, does anyone know of any other apache projects that have either
> > subprojects or complementary sideprojects they're interdependent upon in
> > their ecosystems? I'd like to reach out to some other pmc's for advice
> and
> > feedback on this topic since there's no sense in reinventing the wheel if
> > other projects have wisdom to share on this.
> >
> > On Mon, Apr 27, 2020 at 12:42 PM Joshua McKenzie <jm...@apache.org>
> > wrote:
> >
> > > re: ML noise, how hard would it be to filter out JIRA updates
> w/component
> > > "Drivers"? Or from JIRA queries?
> > >
> > > For governance, I see it cutting both ways. If we have two separate
> > > projects and ML's for drivers and C*, how do we keep a coherent view of
> > new
> > > features and roadmap stuff? Do we have CEP's for both projects and tie
> > them
> > > together? Do we drive changes in the driver feature ecosystem via CEP's
> > in
> > > C*?
> > >
> > > In the Venn diagram of overlap vs. non between the two projects, I see
> > > there being more overlap than not.
> > >
> > > On Mon, Apr 27, 2020 at 12:34 PM Dinesh Joshi <dj...@apache.org>
> wrote:
> > >
> > >>
> > >>
> > >> > On Apr 27, 2020, at 2:50 AM, Sylvain Lebresne <le...@gmail.com>
> > >> wrote:
> > >> >
> > >> > Fwiw, I agree with the concerns raised by Benedict, and think we
> > should
> > >> > carefully think about how this is handled. Which isn't not a
> rejection
> > >> of
> > >> > the donation in any way.
> > >> >
> > >> > Drivers are not small projects, and the majority of their day to day
> > >> > maintenance is unrelated to the server (and the reverse is true).
> > >> >
> > >> > From the user point of view, I think it would be fabulous that
> > Cassandra
> > >> > appears like one project with a server and some official drivers,
> with
> > >> one
> > >> > coherent website and documentation for all. I'm all for striving for
> > >> that.
> > >>
> > >> +1
> > >>
> > >> > Behind the scenes however, I feel tings should be setup so that some
> > >> amount
> > >> > of
> > >> > separation remains between server and whichever drivers are donated
> > and
> > >> > accepted, or I'm fairly sure things would get messy very
> quickly[1]).
> > >> In my
> > >>
> > >> Can you say more about what "getting messy very quickly" means here?
> > >>
> > >> > mind that means *at a minimum*:
> > >> > - separate JIRA projects.
> > >> > - dedicated _dev_ (and commits) mailing lists.
> > >>
> > >> If we're thinking through how this would be setup, initially we had
> the
> > >> same Jira project for sidecar but now there is a separate one to track
> > >> sidecar specific jiras. At the moment we do not have a separate
> mailing
> > >> list. I think Cassandra dev mailing list's volume is low enough to
> keep
> > >> using the same ML. There is an added value that it gives visibility
> and
> > >> developers don't need to go between multiple mailing lists.
> > >>
> > >> > But it's also worth thinking whether a single pool of committers/PMC
> > >> > members is
> > >> > desirable.
> > >> >
> > >> > Tbc, I'm not sure what is the best way to achieve this within the
> > >> > constraint of
> > >> > the Apache fundation, and maybe I'm just stating the obvious here.
> > >> >
> > >> >
> > >> > [1] fwiw, I say this as someone that at some points in time was
> > >> > simultaneously
> > >> > somewhat actively involved in both Cassandra and the DataStax Java
> > >> driver.
> > >> >
> > >> > --
> > >> > Sylvain
> > >> >
> > >> >
> > >> > On Fri, Apr 24, 2020 at 12:54 AM Benedict Elliott Smith <
> > >> benedict@apache.org>
> > >> > wrote:
> > >> >
> > >> >> Do you have some examples of issues?
> > >> >>
> > >> >> So, to explain my thinking: I believe there is value in most
> > >> contributors
> > >> >> being able to know and understand a majority of what the project
> > >> >> undertakes.  Many people track a wide variety of activity on the
> > >> project,
> > >> >> and whether they express an opinion they probably form one and will
> > >> involve
> > >> >> themselves if they consider it important to do so.  I worry that
> > >> importing
> > >> >> several distinct and only loosely related projects to the same
> > >> governance
> > >> >> and communication structures has a strong potential to undermine
> that
> > >> >> capability, as people begin to assume that activity and
> > >> decision-making is
> > >> >> unrelated to them - and if that happens I think something important
> > is
> > >> lost.
> > >> >>
> > >> >> The sidecar challenges this already but seems hopefully manageable:
> > it
> > >> is
> > >> >> a logical extension of Cassandra, existing primarily to plug gaps
> in
> > >> >> Cassandra's own functionality, and features may migrate to
> Cassandra
> > >> over
> > >> >> time.  It is likely to have releases closely tied to Cassandra
> > itself.
> > >> >> Other subprojects are so far exclusively for consumption by the
> > >> Cassandra
> > >> >> project itself, and are all naturally coupled.
> > >> >>
> > >> >> Drivers however are inherently arms-length endeavours: we publish a
> > >> >> protocol specification, and driver maintainers implement it.  They
> > are
> > >> >> otherwise fairly independent, and while a dialogue is helpful it
> does
> > >> not
> > >> >> need to be controlled by a single entity.  Many drivers will
> continue
> > >> to be
> > >> >> controlled by others, as they have been until now.  We're of course
> > >> able to
> > >> >> ensure there's a strong overlap of governance, which I think would
> be
> > >> very
> > >> >> helpful, and something Curator and Zookeeper seem not to have
> > managed.
> > >> >>
> > >> >> Looking at the Curator website, it also seems to pitch itself as a
> > >> >> relatively opinionated product, and much more than a driver.  I
> hope
> > >> the
> > >> >> recipe for conflict in our case is much more limited given the
> > >> functional
> > >> >> scope of a driver - and anyway better avoided with more integrated,
> > but
> > >> >> still distinct governance.
> > >> >>
> > >> >> That's not to say I don't see some value in the project controlling
> > the
> > >> >> driver directly, I just worry about the above.
> > >> >>
> > >> >>
> > >> >>
> > >> >> On 22/04/2020, 21:25, "Nate McCall" <zz...@gmail.com> wrote:
> > >> >>
> > >> >>    On Thu, Apr 23, 2020 at 5:37 AM Benedict Elliott Smith <
> > >> >> benedict@apache.org>
> > >> >>    wrote:
> > >> >>
> > >> >>> I welcome the donation, and hope we are able to accept all of the
> > >> >>> drivers.  This is really great news IMO.
> > >> >>>
> > >> >>> I do however wonder if the project may be accumulating too many
> > >> >>> sub-projects?  I wonder if it's time to think about splitting, and
> > >> >> perhaps
> > >> >>> incubating a project for the drivers?
> > >> >>>
> > >> >>
> > >> >>    This is a legit concern and good question, but I think this is
> > more
> > >> a
> > >> >>    natural evolution of growing a project. There is precedent for
> > this
> > >> in
> > >> >>    Spark, Beam, Hadoop and others who have a number of different
> > >> >> repositories
> > >> >>    under the general project umbrella.
> > >> >>
> > >> >>    What I would like to avoid is a situation like with Apache
> Curator
> > >> and
> > >> >>    Apache Zookeeper. The former being a zookeeper client donation
> > from
> > >> >> Netflix
> > >> >>    that came in as a top level project. From the peanut gallery, it
> > >> seems
> > >> >> like
> > >> >>    that has been less than ideal a couple of times in the past
> > >> >> coordinating
> > >> >>    releases, trademarks and such with separate project management.
> > >> >>
> > >> >>
> > >> >>
> > >> >>
> > >> >>
> ---------------------------------------------------------------------
> > >> >> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> > >> >> For additional commands, e-mail: dev-help@cassandra.apache.org
> > >> >>
> > >> >>
> > >>
> > >>
> > >> ---------------------------------------------------------------------
> > >> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> > >> For additional commands, e-mail: dev-help@cassandra.apache.org
> > >>
> > >>
> >
>

Re: DataStax Driver Donation to Apache Cassandra Project

Posted by Stephen Mallette <sp...@gmail.com>.
>
> To step out of the weeds a bit - other than the Zookeeper / Curator
> example, does anyone know of any other apache projects that have either
> subprojects or complementary sideprojects they're interdependent upon in
> their ecosystems?


Every Apache project is different, so it's quite possible that the
experience I have in this area doesn't apply much here, but I'll offer some
words on the matter in the event that some of it is helpful.

For many years even prior to joining Apache, TinkerPop was quite against
bringing in driver-style sub-projects. Our main concern was one that I
think was voiced here in this thread in some fashion, where core developers
would have to be knowledgeable of the incoming body of work and maintain
that going forward. For core contributors who were primarily Java
developers it was difficult to think that we'd suddenly be responsible for
reviews/VOTEs on Python code, for example.  It was with a bit of
trepidation that we eventually decided it a good idea and opened the
project to them. For our purposes we brought all such projects directly
into our core repository as the thinking was that we wanted to keep all
aspects of the project unified (testing, release, etc) to ensure that for a
particular release tag you could be sure that everything worked together.
We initially started with just Python and developed that as our model for
how new drivers would arrive (there were already other disparate projects
out there in other languages).

We wanted a model that ensured a reasonably high bar for acceptance and
created a rough set of minimum criteria we wanted to have for adding a new
driver to our release lines. The core of that criteria was a common
language agnostic test suite that needed to pass for us to consider it
"ready" in any sense and the project needed to build, test and release
using Maven (which is our build tool for the project). The former ensured
that we had a reasonable level of common tested functionality among drivers
and the latter ensured an easy and consistent way to manage build/release
practices (which fed nicely into our Docker infrastructure for both full
builds and for giving non-JVM developers a nice way to develop drivers
against the latest code without having to be Java experts). Once we
established this approach with Python, we successfully brought in .NET and
Javascript.

I think there were a number of nice upsides to deciding to bring in drivers
in the first place and then in the model for acceptance that we chose:

+ We saw a greater diversity of folks contributing in general as the
ecosystem opened up beyond just the JVM.
+ We saw that the general community coalesced around the "official"
drivers, contributing as one to them, rather than going off and creating
one-off projects. I'm not really aware of any third-party drivers right now
for the languages we support, but if you look at something like Go, there
are three or more choices. I suppose Go would be our next target for
official inclusion.
+ Release day was pretty simple despite the complexity of the environment
with that mixed ecosystem because of our unified build model using Maven
and there wasn't a lot of disparate tooling exposed to the release manager
directly.
+ I can't say that we really saw problems with core project developers (who
mostly new Java) having to review python/c#/javascript. For the most part,
the contribution quality was high and we managed and became more
knowledgeable as we went.
+ As we released drivers and core together, we no longer had situations
where some third-party driver lagged behind some feature in core - if you
wanted to use the latest core functionality you just used the latest
release of core and driver and you could be assured they worked together
and we felt confident saying so.

Doing it over again, I think I would still consider going single repo for
this situation but I think I might not place the requirement that the
projects build with Maven. I think Maven has turned-off some contributors
from those language ecosystems who don't know the JVM. They would have been
much more comfortable just working more directly with the tool systems that
they were familiar with. Of course, to get rid of local maven builds
completely we would have to build a "latest" Docker images so that folks
didn't need to do that themselves like they do now (also with Maven).

Aside from TinkerPop experiences I will offer that, while I'm not
completely sure, I think that for a contribution like this one where the
bulk of the code has been developed outside of the ASF, the DS drivers
would need to go through an IP Clearance process:

https://incubator.apache.org/ip-clearance/



On Mon, Apr 27, 2020 at 12:57 PM Joshua McKenzie <jm...@apache.org>
wrote:

> To step out of the weeds a bit - other than the Zookeeper / Curator
> example, does anyone know of any other apache projects that have either
> subprojects or complementary sideprojects they're interdependent upon in
> their ecosystems? I'd like to reach out to some other pmc's for advice and
> feedback on this topic since there's no sense in reinventing the wheel if
> other projects have wisdom to share on this.
>
> On Mon, Apr 27, 2020 at 12:42 PM Joshua McKenzie <jm...@apache.org>
> wrote:
>
> > re: ML noise, how hard would it be to filter out JIRA updates w/component
> > "Drivers"? Or from JIRA queries?
> >
> > For governance, I see it cutting both ways. If we have two separate
> > projects and ML's for drivers and C*, how do we keep a coherent view of
> new
> > features and roadmap stuff? Do we have CEP's for both projects and tie
> them
> > together? Do we drive changes in the driver feature ecosystem via CEP's
> in
> > C*?
> >
> > In the Venn diagram of overlap vs. non between the two projects, I see
> > there being more overlap than not.
> >
> > On Mon, Apr 27, 2020 at 12:34 PM Dinesh Joshi <dj...@apache.org> wrote:
> >
> >>
> >>
> >> > On Apr 27, 2020, at 2:50 AM, Sylvain Lebresne <le...@gmail.com>
> >> wrote:
> >> >
> >> > Fwiw, I agree with the concerns raised by Benedict, and think we
> should
> >> > carefully think about how this is handled. Which isn't not a rejection
> >> of
> >> > the donation in any way.
> >> >
> >> > Drivers are not small projects, and the majority of their day to day
> >> > maintenance is unrelated to the server (and the reverse is true).
> >> >
> >> > From the user point of view, I think it would be fabulous that
> Cassandra
> >> > appears like one project with a server and some official drivers, with
> >> one
> >> > coherent website and documentation for all. I'm all for striving for
> >> that.
> >>
> >> +1
> >>
> >> > Behind the scenes however, I feel tings should be setup so that some
> >> amount
> >> > of
> >> > separation remains between server and whichever drivers are donated
> and
> >> > accepted, or I'm fairly sure things would get messy very quickly[1]).
> >> In my
> >>
> >> Can you say more about what "getting messy very quickly" means here?
> >>
> >> > mind that means *at a minimum*:
> >> > - separate JIRA projects.
> >> > - dedicated _dev_ (and commits) mailing lists.
> >>
> >> If we're thinking through how this would be setup, initially we had the
> >> same Jira project for sidecar but now there is a separate one to track
> >> sidecar specific jiras. At the moment we do not have a separate mailing
> >> list. I think Cassandra dev mailing list's volume is low enough to keep
> >> using the same ML. There is an added value that it gives visibility and
> >> developers don't need to go between multiple mailing lists.
> >>
> >> > But it's also worth thinking whether a single pool of committers/PMC
> >> > members is
> >> > desirable.
> >> >
> >> > Tbc, I'm not sure what is the best way to achieve this within the
> >> > constraint of
> >> > the Apache fundation, and maybe I'm just stating the obvious here.
> >> >
> >> >
> >> > [1] fwiw, I say this as someone that at some points in time was
> >> > simultaneously
> >> > somewhat actively involved in both Cassandra and the DataStax Java
> >> driver.
> >> >
> >> > --
> >> > Sylvain
> >> >
> >> >
> >> > On Fri, Apr 24, 2020 at 12:54 AM Benedict Elliott Smith <
> >> benedict@apache.org>
> >> > wrote:
> >> >
> >> >> Do you have some examples of issues?
> >> >>
> >> >> So, to explain my thinking: I believe there is value in most
> >> contributors
> >> >> being able to know and understand a majority of what the project
> >> >> undertakes.  Many people track a wide variety of activity on the
> >> project,
> >> >> and whether they express an opinion they probably form one and will
> >> involve
> >> >> themselves if they consider it important to do so.  I worry that
> >> importing
> >> >> several distinct and only loosely related projects to the same
> >> governance
> >> >> and communication structures has a strong potential to undermine that
> >> >> capability, as people begin to assume that activity and
> >> decision-making is
> >> >> unrelated to them - and if that happens I think something important
> is
> >> lost.
> >> >>
> >> >> The sidecar challenges this already but seems hopefully manageable:
> it
> >> is
> >> >> a logical extension of Cassandra, existing primarily to plug gaps in
> >> >> Cassandra's own functionality, and features may migrate to Cassandra
> >> over
> >> >> time.  It is likely to have releases closely tied to Cassandra
> itself.
> >> >> Other subprojects are so far exclusively for consumption by the
> >> Cassandra
> >> >> project itself, and are all naturally coupled.
> >> >>
> >> >> Drivers however are inherently arms-length endeavours: we publish a
> >> >> protocol specification, and driver maintainers implement it.  They
> are
> >> >> otherwise fairly independent, and while a dialogue is helpful it does
> >> not
> >> >> need to be controlled by a single entity.  Many drivers will continue
> >> to be
> >> >> controlled by others, as they have been until now.  We're of course
> >> able to
> >> >> ensure there's a strong overlap of governance, which I think would be
> >> very
> >> >> helpful, and something Curator and Zookeeper seem not to have
> managed.
> >> >>
> >> >> Looking at the Curator website, it also seems to pitch itself as a
> >> >> relatively opinionated product, and much more than a driver.  I hope
> >> the
> >> >> recipe for conflict in our case is much more limited given the
> >> functional
> >> >> scope of a driver - and anyway better avoided with more integrated,
> but
> >> >> still distinct governance.
> >> >>
> >> >> That's not to say I don't see some value in the project controlling
> the
> >> >> driver directly, I just worry about the above.
> >> >>
> >> >>
> >> >>
> >> >> On 22/04/2020, 21:25, "Nate McCall" <zz...@gmail.com> wrote:
> >> >>
> >> >>    On Thu, Apr 23, 2020 at 5:37 AM Benedict Elliott Smith <
> >> >> benedict@apache.org>
> >> >>    wrote:
> >> >>
> >> >>> I welcome the donation, and hope we are able to accept all of the
> >> >>> drivers.  This is really great news IMO.
> >> >>>
> >> >>> I do however wonder if the project may be accumulating too many
> >> >>> sub-projects?  I wonder if it's time to think about splitting, and
> >> >> perhaps
> >> >>> incubating a project for the drivers?
> >> >>>
> >> >>
> >> >>    This is a legit concern and good question, but I think this is
> more
> >> a
> >> >>    natural evolution of growing a project. There is precedent for
> this
> >> in
> >> >>    Spark, Beam, Hadoop and others who have a number of different
> >> >> repositories
> >> >>    under the general project umbrella.
> >> >>
> >> >>    What I would like to avoid is a situation like with Apache Curator
> >> and
> >> >>    Apache Zookeeper. The former being a zookeeper client donation
> from
> >> >> Netflix
> >> >>    that came in as a top level project. From the peanut gallery, it
> >> seems
> >> >> like
> >> >>    that has been less than ideal a couple of times in the past
> >> >> coordinating
> >> >>    releases, trademarks and such with separate project management.
> >> >>
> >> >>
> >> >>
> >> >>
> >> >> ---------------------------------------------------------------------
> >> >> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> >> >> For additional commands, e-mail: dev-help@cassandra.apache.org
> >> >>
> >> >>
> >>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> >> For additional commands, e-mail: dev-help@cassandra.apache.org
> >>
> >>
>

Re: DataStax Driver Donation to Apache Cassandra Project

Posted by Joshua McKenzie <jm...@apache.org>.
To step out of the weeds a bit - other than the Zookeeper / Curator
example, does anyone know of any other apache projects that have either
subprojects or complementary sideprojects they're interdependent upon in
their ecosystems? I'd like to reach out to some other pmc's for advice and
feedback on this topic since there's no sense in reinventing the wheel if
other projects have wisdom to share on this.

On Mon, Apr 27, 2020 at 12:42 PM Joshua McKenzie <jm...@apache.org>
wrote:

> re: ML noise, how hard would it be to filter out JIRA updates w/component
> "Drivers"? Or from JIRA queries?
>
> For governance, I see it cutting both ways. If we have two separate
> projects and ML's for drivers and C*, how do we keep a coherent view of new
> features and roadmap stuff? Do we have CEP's for both projects and tie them
> together? Do we drive changes in the driver feature ecosystem via CEP's in
> C*?
>
> In the Venn diagram of overlap vs. non between the two projects, I see
> there being more overlap than not.
>
> On Mon, Apr 27, 2020 at 12:34 PM Dinesh Joshi <dj...@apache.org> wrote:
>
>>
>>
>> > On Apr 27, 2020, at 2:50 AM, Sylvain Lebresne <le...@gmail.com>
>> wrote:
>> >
>> > Fwiw, I agree with the concerns raised by Benedict, and think we should
>> > carefully think about how this is handled. Which isn't not a rejection
>> of
>> > the donation in any way.
>> >
>> > Drivers are not small projects, and the majority of their day to day
>> > maintenance is unrelated to the server (and the reverse is true).
>> >
>> > From the user point of view, I think it would be fabulous that Cassandra
>> > appears like one project with a server and some official drivers, with
>> one
>> > coherent website and documentation for all. I'm all for striving for
>> that.
>>
>> +1
>>
>> > Behind the scenes however, I feel tings should be setup so that some
>> amount
>> > of
>> > separation remains between server and whichever drivers are donated and
>> > accepted, or I'm fairly sure things would get messy very quickly[1]).
>> In my
>>
>> Can you say more about what "getting messy very quickly" means here?
>>
>> > mind that means *at a minimum*:
>> > - separate JIRA projects.
>> > - dedicated _dev_ (and commits) mailing lists.
>>
>> If we're thinking through how this would be setup, initially we had the
>> same Jira project for sidecar but now there is a separate one to track
>> sidecar specific jiras. At the moment we do not have a separate mailing
>> list. I think Cassandra dev mailing list's volume is low enough to keep
>> using the same ML. There is an added value that it gives visibility and
>> developers don't need to go between multiple mailing lists.
>>
>> > But it's also worth thinking whether a single pool of committers/PMC
>> > members is
>> > desirable.
>> >
>> > Tbc, I'm not sure what is the best way to achieve this within the
>> > constraint of
>> > the Apache fundation, and maybe I'm just stating the obvious here.
>> >
>> >
>> > [1] fwiw, I say this as someone that at some points in time was
>> > simultaneously
>> > somewhat actively involved in both Cassandra and the DataStax Java
>> driver.
>> >
>> > --
>> > Sylvain
>> >
>> >
>> > On Fri, Apr 24, 2020 at 12:54 AM Benedict Elliott Smith <
>> benedict@apache.org>
>> > wrote:
>> >
>> >> Do you have some examples of issues?
>> >>
>> >> So, to explain my thinking: I believe there is value in most
>> contributors
>> >> being able to know and understand a majority of what the project
>> >> undertakes.  Many people track a wide variety of activity on the
>> project,
>> >> and whether they express an opinion they probably form one and will
>> involve
>> >> themselves if they consider it important to do so.  I worry that
>> importing
>> >> several distinct and only loosely related projects to the same
>> governance
>> >> and communication structures has a strong potential to undermine that
>> >> capability, as people begin to assume that activity and
>> decision-making is
>> >> unrelated to them - and if that happens I think something important is
>> lost.
>> >>
>> >> The sidecar challenges this already but seems hopefully manageable: it
>> is
>> >> a logical extension of Cassandra, existing primarily to plug gaps in
>> >> Cassandra's own functionality, and features may migrate to Cassandra
>> over
>> >> time.  It is likely to have releases closely tied to Cassandra itself.
>> >> Other subprojects are so far exclusively for consumption by the
>> Cassandra
>> >> project itself, and are all naturally coupled.
>> >>
>> >> Drivers however are inherently arms-length endeavours: we publish a
>> >> protocol specification, and driver maintainers implement it.  They are
>> >> otherwise fairly independent, and while a dialogue is helpful it does
>> not
>> >> need to be controlled by a single entity.  Many drivers will continue
>> to be
>> >> controlled by others, as they have been until now.  We're of course
>> able to
>> >> ensure there's a strong overlap of governance, which I think would be
>> very
>> >> helpful, and something Curator and Zookeeper seem not to have managed.
>> >>
>> >> Looking at the Curator website, it also seems to pitch itself as a
>> >> relatively opinionated product, and much more than a driver.  I hope
>> the
>> >> recipe for conflict in our case is much more limited given the
>> functional
>> >> scope of a driver - and anyway better avoided with more integrated, but
>> >> still distinct governance.
>> >>
>> >> That's not to say I don't see some value in the project controlling the
>> >> driver directly, I just worry about the above.
>> >>
>> >>
>> >>
>> >> On 22/04/2020, 21:25, "Nate McCall" <zz...@gmail.com> wrote:
>> >>
>> >>    On Thu, Apr 23, 2020 at 5:37 AM Benedict Elliott Smith <
>> >> benedict@apache.org>
>> >>    wrote:
>> >>
>> >>> I welcome the donation, and hope we are able to accept all of the
>> >>> drivers.  This is really great news IMO.
>> >>>
>> >>> I do however wonder if the project may be accumulating too many
>> >>> sub-projects?  I wonder if it's time to think about splitting, and
>> >> perhaps
>> >>> incubating a project for the drivers?
>> >>>
>> >>
>> >>    This is a legit concern and good question, but I think this is more
>> a
>> >>    natural evolution of growing a project. There is precedent for this
>> in
>> >>    Spark, Beam, Hadoop and others who have a number of different
>> >> repositories
>> >>    under the general project umbrella.
>> >>
>> >>    What I would like to avoid is a situation like with Apache Curator
>> and
>> >>    Apache Zookeeper. The former being a zookeeper client donation from
>> >> Netflix
>> >>    that came in as a top level project. From the peanut gallery, it
>> seems
>> >> like
>> >>    that has been less than ideal a couple of times in the past
>> >> coordinating
>> >>    releases, trademarks and such with separate project management.
>> >>
>> >>
>> >>
>> >>
>> >> ---------------------------------------------------------------------
>> >> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>> >> For additional commands, e-mail: dev-help@cassandra.apache.org
>> >>
>> >>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>> For additional commands, e-mail: dev-help@cassandra.apache.org
>>
>>

Re: DataStax Driver Donation to Apache Cassandra Project

Posted by Joshua McKenzie <jm...@apache.org>.
re: ML noise, how hard would it be to filter out JIRA updates w/component
"Drivers"? Or from JIRA queries?

For governance, I see it cutting both ways. If we have two separate
projects and ML's for drivers and C*, how do we keep a coherent view of new
features and roadmap stuff? Do we have CEP's for both projects and tie them
together? Do we drive changes in the driver feature ecosystem via CEP's in
C*?

In the Venn diagram of overlap vs. non between the two projects, I see
there being more overlap than not.

On Mon, Apr 27, 2020 at 12:34 PM Dinesh Joshi <dj...@apache.org> wrote:

>
>
> > On Apr 27, 2020, at 2:50 AM, Sylvain Lebresne <le...@gmail.com>
> wrote:
> >
> > Fwiw, I agree with the concerns raised by Benedict, and think we should
> > carefully think about how this is handled. Which isn't not a rejection of
> > the donation in any way.
> >
> > Drivers are not small projects, and the majority of their day to day
> > maintenance is unrelated to the server (and the reverse is true).
> >
> > From the user point of view, I think it would be fabulous that Cassandra
> > appears like one project with a server and some official drivers, with
> one
> > coherent website and documentation for all. I'm all for striving for
> that.
>
> +1
>
> > Behind the scenes however, I feel tings should be setup so that some
> amount
> > of
> > separation remains between server and whichever drivers are donated and
> > accepted, or I'm fairly sure things would get messy very quickly[1]). In
> my
>
> Can you say more about what "getting messy very quickly" means here?
>
> > mind that means *at a minimum*:
> > - separate JIRA projects.
> > - dedicated _dev_ (and commits) mailing lists.
>
> If we're thinking through how this would be setup, initially we had the
> same Jira project for sidecar but now there is a separate one to track
> sidecar specific jiras. At the moment we do not have a separate mailing
> list. I think Cassandra dev mailing list's volume is low enough to keep
> using the same ML. There is an added value that it gives visibility and
> developers don't need to go between multiple mailing lists.
>
> > But it's also worth thinking whether a single pool of committers/PMC
> > members is
> > desirable.
> >
> > Tbc, I'm not sure what is the best way to achieve this within the
> > constraint of
> > the Apache fundation, and maybe I'm just stating the obvious here.
> >
> >
> > [1] fwiw, I say this as someone that at some points in time was
> > simultaneously
> > somewhat actively involved in both Cassandra and the DataStax Java
> driver.
> >
> > --
> > Sylvain
> >
> >
> > On Fri, Apr 24, 2020 at 12:54 AM Benedict Elliott Smith <
> benedict@apache.org>
> > wrote:
> >
> >> Do you have some examples of issues?
> >>
> >> So, to explain my thinking: I believe there is value in most
> contributors
> >> being able to know and understand a majority of what the project
> >> undertakes.  Many people track a wide variety of activity on the
> project,
> >> and whether they express an opinion they probably form one and will
> involve
> >> themselves if they consider it important to do so.  I worry that
> importing
> >> several distinct and only loosely related projects to the same
> governance
> >> and communication structures has a strong potential to undermine that
> >> capability, as people begin to assume that activity and decision-making
> is
> >> unrelated to them - and if that happens I think something important is
> lost.
> >>
> >> The sidecar challenges this already but seems hopefully manageable: it
> is
> >> a logical extension of Cassandra, existing primarily to plug gaps in
> >> Cassandra's own functionality, and features may migrate to Cassandra
> over
> >> time.  It is likely to have releases closely tied to Cassandra itself.
> >> Other subprojects are so far exclusively for consumption by the
> Cassandra
> >> project itself, and are all naturally coupled.
> >>
> >> Drivers however are inherently arms-length endeavours: we publish a
> >> protocol specification, and driver maintainers implement it.  They are
> >> otherwise fairly independent, and while a dialogue is helpful it does
> not
> >> need to be controlled by a single entity.  Many drivers will continue
> to be
> >> controlled by others, as they have been until now.  We're of course
> able to
> >> ensure there's a strong overlap of governance, which I think would be
> very
> >> helpful, and something Curator and Zookeeper seem not to have managed.
> >>
> >> Looking at the Curator website, it also seems to pitch itself as a
> >> relatively opinionated product, and much more than a driver.  I hope the
> >> recipe for conflict in our case is much more limited given the
> functional
> >> scope of a driver - and anyway better avoided with more integrated, but
> >> still distinct governance.
> >>
> >> That's not to say I don't see some value in the project controlling the
> >> driver directly, I just worry about the above.
> >>
> >>
> >>
> >> On 22/04/2020, 21:25, "Nate McCall" <zz...@gmail.com> wrote:
> >>
> >>    On Thu, Apr 23, 2020 at 5:37 AM Benedict Elliott Smith <
> >> benedict@apache.org>
> >>    wrote:
> >>
> >>> I welcome the donation, and hope we are able to accept all of the
> >>> drivers.  This is really great news IMO.
> >>>
> >>> I do however wonder if the project may be accumulating too many
> >>> sub-projects?  I wonder if it's time to think about splitting, and
> >> perhaps
> >>> incubating a project for the drivers?
> >>>
> >>
> >>    This is a legit concern and good question, but I think this is more a
> >>    natural evolution of growing a project. There is precedent for this
> in
> >>    Spark, Beam, Hadoop and others who have a number of different
> >> repositories
> >>    under the general project umbrella.
> >>
> >>    What I would like to avoid is a situation like with Apache Curator
> and
> >>    Apache Zookeeper. The former being a zookeeper client donation from
> >> Netflix
> >>    that came in as a top level project. From the peanut gallery, it
> seems
> >> like
> >>    that has been less than ideal a couple of times in the past
> >> coordinating
> >>    releases, trademarks and such with separate project management.
> >>
> >>
> >>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> >> For additional commands, e-mail: dev-help@cassandra.apache.org
> >>
> >>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> For additional commands, e-mail: dev-help@cassandra.apache.org
>
>

Re: DataStax Driver Donation to Apache Cassandra Project

Posted by Dinesh Joshi <dj...@apache.org>.

> On Apr 27, 2020, at 2:50 AM, Sylvain Lebresne <le...@gmail.com> wrote:
> 
> Fwiw, I agree with the concerns raised by Benedict, and think we should
> carefully think about how this is handled. Which isn't not a rejection of
> the donation in any way.
> 
> Drivers are not small projects, and the majority of their day to day
> maintenance is unrelated to the server (and the reverse is true).
> 
> From the user point of view, I think it would be fabulous that Cassandra
> appears like one project with a server and some official drivers, with one
> coherent website and documentation for all. I'm all for striving for that.

+1

> Behind the scenes however, I feel tings should be setup so that some amount
> of
> separation remains between server and whichever drivers are donated and
> accepted, or I'm fairly sure things would get messy very quickly[1]). In my

Can you say more about what "getting messy very quickly" means here?

> mind that means *at a minimum*:
> - separate JIRA projects.
> - dedicated _dev_ (and commits) mailing lists.

If we're thinking through how this would be setup, initially we had the same Jira project for sidecar but now there is a separate one to track sidecar specific jiras. At the moment we do not have a separate mailing list. I think Cassandra dev mailing list's volume is low enough to keep using the same ML. There is an added value that it gives visibility and developers don't need to go between multiple mailing lists.

> But it's also worth thinking whether a single pool of committers/PMC
> members is
> desirable.
> 
> Tbc, I'm not sure what is the best way to achieve this within the
> constraint of
> the Apache fundation, and maybe I'm just stating the obvious here.
> 
> 
> [1] fwiw, I say this as someone that at some points in time was
> simultaneously
> somewhat actively involved in both Cassandra and the DataStax Java driver.
> 
> --
> Sylvain
> 
> 
> On Fri, Apr 24, 2020 at 12:54 AM Benedict Elliott Smith <be...@apache.org>
> wrote:
> 
>> Do you have some examples of issues?
>> 
>> So, to explain my thinking: I believe there is value in most contributors
>> being able to know and understand a majority of what the project
>> undertakes.  Many people track a wide variety of activity on the project,
>> and whether they express an opinion they probably form one and will involve
>> themselves if they consider it important to do so.  I worry that importing
>> several distinct and only loosely related projects to the same governance
>> and communication structures has a strong potential to undermine that
>> capability, as people begin to assume that activity and decision-making is
>> unrelated to them - and if that happens I think something important is lost.
>> 
>> The sidecar challenges this already but seems hopefully manageable: it is
>> a logical extension of Cassandra, existing primarily to plug gaps in
>> Cassandra's own functionality, and features may migrate to Cassandra over
>> time.  It is likely to have releases closely tied to Cassandra itself.
>> Other subprojects are so far exclusively for consumption by the Cassandra
>> project itself, and are all naturally coupled.
>> 
>> Drivers however are inherently arms-length endeavours: we publish a
>> protocol specification, and driver maintainers implement it.  They are
>> otherwise fairly independent, and while a dialogue is helpful it does not
>> need to be controlled by a single entity.  Many drivers will continue to be
>> controlled by others, as they have been until now.  We're of course able to
>> ensure there's a strong overlap of governance, which I think would be very
>> helpful, and something Curator and Zookeeper seem not to have managed.
>> 
>> Looking at the Curator website, it also seems to pitch itself as a
>> relatively opinionated product, and much more than a driver.  I hope the
>> recipe for conflict in our case is much more limited given the functional
>> scope of a driver - and anyway better avoided with more integrated, but
>> still distinct governance.
>> 
>> That's not to say I don't see some value in the project controlling the
>> driver directly, I just worry about the above.
>> 
>> 
>> 
>> On 22/04/2020, 21:25, "Nate McCall" <zz...@gmail.com> wrote:
>> 
>>    On Thu, Apr 23, 2020 at 5:37 AM Benedict Elliott Smith <
>> benedict@apache.org>
>>    wrote:
>> 
>>> I welcome the donation, and hope we are able to accept all of the
>>> drivers.  This is really great news IMO.
>>> 
>>> I do however wonder if the project may be accumulating too many
>>> sub-projects?  I wonder if it's time to think about splitting, and
>> perhaps
>>> incubating a project for the drivers?
>>> 
>> 
>>    This is a legit concern and good question, but I think this is more a
>>    natural evolution of growing a project. There is precedent for this in
>>    Spark, Beam, Hadoop and others who have a number of different
>> repositories
>>    under the general project umbrella.
>> 
>>    What I would like to avoid is a situation like with Apache Curator and
>>    Apache Zookeeper. The former being a zookeeper client donation from
>> Netflix
>>    that came in as a top level project. From the peanut gallery, it seems
>> like
>>    that has been less than ideal a couple of times in the past
>> coordinating
>>    releases, trademarks and such with separate project management.
>> 
>> 
>> 
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>> For additional commands, e-mail: dev-help@cassandra.apache.org
>> 
>> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
For additional commands, e-mail: dev-help@cassandra.apache.org


Re: DataStax Driver Donation to Apache Cassandra Project

Posted by Jon Haddad <jo...@jonhaddad.com>.
Separate JIRA is enough enough, separate dev list.. maybe.  I don't see
much purpose in trying to organize into a hierarchy, what problem are you
actually solving here?  It sounds like you don't trust folks who work on
the driver to not commit random code to Cassandra, is that the case?  If
that's not a concern, I don't know what we gain by a hierarchy other than
complexity.

Every committer doesn't have to work on every part of the project, nor be
aware of the daily activity.


On Mon, Apr 27, 2020 at 10:03 AM Benedict Elliott Smith <be...@apache.org>
wrote:

> +1, this is essentially my position, and I agree with the baseline
> requirements for a merged project.  I'm not trying to rule anything out,
> just wondering what the optimal division is.
>
> I think from the user point of view we can hopefully achieve the same
> appearance with or without the same project governance.  The goal should
> absolutely be to have "official" drivers, and close association.  We can
> link them directly in the Cassandra site either way.  The question is only
> how the projects are best structured.
>
> It seems to me that drivers benefit from an umbrella structure for their
> governance and for discussing their commonalities and direction, but also
> need their own distinct lists and Jira.  So we'd be talking about going
> from a flat hierarchy to perhaps a three-tier structure, something like:
>
>                        PMC
>                     /           \
>             Drivers         Cassandra
>          /       |       \                       \
> Driver1  Dvr.2   Dvr.3 ...        Sidecar?
>
> Since drivers are functionally very different to the database server and
> its accoutrements, there will likely be very different kinds of
> discussions, with completely different release schedules - hopefully mostly
> around programmatic API UX, client-side performance, etc.  It feels to me
> intuitively like there is benefit in keeping distinct the projects with
> different focuses and technical problems, so that discussions more easily
> can happen simultaneously at the design and decision-making levels.
>
> This might not only help avoid fragmentation of the decision-making in
> this community, but also help unify decision-making across the drivers.  By
> having a decision-making body whose purview is only drivers, we might
> better emphasise collaboration between those drivers, since that is the
> body's only function.
>
> I'm not staking this out as a strongly held prior conviction, just that I
> see these problems and think we have to consider this carefully upfront, as
> I don't think this kind of decision is easy to revisit.
>
>
>
> On 27/04/2020, 10:51, "Sylvain Lebresne" <le...@gmail.com> wrote:
>
>     Fwiw, I agree with the concerns raised by Benedict, and think we should
>     carefully think about how this is handled. Which isn't not a rejection
> of
>     the donation in any way.
>
>     Drivers are not small projects, and the majority of their day to day
>     maintenance is unrelated to the server (and the reverse is true).
>
>     From the user point of view, I think it would be fabulous that
> Cassandra
>     appears like one project with a server and some official drivers, with
> one
>     coherent website and documentation for all. I'm all for striving for
> that.
>
>     Behind the scenes however, I feel tings should be setup so that some
> amount
>     of
>     separation remains between server and whichever drivers are donated and
>     accepted, or I'm fairly sure things would get messy very quickly[1]).
> In my
>     mind that means *at a minimum*:
>     - separate JIRA projects.
>     - dedicated _dev_ (and commits) mailing lists.
>
>     But it's also worth thinking whether a single pool of committers/PMC
>     members is
>     desirable.
>
>     Tbc, I'm not sure what is the best way to achieve this within the
>     constraint of
>     the Apache fundation, and maybe I'm just stating the obvious here.
>
>
>     [1] fwiw, I say this as someone that at some points in time was
>     simultaneously
>     somewhat actively involved in both Cassandra and the DataStax Java
> driver.
>
>     --
>     Sylvain
>
>
>     On Fri, Apr 24, 2020 at 12:54 AM Benedict Elliott Smith <
> benedict@apache.org>
>     wrote:
>
>     > Do you have some examples of issues?
>     >
>     > So, to explain my thinking: I believe there is value in most
> contributors
>     > being able to know and understand a majority of what the project
>     > undertakes.  Many people track a wide variety of activity on the
> project,
>     > and whether they express an opinion they probably form one and will
> involve
>     > themselves if they consider it important to do so.  I worry that
> importing
>     > several distinct and only loosely related projects to the same
> governance
>     > and communication structures has a strong potential to undermine that
>     > capability, as people begin to assume that activity and
> decision-making is
>     > unrelated to them - and if that happens I think something important
> is lost.
>     >
>     > The sidecar challenges this already but seems hopefully manageable:
> it is
>     > a logical extension of Cassandra, existing primarily to plug gaps in
>     > Cassandra's own functionality, and features may migrate to Cassandra
> over
>     > time.  It is likely to have releases closely tied to Cassandra
> itself.
>     > Other subprojects are so far exclusively for consumption by the
> Cassandra
>     > project itself, and are all naturally coupled.
>     >
>     > Drivers however are inherently arms-length endeavours: we publish a
>     > protocol specification, and driver maintainers implement it.  They
> are
>     > otherwise fairly independent, and while a dialogue is helpful it
> does not
>     > need to be controlled by a single entity.  Many drivers will
> continue to be
>     > controlled by others, as they have been until now.  We're of course
> able to
>     > ensure there's a strong overlap of governance, which I think would
> be very
>     > helpful, and something Curator and Zookeeper seem not to have
> managed.
>     >
>     > Looking at the Curator website, it also seems to pitch itself as a
>     > relatively opinionated product, and much more than a driver.  I hope
> the
>     > recipe for conflict in our case is much more limited given the
> functional
>     > scope of a driver - and anyway better avoided with more integrated,
> but
>     > still distinct governance.
>     >
>     > That's not to say I don't see some value in the project controlling
> the
>     > driver directly, I just worry about the above.
>     >
>     >
>     >
>     > On 22/04/2020, 21:25, "Nate McCall" <zz...@gmail.com> wrote:
>     >
>     >     On Thu, Apr 23, 2020 at 5:37 AM Benedict Elliott Smith <
>     > benedict@apache.org>
>     >     wrote:
>     >
>     >     > I welcome the donation, and hope we are able to accept all of
> the
>     >     > drivers.  This is really great news IMO.
>     >     >
>     >     >  I do however wonder if the project may be accumulating too
> many
>     >     > sub-projects?  I wonder if it's time to think about splitting,
> and
>     > perhaps
>     >     > incubating a project for the drivers?
>     >     >
>     >
>     >     This is a legit concern and good question, but I think this is
> more a
>     >     natural evolution of growing a project. There is precedent for
> this in
>     >     Spark, Beam, Hadoop and others who have a number of different
>     > repositories
>     >     under the general project umbrella.
>     >
>     >     What I would like to avoid is a situation like with Apache
> Curator and
>     >     Apache Zookeeper. The former being a zookeeper client donation
> from
>     > Netflix
>     >     that came in as a top level project. From the peanut gallery, it
> seems
>     > like
>     >     that has been less than ideal a couple of times in the past
>     > coordinating
>     >     releases, trademarks and such with separate project management.
>     >
>     >
>     >
>     >
>     > ---------------------------------------------------------------------
>     > To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>     > For additional commands, e-mail: dev-help@cassandra.apache.org
>     >
>     >
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> For additional commands, e-mail: dev-help@cassandra.apache.org
>
>

Re: DataStax Driver Donation to Apache Cassandra Project

Posted by Benedict Elliott Smith <be...@apache.org>.
+1, this is essentially my position, and I agree with the baseline requirements for a merged project.  I'm not trying to rule anything out, just wondering what the optimal division is.

I think from the user point of view we can hopefully achieve the same appearance with or without the same project governance.  The goal should absolutely be to have "official" drivers, and close association.  We can link them directly in the Cassandra site either way.  The question is only how the projects are best structured.

It seems to me that drivers benefit from an umbrella structure for their governance and for discussing their commonalities and direction, but also need their own distinct lists and Jira.  So we'd be talking about going from a flat hierarchy to perhaps a three-tier structure, something like:

                       PMC
                    /           \
            Drivers         Cassandra
         /       |       \                       \
Driver1  Dvr.2   Dvr.3 ...        Sidecar?

Since drivers are functionally very different to the database server and its accoutrements, there will likely be very different kinds of discussions, with completely different release schedules - hopefully mostly around programmatic API UX, client-side performance, etc.  It feels to me intuitively like there is benefit in keeping distinct the projects with different focuses and technical problems, so that discussions more easily can happen simultaneously at the design and decision-making levels.  

This might not only help avoid fragmentation of the decision-making in this community, but also help unify decision-making across the drivers.  By having a decision-making body whose purview is only drivers, we might better emphasise collaboration between those drivers, since that is the body's only function. 

I'm not staking this out as a strongly held prior conviction, just that I see these problems and think we have to consider this carefully upfront, as I don't think this kind of decision is easy to revisit.



On 27/04/2020, 10:51, "Sylvain Lebresne" <le...@gmail.com> wrote:

    Fwiw, I agree with the concerns raised by Benedict, and think we should
    carefully think about how this is handled. Which isn't not a rejection of
    the donation in any way.

    Drivers are not small projects, and the majority of their day to day
    maintenance is unrelated to the server (and the reverse is true).

    From the user point of view, I think it would be fabulous that Cassandra
    appears like one project with a server and some official drivers, with one
    coherent website and documentation for all. I'm all for striving for that.

    Behind the scenes however, I feel tings should be setup so that some amount
    of
    separation remains between server and whichever drivers are donated and
    accepted, or I'm fairly sure things would get messy very quickly[1]). In my
    mind that means *at a minimum*:
    - separate JIRA projects.
    - dedicated _dev_ (and commits) mailing lists.

    But it's also worth thinking whether a single pool of committers/PMC
    members is
    desirable.

    Tbc, I'm not sure what is the best way to achieve this within the
    constraint of
    the Apache fundation, and maybe I'm just stating the obvious here.


    [1] fwiw, I say this as someone that at some points in time was
    simultaneously
    somewhat actively involved in both Cassandra and the DataStax Java driver.

    --
    Sylvain


    On Fri, Apr 24, 2020 at 12:54 AM Benedict Elliott Smith <be...@apache.org>
    wrote:

    > Do you have some examples of issues?
    >
    > So, to explain my thinking: I believe there is value in most contributors
    > being able to know and understand a majority of what the project
    > undertakes.  Many people track a wide variety of activity on the project,
    > and whether they express an opinion they probably form one and will involve
    > themselves if they consider it important to do so.  I worry that importing
    > several distinct and only loosely related projects to the same governance
    > and communication structures has a strong potential to undermine that
    > capability, as people begin to assume that activity and decision-making is
    > unrelated to them - and if that happens I think something important is lost.
    >
    > The sidecar challenges this already but seems hopefully manageable: it is
    > a logical extension of Cassandra, existing primarily to plug gaps in
    > Cassandra's own functionality, and features may migrate to Cassandra over
    > time.  It is likely to have releases closely tied to Cassandra itself.
    > Other subprojects are so far exclusively for consumption by the Cassandra
    > project itself, and are all naturally coupled.
    >
    > Drivers however are inherently arms-length endeavours: we publish a
    > protocol specification, and driver maintainers implement it.  They are
    > otherwise fairly independent, and while a dialogue is helpful it does not
    > need to be controlled by a single entity.  Many drivers will continue to be
    > controlled by others, as they have been until now.  We're of course able to
    > ensure there's a strong overlap of governance, which I think would be very
    > helpful, and something Curator and Zookeeper seem not to have managed.
    >
    > Looking at the Curator website, it also seems to pitch itself as a
    > relatively opinionated product, and much more than a driver.  I hope the
    > recipe for conflict in our case is much more limited given the functional
    > scope of a driver - and anyway better avoided with more integrated, but
    > still distinct governance.
    >
    > That's not to say I don't see some value in the project controlling the
    > driver directly, I just worry about the above.
    >
    >
    >
    > On 22/04/2020, 21:25, "Nate McCall" <zz...@gmail.com> wrote:
    >
    >     On Thu, Apr 23, 2020 at 5:37 AM Benedict Elliott Smith <
    > benedict@apache.org>
    >     wrote:
    >
    >     > I welcome the donation, and hope we are able to accept all of the
    >     > drivers.  This is really great news IMO.
    >     >
    >     >  I do however wonder if the project may be accumulating too many
    >     > sub-projects?  I wonder if it's time to think about splitting, and
    > perhaps
    >     > incubating a project for the drivers?
    >     >
    >
    >     This is a legit concern and good question, but I think this is more a
    >     natural evolution of growing a project. There is precedent for this in
    >     Spark, Beam, Hadoop and others who have a number of different
    > repositories
    >     under the general project umbrella.
    >
    >     What I would like to avoid is a situation like with Apache Curator and
    >     Apache Zookeeper. The former being a zookeeper client donation from
    > Netflix
    >     that came in as a top level project. From the peanut gallery, it seems
    > like
    >     that has been less than ideal a couple of times in the past
    > coordinating
    >     releases, trademarks and such with separate project management.
    >
    >
    >
    >
    > ---------------------------------------------------------------------
    > To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
    > For additional commands, e-mail: dev-help@cassandra.apache.org
    >
    >



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
For additional commands, e-mail: dev-help@cassandra.apache.org


Re: DataStax Driver Donation to Apache Cassandra Project

Posted by Sylvain Lebresne <le...@gmail.com>.
Fwiw, I agree with the concerns raised by Benedict, and think we should
carefully think about how this is handled. Which isn't not a rejection of
the donation in any way.

Drivers are not small projects, and the majority of their day to day
maintenance is unrelated to the server (and the reverse is true).

From the user point of view, I think it would be fabulous that Cassandra
appears like one project with a server and some official drivers, with one
coherent website and documentation for all. I'm all for striving for that.

Behind the scenes however, I feel tings should be setup so that some amount
of
separation remains between server and whichever drivers are donated and
accepted, or I'm fairly sure things would get messy very quickly[1]). In my
mind that means *at a minimum*:
- separate JIRA projects.
- dedicated _dev_ (and commits) mailing lists.

But it's also worth thinking whether a single pool of committers/PMC
members is
desirable.

Tbc, I'm not sure what is the best way to achieve this within the
constraint of
the Apache fundation, and maybe I'm just stating the obvious here.


[1] fwiw, I say this as someone that at some points in time was
simultaneously
somewhat actively involved in both Cassandra and the DataStax Java driver.

--
Sylvain


On Fri, Apr 24, 2020 at 12:54 AM Benedict Elliott Smith <be...@apache.org>
wrote:

> Do you have some examples of issues?
>
> So, to explain my thinking: I believe there is value in most contributors
> being able to know and understand a majority of what the project
> undertakes.  Many people track a wide variety of activity on the project,
> and whether they express an opinion they probably form one and will involve
> themselves if they consider it important to do so.  I worry that importing
> several distinct and only loosely related projects to the same governance
> and communication structures has a strong potential to undermine that
> capability, as people begin to assume that activity and decision-making is
> unrelated to them - and if that happens I think something important is lost.
>
> The sidecar challenges this already but seems hopefully manageable: it is
> a logical extension of Cassandra, existing primarily to plug gaps in
> Cassandra's own functionality, and features may migrate to Cassandra over
> time.  It is likely to have releases closely tied to Cassandra itself.
> Other subprojects are so far exclusively for consumption by the Cassandra
> project itself, and are all naturally coupled.
>
> Drivers however are inherently arms-length endeavours: we publish a
> protocol specification, and driver maintainers implement it.  They are
> otherwise fairly independent, and while a dialogue is helpful it does not
> need to be controlled by a single entity.  Many drivers will continue to be
> controlled by others, as they have been until now.  We're of course able to
> ensure there's a strong overlap of governance, which I think would be very
> helpful, and something Curator and Zookeeper seem not to have managed.
>
> Looking at the Curator website, it also seems to pitch itself as a
> relatively opinionated product, and much more than a driver.  I hope the
> recipe for conflict in our case is much more limited given the functional
> scope of a driver - and anyway better avoided with more integrated, but
> still distinct governance.
>
> That's not to say I don't see some value in the project controlling the
> driver directly, I just worry about the above.
>
>
>
> On 22/04/2020, 21:25, "Nate McCall" <zz...@gmail.com> wrote:
>
>     On Thu, Apr 23, 2020 at 5:37 AM Benedict Elliott Smith <
> benedict@apache.org>
>     wrote:
>
>     > I welcome the donation, and hope we are able to accept all of the
>     > drivers.  This is really great news IMO.
>     >
>     >  I do however wonder if the project may be accumulating too many
>     > sub-projects?  I wonder if it's time to think about splitting, and
> perhaps
>     > incubating a project for the drivers?
>     >
>
>     This is a legit concern and good question, but I think this is more a
>     natural evolution of growing a project. There is precedent for this in
>     Spark, Beam, Hadoop and others who have a number of different
> repositories
>     under the general project umbrella.
>
>     What I would like to avoid is a situation like with Apache Curator and
>     Apache Zookeeper. The former being a zookeeper client donation from
> Netflix
>     that came in as a top level project. From the peanut gallery, it seems
> like
>     that has been less than ideal a couple of times in the past
> coordinating
>     releases, trademarks and such with separate project management.
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> For additional commands, e-mail: dev-help@cassandra.apache.org
>
>

Re: DataStax Driver Donation to Apache Cassandra Project

Posted by Mick Semb Wever <mc...@apache.org>.
> > - How will we run CI for these contributions?
> >
> > ASF Jenkins/CircleCI works? Do the drivers have specific needs beyond this?
> >
>  That will probably work. I asked partially because the driver CI can have
> a fairly extensive matrix of platforms, runtimes, and server versions. I'm
> not sure how much excess capacity the current Jenkins pool has.


Currently there are 36 servers, all Ubuntu. What can't be tested with
docker (ie mac and windows) would need additional servers donated.


> How should we proceed deciding sub-project vs. incubator question discussed
> here?


Maybe start gathering and writing up the PROs and CONs for each
approach in a separate doc.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
For additional commands, e-mail: dev-help@cassandra.apache.org


Re: DataStax Driver Donation to Apache Cassandra Project

Posted by Adam Holmberg <ad...@datastax.com>.
Thanks for the early input here.

> - Which major branch of the Java driver should be chosen for development?
> > -- Server currently uses Java driver 3.x but the latest is 4.x
>
> No opinions here. What are the major differences here? Could you please
> elaborate.
>
4.x is our actively developed branch. It was a major release with some
breaking changes:
https://www.datastax.com/blog/2019/03/introducing-java-driver-4

> - How will we run CI for these contributions?
>
> ASF Jenkins/CircleCI works? Do the drivers have specific needs beyond this?
>
 That will probably work. I asked partially because the driver CI can have
a fairly extensive matrix of platforms, runtimes, and server versions. I'm
not sure how much excess capacity the current Jenkins pool has.

How should we proceed deciding sub-project vs. incubator question discussed
here?

Adam

On Sat, Apr 25, 2020 at 3:42 PM Dinesh Joshi <dj...@apache.org> wrote:

> Benedict,
>
> Your concerns are valid and its great to think through issues that might
> occur in the future. I personally have never thought that the driver should
> be treated as a separate entity because as a user, Cassandra cannot be used
> _without_ a driver. Drivers are the public interface and are tightly
> coupled with the server. I personally feel that we should take the donation
> as part of the Cassandra project and if we see issues we try to resolve
> them at that point.
>
> Thanks,
>
> Dinesh
>
> > On Apr 23, 2020, at 3:54 PM, Benedict Elliott Smith <be...@apache.org>
> wrote:
> >
> > Do you have some examples of issues?
> >
> > So, to explain my thinking: I believe there is value in most
> contributors being able to know and understand a majority of what the
> project undertakes.  Many people track a wide variety of activity on the
> project, and whether they express an opinion they probably form one and
> will involve themselves if they consider it important to do so.  I worry
> that importing several distinct and only loosely related projects to the
> same governance and communication structures has a strong potential to
> undermine that capability, as people begin to assume that activity and
> decision-making is unrelated to them - and if that happens I think
> something important is lost.
> >
> > The sidecar challenges this already but seems hopefully manageable: it
> is a logical extension of Cassandra, existing primarily to plug gaps in
> Cassandra's own functionality, and features may migrate to Cassandra over
> time.  It is likely to have releases closely tied to Cassandra itself.
> Other subprojects are so far exclusively for consumption by the Cassandra
> project itself, and are all naturally coupled.
> >
> > Drivers however are inherently arms-length endeavours: we publish a
> protocol specification, and driver maintainers implement it.  They are
> otherwise fairly independent, and while a dialogue is helpful it does
> > not need to be controlled by a single entity.  Many drivers will
> continue to be controlled by others, as they have been until now.  We're of
> course able to ensure there's a strong overlap of governance, which I think
> would be very helpful, and something Curator and Zookeeper seem not to have
> managed.
> >
> > Looking at the Curator website, it also seems to pitch itself as a
> relatively opinionated product, and much more than a driver.  I hope the
> recipe for conflict in our case is much more limited given the functional
> scope of a driver - and anyway better avoided with more integrated, but
> still distinct governance.
> >
> > That's not to say I don't see some value in the project controlling the
> driver directly, I just worry about the above.
> >
> >
> >
> > On 22/04/2020, 21:25, "Nate McCall" <zz...@gmail.com> wrote:
> >
> >    On Thu, Apr 23, 2020 at 5:37 AM Benedict Elliott Smith <
> benedict@apache.org>
> >    wrote:
> >
> >> I welcome the donation, and hope we are able to accept all of the
> >> drivers.  This is really great news IMO.
> >>
> >> I do however wonder if the project may be accumulating too many
> >> sub-projects?  I wonder if it's time to think about splitting, and
> perhaps
> >> incubating a project for the drivers?
> >>
> >
> >    This is a legit concern and good question, but I think this is more a
> >    natural evolution of growing a project. There is precedent for this in
> >    Spark, Beam, Hadoop and others who have a number of different
> repositories
> >    under the general project umbrella.
> >
> >    What I would like to avoid is a situation like with Apache Curator and
> >    Apache Zookeeper. The former being a zookeeper client donation from
> Netflix
> >    that came in as a top level project. From the peanut gallery, it
> seems like
> >    that has been less than ideal a couple of times in the past
> coordinating
> >    releases, trademarks and such with separate project management.
> >
> >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> > For additional commands, e-mail: dev-help@cassandra.apache.org
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> For additional commands, e-mail: dev-help@cassandra.apache.org
>
>

-- 
Adam Holmberg
e. adam.holmberg@datastax.com
w. www.datastax.com

Re: DataStax Driver Donation to Apache Cassandra Project

Posted by Dinesh Joshi <dj...@apache.org>.
Benedict, 

Your concerns are valid and its great to think through issues that might occur in the future. I personally have never thought that the driver should be treated as a separate entity because as a user, Cassandra cannot be used _without_ a driver. Drivers are the public interface and are tightly coupled with the server. I personally feel that we should take the donation as part of the Cassandra project and if we see issues we try to resolve them at that point.

Thanks,

Dinesh

> On Apr 23, 2020, at 3:54 PM, Benedict Elliott Smith <be...@apache.org> wrote:
> 
> Do you have some examples of issues?  
> 
> So, to explain my thinking: I believe there is value in most contributors being able to know and understand a majority of what the project undertakes.  Many people track a wide variety of activity on the project, and whether they express an opinion they probably form one and will involve themselves if they consider it important to do so.  I worry that importing several distinct and only loosely related projects to the same governance and communication structures has a strong potential to undermine that capability, as people begin to assume that activity and decision-making is unrelated to them - and if that happens I think something important is lost.
> 
> The sidecar challenges this already but seems hopefully manageable: it is a logical extension of Cassandra, existing primarily to plug gaps in Cassandra's own functionality, and features may migrate to Cassandra over time.  It is likely to have releases closely tied to Cassandra itself.  Other subprojects are so far exclusively for consumption by the Cassandra project itself, and are all naturally coupled.
> 
> Drivers however are inherently arms-length endeavours: we publish a protocol specification, and driver maintainers implement it.  They are otherwise fairly independent, and while a dialogue is helpful it does
> not need to be controlled by a single entity.  Many drivers will continue to be controlled by others, as they have been until now.  We're of course able to ensure there's a strong overlap of governance, which I think would be very helpful, and something Curator and Zookeeper seem not to have managed.
> 
> Looking at the Curator website, it also seems to pitch itself as a relatively opinionated product, and much more than a driver.  I hope the recipe for conflict in our case is much more limited given the functional scope of a driver - and anyway better avoided with more integrated, but still distinct governance.
> 
> That's not to say I don't see some value in the project controlling the driver directly, I just worry about the above.
> 
> 
> 
> On 22/04/2020, 21:25, "Nate McCall" <zz...@gmail.com> wrote:
> 
>    On Thu, Apr 23, 2020 at 5:37 AM Benedict Elliott Smith <be...@apache.org>
>    wrote:
> 
>> I welcome the donation, and hope we are able to accept all of the
>> drivers.  This is really great news IMO.
>> 
>> I do however wonder if the project may be accumulating too many
>> sub-projects?  I wonder if it's time to think about splitting, and perhaps
>> incubating a project for the drivers?
>> 
> 
>    This is a legit concern and good question, but I think this is more a
>    natural evolution of growing a project. There is precedent for this in
>    Spark, Beam, Hadoop and others who have a number of different repositories
>    under the general project umbrella.
> 
>    What I would like to avoid is a situation like with Apache Curator and
>    Apache Zookeeper. The former being a zookeeper client donation from Netflix
>    that came in as a top level project. From the peanut gallery, it seems like
>    that has been less than ideal a couple of times in the past coordinating
>    releases, trademarks and such with separate project management.
> 
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> For additional commands, e-mail: dev-help@cassandra.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
For additional commands, e-mail: dev-help@cassandra.apache.org


Re: DataStax Driver Donation to Apache Cassandra Project

Posted by Benedict Elliott Smith <be...@apache.org>.
Do you have some examples of issues?  

So, to explain my thinking: I believe there is value in most contributors being able to know and understand a majority of what the project undertakes.  Many people track a wide variety of activity on the project, and whether they express an opinion they probably form one and will involve themselves if they consider it important to do so.  I worry that importing several distinct and only loosely related projects to the same governance and communication structures has a strong potential to undermine that capability, as people begin to assume that activity and decision-making is unrelated to them - and if that happens I think something important is lost.

The sidecar challenges this already but seems hopefully manageable: it is a logical extension of Cassandra, existing primarily to plug gaps in Cassandra's own functionality, and features may migrate to Cassandra over time.  It is likely to have releases closely tied to Cassandra itself.  Other subprojects are so far exclusively for consumption by the Cassandra project itself, and are all naturally coupled.

Drivers however are inherently arms-length endeavours: we publish a protocol specification, and driver maintainers implement it.  They are otherwise fairly independent, and while a dialogue is helpful it does not need to be controlled by a single entity.  Many drivers will continue to be controlled by others, as they have been until now.  We're of course able to ensure there's a strong overlap of governance, which I think would be very helpful, and something Curator and Zookeeper seem not to have managed.

Looking at the Curator website, it also seems to pitch itself as a relatively opinionated product, and much more than a driver.  I hope the recipe for conflict in our case is much more limited given the functional scope of a driver - and anyway better avoided with more integrated, but still distinct governance.

That's not to say I don't see some value in the project controlling the driver directly, I just worry about the above.



On 22/04/2020, 21:25, "Nate McCall" <zz...@gmail.com> wrote:

    On Thu, Apr 23, 2020 at 5:37 AM Benedict Elliott Smith <be...@apache.org>
    wrote:

    > I welcome the donation, and hope we are able to accept all of the
    > drivers.  This is really great news IMO.
    >
    >  I do however wonder if the project may be accumulating too many
    > sub-projects?  I wonder if it's time to think about splitting, and perhaps
    > incubating a project for the drivers?
    >

    This is a legit concern and good question, but I think this is more a
    natural evolution of growing a project. There is precedent for this in
    Spark, Beam, Hadoop and others who have a number of different repositories
    under the general project umbrella.

    What I would like to avoid is a situation like with Apache Curator and
    Apache Zookeeper. The former being a zookeeper client donation from Netflix
    that came in as a top level project. From the peanut gallery, it seems like
    that has been less than ideal a couple of times in the past coordinating
    releases, trademarks and such with separate project management.




---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
For additional commands, e-mail: dev-help@cassandra.apache.org


Re: DataStax Driver Donation to Apache Cassandra Project

Posted by Nate McCall <zz...@gmail.com>.
On Thu, Apr 23, 2020 at 5:37 AM Benedict Elliott Smith <be...@apache.org>
wrote:

> I welcome the donation, and hope we are able to accept all of the
> drivers.  This is really great news IMO.
>
>  I do however wonder if the project may be accumulating too many
> sub-projects?  I wonder if it's time to think about splitting, and perhaps
> incubating a project for the drivers?
>

This is a legit concern and good question, but I think this is more a
natural evolution of growing a project. There is precedent for this in
Spark, Beam, Hadoop and others who have a number of different repositories
under the general project umbrella.

What I would like to avoid is a situation like with Apache Curator and
Apache Zookeeper. The former being a zookeeper client donation from Netflix
that came in as a top level project. From the peanut gallery, it seems like
that has been less than ideal a couple of times in the past coordinating
releases, trademarks and such with separate project management.

Re: DataStax Driver Donation to Apache Cassandra Project

Posted by Benedict Elliott Smith <be...@apache.org>.
I welcome the donation, and hope we are able to accept all of the drivers.  This is really great news IMO.

 I do however wonder if the project may be accumulating too many sub-projects?  I wonder if it's time to think about splitting, and perhaps incubating a project for the drivers?


On 22/04/2020, 18:20, "Dinesh Joshi" <dj...@apache.org> wrote:

    Hi Adam,
    
    Great to hear from you! I personally welcome the driver donation. My views are inline below.
    
    Thanks,
    
    Dinesh
    
    > On Apr 22, 2020, at 10:00 AM, Adam Holmberg <ad...@datastax.com> wrote:
    
    > - Which drivers should be taken into project stewardship?
    > -- The project currently bundles Java and Python; there are five others:
    > C#, Node.js, C++, PHP and Ruby
    
    Java and Python at least.
    
    > - Which major branch of the Java driver should be chosen for development?
    > -- Server currently uses Java driver 3.x but the latest is 4.x
    
    No opinions here. What are the major differences here? Could you please elaborate.
    
    > - Who will be the committers that maintain these drivers? Should we
    > nominate new committers (contributors on the current drivers code-bases) so
    > they can keep maintaining them with minimal disruption to the project as a
    > whole?
    
    I generally think people who have built the code base should become committers to avoid disruption and allow continuity.
    
    > - What should the new artifacts be named in package indices (coordinates
    > and artifact names)?
    
    I am not completely sure but we may need to rename some packages but it would be really great if we could avoid breakages due to naming changes.
    
    > - How will we run CI for these contributions?
    
    ASF Jenkins/CircleCI works? Do the drivers have specific needs beyond this?
    
    > - Do we do in-tree? Sub-projects?
    
    sub-projects like cassandra-diff, sidecar, etc. This way drivers continue to evolve separately.
    
    > 
    > There will surely be even more to figure out as we go. We look forward to
    > discussing this with everyone.
    > 
    > Kind regards,
    > The DS Drivers Team
    
    
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
    For additional commands, e-mail: dev-help@cassandra.apache.org
    
    



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
For additional commands, e-mail: dev-help@cassandra.apache.org


Re: DataStax Driver Donation to Apache Cassandra Project

Posted by Dinesh Joshi <dj...@apache.org>.
Hi Adam,

Great to hear from you! I personally welcome the driver donation. My views are inline below.

Thanks,

Dinesh

> On Apr 22, 2020, at 10:00 AM, Adam Holmberg <ad...@datastax.com> wrote:

> - Which drivers should be taken into project stewardship?
> -- The project currently bundles Java and Python; there are five others:
> C#, Node.js, C++, PHP and Ruby

Java and Python at least.

> - Which major branch of the Java driver should be chosen for development?
> -- Server currently uses Java driver 3.x but the latest is 4.x

No opinions here. What are the major differences here? Could you please elaborate.

> - Who will be the committers that maintain these drivers? Should we
> nominate new committers (contributors on the current drivers code-bases) so
> they can keep maintaining them with minimal disruption to the project as a
> whole?

I generally think people who have built the code base should become committers to avoid disruption and allow continuity.

> - What should the new artifacts be named in package indices (coordinates
> and artifact names)?

I am not completely sure but we may need to rename some packages but it would be really great if we could avoid breakages due to naming changes.

> - How will we run CI for these contributions?

ASF Jenkins/CircleCI works? Do the drivers have specific needs beyond this?

> - Do we do in-tree? Sub-projects?

sub-projects like cassandra-diff, sidecar, etc. This way drivers continue to evolve separately.

> 
> There will surely be even more to figure out as we go. We look forward to
> discussing this with everyone.
> 
> Kind regards,
> The DS Drivers Team


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
For additional commands, e-mail: dev-help@cassandra.apache.org