You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by Wes McKinney <we...@gmail.com> on 2018/06/30 14:57:42 UTC

Recruiting more maintainers for Apache Arrow

hi folks,

Arrow has grown by leaps and bounds over the last 2.5 years. We are
approaching our 2000th patch and on track to surpass 200 unique
contributors by year end.

All this contribution growth is great, but it has a hidden cost: the
maintenance. The burden of maintaining the project: particularly
reviewing and merging patches, has fallen on a very small number of
people. From the commit logs, we can see how many patches each
committer has merged:

$ git shortlog -csn d5aa7c46692474376a3c31704cfc4783c86338f2..master
  1289  Wes McKinney
   268  Uwe L. Korn
    74  Korn, Uwe
    54  Antoine Pitrou
    52  Julien Le Dem
    39  Philipp Moritz
    18  Kouhei Sutou
    18  Steven Phillips
    13  Bryan Cutler
    11  Jacques Nadeau
    10  Phillip Cloud
     8  Brian Hulette
     5  Robert Nishihara
     5  adeneche
     4  GitHub
     3  Sidd
     3  siddharth
     1  AbdelHakim Deneche
     1  Your Name Here

So Uwe and I have merged ~84% of the patches in the project so far.
This isn't a completely accurate reflection of the maintainer burden,
since many others contribute to code reviews and other aspects of
patch maintenance, and you have to be a committer to earn a place on
this list.

I'm not sure what's the best way to address this problem. The quality
of our code review has declined at times as we struggle to keep up
with the flow of patches -- I don't think this is good. Having the
patch queue pile up isn't great either. Personally, I'm having a
difficult time balancing project maintenance and patch authoring,
particularly in the last 6 months.

Unfortunately, many people believe that writing patches is the primary
mode of contribution to an open source project. Apache projects
explicitly state that non-patch contributions are valued in earning
karma (committership and PMC membership). We're starting to have more
corporate contributors come out of the woodwork, and while it's great
for contributors to be paid to write patches for the project, they are
rarely given the time and space to contribute meaningfully to
maintenance.

Any thoughts about how we can grow the maintainership? Somehow we need
to reach ~5-6 core maintainers over the next year.

Thanks,
Wes

Re: Recruiting more maintainers for Apache Arrow

Posted by Wes McKinney <we...@gmail.com>.
hi Antoine,

On Sat, Jun 30, 2018 at 2:35 PM, Antoine Pitrou <an...@python.org> wrote:
>
> Hi Wes,
>
>> I'm not sure what's the best way to address this problem. The quality
>> of our code review has declined at times as we struggle to keep up
>> with the flow of patches -- I don't think this is good. Having the
>> patch queue pile up isn't great either.
>
> I'd like to do more reviews but due to the breadth of topics and
> technologies in our code base I don't feel competent for many of the PRs
> that are being posted.

As one of the top 3 maintainers (by # of patches merged) in 2018, and
the newest committer, there is no need to apologize for anything.

>
> For example, on a Rust PR I may do a brief review of concepts, APIs or
> general cleanliness, but not much more.
>
>> Personally, I'm having a
>> difficult time balancing project maintenance and patch authoring,
>> particularly in the last 6 months.
>
> I think it's ok to spend most of your time on reviewing and project
> maintenance.

That's what I will do for a while, but honestly it is creating a lot
of stress for me because we are not progressing very quickly towards a
feature-complete iteration of the columnar format and the ability to
do a 1.0 release. If I were able to spend more time writing patches, I
feel I could put more pressure on the project to reach that point
sooner.

>
>> Any thoughts about how we can grow the maintainership? Somehow we need
>> to reach ~5-6 core maintainers over the next year.
>
> Or more of them, if we want all topics to be covered by at least 1-2
> maintainers.

Agreed. As an example, Kou has done an excellent job maintaining the
C/GLib subproject and has been super responsive dealing with debugging
and packaging / release management issues.

>
> Regards
>
> Antoine.

Re: Recruiting more maintainers for Apache Arrow

Posted by Antoine Pitrou <an...@python.org>.
Hi Wes,

> I'm not sure what's the best way to address this problem. The quality
> of our code review has declined at times as we struggle to keep up
> with the flow of patches -- I don't think this is good. Having the
> patch queue pile up isn't great either.

I'd like to do more reviews but due to the breadth of topics and
technologies in our code base I don't feel competent for many of the PRs
that are being posted.

For example, on a Rust PR I may do a brief review of concepts, APIs or
general cleanliness, but not much more.

> Personally, I'm having a
> difficult time balancing project maintenance and patch authoring,
> particularly in the last 6 months.

I think it's ok to spend most of your time on reviewing and project
maintenance.

> Any thoughts about how we can grow the maintainership? Somehow we need
> to reach ~5-6 core maintainers over the next year.

Or more of them, if we want all topics to be covered by at least 1-2
maintainers.

Regards

Antoine.

Re: Recruiting more maintainers for Apache Arrow

Posted by Holden Karau <ho...@pigscanfly.ca>.
One of the things I’ve started doing in the Spark project is live code
reviews to encourage other folks to get involved in the review process and
help it seem more achievable (see
https://www.youtube.com/playlist?list=PLRLebp9QyZtYF46jlSnIu2x1NDBkKa2uw )
.

Another that I think has helped us is making it clear one of the steps to
becoming a committer (something often valued by corporate employers) is
being involved in the review process.

I don’t know how much this applies, but some of the committees have also
found our PR dashboard which gives a view of PRs that are ready to merge
and organized by area to be helpful (see
http://spark-prs.appspot.com ).

YMMV of course, but this is a problem with I spend a lot of time thinking
about (only sometimes with answers) so really interested to see where the
discussion goes.

I gave a somewhat related talk: (Dealing with Contributor Overload) at FOSS
backstage recently
https://youtu.be/XS8cTLAuHUw

I’m not really all that involved with the Arrow project but if folks would
be open to it I’d be happy to add it to my list of projects I do livestream
reviews with.

On Sat, Jun 30, 2018 at 7:58 AM Wes McKinney <we...@gmail.com> wrote:

> hi folks,
>
> Arrow has grown by leaps and bounds over the last 2.5 years. We are
> approaching our 2000th patch and on track to surpass 200 unique
> contributors by year end.
>
> All this contribution growth is great, but it has a hidden cost: the
> maintenance. The burden of maintaining the project: particularly
> reviewing and merging patches, has fallen on a very small number of
> people. From the commit logs, we can see how many patches each
> committer has merged:
>
> $ git shortlog -csn d5aa7c46692474376a3c31704cfc4783c86338f2..master
>   1289  Wes McKinney
>    268  Uwe L. Korn
>     74  Korn, Uwe
>     54  Antoine Pitrou
>     52  Julien Le Dem
>     39  Philipp Moritz
>     18  Kouhei Sutou
>     18  Steven Phillips
>     13  Bryan Cutler
>     11  Jacques Nadeau
>     10  Phillip Cloud
>      8  Brian Hulette
>      5  Robert Nishihara
>      5  adeneche
>      4  GitHub
>      3  Sidd
>      3  siddharth
>      1  AbdelHakim Deneche
>      1  Your Name Here
>
> So Uwe and I have merged ~84% of the patches in the project so far.
> This isn't a completely accurate reflection of the maintainer burden,
> since many others contribute to code reviews and other aspects of
> patch maintenance, and you have to be a committer to earn a place on
> this list.
>
> I'm not sure what's the best way to address this problem. The quality
> of our code review has declined at times as we struggle to keep up
> with the flow of patches -- I don't think this is good. Having the
> patch queue pile up isn't great either. Personally, I'm having a
> difficult time balancing project maintenance and patch authoring,
> particularly in the last 6 months.
>
> Unfortunately, many people believe that writing patches is the primary
> mode of contribution to an open source project. Apache projects
> explicitly state that non-patch contributions are valued in earning
> karma (committership and PMC membership). We're starting to have more
> corporate contributors come out of the woodwork, and while it's great
> for contributors to be paid to write patches for the project, they are
> rarely given the time and space to contribute meaningfully to
> maintenance.
>
> Any thoughts about how we can grow the maintainership? Somehow we need
> to reach ~5-6 core maintainers over the next year.
>
> Thanks,
> Wes
>
-- 
Twitter: https://twitter.com/holdenkarau

Re: Recruiting more maintainers for Apache Arrow

Posted by Wes McKinney <we...@gmail.com>.
hi Marco,

some comments inline

On Sat, Jun 30, 2018 at 2:15 PM, Marco Neumann
<ma...@crepererum.net.invalid> wrote:
> Hey,
>
> first of all, thanks a lot for your, Uwes, the mergers and contributors
> work. Now, to the maintainer problem:
>
> # Arrow as "a library"
> One thing that makes Arrow special is that it is not a single, but many
> libraries (one for each language) and many of them are not only a
> binding to a C/C++ lib, but partly a complete re-implementation of the
> protocol, e.g.:
>
> - C++: one core, but also contains Python specialties
> - Java: another core
> - Rust: yet another core
> - Python: a binding to C++ but also a lot more stuff because of Pandas
> ...
>
> And you two are maintaining all of them and I doubt that you have the
> capacities and knowledge to do this at the desired level of quality
> (which is natural, not a personal issue or offense). So this I would
> call "pseudo-maintenance", since you're solely the gatekeeper that does
> some shallow reviewing and has the burden to do the housekeeping and
> the merging. So why accepting these language bindings in the first
> place without bringing a core maintainer in place? For example, let's
> say someone proposes a binding to Haskell now. That should not be
> accepted as part of the official Apache implementation without a
> dedicated maintainer (ideally the PR-author would be that person, but
> there may others who step up).

The most development activity, and where we have the most need of
help, is in C++ and Python. The other area is in dev/CI infrastructure
and release management.

We're falling behind on implementation and design work involving
Java-land (I have been trying for about a year to hammer down an
improved Interval type), but that's a separate problem.

We are about to reach a point (particularly if Gandiva becomes part of
Apache Arrow) where more languages will become dependent on the C++
library. This makes the need for more C++ maintainers even more
urgent.

I think the other libraries have done a good job of self-managing
their code (e.g. Java, JavaScript), and I frequently merge patches
when there is a +1 or some other consensus.

>
> Right now, it might be too late to remove some of the incomplete / WIP
> implementations that don't have a core maintainer though.

Honestly, the incomplete/WIP projects are not causing any maintenance
burden. It's the main projects and their development lifecycle that is
creating a lot of work.

>
> # GitHub
> Another special thing to consider is that Arrow is (ab)using GitHub as
> a code hosting platform. Even as a contributor, this has obvious bad
> uncool consequences:

I think these issues are red herrings. If maintainers are more
motivated by the gamification of their open source contributions
rather than the health and success of the proejct, I really question
how valuable of a maintainer they are.

>
> - you have yet another issue hosting system to log in

I strongly dispute the notion that using JIRA is a deterrent to
maintainers. If anyone, it's a filter for drive-by contributors and
unserious maintainers. I say this as the project's primary JIRA
gardener.

> - there is yet another information channel to keep track of (this ML
>   for example, which has a semi-informative web interface telling you
>   can only login using Google but does not tell you how to subscribe to
>   the list)
> - links to issues don't work in the known magic way

I think these things might deter passers-by, but I don't see why they
would be a problem for someone who is concerned with the health of the
project. As the primary maintainer of the project, these things don't
impact me in any way.

> - you're merging the PRs by closing them; which is by all means a not
>   very nice way because it does not reflect the contributors work in
>   the project overview and personal profiles, but exactly this is a
>   large part of the GitHub community (btw: merging PRs without using
>   GitHubs merge button IS possible as bors/bors-ng proof)

For each patch you contribute, you get one contribution "point" on
GitHub, but it won't show that you have a PR "merged". I don't see why
we should have to comply with GitHub's gamified approach to open
source.

>
> So as a potential maintainer, this is already a bumper, since I know
> that there are things less confortable then the system I would get from
> any normal GitHub or Gitlab project.
>
> I'm not really sure how to solve this or if it should be solved (read
> about the laziness aspect in "Contribution VS Maintenance" below)

I don't mean to be too dismissive of these concerns (they are common;
people have a difficult time with change) -- I've been long critical
of people concerned with their "GitHub High Score". See some writing
on this from a while ago:
http://wesmckinney.com/blog/github-open-source-contributions/

>
> # Time / Payment
> Yes, this is indeed a big issue. From what I can tell from the open
> source projects I was involved in is that for large contributor crowds,
> you normally have full/half-time positions in place for the core
> maintainer (look at the Mozilla projects, the Blender Foundation, Gnome
> / Red Hat). So at one point I think maintaining isn't a part time /
> hobby thing anymore (w/o downgrading the hard work of Hobby-
> contributors, in contrast). I don't have a link at hand, but I recall
> some discussion about GitHub and it's importance for hiring (since it
> it acts as a CV) after MS bought it, and some of the responses are
> "doing all this work in your free time is a privilege of wealthy,
> mostly-white men", which without signing this statement in this really
> bare form already shows a problem of open source world.
>
> # Contribution VS Maintenance
> The very "nice" thing about patch/PR contribution is that you do your
> work and then you can walk away and it's the maintainers problem to
> release the artifact, upgrade/migrate your code and ensure that the
> tests you've written never break. It's comfortable. Being a maintainer
> means all the opposite things. And in the end, you get blamed for not
> supporting certain features (see the open source paragraph here https:/
> /blog.ghost.org/5/ ) or for security disasters (remember the OpenSSL
> disaster).
>
> I think together with the previous point this means, we have to get
> companies to pay for that work, and not just dump their features to an
> OSS repo.

This is a huge problem. I have recently made some significant personal
financial sacrifices to be able to engineer an arrangement where I can
provide more scalable full-time employment opportunities for Apache
Arrow maintainers. See:
http://wesmckinney.com/blog/announcing-ursalabs/.

Particularly in the United States, full-time employment is very
important to have health care and other benefits, so the best scenario
is for companies to sponsor full-time (100%, not 20%) maintainers.
What I have seen happen all too often is that a person might start out
spending 50-80% of their time doing OSS maintenance, and at some point
they get reassigned to proprietary projects and stop doing
maintenance.

>
> # Path to Maintainership
> So I think (from my narrow point of view!) that many people expect that
> the path from "outsider" to "maintainer" takes the route over "a lot of
> patch/PR contributions". If I'm reading your mail right, that is not
> necessarily the case for Apache projects and I think that's great. The
> "review PRs" path sounds great, but I think GitHub or any platform I'm
> aware don't do a good job in getting people to do so. I mean, I see a
> PR and a can leave a review, but for me it is not really clear which
> consequences this have (naturally, random people don't have a veto on
> changes). So I can jump in when I think something is wrong, but I
> cannot approve a PR. This makes sense, but it poses the question of
> "how?!". I mean, it is pretty clear on how to become a patch/PR
> contributor, but it is not clear on how to become a maintainer, at
> least not in an easy way. (I'm sure it's written down somewhere).

Since we just started a project wiki
(https://cwiki.apache.org/confluence/display/ARROW), I can write down
a list of all the things that I regularly do as a maintainer.

Being a "maintainer" is a project leadership role; you are a "prime
mover". it means you are doing all of the things that help the project
stay organized, move forward, and periodically make releases. I took
it upon myself to be the Arrow prime mover from the early days of the
project, but we now have a large enough user and contributor base that
it is unfair to me to continue bearing the load that I have in the
past.

>
> So, overall I think a clear Call for Action at the top of the README
> could help. Like "Hey, we're looking for maintainers, you could start
> by reviewing some PRs and after some reviews maintainers will just be
> the last gatekeeper and after some more time, you can even merge PRs on
> your own".
>
> # My personal contribution
> Triggered by this call for help, I'll try to get more involved in
> Python, C++ and Rust reviews.
>
> So, these are some thoughts that I hope may help.
>

Thanks for these comments, and much appreciate your help!

> Thanks again for addressing this issue and your time and passion,
> Marco
>
> On 2018/06/30 14:57:42, Wes McKinney <w....@gmail.com> wrote:
>> hi folks,>
>>
>> Arrow has grown by leaps and bounds over the last 2.5 years. We are>
>> approaching our 2000th patch and on track to surpass 200 unique>
>> contributors by year end.>
>>
>> All this contribution growth is great, but it has a hidden cost:
>
> the>
>> maintenance. The burden of maintaining the project: particularly>
>> reviewing and merging patches, has fallen on a very small number of>
>> people. From the commit logs, we can see how many patches each>
>> committer has merged:>
>>
>> $ git shortlog -csn
>
> d5aa7c46692474376a3c31704cfc4783c86338f2..master>
>>   1289  Wes McKinney>
>>    268  Uwe L. Korn>
>>     74  Korn, Uwe>
>>     54  Antoine Pitrou>
>>     52  Julien Le Dem>
>>     39  Philipp Moritz>
>>     18  Kouhei Sutou>
>>     18  Steven Phillips>
>>     13  Bryan Cutler>
>>     11  Jacques Nadeau>
>>     10  Phillip Cloud>
>>      8  Brian Hulette>
>>      5  Robert Nishihara>
>>      5  adeneche>
>>      4  GitHub>
>>      3  Sidd>
>>      3  siddharth>
>>      1  AbdelHakim Deneche>
>>      1  Your Name Here>
>>
>> So Uwe and I have merged ~84% of the patches in the project so far.>
>> This isn't a completely accurate reflection of the maintainer
>
> burden,>
>> since many others contribute to code reviews and other aspects of>
>> patch maintenance, and you have to be a committer to earn a place
>
> on>
>> this list.>
>>
>> I'm not sure what's the best way to address this problem. The
>
> quality>
>> of our code review has declined at times as we struggle to keep up>
>> with the flow of patches -- I don't think this is good. Having the>
>> patch queue pile up isn't great either. Personally, I'm having a>
>> difficult time balancing project maintenance and patch authoring,>
>> particularly in the last 6 months.>
>>
>> Unfortunately, many people believe that writing patches is the
>
> primary>
>> mode of contribution to an open source project. Apache projects>
>> explicitly state that non-patch contributions are valued in earning>
>> karma (committership and PMC membership). We're starting to have
>
> more>
>> corporate contributors come out of the woodwork, and while it's
>
> great>
>> for contributors to be paid to write patches for the project, they
>
> are>
>> rarely given the time and space to contribute meaningfully to>
>> maintenance.>
>>
>> Any thoughts about how we can grow the maintainership? Somehow we
>
> need>
>> to reach ~5-6 core maintainers over the next year.>
>>
>> Thanks,>
>> Wes>

Re: A renewed plea for help [was Re: Recruiting more maintainers for Apache Arrow]

Posted by Micah Kornfield <em...@gmail.com>.
Hi Wes,
I can take a stab at going through the older C++/Python PRs as a first pass
triage (I also appreciate that I'm just getting back to project, so if you
prefer I hold off on this I understand).

Is there a good mechanism to have a committer/PMC member look at PRs after
a first pass by a contributor?  (e.g. round-robin or by area of focus)

Thanks,
Micah

On Tue, Jan 22, 2019 at 2:58 PM Wes McKinney <we...@gmail.com> wrote:

> hi folks,
>
> It's been 3 months since I sent this e-mail so I thought I would
> follow up about where things stand. The project continues to grow very
> fast and so there is a ton of pull request and JIRA gardening to do.
> For the 0.12.0 release, my personal burden of patch-merging stood at
> about 50%.
>
> http://arrow.apache.org/release/0.12.0.html#patch-committers
>
> We currently have 93 pull requests open. There are some stale PRs but
> for the most part these are good / mostly-non-stale PRs in occasional
> need of rebasing or following up with contributors about making
> changes.
>
> There were 1540 patches merged into the project in 2018 (excluding the
> Parquet merge) -- that's more than 4 patches per day. Evidence
> suggests that the overall patch count for 2019 will be even higher; if
> I had to guess somewhere well over 2000. Out of last year's patches, I
> merged 1028, i.e. 2 out of every 3. If we are to be able to take on
> 2000 or more patches this year, we'll need more help. If you are
> neither a committer nor a PMC member, you can still help with code
> review and discussions to help contributors get their work into
> merge-ready state.
>
> While I would like to the share of patch maintenance more distributed,
> I'll do what I have to in order to keep the patches flowing as fast as
> possible into master, but contributors and other maintainers can help
> with the Always Be Closing mindset -- the 80/20 rule or 90/10 rule
> frequently applies. In many cases it is better to merge a patch and
> open up a JIRA for follow up improvements if there is uncertainty
> about whether something is "done". As they say "Done [Merged] is
> better than Perfect" (as long as the build isn't broken)
>
> Additionally, please be proactive about opening JIRA issues. Out of
> the last 1000 issues created (ARROW-4326 to ARROW-3327), 501 of them
> were created by just 5 people (Wes, Antoine, Uwe, Krisztian, Kou). In
> accordance with the "Always Be Closing" mindset, I frequently create
> issues as a way of closing a discussion where there is no urgent next
> step, but some work needs to be done in the future. We need to capture
> information and file it away as efficiently as possible so we can move
> on to other work.
>
> Thank you,
> Wes
>
> On Mon, Oct 15, 2018 at 9:13 AM Wes McKinney <we...@gmail.com> wrote:
> >
> > hi folks,
> >
> > It's been a few months, but as Apache Arrow is rapidly becoming a
> > critical dependency of next-generation data applications (see, for
> > example, RAPIDS just launched by NVIDIA http://rapids.ai/), we are
> > quite seriously in need of more project maintainers, or in lieu of new
> > individual contributors, additional direct funding. We are especially
> > in need of corporations dependent on this software to help carry the
> > load of JIRA gardening, code review, build and CI tooling, packaging
> > automation, developer workflow tools, and so on.
> >
> > One of the casualties of the growing maintenance burden of this
> > project is that it's increasingly difficult for people like me who
> > know the project internals very well to allocate time to working on
> > new functionality. When I talk to people about the project they often
> > ask me things like "When will X, Y, or Z functionality be ready?" and
> > my answer is often "I don't know, it depends on whether more people
> > show up to help with the maintenance workload so people can spend more
> > time building new things". This is coupled with the frustration that
> > newcomers can experience where the learning curve is very steep to be
> > able to contribute significantly to new functionality. The only way
> > out is to recruit more people to help keep things orderly, take out
> > the proverbial garbage, and keep the project healthy.
> >
> > If anyone reading has the bandwidth to help with maintaining the
> > project, or to contribute funds to support maintenance, please let us
> > know.
> >
> > Special thanks to Antoine, Kou, Kristztian, Phillip, and Uwe for their
> > work on tooling, packaging, and other development processes for the
> > 0.10 and 0.11 releases.
> >
> > Thanks,
> > Wes
> >
> > On Mon, Jul 2, 2018 at 11:40 AM Antoine Pitrou <an...@python.org>
> wrote:
> > >
> > >
> > > Hi,
> > >
> > > Le 02/07/2018 à 15:58, Wes McKinney a écrit :
> > > > * http://ivory.idyll.org/blog/2018-how-open-is-too-open.html
> > > > * http://ivory.idyll.org/blog/2018-oss-framework-cpr.html
> > >
> > > Very good articles, but I would stress that some of the mechanisms
> > > proposed lack metrics in their favour.  Two particular examples that I
> > > know about:
> > >
> > > 1)
> > >
> > > """ I seem to recall Martin van Loewis offering to review one
> externally
> > > contributed patch for every ten other patches reviewed by the
> submitter.
> > > (I can’t find the link, sorry!) This imposes work requirements on
> > > would-be contributors that obligate them to contribute substantively to
> > > the project maintenance, before their pet feature gets implemented. """
> > >
> > > Martin's offer was almost never taken up, although he expressed it many
> > > times during many years.  I think there are two factors to it:
> > >
> > > a) Cost.  As an occasional contributor, I could understand having to do
> > > a review before contributing a patch of mine, but not having to do 5 or
> > > more reviews for each patch I contribute.  The effort asked is much too
> > > high, and you're probably discouraging people who are discovering the
> > > project, even before they could get hooked on it.
> > >
> > > b) Difficult.  It's much more difficult and intimidating to review
> > > someone else's PR, than to propose your own changes knowing that it
> will
> > > be reviewed by (you are assuming) competent people.  So this mechanism
> > > is excluding first-time contributors, which is probably *not* what you
> want.
> > >
> > > 2)
> > >
> > > """ Some projects have excellent incubators, like the Python Core
> > > Mentorship Program, where people who are interested in applying their
> > > effort to recruiting new contributors can do so. """
> > >
> > > Actually, it doesn't seem to me that a significant proportion of
> > > frequent Python contributors have gone through the core mentorship
> > > process.  It probably got us a handful of one-time contributions.
> > > Pointing to the Python core mentorship program as an "excellent
> > > incubator" sounds rather far-fetched to me.
> > >
> > > Generally speaking, there's a limit to the usefulness of hand-holding
> > > contributors, especially if your project is rather complex (as Python
> > > is), because the blocking point for contributors is *not* that the
> > > development mailing-list is a bit intimidating (as was claimed by the
> > > people who founded the Python core mentorship program).
> > >
> > >
> > > PS : as a matter of fact, the general rate of contributions to Python
> > > has been *decreasing* for years.
> > >
> > > Regards
> > >
> > > Antoine.
>

Re: A renewed plea for help [was Re: Recruiting more maintainers for Apache Arrow]

Posted by Wes McKinney <we...@gmail.com>.
hi Antoine,

On Wed, Jan 23, 2019 at 4:35 AM Antoine Pitrou <so...@pitrou.net> wrote:
>
> On Tue, 22 Jan 2019 16:57:42 -0600
> Wes McKinney <we...@gmail.com> wrote:
> >
> > There were 1540 patches merged into the project in 2018 (excluding the
> > Parquet merge) -- that's more than 4 patches per day. Evidence
> > suggests that the overall patch count for 2019 will be even higher; if
> > I had to guess somewhere well over 2000. Out of last year's patches, I
> > merged 1028, i.e. 2 out of every 3. If we are to be able to take on
> > 2000 or more patches this year, we'll need more help. If you are
> > neither a committer nor a PMC member, you can still help with code
> > review and discussions to help contributors get their work into
> > merge-ready state.
>
> I generally try to review as many PRs as I feel competent to.
>
> What should be the guideline when some PRs for other implementations
> (such as C#, Java...) are lingering on?

I have generally taken the approach of merging patches when the builds
are passing and there has been some code review. In the case of C# as
an example, we don't have consistent reviewers so I will generally
glance through the code (5 minutes or less) to make sure I see nothing
terribly concerning, or to catch other problems like accidental
changes to other files in rebase conflicts, or binary files
accidentally checked in.

It is also helpful to ping people about stale PRs to keep them engaged.

>
> > I'll do what I have to in order to keep the patches flowing as fast as
> > possible into master, but contributors and other maintainers can help
> > with the Always Be Closing mindset -- the 80/20 rule or 90/10 rule
> > frequently applies. In many cases it is better to merge a patch and
> > open up a JIRA for follow up improvements if there is uncertainty
> > about whether something is "done".
>
> I'm quite wary of technical debt (which can quickly plague fast-growing
> projects) so I tend to be a bit demanding in my reviews :-)

I'm also wary of technical debt -- I am definitely not suggesting to
merge patches that you are not comfortable with! =)

I have noticed that sometimes patches may get left in a broken state
while also falling short of addressing all of the review comments. I
would prefer to see a 90% finished patch with a passing build than any
kind of broken build. Whether or not that last 10% needs to get done
in that patch or in a follow up patch depends.

I frequently will step in to "carry" patches when there are small
fixes necessary to get a passing build so something can be merged.
What "carry" means can depend a lot; e.g. rebasing or fixing lint
errors is common. Ideally contributors will take responsibility for
getting a patch into a merge-ready state in a timely fashion, but not
always.

- Wes

>
> Regards
>
> Antoine.
>
>

Re: A renewed plea for help [was Re: Recruiting more maintainers for Apache Arrow]

Posted by Antoine Pitrou <so...@pitrou.net>.
On Tue, 22 Jan 2019 16:57:42 -0600
Wes McKinney <we...@gmail.com> wrote:
> 
> There were 1540 patches merged into the project in 2018 (excluding the
> Parquet merge) -- that's more than 4 patches per day. Evidence
> suggests that the overall patch count for 2019 will be even higher; if
> I had to guess somewhere well over 2000. Out of last year's patches, I
> merged 1028, i.e. 2 out of every 3. If we are to be able to take on
> 2000 or more patches this year, we'll need more help. If you are
> neither a committer nor a PMC member, you can still help with code
> review and discussions to help contributors get their work into
> merge-ready state.

I generally try to review as many PRs as I feel competent to.

What should be the guideline when some PRs for other implementations
(such as C#, Java...) are lingering on?

> I'll do what I have to in order to keep the patches flowing as fast as
> possible into master, but contributors and other maintainers can help
> with the Always Be Closing mindset -- the 80/20 rule or 90/10 rule
> frequently applies. In many cases it is better to merge a patch and
> open up a JIRA for follow up improvements if there is uncertainty
> about whether something is "done".

I'm quite wary of technical debt (which can quickly plague fast-growing
projects) so I tend to be a bit demanding in my reviews :-)

Regards

Antoine.



Re: A renewed plea for help [was Re: Recruiting more maintainers for Apache Arrow]

Posted by Wes McKinney <we...@gmail.com>.
hi folks,

It's been 3 months since I sent this e-mail so I thought I would
follow up about where things stand. The project continues to grow very
fast and so there is a ton of pull request and JIRA gardening to do.
For the 0.12.0 release, my personal burden of patch-merging stood at
about 50%.

http://arrow.apache.org/release/0.12.0.html#patch-committers

We currently have 93 pull requests open. There are some stale PRs but
for the most part these are good / mostly-non-stale PRs in occasional
need of rebasing or following up with contributors about making
changes.

There were 1540 patches merged into the project in 2018 (excluding the
Parquet merge) -- that's more than 4 patches per day. Evidence
suggests that the overall patch count for 2019 will be even higher; if
I had to guess somewhere well over 2000. Out of last year's patches, I
merged 1028, i.e. 2 out of every 3. If we are to be able to take on
2000 or more patches this year, we'll need more help. If you are
neither a committer nor a PMC member, you can still help with code
review and discussions to help contributors get their work into
merge-ready state.

While I would like to the share of patch maintenance more distributed,
I'll do what I have to in order to keep the patches flowing as fast as
possible into master, but contributors and other maintainers can help
with the Always Be Closing mindset -- the 80/20 rule or 90/10 rule
frequently applies. In many cases it is better to merge a patch and
open up a JIRA for follow up improvements if there is uncertainty
about whether something is "done". As they say "Done [Merged] is
better than Perfect" (as long as the build isn't broken)

Additionally, please be proactive about opening JIRA issues. Out of
the last 1000 issues created (ARROW-4326 to ARROW-3327), 501 of them
were created by just 5 people (Wes, Antoine, Uwe, Krisztian, Kou). In
accordance with the "Always Be Closing" mindset, I frequently create
issues as a way of closing a discussion where there is no urgent next
step, but some work needs to be done in the future. We need to capture
information and file it away as efficiently as possible so we can move
on to other work.

Thank you,
Wes

On Mon, Oct 15, 2018 at 9:13 AM Wes McKinney <we...@gmail.com> wrote:
>
> hi folks,
>
> It's been a few months, but as Apache Arrow is rapidly becoming a
> critical dependency of next-generation data applications (see, for
> example, RAPIDS just launched by NVIDIA http://rapids.ai/), we are
> quite seriously in need of more project maintainers, or in lieu of new
> individual contributors, additional direct funding. We are especially
> in need of corporations dependent on this software to help carry the
> load of JIRA gardening, code review, build and CI tooling, packaging
> automation, developer workflow tools, and so on.
>
> One of the casualties of the growing maintenance burden of this
> project is that it's increasingly difficult for people like me who
> know the project internals very well to allocate time to working on
> new functionality. When I talk to people about the project they often
> ask me things like "When will X, Y, or Z functionality be ready?" and
> my answer is often "I don't know, it depends on whether more people
> show up to help with the maintenance workload so people can spend more
> time building new things". This is coupled with the frustration that
> newcomers can experience where the learning curve is very steep to be
> able to contribute significantly to new functionality. The only way
> out is to recruit more people to help keep things orderly, take out
> the proverbial garbage, and keep the project healthy.
>
> If anyone reading has the bandwidth to help with maintaining the
> project, or to contribute funds to support maintenance, please let us
> know.
>
> Special thanks to Antoine, Kou, Kristztian, Phillip, and Uwe for their
> work on tooling, packaging, and other development processes for the
> 0.10 and 0.11 releases.
>
> Thanks,
> Wes
>
> On Mon, Jul 2, 2018 at 11:40 AM Antoine Pitrou <an...@python.org> wrote:
> >
> >
> > Hi,
> >
> > Le 02/07/2018 à 15:58, Wes McKinney a écrit :
> > > * http://ivory.idyll.org/blog/2018-how-open-is-too-open.html
> > > * http://ivory.idyll.org/blog/2018-oss-framework-cpr.html
> >
> > Very good articles, but I would stress that some of the mechanisms
> > proposed lack metrics in their favour.  Two particular examples that I
> > know about:
> >
> > 1)
> >
> > """ I seem to recall Martin van Loewis offering to review one externally
> > contributed patch for every ten other patches reviewed by the submitter.
> > (I can’t find the link, sorry!) This imposes work requirements on
> > would-be contributors that obligate them to contribute substantively to
> > the project maintenance, before their pet feature gets implemented. """
> >
> > Martin's offer was almost never taken up, although he expressed it many
> > times during many years.  I think there are two factors to it:
> >
> > a) Cost.  As an occasional contributor, I could understand having to do
> > a review before contributing a patch of mine, but not having to do 5 or
> > more reviews for each patch I contribute.  The effort asked is much too
> > high, and you're probably discouraging people who are discovering the
> > project, even before they could get hooked on it.
> >
> > b) Difficult.  It's much more difficult and intimidating to review
> > someone else's PR, than to propose your own changes knowing that it will
> > be reviewed by (you are assuming) competent people.  So this mechanism
> > is excluding first-time contributors, which is probably *not* what you want.
> >
> > 2)
> >
> > """ Some projects have excellent incubators, like the Python Core
> > Mentorship Program, where people who are interested in applying their
> > effort to recruiting new contributors can do so. """
> >
> > Actually, it doesn't seem to me that a significant proportion of
> > frequent Python contributors have gone through the core mentorship
> > process.  It probably got us a handful of one-time contributions.
> > Pointing to the Python core mentorship program as an "excellent
> > incubator" sounds rather far-fetched to me.
> >
> > Generally speaking, there's a limit to the usefulness of hand-holding
> > contributors, especially if your project is rather complex (as Python
> > is), because the blocking point for contributors is *not* that the
> > development mailing-list is a bit intimidating (as was claimed by the
> > people who founded the Python core mentorship program).
> >
> >
> > PS : as a matter of fact, the general rate of contributions to Python
> > has been *decreasing* for years.
> >
> > Regards
> >
> > Antoine.

A renewed plea for help [was Re: Recruiting more maintainers for Apache Arrow]

Posted by Wes McKinney <we...@gmail.com>.
hi folks,

It's been a few months, but as Apache Arrow is rapidly becoming a
critical dependency of next-generation data applications (see, for
example, RAPIDS just launched by NVIDIA http://rapids.ai/), we are
quite seriously in need of more project maintainers, or in lieu of new
individual contributors, additional direct funding. We are especially
in need of corporations dependent on this software to help carry the
load of JIRA gardening, code review, build and CI tooling, packaging
automation, developer workflow tools, and so on.

One of the casualties of the growing maintenance burden of this
project is that it's increasingly difficult for people like me who
know the project internals very well to allocate time to working on
new functionality. When I talk to people about the project they often
ask me things like "When will X, Y, or Z functionality be ready?" and
my answer is often "I don't know, it depends on whether more people
show up to help with the maintenance workload so people can spend more
time building new things". This is coupled with the frustration that
newcomers can experience where the learning curve is very steep to be
able to contribute significantly to new functionality. The only way
out is to recruit more people to help keep things orderly, take out
the proverbial garbage, and keep the project healthy.

If anyone reading has the bandwidth to help with maintaining the
project, or to contribute funds to support maintenance, please let us
know.

Special thanks to Antoine, Kou, Kristztian, Phillip, and Uwe for their
work on tooling, packaging, and other development processes for the
0.10 and 0.11 releases.

Thanks,
Wes

On Mon, Jul 2, 2018 at 11:40 AM Antoine Pitrou <an...@python.org> wrote:
>
>
> Hi,
>
> Le 02/07/2018 à 15:58, Wes McKinney a écrit :
> > * http://ivory.idyll.org/blog/2018-how-open-is-too-open.html
> > * http://ivory.idyll.org/blog/2018-oss-framework-cpr.html
>
> Very good articles, but I would stress that some of the mechanisms
> proposed lack metrics in their favour.  Two particular examples that I
> know about:
>
> 1)
>
> """ I seem to recall Martin van Loewis offering to review one externally
> contributed patch for every ten other patches reviewed by the submitter.
> (I can’t find the link, sorry!) This imposes work requirements on
> would-be contributors that obligate them to contribute substantively to
> the project maintenance, before their pet feature gets implemented. """
>
> Martin's offer was almost never taken up, although he expressed it many
> times during many years.  I think there are two factors to it:
>
> a) Cost.  As an occasional contributor, I could understand having to do
> a review before contributing a patch of mine, but not having to do 5 or
> more reviews for each patch I contribute.  The effort asked is much too
> high, and you're probably discouraging people who are discovering the
> project, even before they could get hooked on it.
>
> b) Difficult.  It's much more difficult and intimidating to review
> someone else's PR, than to propose your own changes knowing that it will
> be reviewed by (you are assuming) competent people.  So this mechanism
> is excluding first-time contributors, which is probably *not* what you want.
>
> 2)
>
> """ Some projects have excellent incubators, like the Python Core
> Mentorship Program, where people who are interested in applying their
> effort to recruiting new contributors can do so. """
>
> Actually, it doesn't seem to me that a significant proportion of
> frequent Python contributors have gone through the core mentorship
> process.  It probably got us a handful of one-time contributions.
> Pointing to the Python core mentorship program as an "excellent
> incubator" sounds rather far-fetched to me.
>
> Generally speaking, there's a limit to the usefulness of hand-holding
> contributors, especially if your project is rather complex (as Python
> is), because the blocking point for contributors is *not* that the
> development mailing-list is a bit intimidating (as was claimed by the
> people who founded the Python core mentorship program).
>
>
> PS : as a matter of fact, the general rate of contributions to Python
> has been *decreasing* for years.
>
> Regards
>
> Antoine.

Re: Recruiting more maintainers for Apache Arrow

Posted by Antoine Pitrou <an...@python.org>.
Hi,

Le 02/07/2018 à 15:58, Wes McKinney a écrit :
> * http://ivory.idyll.org/blog/2018-how-open-is-too-open.html
> * http://ivory.idyll.org/blog/2018-oss-framework-cpr.html

Very good articles, but I would stress that some of the mechanisms
proposed lack metrics in their favour.  Two particular examples that I
know about:

1)

""" I seem to recall Martin van Loewis offering to review one externally
contributed patch for every ten other patches reviewed by the submitter.
(I can’t find the link, sorry!) This imposes work requirements on
would-be contributors that obligate them to contribute substantively to
the project maintenance, before their pet feature gets implemented. """

Martin's offer was almost never taken up, although he expressed it many
times during many years.  I think there are two factors to it:

a) Cost.  As an occasional contributor, I could understand having to do
a review before contributing a patch of mine, but not having to do 5 or
more reviews for each patch I contribute.  The effort asked is much too
high, and you're probably discouraging people who are discovering the
project, even before they could get hooked on it.

b) Difficult.  It's much more difficult and intimidating to review
someone else's PR, than to propose your own changes knowing that it will
be reviewed by (you are assuming) competent people.  So this mechanism
is excluding first-time contributors, which is probably *not* what you want.

2)

""" Some projects have excellent incubators, like the Python Core
Mentorship Program, where people who are interested in applying their
effort to recruiting new contributors can do so. """

Actually, it doesn't seem to me that a significant proportion of
frequent Python contributors have gone through the core mentorship
process.  It probably got us a handful of one-time contributions.
Pointing to the Python core mentorship program as an "excellent
incubator" sounds rather far-fetched to me.

Generally speaking, there's a limit to the usefulness of hand-holding
contributors, especially if your project is rather complex (as Python
is), because the blocking point for contributors is *not* that the
development mailing-list is a bit intimidating (as was claimed by the
people who founded the Python core mentorship program).


PS : as a matter of fact, the general rate of contributions to Python
has been *decreasing* for years.

Regards

Antoine.

Re: Recruiting more maintainers for Apache Arrow

Posted by Wes McKinney <we...@gmail.com>.
Hi folks,

I would like to highlight that the challenges we are having are
endemic to many parts of the open source world right now. A colleague
of mine in the Python world wrote some pieces about this recently:

* http://ivory.idyll.org/blog/2018-how-open-is-too-open.html
* http://ivory.idyll.org/blog/2018-oss-framework-cpr.html

Here are some quotes from those pieces:

"This need for constant attention to projects, the sprawling ecosystem
of amazing scientific software packages, and the relatively small
community of actual maintainers, when combined, lead to the open
source sustainability problem in science: we do not have the person
power to keep it all running without heroic efforts. And when you
couple this with the lack of clear career paths for software
maintenance in science, it is clear that we cannot ethically and
sustainably recruit more people into open source maintainership."

I would say that "heroics" does describe some of the occasional
behavior of Arrow maintainers. The trouble with "heroics" (which
translates practically speaking to "overwork") is that if sustained
for a long period of time, it surely leads to burnout and depression.
I can speak from personal experience.

On a later point in this quote about "lack of clear career paths for
software maintenance", rather than griping about the problem, I
decided to do something about it. I have recently created a new
organization so that I can

a) enable organizations to directly fund Arrow maintenance and
b) provide secure full-time employment to Arrow maintainers

"Second, the cost of the constant maintenance needs (code,
documentation, installation, etc.) on the pool of available effort
needs to be taken into account. Contributions of new features that do
not come with effort applied to maintenance should be carefully
considered - is this new contributor likely to stick around? Can they
and will they devote some effort to maintenance? If not, maybe those
contributions should be deferred in favor of contributions that add
maintenance effort to the project, e.g. via partnerships."

I see both sides of this argument. I think we need to be more
proactive about requesting maintenance help from "extractive"
contributors who are mostly "taking" from the project and giving
relatively little to support the overall health of the project.

"Fourth, there are some interesting governance implications around
allowing all or most of the resource appropriators to participate in
decision making. I need to dig more into this, but, briefly, I think
projects should formally lay out what level of investment and
contribution is rewarded with what kind of operational, policy making,
and constitutional decision making authority."

Apache governance already provides a framework for obtaining decision
making authority in a project. Suffice to say, I would be hesistant to
support a new PMC member who has not engaged on project maintenance.

- Wes

On Mon, Jul 2, 2018 at 7:03 AM, Antoine Pitrou <an...@python.org> wrote:
>
> Hi Dimitri,
>
> Le 02/07/2018 à 12:46, Dimitri Vorona a écrit :
>> Hi Wes,
>>
>> to contribute an outsiders POW: while it is clear, what's expected if you'd
>> like to make a PR, it's not at all clear to me, where would I start if I
>> wanted to help with PR reviews without being heavily involved with the
>> community/being a full maintainer. Should I just grab a PR, test it,
>> comment on changes? I wouldn't be sure if I were stepping on someone's
>> feet, tbh.
>
> You don't have to manually test a PR, unless you want to be sure about
> semantics that are not part of the tests added in the PR (but then it
> would be a good idea to mention that the tests don't exercise the
> semantics enough :-)).
>
> From my point of view (generally as an open source developer and
> maintainer, this isn't specific to Arrow), reviewing is:
>
> * checking for soundness of concepts (if the PR adds any of them)
> * checking for maintainability and readability of code
> * checking for smelly coding patterns, possible sources of bugs etc.
> * depending on the context, checking for possible performance issues
> * any potential problem that your personal expertise may help you detect
>
> If you're not sure about a comment and hesitate posting it, a good
> solution is to phrase it as a question.
>
> Regards
>
> Antoine.

Re: Recruiting more maintainers for Apache Arrow

Posted by Antoine Pitrou <an...@python.org>.
Hi Dimitri,

Le 02/07/2018 à 12:46, Dimitri Vorona a écrit :
> Hi Wes,
> 
> to contribute an outsiders POW: while it is clear, what's expected if you'd
> like to make a PR, it's not at all clear to me, where would I start if I
> wanted to help with PR reviews without being heavily involved with the
> community/being a full maintainer. Should I just grab a PR, test it,
> comment on changes? I wouldn't be sure if I were stepping on someone's
> feet, tbh.

You don't have to manually test a PR, unless you want to be sure about
semantics that are not part of the tests added in the PR (but then it
would be a good idea to mention that the tests don't exercise the
semantics enough :-)).

From my point of view (generally as an open source developer and
maintainer, this isn't specific to Arrow), reviewing is:

* checking for soundness of concepts (if the PR adds any of them)
* checking for maintainability and readability of code
* checking for smelly coding patterns, possible sources of bugs etc.
* depending on the context, checking for possible performance issues
* any potential problem that your personal expertise may help you detect

If you're not sure about a comment and hesitate posting it, a good
solution is to phrase it as a question.

Regards

Antoine.

Re: Recruiting more maintainers for Apache Arrow

Posted by Dimitri Vorona <al...@googlemail.com.INVALID>.
Hi Wes,

to contribute an outsiders POW: while it is clear, what's expected if you'd
like to make a PR, it's not at all clear to me, where would I start if I
wanted to help with PR reviews without being heavily involved with the
community/being a full maintainer. Should I just grab a PR, test it,
comment on changes? I wouldn't be sure if I were stepping on someone's
feet, tbh. So, in my view it would help if:

* there were some kind of informal reviewer assignment system, i.e. I say
"I'd like to review this PR", Wes/Uwe/Antoine reply: "sure, give it a
shot". This would be mentioned prominently in the contributor guide

* afterwards there were some kind of feedback-to-feedback arrangement,
although it would increase the work load for the existing maintainers in
the short term, of course

Cheers,
Dimitri.

On Sun, Jul 1, 2018 at 1:09 AM Donald E. Foss <do...@gmail.com> wrote:

> For what it's worth, this email thread and your summary writeup, Wes, are
> a significant call to action on their own.
>
> I've been passive, not by choice, but by policy. Given the significance
> and need of this project, I'll see what I can do on my side. It will be at
> least a week given the US holiday.
>
> Donald E. Foss
>
> > On Jun 30, 2018, at 2:15 PM, Marco Neumann <ma...@crepererum.net.INVALID>
> wrote:
> >
> > Hey,
> >
> > first of all, thanks a lot for your, Uwes, the mergers and contributors
> > work. Now, to the maintainer problem:
> >
> > # Arrow as "a library"
> > One thing that makes Arrow special is that it is not a single, but many
> > libraries (one for each language) and many of them are not only a
> > binding to a C/C++ lib, but partly a complete re-implementation of the
> > protocol, e.g.:
> >
> > - C++: one core, but also contains Python specialties
> > - Java: another core
> > - Rust: yet another core
> > - Python: a binding to C++ but also a lot more stuff because of Pandas
> > ...
> >
> > And you two are maintaining all of them and I doubt that you have the
> > capacities and knowledge to do this at the desired level of quality
> > (which is natural, not a personal issue or offense). So this I would
> > call "pseudo-maintenance", since you're solely the gatekeeper that does
> > some shallow reviewing and has the burden to do the housekeeping and
> > the merging. So why accepting these language bindings in the first
> > place without bringing a core maintainer in place? For example, let's
> > say someone proposes a binding to Haskell now. That should not be
> > accepted as part of the official Apache implementation without a
> > dedicated maintainer (ideally the PR-author would be that person, but
> > there may others who step up).
> >
> > Right now, it might be too late to remove some of the incomplete / WIP
> > implementations that don't have a core maintainer though.
> >
> > # GitHub
> > Another special thing to consider is that Arrow is (ab)using GitHub as
> > a code hosting platform. Even as a contributor, this has obvious bad
> > uncool consequences:
> >
> > - you have yet another issue hosting system to log in
> > - there is yet another information channel to keep track of (this ML
> >  for example, which has a semi-informative web interface telling you
> >  can only login using Google but does not tell you how to subscribe to
> >  the list)
> > - links to issues don't work in the known magic way
> > - you're merging the PRs by closing them; which is by all means a not
> >  very nice way because it does not reflect the contributors work in
> >  the project overview and personal profiles, but exactly this is a
> >  large part of the GitHub community (btw: merging PRs without using
> >  GitHubs merge button IS possible as bors/bors-ng proof)
> >
> > So as a potential maintainer, this is already a bumper, since I know
> > that there are things less confortable then the system I would get from
> > any normal GitHub or Gitlab project.
> >
> > I'm not really sure how to solve this or if it should be solved (read
> > about the laziness aspect in "Contribution VS Maintenance" below)
> >
> > # Time / Payment
> > Yes, this is indeed a big issue. From what I can tell from the open
> > source projects I was involved in is that for large contributor crowds,
> > you normally have full/half-time positions in place for the core
> > maintainer (look at the Mozilla projects, the Blender Foundation, Gnome
> > / Red Hat). So at one point I think maintaining isn't a part time /
> > hobby thing anymore (w/o downgrading the hard work of Hobby-
> > contributors, in contrast). I don't have a link at hand, but I recall
> > some discussion about GitHub and it's importance for hiring (since it
> > it acts as a CV) after MS bought it, and some of the responses are
> > "doing all this work in your free time is a privilege of wealthy,
> > mostly-white men", which without signing this statement in this really
> > bare form already shows a problem of open source world.
> >
> > # Contribution VS Maintenance
> > The very "nice" thing about patch/PR contribution is that you do your
> > work and then you can walk away and it's the maintainers problem to
> > release the artifact, upgrade/migrate your code and ensure that the
> > tests you've written never break. It's comfortable. Being a maintainer
> > means all the opposite things. And in the end, you get blamed for not
> > supporting certain features (see the open source paragraph here https:/
> > /blog.ghost.org/5/ ) or for security disasters (remember the OpenSSL
> > disaster).
> >
> > I think together with the previous point this means, we have to get
> > companies to pay for that work, and not just dump their features to an
> > OSS repo.
> >
> > # Path to Maintainership
> > So I think (from my narrow point of view!) that many people expect that
> > the path from "outsider" to "maintainer" takes the route over "a lot of
> > patch/PR contributions". If I'm reading your mail right, that is not
> > necessarily the case for Apache projects and I think that's great. The
> > "review PRs" path sounds great, but I think GitHub or any platform I'm
> > aware don't do a good job in getting people to do so. I mean, I see a
> > PR and a can leave a review, but for me it is not really clear which
> > consequences this have (naturally, random people don't have a veto on
> > changes). So I can jump in when I think something is wrong, but I
> > cannot approve a PR. This makes sense, but it poses the question of
> > "how?!". I mean, it is pretty clear on how to become a patch/PR
> > contributor, but it is not clear on how to become a maintainer, at
> > least not in an easy way. (I'm sure it's written down somewhere).
> >
> > So, overall I think a clear Call for Action at the top of the README
> > could help. Like "Hey, we're looking for maintainers, you could start
> > by reviewing some PRs and after some reviews maintainers will just be
> > the last gatekeeper and after some more time, you can even merge PRs on
> > your own".
> >
> > # My personal contribution
> > Triggered by this call for help, I'll try to get more involved in
> > Python, C++ and Rust reviews.
> >
> > So, these are some thoughts that I hope may help.
> >
> > Thanks again for addressing this issue and your time and passion,
> > Marco
> >
> >> On 2018/06/30 14:57:42, Wes McKinney <w....@gmail.com> wrote:
> >> hi folks,>
> >>
> >> Arrow has grown by leaps and bounds over the last 2.5 years. We are>
> >> approaching our 2000th patch and on track to surpass 200 unique>
> >> contributors by year end.>
> >>
> >> All this contribution growth is great, but it has a hidden cost:
> >
> > the>
> >> maintenance. The burden of maintaining the project: particularly>
> >> reviewing and merging patches, has fallen on a very small number of>
> >> people. From the commit logs, we can see how many patches each>
> >> committer has merged:>
> >>
> >> $ git shortlog -csn
> >
> > d5aa7c46692474376a3c31704cfc4783c86338f2..master>
> >>  1289  Wes McKinney>
> >>   268  Uwe L. Korn>
> >>    74  Korn, Uwe>
> >>    54  Antoine Pitrou>
> >>    52  Julien Le Dem>
> >>    39  Philipp Moritz>
> >>    18  Kouhei Sutou>
> >>    18  Steven Phillips>
> >>    13  Bryan Cutler>
> >>    11  Jacques Nadeau>
> >>    10  Phillip Cloud>
> >>     8  Brian Hulette>
> >>     5  Robert Nishihara>
> >>     5  adeneche>
> >>     4  GitHub>
> >>     3  Sidd>
> >>     3  siddharth>
> >>     1  AbdelHakim Deneche>
> >>     1  Your Name Here>
> >>
> >> So Uwe and I have merged ~84% of the patches in the project so far.>
> >> This isn't a completely accurate reflection of the maintainer
> >
> > burden,>
> >> since many others contribute to code reviews and other aspects of>
> >> patch maintenance, and you have to be a committer to earn a place
> >
> > on>
> >> this list.>
> >>
> >> I'm not sure what's the best way to address this problem. The
> >
> > quality>
> >> of our code review has declined at times as we struggle to keep up>
> >> with the flow of patches -- I don't think this is good. Having the>
> >> patch queue pile up isn't great either. Personally, I'm having a>
> >> difficult time balancing project maintenance and patch authoring,>
> >> particularly in the last 6 months.>
> >>
> >> Unfortunately, many people believe that writing patches is the
> >
> > primary>
> >> mode of contribution to an open source project. Apache projects>
> >> explicitly state that non-patch contributions are valued in earning>
> >> karma (committership and PMC membership). We're starting to have
> >
> > more>
> >> corporate contributors come out of the woodwork, and while it's
> >
> > great>
> >> for contributors to be paid to write patches for the project, they
> >
> > are>
> >> rarely given the time and space to contribute meaningfully to>
> >> maintenance.>
> >>
> >> Any thoughts about how we can grow the maintainership? Somehow we
> >
> > need>
> >> to reach ~5-6 core maintainers over the next year.>
> >>
> >> Thanks,>
> >> Wes>
>

Re: Recruiting more maintainers for Apache Arrow

Posted by "Donald E. Foss" <do...@gmail.com>.
For what it's worth, this email thread and your summary writeup, Wes, are a significant call to action on their own. 

I've been passive, not by choice, but by policy. Given the significance and need of this project, I'll see what I can do on my side. It will be at least a week given the US holiday. 

Donald E. Foss

> On Jun 30, 2018, at 2:15 PM, Marco Neumann <ma...@crepererum.net.INVALID> wrote:
> 
> Hey,
> 
> first of all, thanks a lot for your, Uwes, the mergers and contributors
> work. Now, to the maintainer problem:
> 
> # Arrow as "a library"
> One thing that makes Arrow special is that it is not a single, but many
> libraries (one for each language) and many of them are not only a
> binding to a C/C++ lib, but partly a complete re-implementation of the
> protocol, e.g.:
> 
> - C++: one core, but also contains Python specialties
> - Java: another core
> - Rust: yet another core
> - Python: a binding to C++ but also a lot more stuff because of Pandas
> ...
> 
> And you two are maintaining all of them and I doubt that you have the
> capacities and knowledge to do this at the desired level of quality
> (which is natural, not a personal issue or offense). So this I would
> call "pseudo-maintenance", since you're solely the gatekeeper that does
> some shallow reviewing and has the burden to do the housekeeping and
> the merging. So why accepting these language bindings in the first
> place without bringing a core maintainer in place? For example, let's
> say someone proposes a binding to Haskell now. That should not be
> accepted as part of the official Apache implementation without a
> dedicated maintainer (ideally the PR-author would be that person, but
> there may others who step up).
> 
> Right now, it might be too late to remove some of the incomplete / WIP
> implementations that don't have a core maintainer though.
> 
> # GitHub
> Another special thing to consider is that Arrow is (ab)using GitHub as
> a code hosting platform. Even as a contributor, this has obvious bad
> uncool consequences:
> 
> - you have yet another issue hosting system to log in
> - there is yet another information channel to keep track of (this ML
>  for example, which has a semi-informative web interface telling you
>  can only login using Google but does not tell you how to subscribe to
>  the list)
> - links to issues don't work in the known magic way
> - you're merging the PRs by closing them; which is by all means a not
>  very nice way because it does not reflect the contributors work in
>  the project overview and personal profiles, but exactly this is a
>  large part of the GitHub community (btw: merging PRs without using
>  GitHubs merge button IS possible as bors/bors-ng proof)
> 
> So as a potential maintainer, this is already a bumper, since I know
> that there are things less confortable then the system I would get from
> any normal GitHub or Gitlab project.
> 
> I'm not really sure how to solve this or if it should be solved (read
> about the laziness aspect in "Contribution VS Maintenance" below)
> 
> # Time / Payment
> Yes, this is indeed a big issue. From what I can tell from the open
> source projects I was involved in is that for large contributor crowds,
> you normally have full/half-time positions in place for the core
> maintainer (look at the Mozilla projects, the Blender Foundation, Gnome
> / Red Hat). So at one point I think maintaining isn't a part time /
> hobby thing anymore (w/o downgrading the hard work of Hobby-
> contributors, in contrast). I don't have a link at hand, but I recall
> some discussion about GitHub and it's importance for hiring (since it
> it acts as a CV) after MS bought it, and some of the responses are
> "doing all this work in your free time is a privilege of wealthy,
> mostly-white men", which without signing this statement in this really
> bare form already shows a problem of open source world.
> 
> # Contribution VS Maintenance
> The very "nice" thing about patch/PR contribution is that you do your
> work and then you can walk away and it's the maintainers problem to
> release the artifact, upgrade/migrate your code and ensure that the
> tests you've written never break. It's comfortable. Being a maintainer
> means all the opposite things. And in the end, you get blamed for not
> supporting certain features (see the open source paragraph here https:/
> /blog.ghost.org/5/ ) or for security disasters (remember the OpenSSL
> disaster).
> 
> I think together with the previous point this means, we have to get
> companies to pay for that work, and not just dump their features to an
> OSS repo.
> 
> # Path to Maintainership
> So I think (from my narrow point of view!) that many people expect that
> the path from "outsider" to "maintainer" takes the route over "a lot of
> patch/PR contributions". If I'm reading your mail right, that is not
> necessarily the case for Apache projects and I think that's great. The
> "review PRs" path sounds great, but I think GitHub or any platform I'm
> aware don't do a good job in getting people to do so. I mean, I see a
> PR and a can leave a review, but for me it is not really clear which
> consequences this have (naturally, random people don't have a veto on
> changes). So I can jump in when I think something is wrong, but I
> cannot approve a PR. This makes sense, but it poses the question of
> "how?!". I mean, it is pretty clear on how to become a patch/PR
> contributor, but it is not clear on how to become a maintainer, at
> least not in an easy way. (I'm sure it's written down somewhere).
> 
> So, overall I think a clear Call for Action at the top of the README
> could help. Like "Hey, we're looking for maintainers, you could start
> by reviewing some PRs and after some reviews maintainers will just be
> the last gatekeeper and after some more time, you can even merge PRs on
> your own".
> 
> # My personal contribution
> Triggered by this call for help, I'll try to get more involved in
> Python, C++ and Rust reviews.
> 
> So, these are some thoughts that I hope may help.
> 
> Thanks again for addressing this issue and your time and passion,
> Marco
> 
>> On 2018/06/30 14:57:42, Wes McKinney <w....@gmail.com> wrote: 
>> hi folks,> 
>> 
>> Arrow has grown by leaps and bounds over the last 2.5 years. We are> 
>> approaching our 2000th patch and on track to surpass 200 unique> 
>> contributors by year end.> 
>> 
>> All this contribution growth is great, but it has a hidden cost:
> 
> the> 
>> maintenance. The burden of maintaining the project: particularly> 
>> reviewing and merging patches, has fallen on a very small number of> 
>> people. From the commit logs, we can see how many patches each> 
>> committer has merged:> 
>> 
>> $ git shortlog -csn
> 
> d5aa7c46692474376a3c31704cfc4783c86338f2..master> 
>>  1289  Wes McKinney> 
>>   268  Uwe L. Korn> 
>>    74  Korn, Uwe> 
>>    54  Antoine Pitrou> 
>>    52  Julien Le Dem> 
>>    39  Philipp Moritz> 
>>    18  Kouhei Sutou> 
>>    18  Steven Phillips> 
>>    13  Bryan Cutler> 
>>    11  Jacques Nadeau> 
>>    10  Phillip Cloud> 
>>     8  Brian Hulette> 
>>     5  Robert Nishihara> 
>>     5  adeneche> 
>>     4  GitHub> 
>>     3  Sidd> 
>>     3  siddharth> 
>>     1  AbdelHakim Deneche> 
>>     1  Your Name Here> 
>> 
>> So Uwe and I have merged ~84% of the patches in the project so far.> 
>> This isn't a completely accurate reflection of the maintainer
> 
> burden,> 
>> since many others contribute to code reviews and other aspects of> 
>> patch maintenance, and you have to be a committer to earn a place
> 
> on> 
>> this list.> 
>> 
>> I'm not sure what's the best way to address this problem. The
> 
> quality> 
>> of our code review has declined at times as we struggle to keep up> 
>> with the flow of patches -- I don't think this is good. Having the> 
>> patch queue pile up isn't great either. Personally, I'm having a> 
>> difficult time balancing project maintenance and patch authoring,> 
>> particularly in the last 6 months.> 
>> 
>> Unfortunately, many people believe that writing patches is the
> 
> primary> 
>> mode of contribution to an open source project. Apache projects> 
>> explicitly state that non-patch contributions are valued in earning> 
>> karma (committership and PMC membership). We're starting to have
> 
> more> 
>> corporate contributors come out of the woodwork, and while it's
> 
> great> 
>> for contributors to be paid to write patches for the project, they
> 
> are> 
>> rarely given the time and space to contribute meaningfully to> 
>> maintenance.> 
>> 
>> Any thoughts about how we can grow the maintainership? Somehow we
> 
> need> 
>> to reach ~5-6 core maintainers over the next year.> 
>> 
>> Thanks,> 
>> Wes> 

Re: Recruiting more maintainers for Apache Arrow

Posted by Marco Neumann <ma...@crepererum.net.INVALID>.
Hey,

first of all, thanks a lot for your, Uwes, the mergers and contributors
work. Now, to the maintainer problem:

# Arrow as "a library"
One thing that makes Arrow special is that it is not a single, but many
libraries (one for each language) and many of them are not only a
binding to a C/C++ lib, but partly a complete re-implementation of the
protocol, e.g.:

- C++: one core, but also contains Python specialties
- Java: another core
- Rust: yet another core
- Python: a binding to C++ but also a lot more stuff because of Pandas
...

And you two are maintaining all of them and I doubt that you have the
capacities and knowledge to do this at the desired level of quality
(which is natural, not a personal issue or offense). So this I would
call "pseudo-maintenance", since you're solely the gatekeeper that does
some shallow reviewing and has the burden to do the housekeeping and
the merging. So why accepting these language bindings in the first
place without bringing a core maintainer in place? For example, let's
say someone proposes a binding to Haskell now. That should not be
accepted as part of the official Apache implementation without a
dedicated maintainer (ideally the PR-author would be that person, but
there may others who step up).

Right now, it might be too late to remove some of the incomplete / WIP
implementations that don't have a core maintainer though.

# GitHub
Another special thing to consider is that Arrow is (ab)using GitHub as
a code hosting platform. Even as a contributor, this has obvious bad
uncool consequences:

- you have yet another issue hosting system to log in
- there is yet another information channel to keep track of (this ML
  for example, which has a semi-informative web interface telling you
  can only login using Google but does not tell you how to subscribe to
  the list)
- links to issues don't work in the known magic way
- you're merging the PRs by closing them; which is by all means a not
  very nice way because it does not reflect the contributors work in
  the project overview and personal profiles, but exactly this is a
  large part of the GitHub community (btw: merging PRs without using
  GitHubs merge button IS possible as bors/bors-ng proof)

So as a potential maintainer, this is already a bumper, since I know
that there are things less confortable then the system I would get from
any normal GitHub or Gitlab project.

I'm not really sure how to solve this or if it should be solved (read
about the laziness aspect in "Contribution VS Maintenance" below)

# Time / Payment
Yes, this is indeed a big issue. From what I can tell from the open
source projects I was involved in is that for large contributor crowds,
you normally have full/half-time positions in place for the core
maintainer (look at the Mozilla projects, the Blender Foundation, Gnome
/ Red Hat). So at one point I think maintaining isn't a part time /
hobby thing anymore (w/o downgrading the hard work of Hobby-
contributors, in contrast). I don't have a link at hand, but I recall
some discussion about GitHub and it's importance for hiring (since it
it acts as a CV) after MS bought it, and some of the responses are
"doing all this work in your free time is a privilege of wealthy,
mostly-white men", which without signing this statement in this really
bare form already shows a problem of open source world.

# Contribution VS Maintenance
The very "nice" thing about patch/PR contribution is that you do your
work and then you can walk away and it's the maintainers problem to
release the artifact, upgrade/migrate your code and ensure that the
tests you've written never break. It's comfortable. Being a maintainer
means all the opposite things. And in the end, you get blamed for not
supporting certain features (see the open source paragraph here https:/
/blog.ghost.org/5/ ) or for security disasters (remember the OpenSSL
disaster).

I think together with the previous point this means, we have to get
companies to pay for that work, and not just dump their features to an
OSS repo.

# Path to Maintainership
So I think (from my narrow point of view!) that many people expect that
the path from "outsider" to "maintainer" takes the route over "a lot of
patch/PR contributions". If I'm reading your mail right, that is not
necessarily the case for Apache projects and I think that's great. The
"review PRs" path sounds great, but I think GitHub or any platform I'm
aware don't do a good job in getting people to do so. I mean, I see a
PR and a can leave a review, but for me it is not really clear which
consequences this have (naturally, random people don't have a veto on
changes). So I can jump in when I think something is wrong, but I
cannot approve a PR. This makes sense, but it poses the question of
"how?!". I mean, it is pretty clear on how to become a patch/PR
contributor, but it is not clear on how to become a maintainer, at
least not in an easy way. (I'm sure it's written down somewhere).

So, overall I think a clear Call for Action at the top of the README
could help. Like "Hey, we're looking for maintainers, you could start
by reviewing some PRs and after some reviews maintainers will just be
the last gatekeeper and after some more time, you can even merge PRs on
your own".

# My personal contribution
Triggered by this call for help, I'll try to get more involved in
Python, C++ and Rust reviews.

So, these are some thoughts that I hope may help.

Thanks again for addressing this issue and your time and passion,
Marco

On 2018/06/30 14:57:42, Wes McKinney <w....@gmail.com> wrote: 
> hi folks,> 
> 
> Arrow has grown by leaps and bounds over the last 2.5 years. We are> 
> approaching our 2000th patch and on track to surpass 200 unique> 
> contributors by year end.> 
> 
> All this contribution growth is great, but it has a hidden cost:

the> 
> maintenance. The burden of maintaining the project: particularly> 
> reviewing and merging patches, has fallen on a very small number of> 
> people. From the commit logs, we can see how many patches each> 
> committer has merged:> 
> 
> $ git shortlog -csn

d5aa7c46692474376a3c31704cfc4783c86338f2..master> 
>   1289  Wes McKinney> 
>    268  Uwe L. Korn> 
>     74  Korn, Uwe> 
>     54  Antoine Pitrou> 
>     52  Julien Le Dem> 
>     39  Philipp Moritz> 
>     18  Kouhei Sutou> 
>     18  Steven Phillips> 
>     13  Bryan Cutler> 
>     11  Jacques Nadeau> 
>     10  Phillip Cloud> 
>      8  Brian Hulette> 
>      5  Robert Nishihara> 
>      5  adeneche> 
>      4  GitHub> 
>      3  Sidd> 
>      3  siddharth> 
>      1  AbdelHakim Deneche> 
>      1  Your Name Here> 
> 
> So Uwe and I have merged ~84% of the patches in the project so far.> 
> This isn't a completely accurate reflection of the maintainer

burden,> 
> since many others contribute to code reviews and other aspects of> 
> patch maintenance, and you have to be a committer to earn a place

on> 
> this list.> 
> 
> I'm not sure what's the best way to address this problem. The

quality> 
> of our code review has declined at times as we struggle to keep up> 
> with the flow of patches -- I don't think this is good. Having the> 
> patch queue pile up isn't great either. Personally, I'm having a> 
> difficult time balancing project maintenance and patch authoring,> 
> particularly in the last 6 months.> 
> 
> Unfortunately, many people believe that writing patches is the

primary> 
> mode of contribution to an open source project. Apache projects> 
> explicitly state that non-patch contributions are valued in earning> 
> karma (committership and PMC membership). We're starting to have

more> 
> corporate contributors come out of the woodwork, and while it's

great> 
> for contributors to be paid to write patches for the project, they

are> 
> rarely given the time and space to contribute meaningfully to> 
> maintenance.> 
> 
> Any thoughts about how we can grow the maintainership? Somehow we

need> 
> to reach ~5-6 core maintainers over the next year.> 
> 
> Thanks,> 
> Wes> 

Re: Recruiting more maintainers for Apache Arrow

Posted by Marco Neumann <ma...@crepererum.net.INVALID>.
Hey,

first of all, thanks a lot for your, Uwes, the mergers and contributors
work. Now, to the maintainer problem:

# Arrow as "a library"
One thing that makes Arrow special is that it is not a single, but many
libraries (one for each language) and many of them are not only a
binding to a C/C++ lib, but partly a complete re-implementation of the
protocol, e.g.:

- C++: one core, but also contains Python specialties
- Java: another core
- Rust: yet another core
- Python: a binding to C++ but also a lot more stuff because of Pandas
...

And you two are maintaining all of them and I doubt that you have the
capacities and knowledge to do this at the desired level of quality
(which is natural, not a personal issue or offense). So this I would
call "pseudo-maintenance", since you're solely the gatekeeper that does
some shallow reviewing and has the burden to do the housekeeping and
the merging. So why accepting these language bindings in the first
place without bringing a core maintainer in place? For example, let's
say someone proposes a binding to Haskell now. That should not be
accepted as part of the official Apache implementation without a
dedicated maintainer (ideally the PR-author would be that person, but
there may others who step up).

Right now, it might be too late to remove some of the incomplete / WIP
implementations that don't have a core maintainer though.

# GitHub
Another special thing to consider is that Arrow is (ab)using GitHub as
a code hosting platform. Even as a contributor, this has obvious bad
uncool consequences:

- you have yet another issue hosting system to log in
- links to issues don't work in the known magic way
- you're merging the PRs by closing them; which is by all means a not
  very nice way because it does not reflect the contributors work in
  the project overview and personal profiles, but exactly this is a
  large part of the GitHub community (btw: merging PRs without using
  GitHubs merge button IS possible as bors/bors-ng proof)

So as a potential maintainer, this is already a bumper, since I know
that there are things less confortable then the system I would get from
any normal GitHub or Gitlab project.

I'm not really sure how to solve this or if it should be solved (read
about the laziness aspect in "Contribution VS Maintenance" below)

# Time / Payment
Yes, this is indeed a big issue. From what I can tell from the open
source projects I was involved in is that for large contributor crowds,
you normally have full/half-time positions in place for the core
maintainer (look at the Mozilla projects, the Blender Foundation, Gnome
/ Red Hat). So at one point I think maintaining isn't a part time /
hobby thing anymore (w/o downgrading the hard work of Hobby-
contributors, in contrast). I don't have a link at hand, but I recall
some discussion about GitHub and it's importance for hiring (since it
it acts as a CV) after MS bought it, and some of the responses are
"doing all this work in your free time is a privilege of wealthy,
mostly-white men", which without signing this statement in this really
bare form already shows a problem of open source world.

# Contribution VS Maintenance
The very "nice" thing about patch/PR contribution is that you do your
work and then you can walk away and it's the maintainers problem to
release the artifact, upgrade/migrate your code and ensure that the
tests you've written never break. It's comfortable. Being a maintainer
means all the opposite things. And in the end, you get blamed for not
supporting certain features (see the open source paragraph here https:/
/blog.ghost.org/5/ ) or for security disasters (remember the OpenSSL
disaster).

I think together with the previous point this means, we have to get
companies to pay for that work, and not just dump their features to an
OSS repo.

# Path to Maintainership
So I think (from my narrow point of view!) that many people expect that
the path from "outsider" to "maintainer" takes the route over "a lot of
patch/PR contributions". If I'm reading your mail right, that is not
necessarily the case for Apache projects and I think that's great. The
"review PRs" path sounds great, but I think GitHub or any platform I'm
aware don't do a good job in getting people to do so. I mean, I see a
PR and a can leave a review, but for me it is not really clear which
consequences this have (naturally, random people don't have a veto on
changes). So I can jump in when I think something is wrong, but I
cannot approve a PR. This makes sense, but it poses the question of
"how?!". I mean, it is pretty clear on how to become a patch/PR
contributor, but it is not clear on how to become a maintainer, at
least not in an easy way. (I'm sure it's written down somewhere).

So, overall I think a clear Call for Action at the top of the README
could help. Like "Hey, we're looking for maintainers, you could start
by reviewing some PRs and after some reviews maintainers will just be
the last gatekeeper and after some more time, you can even merge PRs on
your own".

# My personal contribution
Triggered by this call for help, I'll try to get more involved in
Python, C++ and Rust reviews.

So, these are some thoughts that I hope may help.

Thanks again for addressing this issue and your time and passion,
Marco

On 2018/06/30 14:57:42, Wes McKinney <w....@gmail.com> wrote: 
> hi folks,> 
> 
> Arrow has grown by leaps and bounds over the last 2.5 years. We are> 
> approaching our 2000th patch and on track to surpass 200 unique> 
> contributors by year end.> 
> 
> All this contribution growth is great, but it has a hidden cost:
the> 
> maintenance. The burden of maintaining the project: particularly> 
> reviewing and merging patches, has fallen on a very small number of> 
> people. From the commit logs, we can see how many patches each> 
> committer has merged:> 
> 
> $ git shortlog -csn
d5aa7c46692474376a3c31704cfc4783c86338f2..master> 
>   1289  Wes McKinney> 
>    268  Uwe L. Korn> 
>     74  Korn, Uwe> 
>     54  Antoine Pitrou> 
>     52  Julien Le Dem> 
>     39  Philipp Moritz> 
>     18  Kouhei Sutou> 
>     18  Steven Phillips> 
>     13  Bryan Cutler> 
>     11  Jacques Nadeau> 
>     10  Phillip Cloud> 
>      8  Brian Hulette> 
>      5  Robert Nishihara> 
>      5  adeneche> 
>      4  GitHub> 
>      3  Sidd> 
>      3  siddharth> 
>      1  AbdelHakim Deneche> 
>      1  Your Name Here> 
> 
> So Uwe and I have merged ~84% of the patches in the project so far.> 
> This isn't a completely accurate reflection of the maintainer
burden,> 
> since many others contribute to code reviews and other aspects of> 
> patch maintenance, and you have to be a committer to earn a place
on> 
> this list.> 
> 
> I'm not sure what's the best way to address this problem. The
quality> 
> of our code review has declined at times as we struggle to keep up> 
> with the flow of patches -- I don't think this is good. Having the> 
> patch queue pile up isn't great either. Personally, I'm having a> 
> difficult time balancing project maintenance and patch authoring,> 
> particularly in the last 6 months.> 
> 
> Unfortunately, many people believe that writing patches is the
primary> 
> mode of contribution to an open source project. Apache projects> 
> explicitly state that non-patch contributions are valued in earning> 
> karma (committership and PMC membership). We're starting to have
more> 
> corporate contributors come out of the woodwork, and while it's
great> 
> for contributors to be paid to write patches for the project, they
are> 
> rarely given the time and space to contribute meaningfully to> 
> maintenance.> 
> 
> Any thoughts about how we can grow the maintainership? Somehow we
need> 
> to reach ~5-6 core maintainers over the next year.> 
> 
> Thanks,> 
> Wes> 
>