You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by Wes McKinney <we...@gmail.com> on 2018/10/15 14:13:45 UTC

A renewed plea for help [was Re: Recruiting more maintainers for Apache Arrow]

hi folks,

It's been a few months, but as Apache Arrow is rapidly becoming a
critical dependency of next-generation data applications (see, for
example, RAPIDS just launched by NVIDIA http://rapids.ai/), we are
quite seriously in need of more project maintainers, or in lieu of new
individual contributors, additional direct funding. We are especially
in need of corporations dependent on this software to help carry the
load of JIRA gardening, code review, build and CI tooling, packaging
automation, developer workflow tools, and so on.

One of the casualties of the growing maintenance burden of this
project is that it's increasingly difficult for people like me who
know the project internals very well to allocate time to working on
new functionality. When I talk to people about the project they often
ask me things like "When will X, Y, or Z functionality be ready?" and
my answer is often "I don't know, it depends on whether more people
show up to help with the maintenance workload so people can spend more
time building new things". This is coupled with the frustration that
newcomers can experience where the learning curve is very steep to be
able to contribute significantly to new functionality. The only way
out is to recruit more people to help keep things orderly, take out
the proverbial garbage, and keep the project healthy.

If anyone reading has the bandwidth to help with maintaining the
project, or to contribute funds to support maintenance, please let us
know.

Special thanks to Antoine, Kou, Kristztian, Phillip, and Uwe for their
work on tooling, packaging, and other development processes for the
0.10 and 0.11 releases.

Thanks,
Wes

On Mon, Jul 2, 2018 at 11:40 AM Antoine Pitrou <an...@python.org> wrote:
>
>
> Hi,
>
> Le 02/07/2018 à 15:58, Wes McKinney a écrit :
> > * http://ivory.idyll.org/blog/2018-how-open-is-too-open.html
> > * http://ivory.idyll.org/blog/2018-oss-framework-cpr.html
>
> Very good articles, but I would stress that some of the mechanisms
> proposed lack metrics in their favour.  Two particular examples that I
> know about:
>
> 1)
>
> """ I seem to recall Martin van Loewis offering to review one externally
> contributed patch for every ten other patches reviewed by the submitter.
> (I can’t find the link, sorry!) This imposes work requirements on
> would-be contributors that obligate them to contribute substantively to
> the project maintenance, before their pet feature gets implemented. """
>
> Martin's offer was almost never taken up, although he expressed it many
> times during many years.  I think there are two factors to it:
>
> a) Cost.  As an occasional contributor, I could understand having to do
> a review before contributing a patch of mine, but not having to do 5 or
> more reviews for each patch I contribute.  The effort asked is much too
> high, and you're probably discouraging people who are discovering the
> project, even before they could get hooked on it.
>
> b) Difficult.  It's much more difficult and intimidating to review
> someone else's PR, than to propose your own changes knowing that it will
> be reviewed by (you are assuming) competent people.  So this mechanism
> is excluding first-time contributors, which is probably *not* what you want.
>
> 2)
>
> """ Some projects have excellent incubators, like the Python Core
> Mentorship Program, where people who are interested in applying their
> effort to recruiting new contributors can do so. """
>
> Actually, it doesn't seem to me that a significant proportion of
> frequent Python contributors have gone through the core mentorship
> process.  It probably got us a handful of one-time contributions.
> Pointing to the Python core mentorship program as an "excellent
> incubator" sounds rather far-fetched to me.
>
> Generally speaking, there's a limit to the usefulness of hand-holding
> contributors, especially if your project is rather complex (as Python
> is), because the blocking point for contributors is *not* that the
> development mailing-list is a bit intimidating (as was claimed by the
> people who founded the Python core mentorship program).
>
>
> PS : as a matter of fact, the general rate of contributions to Python
> has been *decreasing* for years.
>
> Regards
>
> Antoine.

Re: A renewed plea for help [was Re: Recruiting more maintainers for Apache Arrow]

Posted by Micah Kornfield <em...@gmail.com>.
Hi Wes,
I can take a stab at going through the older C++/Python PRs as a first pass
triage (I also appreciate that I'm just getting back to project, so if you
prefer I hold off on this I understand).

Is there a good mechanism to have a committer/PMC member look at PRs after
a first pass by a contributor?  (e.g. round-robin or by area of focus)

Thanks,
Micah

On Tue, Jan 22, 2019 at 2:58 PM Wes McKinney <we...@gmail.com> wrote:

> hi folks,
>
> It's been 3 months since I sent this e-mail so I thought I would
> follow up about where things stand. The project continues to grow very
> fast and so there is a ton of pull request and JIRA gardening to do.
> For the 0.12.0 release, my personal burden of patch-merging stood at
> about 50%.
>
> http://arrow.apache.org/release/0.12.0.html#patch-committers
>
> We currently have 93 pull requests open. There are some stale PRs but
> for the most part these are good / mostly-non-stale PRs in occasional
> need of rebasing or following up with contributors about making
> changes.
>
> There were 1540 patches merged into the project in 2018 (excluding the
> Parquet merge) -- that's more than 4 patches per day. Evidence
> suggests that the overall patch count for 2019 will be even higher; if
> I had to guess somewhere well over 2000. Out of last year's patches, I
> merged 1028, i.e. 2 out of every 3. If we are to be able to take on
> 2000 or more patches this year, we'll need more help. If you are
> neither a committer nor a PMC member, you can still help with code
> review and discussions to help contributors get their work into
> merge-ready state.
>
> While I would like to the share of patch maintenance more distributed,
> I'll do what I have to in order to keep the patches flowing as fast as
> possible into master, but contributors and other maintainers can help
> with the Always Be Closing mindset -- the 80/20 rule or 90/10 rule
> frequently applies. In many cases it is better to merge a patch and
> open up a JIRA for follow up improvements if there is uncertainty
> about whether something is "done". As they say "Done [Merged] is
> better than Perfect" (as long as the build isn't broken)
>
> Additionally, please be proactive about opening JIRA issues. Out of
> the last 1000 issues created (ARROW-4326 to ARROW-3327), 501 of them
> were created by just 5 people (Wes, Antoine, Uwe, Krisztian, Kou). In
> accordance with the "Always Be Closing" mindset, I frequently create
> issues as a way of closing a discussion where there is no urgent next
> step, but some work needs to be done in the future. We need to capture
> information and file it away as efficiently as possible so we can move
> on to other work.
>
> Thank you,
> Wes
>
> On Mon, Oct 15, 2018 at 9:13 AM Wes McKinney <we...@gmail.com> wrote:
> >
> > hi folks,
> >
> > It's been a few months, but as Apache Arrow is rapidly becoming a
> > critical dependency of next-generation data applications (see, for
> > example, RAPIDS just launched by NVIDIA http://rapids.ai/), we are
> > quite seriously in need of more project maintainers, or in lieu of new
> > individual contributors, additional direct funding. We are especially
> > in need of corporations dependent on this software to help carry the
> > load of JIRA gardening, code review, build and CI tooling, packaging
> > automation, developer workflow tools, and so on.
> >
> > One of the casualties of the growing maintenance burden of this
> > project is that it's increasingly difficult for people like me who
> > know the project internals very well to allocate time to working on
> > new functionality. When I talk to people about the project they often
> > ask me things like "When will X, Y, or Z functionality be ready?" and
> > my answer is often "I don't know, it depends on whether more people
> > show up to help with the maintenance workload so people can spend more
> > time building new things". This is coupled with the frustration that
> > newcomers can experience where the learning curve is very steep to be
> > able to contribute significantly to new functionality. The only way
> > out is to recruit more people to help keep things orderly, take out
> > the proverbial garbage, and keep the project healthy.
> >
> > If anyone reading has the bandwidth to help with maintaining the
> > project, or to contribute funds to support maintenance, please let us
> > know.
> >
> > Special thanks to Antoine, Kou, Kristztian, Phillip, and Uwe for their
> > work on tooling, packaging, and other development processes for the
> > 0.10 and 0.11 releases.
> >
> > Thanks,
> > Wes
> >
> > On Mon, Jul 2, 2018 at 11:40 AM Antoine Pitrou <an...@python.org>
> wrote:
> > >
> > >
> > > Hi,
> > >
> > > Le 02/07/2018 à 15:58, Wes McKinney a écrit :
> > > > * http://ivory.idyll.org/blog/2018-how-open-is-too-open.html
> > > > * http://ivory.idyll.org/blog/2018-oss-framework-cpr.html
> > >
> > > Very good articles, but I would stress that some of the mechanisms
> > > proposed lack metrics in their favour.  Two particular examples that I
> > > know about:
> > >
> > > 1)
> > >
> > > """ I seem to recall Martin van Loewis offering to review one
> externally
> > > contributed patch for every ten other patches reviewed by the
> submitter.
> > > (I can’t find the link, sorry!) This imposes work requirements on
> > > would-be contributors that obligate them to contribute substantively to
> > > the project maintenance, before their pet feature gets implemented. """
> > >
> > > Martin's offer was almost never taken up, although he expressed it many
> > > times during many years.  I think there are two factors to it:
> > >
> > > a) Cost.  As an occasional contributor, I could understand having to do
> > > a review before contributing a patch of mine, but not having to do 5 or
> > > more reviews for each patch I contribute.  The effort asked is much too
> > > high, and you're probably discouraging people who are discovering the
> > > project, even before they could get hooked on it.
> > >
> > > b) Difficult.  It's much more difficult and intimidating to review
> > > someone else's PR, than to propose your own changes knowing that it
> will
> > > be reviewed by (you are assuming) competent people.  So this mechanism
> > > is excluding first-time contributors, which is probably *not* what you
> want.
> > >
> > > 2)
> > >
> > > """ Some projects have excellent incubators, like the Python Core
> > > Mentorship Program, where people who are interested in applying their
> > > effort to recruiting new contributors can do so. """
> > >
> > > Actually, it doesn't seem to me that a significant proportion of
> > > frequent Python contributors have gone through the core mentorship
> > > process.  It probably got us a handful of one-time contributions.
> > > Pointing to the Python core mentorship program as an "excellent
> > > incubator" sounds rather far-fetched to me.
> > >
> > > Generally speaking, there's a limit to the usefulness of hand-holding
> > > contributors, especially if your project is rather complex (as Python
> > > is), because the blocking point for contributors is *not* that the
> > > development mailing-list is a bit intimidating (as was claimed by the
> > > people who founded the Python core mentorship program).
> > >
> > >
> > > PS : as a matter of fact, the general rate of contributions to Python
> > > has been *decreasing* for years.
> > >
> > > Regards
> > >
> > > Antoine.
>

Re: A renewed plea for help [was Re: Recruiting more maintainers for Apache Arrow]

Posted by Wes McKinney <we...@gmail.com>.
hi Antoine,

On Wed, Jan 23, 2019 at 4:35 AM Antoine Pitrou <so...@pitrou.net> wrote:
>
> On Tue, 22 Jan 2019 16:57:42 -0600
> Wes McKinney <we...@gmail.com> wrote:
> >
> > There were 1540 patches merged into the project in 2018 (excluding the
> > Parquet merge) -- that's more than 4 patches per day. Evidence
> > suggests that the overall patch count for 2019 will be even higher; if
> > I had to guess somewhere well over 2000. Out of last year's patches, I
> > merged 1028, i.e. 2 out of every 3. If we are to be able to take on
> > 2000 or more patches this year, we'll need more help. If you are
> > neither a committer nor a PMC member, you can still help with code
> > review and discussions to help contributors get their work into
> > merge-ready state.
>
> I generally try to review as many PRs as I feel competent to.
>
> What should be the guideline when some PRs for other implementations
> (such as C#, Java...) are lingering on?

I have generally taken the approach of merging patches when the builds
are passing and there has been some code review. In the case of C# as
an example, we don't have consistent reviewers so I will generally
glance through the code (5 minutes or less) to make sure I see nothing
terribly concerning, or to catch other problems like accidental
changes to other files in rebase conflicts, or binary files
accidentally checked in.

It is also helpful to ping people about stale PRs to keep them engaged.

>
> > I'll do what I have to in order to keep the patches flowing as fast as
> > possible into master, but contributors and other maintainers can help
> > with the Always Be Closing mindset -- the 80/20 rule or 90/10 rule
> > frequently applies. In many cases it is better to merge a patch and
> > open up a JIRA for follow up improvements if there is uncertainty
> > about whether something is "done".
>
> I'm quite wary of technical debt (which can quickly plague fast-growing
> projects) so I tend to be a bit demanding in my reviews :-)

I'm also wary of technical debt -- I am definitely not suggesting to
merge patches that you are not comfortable with! =)

I have noticed that sometimes patches may get left in a broken state
while also falling short of addressing all of the review comments. I
would prefer to see a 90% finished patch with a passing build than any
kind of broken build. Whether or not that last 10% needs to get done
in that patch or in a follow up patch depends.

I frequently will step in to "carry" patches when there are small
fixes necessary to get a passing build so something can be merged.
What "carry" means can depend a lot; e.g. rebasing or fixing lint
errors is common. Ideally contributors will take responsibility for
getting a patch into a merge-ready state in a timely fashion, but not
always.

- Wes

>
> Regards
>
> Antoine.
>
>

Re: A renewed plea for help [was Re: Recruiting more maintainers for Apache Arrow]

Posted by Antoine Pitrou <so...@pitrou.net>.
On Tue, 22 Jan 2019 16:57:42 -0600
Wes McKinney <we...@gmail.com> wrote:
> 
> There were 1540 patches merged into the project in 2018 (excluding the
> Parquet merge) -- that's more than 4 patches per day. Evidence
> suggests that the overall patch count for 2019 will be even higher; if
> I had to guess somewhere well over 2000. Out of last year's patches, I
> merged 1028, i.e. 2 out of every 3. If we are to be able to take on
> 2000 or more patches this year, we'll need more help. If you are
> neither a committer nor a PMC member, you can still help with code
> review and discussions to help contributors get their work into
> merge-ready state.

I generally try to review as many PRs as I feel competent to.

What should be the guideline when some PRs for other implementations
(such as C#, Java...) are lingering on?

> I'll do what I have to in order to keep the patches flowing as fast as
> possible into master, but contributors and other maintainers can help
> with the Always Be Closing mindset -- the 80/20 rule or 90/10 rule
> frequently applies. In many cases it is better to merge a patch and
> open up a JIRA for follow up improvements if there is uncertainty
> about whether something is "done".

I'm quite wary of technical debt (which can quickly plague fast-growing
projects) so I tend to be a bit demanding in my reviews :-)

Regards

Antoine.



Re: A renewed plea for help [was Re: Recruiting more maintainers for Apache Arrow]

Posted by Wes McKinney <we...@gmail.com>.
hi folks,

It's been 3 months since I sent this e-mail so I thought I would
follow up about where things stand. The project continues to grow very
fast and so there is a ton of pull request and JIRA gardening to do.
For the 0.12.0 release, my personal burden of patch-merging stood at
about 50%.

http://arrow.apache.org/release/0.12.0.html#patch-committers

We currently have 93 pull requests open. There are some stale PRs but
for the most part these are good / mostly-non-stale PRs in occasional
need of rebasing or following up with contributors about making
changes.

There were 1540 patches merged into the project in 2018 (excluding the
Parquet merge) -- that's more than 4 patches per day. Evidence
suggests that the overall patch count for 2019 will be even higher; if
I had to guess somewhere well over 2000. Out of last year's patches, I
merged 1028, i.e. 2 out of every 3. If we are to be able to take on
2000 or more patches this year, we'll need more help. If you are
neither a committer nor a PMC member, you can still help with code
review and discussions to help contributors get their work into
merge-ready state.

While I would like to the share of patch maintenance more distributed,
I'll do what I have to in order to keep the patches flowing as fast as
possible into master, but contributors and other maintainers can help
with the Always Be Closing mindset -- the 80/20 rule or 90/10 rule
frequently applies. In many cases it is better to merge a patch and
open up a JIRA for follow up improvements if there is uncertainty
about whether something is "done". As they say "Done [Merged] is
better than Perfect" (as long as the build isn't broken)

Additionally, please be proactive about opening JIRA issues. Out of
the last 1000 issues created (ARROW-4326 to ARROW-3327), 501 of them
were created by just 5 people (Wes, Antoine, Uwe, Krisztian, Kou). In
accordance with the "Always Be Closing" mindset, I frequently create
issues as a way of closing a discussion where there is no urgent next
step, but some work needs to be done in the future. We need to capture
information and file it away as efficiently as possible so we can move
on to other work.

Thank you,
Wes

On Mon, Oct 15, 2018 at 9:13 AM Wes McKinney <we...@gmail.com> wrote:
>
> hi folks,
>
> It's been a few months, but as Apache Arrow is rapidly becoming a
> critical dependency of next-generation data applications (see, for
> example, RAPIDS just launched by NVIDIA http://rapids.ai/), we are
> quite seriously in need of more project maintainers, or in lieu of new
> individual contributors, additional direct funding. We are especially
> in need of corporations dependent on this software to help carry the
> load of JIRA gardening, code review, build and CI tooling, packaging
> automation, developer workflow tools, and so on.
>
> One of the casualties of the growing maintenance burden of this
> project is that it's increasingly difficult for people like me who
> know the project internals very well to allocate time to working on
> new functionality. When I talk to people about the project they often
> ask me things like "When will X, Y, or Z functionality be ready?" and
> my answer is often "I don't know, it depends on whether more people
> show up to help with the maintenance workload so people can spend more
> time building new things". This is coupled with the frustration that
> newcomers can experience where the learning curve is very steep to be
> able to contribute significantly to new functionality. The only way
> out is to recruit more people to help keep things orderly, take out
> the proverbial garbage, and keep the project healthy.
>
> If anyone reading has the bandwidth to help with maintaining the
> project, or to contribute funds to support maintenance, please let us
> know.
>
> Special thanks to Antoine, Kou, Kristztian, Phillip, and Uwe for their
> work on tooling, packaging, and other development processes for the
> 0.10 and 0.11 releases.
>
> Thanks,
> Wes
>
> On Mon, Jul 2, 2018 at 11:40 AM Antoine Pitrou <an...@python.org> wrote:
> >
> >
> > Hi,
> >
> > Le 02/07/2018 à 15:58, Wes McKinney a écrit :
> > > * http://ivory.idyll.org/blog/2018-how-open-is-too-open.html
> > > * http://ivory.idyll.org/blog/2018-oss-framework-cpr.html
> >
> > Very good articles, but I would stress that some of the mechanisms
> > proposed lack metrics in their favour.  Two particular examples that I
> > know about:
> >
> > 1)
> >
> > """ I seem to recall Martin van Loewis offering to review one externally
> > contributed patch for every ten other patches reviewed by the submitter.
> > (I can’t find the link, sorry!) This imposes work requirements on
> > would-be contributors that obligate them to contribute substantively to
> > the project maintenance, before their pet feature gets implemented. """
> >
> > Martin's offer was almost never taken up, although he expressed it many
> > times during many years.  I think there are two factors to it:
> >
> > a) Cost.  As an occasional contributor, I could understand having to do
> > a review before contributing a patch of mine, but not having to do 5 or
> > more reviews for each patch I contribute.  The effort asked is much too
> > high, and you're probably discouraging people who are discovering the
> > project, even before they could get hooked on it.
> >
> > b) Difficult.  It's much more difficult and intimidating to review
> > someone else's PR, than to propose your own changes knowing that it will
> > be reviewed by (you are assuming) competent people.  So this mechanism
> > is excluding first-time contributors, which is probably *not* what you want.
> >
> > 2)
> >
> > """ Some projects have excellent incubators, like the Python Core
> > Mentorship Program, where people who are interested in applying their
> > effort to recruiting new contributors can do so. """
> >
> > Actually, it doesn't seem to me that a significant proportion of
> > frequent Python contributors have gone through the core mentorship
> > process.  It probably got us a handful of one-time contributions.
> > Pointing to the Python core mentorship program as an "excellent
> > incubator" sounds rather far-fetched to me.
> >
> > Generally speaking, there's a limit to the usefulness of hand-holding
> > contributors, especially if your project is rather complex (as Python
> > is), because the blocking point for contributors is *not* that the
> > development mailing-list is a bit intimidating (as was claimed by the
> > people who founded the Python core mentorship program).
> >
> >
> > PS : as a matter of fact, the general rate of contributions to Python
> > has been *decreasing* for years.
> >
> > Regards
> >
> > Antoine.