You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by Wes McKinney <we...@gmail.com> on 2020/06/20 15:56:42 UTC

Helping new contributors get started [was Re: Renaming master branch, removing blacklist/whitelist]

hi Suvayu,

Changing the subject so we can have a discussion about this
separately. It sounds to me a bit like you may be airing grievances
but I will offer my opinion and we can see what other people think.

From a purely factual view, the project is successfully attracting and
supporting contributors. Over 500 different people have contributed to
the project (more than the "420" printed on GitHub because many people
use e-mail addresses not associated with their GitHub user names) and
that number is increasing steadily over time.

We have invested greatly in providing systems to support developers of
the project. We have a large and complex CI setup and nowadays it
works pretty much like clockwork which is a huge change compared with
a year or two ago.

In general, I will say: if you wish for significant volunteer
mentorship especially in an early stage open source project you are
likely going to be disappointed. I associate these patterns with later
stage projects (think: the Python programming language or the Linux
kernel). I personally do not have the time -- I direct a team of
people working on the project working towards specific development
goals, and we are in turn accountable to the people who are sponsoring
our work. In addition, I do a large amount of individual
contributions.

If you are looking for individualized "mentorship and guidance"
_beyond_ pointers toward what part of the project you should be
looking at to solve a problem, feedback on issues about whether or not
something is deemed useful or high priority or not, and feedback on
your PRs whether you are on the right track or not, I think your
expectations -- at this stage of the project -- may not be reasonable.
The number of regularly active developers in this project for the
parts that you have looked at is actually quite small. So you're
talking about some of the 10 people at the top of the GitHub
contributor list. It would be different if we were talking about an
older project with an order of magnitude more regularly active
developers.

The area where I think we could improve the most is developer
documentation, which in a sense is "self-service guidance" in
understanding the codebases. Antoine and others have taken initiative
on this but it often goes by the way side since the number of people
with requisite knowledge to write it is small (countable on fingers
and toes if you include all the programming languages) and very short
of free cycles.

Thanks,
Wes

On Fri, Jun 19, 2020 at 9:44 PM Suvayu Ali <fa...@gmail.com> wrote:
>
> Hi all,
>
> (sorry if this is a duplicate post, I always have trouble posting to this list)
>
> On Fri, Jun 19, 2020 at 5:54 PM Todd Hendricks <he...@gmail.com> wrote:
> >
> > I'm a black data scientist. For whatever it's worth, I have never taken
> > offense to the term "Master" branch, as I have never interpreted it to have
> > a derogatory connotation. It's literally never crossed my mind.
>
> As an Indian person, I would concur with what Todd said.
>
> That said, I would like to highlight a few things.  Since the
> community is spending time to discuss how to be more welcoming to a
> diverse group of contributors, instead of default branch names, there
> are many practically relevant issues that could be addressed.
>
> I've been trying to contribute to this project for about 2 yrs, rather
> unsuccessfully.  I come from the perspective of analysis rather than
> engineering.  But I'm no stranger to technical nitty gritties
> (particle physicist at CERN, data scientist at non-technical startups,
> scientific software dev).  I started by filing bug reports for my
> needs (pyarrow and parquet).  Most bug reports are still open, they
> received a bit of discussion, but mostly they have been assigned and
> reassigned to releases for over a yr.  On day one I had offered to do
> the work myself, but with some guidance, I didn't receive any.  So I
> gave up.
>
> Some months later, after Gandiva was released, I came back with the
> goal of using it from pyarrow.  While after some help I could do
> simple tests in C++, getting it to work with pyarrow proved difficult.
> I don't remember the exact hurdle, but I decided I would package it
> for my distro (Fedora) for simpler compilation.  So I contributed a
> few patches to the build system to build against system libraries
> instead of the vendored versions, including the ability to switch LLVM
> versions.  I think around this time Kou was overhauling the build
> system. My patches were not accepted, but some of the ground work I
> did hopefully help Kou.  Eventually though, I gave up.
>
> Soon after, I tried to build a wheel for ARM; I was gathering some
> data on an RPi.  That didn't go so well either, again, the reason was
> lack of guidance.  At the time, it was also expressed that wheels are
> disfavoured by the community, and not worth maintaining.  I see that
> position has changed now.
>
> There is a clear pattern here, if the community is really serious
> about addressing diversity and being inclusive, time would be better
> spent by addressing issues like contribution guidelines for beginners
> (not saying absolute beginners), mentoring, or triaging of open issues
> in terms of ease of contribution, and other concrete hurdles for new
> comers.  I realise people's time is scarce, but you have to start
> somewhere.  At the least, if someone guides me, I can pick up these
> tasks and the maintainers can focus on the more involved roles. If the
> issues I have highlighted cannot be prioritised, then wasting time on
> superficial issues like default branch names should also be avoided.
>
> I hope my comments are accepted as constructive criticism.
>
> Cheers,
>
> PS: whitelist/blacklist -> accept/reject seems quite reasonable;
> personally, colour based terminology has always been very unclear to
> me
>
> --
> Suvayu
>
> Open source is the future. It sets us free.

Re: Helping new contributors get started [was Re: Renaming master branch, removing blacklist/whitelist]

Posted by Amol Umbarkar <am...@gmail.com>.
Just wanted to share an experience from another project. I am a big fan of dask
developer log <https://blog.dask.org/2016/12/05/dask-dev-1>. It helped me
understand what project is currently focusing on and some pointers on past
decisions.

I understand the current stage and workload may not go in line with this
idea.

I don't mind compiling/maintaining the blog (weekly/ 2 weeks) if everyone
shares short notes on pieces that are being worked on.

~Amol

On Sun, Jun 21, 2020 at 2:30 AM Adam Lippai <ad...@rigo.sk> wrote:

> Undoubtedly, you always answer and that is amazing. Now all the help is
> core/pro -> beginner, but a average <-> average or average-> beginner
> cooperation would be nice. I understand it's not the time to introduce it
> yet, we don't have the critical mass. I didn't think of SO before, but
> indeed, it serves this purpose, it's a good forum for this.
>
> Thanks for the detailed answer.
>
> Best regards,
> Adam Lippai
>
>
> On Sat, Jun 20, 2020, 22:38 Wes McKinney <we...@gmail.com> wrote:
>
> > On Sat, Jun 20, 2020 at 3:19 PM Adam Lippai <ad...@rigo.sk> wrote:
> > >
> > > I've seen better and worse examples before.
> > > I was an active, beginner Drupal developer ~12 years ago. The Drupal
> > > project community was very strong, particularly in Hungary where I
> live.
> > > International and local IRC channels, international and local
> > > forums+events, highly customized issue tracker and superb
> documentation.
> > It
> > > was more mature and bigger that time. On the other hand when I tried to
> > > give back to Angular or React... Well... You are already ahead of them.
> > > React eventually recognized the problem and they try to solve it, but a
> > > large company's bureaucracy doesn't help that.
> > >
> > > My experience with Arrow is aligned with my expectations of a project
> of
> > > this age or size (and in a few fields you are awesome!). Andy Grove,
> > > xhochy, wesm, Joris were welcoming and responsive on Jira, Twitter and
> > this
> > > mailing list too. Ofc nobody worked for free on my ideas and I can't
> > > develop C++ or Rust alone (yet). What I can do now is tracking the
> > > development, the PRs (I've added a few more or less valuable, but not
> so
> > > unique comments) and I'm subscribed to a few Jira issues.
> > >
> > > At this point I could use a gitter/IRC/slack channel for discussions -
> > with
> > > peers instead of core devs - and using mailing list + JIRA doesn't help
> > > either. They are simply cumbersome, hard to navigate/search, focus is
> > lost
> > > when somebody is not sure what's interesting. A simpler issue tracker
> (eg
> > > GitHub issues) and a super simple forum instead of mailing list would
> > lower
> > > the barriers. I don't think this is a priority as this setup certainly
> > > serves your current workflows.
> >
> > On this I will say: we used to have a Slack channel but it didn't work
> > well. Only a few core developers ever looked at it and because of the
> > general "Slackification" of open source a lot of people would join the
> > Slack channel looking for help and be unable to get it. People also
> > reported bugs in Slack and we would learn about them weeks after the
> > fact, or never. I think if we added a new official communications
> > channel for the project right now it would likely suffer the same
> > fate. If we had 10x as many core developers then there might be enough
> > core devs who are comfortable with the additional modality that it
> > might make sense. We still have lots of people reporting bugs on Stack
> > Overflow and very few core developers regularly look at the SO
> > questions.
> >
> > By contrast, we nearly unfailingly respond to people on the mailing
> > list and JIRA. So if people are looking for help they can certainly
> > get it there.
> >
> > > Keep up the good work, you are amazing! I can't wait a more complete
> > > DataFusion, group by and join for pyarrow and other dozen exciting
> > > opportunities and features.
> > >
> > > tl;dr you are great, not behind, local communities/meetups are a good
> > > opportunity (but covid...), I find Jira + mailing list hard to use
> > > (mentally, as not core dev)
> > >
> > > Best regards,
> > > Adam Lippai
> > >
> > >
> > >
> > > On Sat, Jun 20, 2020, 21:23 Wes McKinney <we...@gmail.com> wrote:
> > >
> > > > On Sat, Jun 20, 2020 at 1:52 PM Neal Richardson
> > > > <ne...@gmail.com> wrote:
> > > > >
> > > > > Hi Suvayu,
> > > > > Thanks for your feedback. I'm sorry to hear that you feel that you
> > > > haven't
> > > > > had the best experiences trying to contribute to the project. For
> > what
> > > > it's
> > > > > worth, I believe that raising concerns like this _is_ itself a
> > valuable
> > > > > contribution. So even if you haven't gotten to the point of having
> a
> > pull
> > > > > request merged, I don't think it's accurate to say that you've been
> > > > trying
> > > > > unsuccessfully to contribute--you're contributing right now.
> > > > >
> > > > > As it turns out, just the other day I opened a JIRA issue about
> > improving
> > > > > the contributor guide (
> > https://issues.apache.org/jira/browse/ARROW-9189
> > > > ),
> > > > > and I'll be taking that up next week as part of our 1.0 website
> > > > overhaul. I
> > > > > agree that we can do a better job in helping new contributors
> > > > participate,
> > > > > and that many of those forms of contribution need not require lots
> of
> > > > time
> > > > > from Arrow core developers. Wes's point about the limited bandwidth
> > to
> > > > > provide mentorship is valid; that said, I've seen many successful
> > cases
> > > > of
> > > > > first-time contributors getting the support they need. While
> there's
> > > > > certainly room for improvement, I'm optimistic that we're on the
> > right
> > > > > track.
> > > >
> > > > Yes — to be clear, the core developers in my experience (myself
> > > > included) are spending a lot of time responding to questions on JIRA,
> > > > clarifying issues with issue reporters, and offering advice about how
> > > > to proceed. Additionally, we spend a lot of time reviewing code and
> > > > helping people get their patches ready to be merged. There's no way
> we
> > > > would have 500+ contributors if we were not doing these things.
> > > >
> > > > As far as getting the help that's needed from core developers, the
> > > > thing that helps someone like me the most is to have the "request" be
> > > > as specific and direct as possible. In any given day I might look at
> > > > 50-100 different issues and so if it's not clear what I need to do I
> > > > will often move on to the next thing. Example direct requests:
> > > >
> > > > * Do you think $PROPOSED_APPROACH is the right one?
> > > > * In which file(s) should I be looking to make changes?
> > > > * Is there anything related in the codebase I can look at to learn?
> > > >
> > > > I'm sure we can put this advice in our contributor guide.
> > > >
> > > > If you ask these questions and do not get an answer, it is OK to ask
> > again.
> > > >
> > > > I see six JIRA issues from Suvayu in the project
> > > >
> > > > * https://issues.apache.org/jira/browse/ARROW-1956
> > > > * https://issues.apache.org/jira/browse/ARROW-3806
> > > > * https://issues.apache.org/jira/browse/ARROW-4930
> > > > * https://issues.apache.org/jira/browse/ARROW-3792
> > > > * https://issues.apache.org/jira/browse/ARROW-3874
> > > > * https://issues.apache.org/jira/browse/ARROW-6577
> > > >
> > > > There are comments in all cases and the issues were resolved in 4 out
> > > > of 6 cases. I see one example of you asking for guidance
> > > > (https://issues.apache.org/jira/browse/ARROW-1956) on December 29,
> > > > 2017 while I (and presumably others) were on vacation for the New
> > > > Year. In the future, it is OK to be more persistent.
> > > >
> > > > Thanks
> > > >
> > > > > Neal
> > > > >
> > > > >
> > > > > On Sat, Jun 20, 2020 at 11:25 AM Suvayu Ali <fatkasuvayu@gmail.com
> >
> > > > wrote:
> > > > >
> > > > > > Hi Wes, others,
> > > > > >
> > > > > > Thank you for taking the time to draft a long response.
> > > > > >
> > > > > > On Sat, Jun 20, 2020 at 3:57 PM Wes McKinney <
> wesmckinn@gmail.com>
> > > > wrote:
> > > > > > >
> > > > > > > From a purely factual view, the project is successfully
> > attracting
> > > > and
> > > > > > > supporting contributors. Over 500 different people have
> > contributed
> > > > to
> > > > > > > the project (more than the "420" printed on GitHub because many
> > > > people
> > > > > > > use e-mail addresses not associated with their GitHub user
> > names) and
> > > > > > > that number is increasing steadily over time.
> > > > > >
> > > > > > This response reinforces one of my points, all this branch name
> > change
> > > > > > business then has nothing to do with actually getting new
> > > > > > contributors.
> > > > > >
> > > > > > > We have invested greatly in providing systems to support
> > developers
> > > > of
> > > > > > > the project. We have a large and complex CI setup and nowadays
> it
> > > > > > > works pretty much like clockwork which is a huge change
> compared
> > with
> > > > > > > a year or two ago.
> > > > > >
> > > > > > Agreed, and I have learned a lot from it just by observing.
> > > > > >
> > > > > > > If you are looking for individualized "mentorship and guidance"
> > > > > > > _beyond_ pointers toward what part of the project you should be
> > > > > > > looking at to solve a problem, feedback on issues about whether
> > or
> > > > not
> > > > > > > something is deemed useful or high priority or not, and
> feedback
> > on
> > > > > > > your PRs whether you are on the right track or not, I think
> your
> > > > > > > expectations -- at this stage of the project -- may not be
> > > > reasonable.
> > > > > > > The number of regularly active developers in this project for
> the
> > > > > > > parts that you have looked at is actually quite small. So
> you're
> > > > > > > talking about some of the 10 people at the top of the GitHub
> > > > > > > contributor list. It would be different if we were talking
> about
> > an
> > > > > > > older project with an order of magnitude more regularly active
> > > > > > > developers.
> > > > > >
> > > > > > If pointers to you are: look at the serialisation code, then
> yes, I
> > > > > > was hoping for more along the lines of look at class XYZ in file
> > bla.
> > > > > > I completely understand if that's not possible.  That is why I
> > never
> > > > > > said anything before.  You may not remember, during the "whether
> to
> > > > > > support wheels" discussion, as I was impacted, I offered a
> > compromise
> > > > > > of releasing a reduced feature-set wheel with simpler
> dependencies,
> > > > > > which was rejected with this exact argument.  I did not counter,
> > > > > > because it is a very reasonable position to take, and I'm in no
> > > > > > position to "demand" anything.
> > > > > >
> > > > > > I only wrote today because I felt maybe now there is a
> willingness
> > for
> > > > > > newer, diverse contributors, because that's how this thread was
> > > > > > motivated.  So I stated the hurdles I have faced, and hoped
> > instead of
> > > > > > wasting scarce resources on superficial changes the community
> could
> > > > > > address actual hurdles for new contributors like me.  Obviously I
> > > > > > misunderstood.
> > > > > >
> > > > > > > The area where I think we could improve the most is developer
> > > > > > > documentation, which in a sense is "self-service guidance" in
> > > > > > > understanding the codebases. Antoine and others have taken
> > initiative
> > > > > > > on this but it often goes by the way side since the number of
> > people
> > > > > > > with requisite knowledge to write it is small (countable on
> > fingers
> > > > > > > and toes if you include all the programming languages) and very
> > short
> > > > > > > of free cycles.
> > > > > >
> > > > > > I'm guessing you mean the Sphinx docs?  Whatever I have managed
> to
> > use
> > > > > > Arrow for, it's thanks to those.  Maybe that is my cue, when
> > hitting a
> > > > > > dead-end, "I should ask which source file do I look in?"
> > > > > >
> > > > > > Anyway, I don't want to waste anyone's time anymore. I felt
> there's
> > > > > > room for feedback, I was wrong, and I withdraw from this
> > discussion.
> > > > > > I'll continue to lurk on the mailing list, and try to contribute
> > when
> > > > > > I can.
> > > > > >
> > > > > > Cheers and thanks for your time,
> > > > > >
> > > > > > --
> > > > > > Suvayu
> > > > > >
> > > > > > Open source is the future. It sets us free.
> > > > > >
> > > >
> >
>

Re: Helping new contributors get started [was Re: Renaming master branch, removing blacklist/whitelist]

Posted by Adam Lippai <ad...@rigo.sk>.
Undoubtedly, you always answer and that is amazing. Now all the help is
core/pro -> beginner, but a average <-> average or average-> beginner
cooperation would be nice. I understand it's not the time to introduce it
yet, we don't have the critical mass. I didn't think of SO before, but
indeed, it serves this purpose, it's a good forum for this.

Thanks for the detailed answer.

Best regards,
Adam Lippai


On Sat, Jun 20, 2020, 22:38 Wes McKinney <we...@gmail.com> wrote:

> On Sat, Jun 20, 2020 at 3:19 PM Adam Lippai <ad...@rigo.sk> wrote:
> >
> > I've seen better and worse examples before.
> > I was an active, beginner Drupal developer ~12 years ago. The Drupal
> > project community was very strong, particularly in Hungary where I live.
> > International and local IRC channels, international and local
> > forums+events, highly customized issue tracker and superb documentation.
> It
> > was more mature and bigger that time. On the other hand when I tried to
> > give back to Angular or React... Well... You are already ahead of them.
> > React eventually recognized the problem and they try to solve it, but a
> > large company's bureaucracy doesn't help that.
> >
> > My experience with Arrow is aligned with my expectations of a project of
> > this age or size (and in a few fields you are awesome!). Andy Grove,
> > xhochy, wesm, Joris were welcoming and responsive on Jira, Twitter and
> this
> > mailing list too. Ofc nobody worked for free on my ideas and I can't
> > develop C++ or Rust alone (yet). What I can do now is tracking the
> > development, the PRs (I've added a few more or less valuable, but not so
> > unique comments) and I'm subscribed to a few Jira issues.
> >
> > At this point I could use a gitter/IRC/slack channel for discussions -
> with
> > peers instead of core devs - and using mailing list + JIRA doesn't help
> > either. They are simply cumbersome, hard to navigate/search, focus is
> lost
> > when somebody is not sure what's interesting. A simpler issue tracker (eg
> > GitHub issues) and a super simple forum instead of mailing list would
> lower
> > the barriers. I don't think this is a priority as this setup certainly
> > serves your current workflows.
>
> On this I will say: we used to have a Slack channel but it didn't work
> well. Only a few core developers ever looked at it and because of the
> general "Slackification" of open source a lot of people would join the
> Slack channel looking for help and be unable to get it. People also
> reported bugs in Slack and we would learn about them weeks after the
> fact, or never. I think if we added a new official communications
> channel for the project right now it would likely suffer the same
> fate. If we had 10x as many core developers then there might be enough
> core devs who are comfortable with the additional modality that it
> might make sense. We still have lots of people reporting bugs on Stack
> Overflow and very few core developers regularly look at the SO
> questions.
>
> By contrast, we nearly unfailingly respond to people on the mailing
> list and JIRA. So if people are looking for help they can certainly
> get it there.
>
> > Keep up the good work, you are amazing! I can't wait a more complete
> > DataFusion, group by and join for pyarrow and other dozen exciting
> > opportunities and features.
> >
> > tl;dr you are great, not behind, local communities/meetups are a good
> > opportunity (but covid...), I find Jira + mailing list hard to use
> > (mentally, as not core dev)
> >
> > Best regards,
> > Adam Lippai
> >
> >
> >
> > On Sat, Jun 20, 2020, 21:23 Wes McKinney <we...@gmail.com> wrote:
> >
> > > On Sat, Jun 20, 2020 at 1:52 PM Neal Richardson
> > > <ne...@gmail.com> wrote:
> > > >
> > > > Hi Suvayu,
> > > > Thanks for your feedback. I'm sorry to hear that you feel that you
> > > haven't
> > > > had the best experiences trying to contribute to the project. For
> what
> > > it's
> > > > worth, I believe that raising concerns like this _is_ itself a
> valuable
> > > > contribution. So even if you haven't gotten to the point of having a
> pull
> > > > request merged, I don't think it's accurate to say that you've been
> > > trying
> > > > unsuccessfully to contribute--you're contributing right now.
> > > >
> > > > As it turns out, just the other day I opened a JIRA issue about
> improving
> > > > the contributor guide (
> https://issues.apache.org/jira/browse/ARROW-9189
> > > ),
> > > > and I'll be taking that up next week as part of our 1.0 website
> > > overhaul. I
> > > > agree that we can do a better job in helping new contributors
> > > participate,
> > > > and that many of those forms of contribution need not require lots of
> > > time
> > > > from Arrow core developers. Wes's point about the limited bandwidth
> to
> > > > provide mentorship is valid; that said, I've seen many successful
> cases
> > > of
> > > > first-time contributors getting the support they need. While there's
> > > > certainly room for improvement, I'm optimistic that we're on the
> right
> > > > track.
> > >
> > > Yes — to be clear, the core developers in my experience (myself
> > > included) are spending a lot of time responding to questions on JIRA,
> > > clarifying issues with issue reporters, and offering advice about how
> > > to proceed. Additionally, we spend a lot of time reviewing code and
> > > helping people get their patches ready to be merged. There's no way we
> > > would have 500+ contributors if we were not doing these things.
> > >
> > > As far as getting the help that's needed from core developers, the
> > > thing that helps someone like me the most is to have the "request" be
> > > as specific and direct as possible. In any given day I might look at
> > > 50-100 different issues and so if it's not clear what I need to do I
> > > will often move on to the next thing. Example direct requests:
> > >
> > > * Do you think $PROPOSED_APPROACH is the right one?
> > > * In which file(s) should I be looking to make changes?
> > > * Is there anything related in the codebase I can look at to learn?
> > >
> > > I'm sure we can put this advice in our contributor guide.
> > >
> > > If you ask these questions and do not get an answer, it is OK to ask
> again.
> > >
> > > I see six JIRA issues from Suvayu in the project
> > >
> > > * https://issues.apache.org/jira/browse/ARROW-1956
> > > * https://issues.apache.org/jira/browse/ARROW-3806
> > > * https://issues.apache.org/jira/browse/ARROW-4930
> > > * https://issues.apache.org/jira/browse/ARROW-3792
> > > * https://issues.apache.org/jira/browse/ARROW-3874
> > > * https://issues.apache.org/jira/browse/ARROW-6577
> > >
> > > There are comments in all cases and the issues were resolved in 4 out
> > > of 6 cases. I see one example of you asking for guidance
> > > (https://issues.apache.org/jira/browse/ARROW-1956) on December 29,
> > > 2017 while I (and presumably others) were on vacation for the New
> > > Year. In the future, it is OK to be more persistent.
> > >
> > > Thanks
> > >
> > > > Neal
> > > >
> > > >
> > > > On Sat, Jun 20, 2020 at 11:25 AM Suvayu Ali <fa...@gmail.com>
> > > wrote:
> > > >
> > > > > Hi Wes, others,
> > > > >
> > > > > Thank you for taking the time to draft a long response.
> > > > >
> > > > > On Sat, Jun 20, 2020 at 3:57 PM Wes McKinney <we...@gmail.com>
> > > wrote:
> > > > > >
> > > > > > From a purely factual view, the project is successfully
> attracting
> > > and
> > > > > > supporting contributors. Over 500 different people have
> contributed
> > > to
> > > > > > the project (more than the "420" printed on GitHub because many
> > > people
> > > > > > use e-mail addresses not associated with their GitHub user
> names) and
> > > > > > that number is increasing steadily over time.
> > > > >
> > > > > This response reinforces one of my points, all this branch name
> change
> > > > > business then has nothing to do with actually getting new
> > > > > contributors.
> > > > >
> > > > > > We have invested greatly in providing systems to support
> developers
> > > of
> > > > > > the project. We have a large and complex CI setup and nowadays it
> > > > > > works pretty much like clockwork which is a huge change compared
> with
> > > > > > a year or two ago.
> > > > >
> > > > > Agreed, and I have learned a lot from it just by observing.
> > > > >
> > > > > > If you are looking for individualized "mentorship and guidance"
> > > > > > _beyond_ pointers toward what part of the project you should be
> > > > > > looking at to solve a problem, feedback on issues about whether
> or
> > > not
> > > > > > something is deemed useful or high priority or not, and feedback
> on
> > > > > > your PRs whether you are on the right track or not, I think your
> > > > > > expectations -- at this stage of the project -- may not be
> > > reasonable.
> > > > > > The number of regularly active developers in this project for the
> > > > > > parts that you have looked at is actually quite small. So you're
> > > > > > talking about some of the 10 people at the top of the GitHub
> > > > > > contributor list. It would be different if we were talking about
> an
> > > > > > older project with an order of magnitude more regularly active
> > > > > > developers.
> > > > >
> > > > > If pointers to you are: look at the serialisation code, then yes, I
> > > > > was hoping for more along the lines of look at class XYZ in file
> bla.
> > > > > I completely understand if that's not possible.  That is why I
> never
> > > > > said anything before.  You may not remember, during the "whether to
> > > > > support wheels" discussion, as I was impacted, I offered a
> compromise
> > > > > of releasing a reduced feature-set wheel with simpler dependencies,
> > > > > which was rejected with this exact argument.  I did not counter,
> > > > > because it is a very reasonable position to take, and I'm in no
> > > > > position to "demand" anything.
> > > > >
> > > > > I only wrote today because I felt maybe now there is a willingness
> for
> > > > > newer, diverse contributors, because that's how this thread was
> > > > > motivated.  So I stated the hurdles I have faced, and hoped
> instead of
> > > > > wasting scarce resources on superficial changes the community could
> > > > > address actual hurdles for new contributors like me.  Obviously I
> > > > > misunderstood.
> > > > >
> > > > > > The area where I think we could improve the most is developer
> > > > > > documentation, which in a sense is "self-service guidance" in
> > > > > > understanding the codebases. Antoine and others have taken
> initiative
> > > > > > on this but it often goes by the way side since the number of
> people
> > > > > > with requisite knowledge to write it is small (countable on
> fingers
> > > > > > and toes if you include all the programming languages) and very
> short
> > > > > > of free cycles.
> > > > >
> > > > > I'm guessing you mean the Sphinx docs?  Whatever I have managed to
> use
> > > > > Arrow for, it's thanks to those.  Maybe that is my cue, when
> hitting a
> > > > > dead-end, "I should ask which source file do I look in?"
> > > > >
> > > > > Anyway, I don't want to waste anyone's time anymore. I felt there's
> > > > > room for feedback, I was wrong, and I withdraw from this
> discussion.
> > > > > I'll continue to lurk on the mailing list, and try to contribute
> when
> > > > > I can.
> > > > >
> > > > > Cheers and thanks for your time,
> > > > >
> > > > > --
> > > > > Suvayu
> > > > >
> > > > > Open source is the future. It sets us free.
> > > > >
> > >
>

Re: Helping new contributors get started [was Re: Renaming master branch, removing blacklist/whitelist]

Posted by Wes McKinney <we...@gmail.com>.
On Sat, Jun 20, 2020 at 3:19 PM Adam Lippai <ad...@rigo.sk> wrote:
>
> I've seen better and worse examples before.
> I was an active, beginner Drupal developer ~12 years ago. The Drupal
> project community was very strong, particularly in Hungary where I live.
> International and local IRC channels, international and local
> forums+events, highly customized issue tracker and superb documentation. It
> was more mature and bigger that time. On the other hand when I tried to
> give back to Angular or React... Well... You are already ahead of them.
> React eventually recognized the problem and they try to solve it, but a
> large company's bureaucracy doesn't help that.
>
> My experience with Arrow is aligned with my expectations of a project of
> this age or size (and in a few fields you are awesome!). Andy Grove,
> xhochy, wesm, Joris were welcoming and responsive on Jira, Twitter and this
> mailing list too. Ofc nobody worked for free on my ideas and I can't
> develop C++ or Rust alone (yet). What I can do now is tracking the
> development, the PRs (I've added a few more or less valuable, but not so
> unique comments) and I'm subscribed to a few Jira issues.
>
> At this point I could use a gitter/IRC/slack channel for discussions - with
> peers instead of core devs - and using mailing list + JIRA doesn't help
> either. They are simply cumbersome, hard to navigate/search, focus is lost
> when somebody is not sure what's interesting. A simpler issue tracker (eg
> GitHub issues) and a super simple forum instead of mailing list would lower
> the barriers. I don't think this is a priority as this setup certainly
> serves your current workflows.

On this I will say: we used to have a Slack channel but it didn't work
well. Only a few core developers ever looked at it and because of the
general "Slackification" of open source a lot of people would join the
Slack channel looking for help and be unable to get it. People also
reported bugs in Slack and we would learn about them weeks after the
fact, or never. I think if we added a new official communications
channel for the project right now it would likely suffer the same
fate. If we had 10x as many core developers then there might be enough
core devs who are comfortable with the additional modality that it
might make sense. We still have lots of people reporting bugs on Stack
Overflow and very few core developers regularly look at the SO
questions.

By contrast, we nearly unfailingly respond to people on the mailing
list and JIRA. So if people are looking for help they can certainly
get it there.

> Keep up the good work, you are amazing! I can't wait a more complete
> DataFusion, group by and join for pyarrow and other dozen exciting
> opportunities and features.
>
> tl;dr you are great, not behind, local communities/meetups are a good
> opportunity (but covid...), I find Jira + mailing list hard to use
> (mentally, as not core dev)
>
> Best regards,
> Adam Lippai
>
>
>
> On Sat, Jun 20, 2020, 21:23 Wes McKinney <we...@gmail.com> wrote:
>
> > On Sat, Jun 20, 2020 at 1:52 PM Neal Richardson
> > <ne...@gmail.com> wrote:
> > >
> > > Hi Suvayu,
> > > Thanks for your feedback. I'm sorry to hear that you feel that you
> > haven't
> > > had the best experiences trying to contribute to the project. For what
> > it's
> > > worth, I believe that raising concerns like this _is_ itself a valuable
> > > contribution. So even if you haven't gotten to the point of having a pull
> > > request merged, I don't think it's accurate to say that you've been
> > trying
> > > unsuccessfully to contribute--you're contributing right now.
> > >
> > > As it turns out, just the other day I opened a JIRA issue about improving
> > > the contributor guide (https://issues.apache.org/jira/browse/ARROW-9189
> > ),
> > > and I'll be taking that up next week as part of our 1.0 website
> > overhaul. I
> > > agree that we can do a better job in helping new contributors
> > participate,
> > > and that many of those forms of contribution need not require lots of
> > time
> > > from Arrow core developers. Wes's point about the limited bandwidth to
> > > provide mentorship is valid; that said, I've seen many successful cases
> > of
> > > first-time contributors getting the support they need. While there's
> > > certainly room for improvement, I'm optimistic that we're on the right
> > > track.
> >
> > Yes — to be clear, the core developers in my experience (myself
> > included) are spending a lot of time responding to questions on JIRA,
> > clarifying issues with issue reporters, and offering advice about how
> > to proceed. Additionally, we spend a lot of time reviewing code and
> > helping people get their patches ready to be merged. There's no way we
> > would have 500+ contributors if we were not doing these things.
> >
> > As far as getting the help that's needed from core developers, the
> > thing that helps someone like me the most is to have the "request" be
> > as specific and direct as possible. In any given day I might look at
> > 50-100 different issues and so if it's not clear what I need to do I
> > will often move on to the next thing. Example direct requests:
> >
> > * Do you think $PROPOSED_APPROACH is the right one?
> > * In which file(s) should I be looking to make changes?
> > * Is there anything related in the codebase I can look at to learn?
> >
> > I'm sure we can put this advice in our contributor guide.
> >
> > If you ask these questions and do not get an answer, it is OK to ask again.
> >
> > I see six JIRA issues from Suvayu in the project
> >
> > * https://issues.apache.org/jira/browse/ARROW-1956
> > * https://issues.apache.org/jira/browse/ARROW-3806
> > * https://issues.apache.org/jira/browse/ARROW-4930
> > * https://issues.apache.org/jira/browse/ARROW-3792
> > * https://issues.apache.org/jira/browse/ARROW-3874
> > * https://issues.apache.org/jira/browse/ARROW-6577
> >
> > There are comments in all cases and the issues were resolved in 4 out
> > of 6 cases. I see one example of you asking for guidance
> > (https://issues.apache.org/jira/browse/ARROW-1956) on December 29,
> > 2017 while I (and presumably others) were on vacation for the New
> > Year. In the future, it is OK to be more persistent.
> >
> > Thanks
> >
> > > Neal
> > >
> > >
> > > On Sat, Jun 20, 2020 at 11:25 AM Suvayu Ali <fa...@gmail.com>
> > wrote:
> > >
> > > > Hi Wes, others,
> > > >
> > > > Thank you for taking the time to draft a long response.
> > > >
> > > > On Sat, Jun 20, 2020 at 3:57 PM Wes McKinney <we...@gmail.com>
> > wrote:
> > > > >
> > > > > From a purely factual view, the project is successfully attracting
> > and
> > > > > supporting contributors. Over 500 different people have contributed
> > to
> > > > > the project (more than the "420" printed on GitHub because many
> > people
> > > > > use e-mail addresses not associated with their GitHub user names) and
> > > > > that number is increasing steadily over time.
> > > >
> > > > This response reinforces one of my points, all this branch name change
> > > > business then has nothing to do with actually getting new
> > > > contributors.
> > > >
> > > > > We have invested greatly in providing systems to support developers
> > of
> > > > > the project. We have a large and complex CI setup and nowadays it
> > > > > works pretty much like clockwork which is a huge change compared with
> > > > > a year or two ago.
> > > >
> > > > Agreed, and I have learned a lot from it just by observing.
> > > >
> > > > > If you are looking for individualized "mentorship and guidance"
> > > > > _beyond_ pointers toward what part of the project you should be
> > > > > looking at to solve a problem, feedback on issues about whether or
> > not
> > > > > something is deemed useful or high priority or not, and feedback on
> > > > > your PRs whether you are on the right track or not, I think your
> > > > > expectations -- at this stage of the project -- may not be
> > reasonable.
> > > > > The number of regularly active developers in this project for the
> > > > > parts that you have looked at is actually quite small. So you're
> > > > > talking about some of the 10 people at the top of the GitHub
> > > > > contributor list. It would be different if we were talking about an
> > > > > older project with an order of magnitude more regularly active
> > > > > developers.
> > > >
> > > > If pointers to you are: look at the serialisation code, then yes, I
> > > > was hoping for more along the lines of look at class XYZ in file bla.
> > > > I completely understand if that's not possible.  That is why I never
> > > > said anything before.  You may not remember, during the "whether to
> > > > support wheels" discussion, as I was impacted, I offered a compromise
> > > > of releasing a reduced feature-set wheel with simpler dependencies,
> > > > which was rejected with this exact argument.  I did not counter,
> > > > because it is a very reasonable position to take, and I'm in no
> > > > position to "demand" anything.
> > > >
> > > > I only wrote today because I felt maybe now there is a willingness for
> > > > newer, diverse contributors, because that's how this thread was
> > > > motivated.  So I stated the hurdles I have faced, and hoped instead of
> > > > wasting scarce resources on superficial changes the community could
> > > > address actual hurdles for new contributors like me.  Obviously I
> > > > misunderstood.
> > > >
> > > > > The area where I think we could improve the most is developer
> > > > > documentation, which in a sense is "self-service guidance" in
> > > > > understanding the codebases. Antoine and others have taken initiative
> > > > > on this but it often goes by the way side since the number of people
> > > > > with requisite knowledge to write it is small (countable on fingers
> > > > > and toes if you include all the programming languages) and very short
> > > > > of free cycles.
> > > >
> > > > I'm guessing you mean the Sphinx docs?  Whatever I have managed to use
> > > > Arrow for, it's thanks to those.  Maybe that is my cue, when hitting a
> > > > dead-end, "I should ask which source file do I look in?"
> > > >
> > > > Anyway, I don't want to waste anyone's time anymore. I felt there's
> > > > room for feedback, I was wrong, and I withdraw from this discussion.
> > > > I'll continue to lurk on the mailing list, and try to contribute when
> > > > I can.
> > > >
> > > > Cheers and thanks for your time,
> > > >
> > > > --
> > > > Suvayu
> > > >
> > > > Open source is the future. It sets us free.
> > > >
> >

Re: Helping new contributors get started [was Re: Renaming master branch, removing blacklist/whitelist]

Posted by Adam Lippai <ad...@rigo.sk>.
I've seen better and worse examples before.
I was an active, beginner Drupal developer ~12 years ago. The Drupal
project community was very strong, particularly in Hungary where I live.
International and local IRC channels, international and local
forums+events, highly customized issue tracker and superb documentation. It
was more mature and bigger that time. On the other hand when I tried to
give back to Angular or React... Well... You are already ahead of them.
React eventually recognized the problem and they try to solve it, but a
large company's bureaucracy doesn't help that.

My experience with Arrow is aligned with my expectations of a project of
this age or size (and in a few fields you are awesome!). Andy Grove,
xhochy, wesm, Joris were welcoming and responsive on Jira, Twitter and this
mailing list too. Ofc nobody worked for free on my ideas and I can't
develop C++ or Rust alone (yet). What I can do now is tracking the
development, the PRs (I've added a few more or less valuable, but not so
unique comments) and I'm subscribed to a few Jira issues.

At this point I could use a gitter/IRC/slack channel for discussions - with
peers instead of core devs - and using mailing list + JIRA doesn't help
either. They are simply cumbersome, hard to navigate/search, focus is lost
when somebody is not sure what's interesting. A simpler issue tracker (eg
GitHub issues) and a super simple forum instead of mailing list would lower
the barriers. I don't think this is a priority as this setup certainly
serves your current workflows.

Keep up the good work, you are amazing! I can't wait a more complete
DataFusion, group by and join for pyarrow and other dozen exciting
opportunities and features.

tl;dr you are great, not behind, local communities/meetups are a good
opportunity (but covid...), I find Jira + mailing list hard to use
(mentally, as not core dev)

Best regards,
Adam Lippai



On Sat, Jun 20, 2020, 21:23 Wes McKinney <we...@gmail.com> wrote:

> On Sat, Jun 20, 2020 at 1:52 PM Neal Richardson
> <ne...@gmail.com> wrote:
> >
> > Hi Suvayu,
> > Thanks for your feedback. I'm sorry to hear that you feel that you
> haven't
> > had the best experiences trying to contribute to the project. For what
> it's
> > worth, I believe that raising concerns like this _is_ itself a valuable
> > contribution. So even if you haven't gotten to the point of having a pull
> > request merged, I don't think it's accurate to say that you've been
> trying
> > unsuccessfully to contribute--you're contributing right now.
> >
> > As it turns out, just the other day I opened a JIRA issue about improving
> > the contributor guide (https://issues.apache.org/jira/browse/ARROW-9189
> ),
> > and I'll be taking that up next week as part of our 1.0 website
> overhaul. I
> > agree that we can do a better job in helping new contributors
> participate,
> > and that many of those forms of contribution need not require lots of
> time
> > from Arrow core developers. Wes's point about the limited bandwidth to
> > provide mentorship is valid; that said, I've seen many successful cases
> of
> > first-time contributors getting the support they need. While there's
> > certainly room for improvement, I'm optimistic that we're on the right
> > track.
>
> Yes — to be clear, the core developers in my experience (myself
> included) are spending a lot of time responding to questions on JIRA,
> clarifying issues with issue reporters, and offering advice about how
> to proceed. Additionally, we spend a lot of time reviewing code and
> helping people get their patches ready to be merged. There's no way we
> would have 500+ contributors if we were not doing these things.
>
> As far as getting the help that's needed from core developers, the
> thing that helps someone like me the most is to have the "request" be
> as specific and direct as possible. In any given day I might look at
> 50-100 different issues and so if it's not clear what I need to do I
> will often move on to the next thing. Example direct requests:
>
> * Do you think $PROPOSED_APPROACH is the right one?
> * In which file(s) should I be looking to make changes?
> * Is there anything related in the codebase I can look at to learn?
>
> I'm sure we can put this advice in our contributor guide.
>
> If you ask these questions and do not get an answer, it is OK to ask again.
>
> I see six JIRA issues from Suvayu in the project
>
> * https://issues.apache.org/jira/browse/ARROW-1956
> * https://issues.apache.org/jira/browse/ARROW-3806
> * https://issues.apache.org/jira/browse/ARROW-4930
> * https://issues.apache.org/jira/browse/ARROW-3792
> * https://issues.apache.org/jira/browse/ARROW-3874
> * https://issues.apache.org/jira/browse/ARROW-6577
>
> There are comments in all cases and the issues were resolved in 4 out
> of 6 cases. I see one example of you asking for guidance
> (https://issues.apache.org/jira/browse/ARROW-1956) on December 29,
> 2017 while I (and presumably others) were on vacation for the New
> Year. In the future, it is OK to be more persistent.
>
> Thanks
>
> > Neal
> >
> >
> > On Sat, Jun 20, 2020 at 11:25 AM Suvayu Ali <fa...@gmail.com>
> wrote:
> >
> > > Hi Wes, others,
> > >
> > > Thank you for taking the time to draft a long response.
> > >
> > > On Sat, Jun 20, 2020 at 3:57 PM Wes McKinney <we...@gmail.com>
> wrote:
> > > >
> > > > From a purely factual view, the project is successfully attracting
> and
> > > > supporting contributors. Over 500 different people have contributed
> to
> > > > the project (more than the "420" printed on GitHub because many
> people
> > > > use e-mail addresses not associated with their GitHub user names) and
> > > > that number is increasing steadily over time.
> > >
> > > This response reinforces one of my points, all this branch name change
> > > business then has nothing to do with actually getting new
> > > contributors.
> > >
> > > > We have invested greatly in providing systems to support developers
> of
> > > > the project. We have a large and complex CI setup and nowadays it
> > > > works pretty much like clockwork which is a huge change compared with
> > > > a year or two ago.
> > >
> > > Agreed, and I have learned a lot from it just by observing.
> > >
> > > > If you are looking for individualized "mentorship and guidance"
> > > > _beyond_ pointers toward what part of the project you should be
> > > > looking at to solve a problem, feedback on issues about whether or
> not
> > > > something is deemed useful or high priority or not, and feedback on
> > > > your PRs whether you are on the right track or not, I think your
> > > > expectations -- at this stage of the project -- may not be
> reasonable.
> > > > The number of regularly active developers in this project for the
> > > > parts that you have looked at is actually quite small. So you're
> > > > talking about some of the 10 people at the top of the GitHub
> > > > contributor list. It would be different if we were talking about an
> > > > older project with an order of magnitude more regularly active
> > > > developers.
> > >
> > > If pointers to you are: look at the serialisation code, then yes, I
> > > was hoping for more along the lines of look at class XYZ in file bla.
> > > I completely understand if that's not possible.  That is why I never
> > > said anything before.  You may not remember, during the "whether to
> > > support wheels" discussion, as I was impacted, I offered a compromise
> > > of releasing a reduced feature-set wheel with simpler dependencies,
> > > which was rejected with this exact argument.  I did not counter,
> > > because it is a very reasonable position to take, and I'm in no
> > > position to "demand" anything.
> > >
> > > I only wrote today because I felt maybe now there is a willingness for
> > > newer, diverse contributors, because that's how this thread was
> > > motivated.  So I stated the hurdles I have faced, and hoped instead of
> > > wasting scarce resources on superficial changes the community could
> > > address actual hurdles for new contributors like me.  Obviously I
> > > misunderstood.
> > >
> > > > The area where I think we could improve the most is developer
> > > > documentation, which in a sense is "self-service guidance" in
> > > > understanding the codebases. Antoine and others have taken initiative
> > > > on this but it often goes by the way side since the number of people
> > > > with requisite knowledge to write it is small (countable on fingers
> > > > and toes if you include all the programming languages) and very short
> > > > of free cycles.
> > >
> > > I'm guessing you mean the Sphinx docs?  Whatever I have managed to use
> > > Arrow for, it's thanks to those.  Maybe that is my cue, when hitting a
> > > dead-end, "I should ask which source file do I look in?"
> > >
> > > Anyway, I don't want to waste anyone's time anymore. I felt there's
> > > room for feedback, I was wrong, and I withdraw from this discussion.
> > > I'll continue to lurk on the mailing list, and try to contribute when
> > > I can.
> > >
> > > Cheers and thanks for your time,
> > >
> > > --
> > > Suvayu
> > >
> > > Open source is the future. It sets us free.
> > >
>

Re: Helping new contributors get started [was Re: Renaming master branch, removing blacklist/whitelist]

Posted by Wes McKinney <we...@gmail.com>.
On Sat, Jun 20, 2020 at 1:52 PM Neal Richardson
<ne...@gmail.com> wrote:
>
> Hi Suvayu,
> Thanks for your feedback. I'm sorry to hear that you feel that you haven't
> had the best experiences trying to contribute to the project. For what it's
> worth, I believe that raising concerns like this _is_ itself a valuable
> contribution. So even if you haven't gotten to the point of having a pull
> request merged, I don't think it's accurate to say that you've been trying
> unsuccessfully to contribute--you're contributing right now.
>
> As it turns out, just the other day I opened a JIRA issue about improving
> the contributor guide (https://issues.apache.org/jira/browse/ARROW-9189),
> and I'll be taking that up next week as part of our 1.0 website overhaul. I
> agree that we can do a better job in helping new contributors participate,
> and that many of those forms of contribution need not require lots of time
> from Arrow core developers. Wes's point about the limited bandwidth to
> provide mentorship is valid; that said, I've seen many successful cases of
> first-time contributors getting the support they need. While there's
> certainly room for improvement, I'm optimistic that we're on the right
> track.

Yes — to be clear, the core developers in my experience (myself
included) are spending a lot of time responding to questions on JIRA,
clarifying issues with issue reporters, and offering advice about how
to proceed. Additionally, we spend a lot of time reviewing code and
helping people get their patches ready to be merged. There's no way we
would have 500+ contributors if we were not doing these things.

As far as getting the help that's needed from core developers, the
thing that helps someone like me the most is to have the "request" be
as specific and direct as possible. In any given day I might look at
50-100 different issues and so if it's not clear what I need to do I
will often move on to the next thing. Example direct requests:

* Do you think $PROPOSED_APPROACH is the right one?
* In which file(s) should I be looking to make changes?
* Is there anything related in the codebase I can look at to learn?

I'm sure we can put this advice in our contributor guide.

If you ask these questions and do not get an answer, it is OK to ask again.

I see six JIRA issues from Suvayu in the project

* https://issues.apache.org/jira/browse/ARROW-1956
* https://issues.apache.org/jira/browse/ARROW-3806
* https://issues.apache.org/jira/browse/ARROW-4930
* https://issues.apache.org/jira/browse/ARROW-3792
* https://issues.apache.org/jira/browse/ARROW-3874
* https://issues.apache.org/jira/browse/ARROW-6577

There are comments in all cases and the issues were resolved in 4 out
of 6 cases. I see one example of you asking for guidance
(https://issues.apache.org/jira/browse/ARROW-1956) on December 29,
2017 while I (and presumably others) were on vacation for the New
Year. In the future, it is OK to be more persistent.

Thanks

> Neal
>
>
> On Sat, Jun 20, 2020 at 11:25 AM Suvayu Ali <fa...@gmail.com> wrote:
>
> > Hi Wes, others,
> >
> > Thank you for taking the time to draft a long response.
> >
> > On Sat, Jun 20, 2020 at 3:57 PM Wes McKinney <we...@gmail.com> wrote:
> > >
> > > From a purely factual view, the project is successfully attracting and
> > > supporting contributors. Over 500 different people have contributed to
> > > the project (more than the "420" printed on GitHub because many people
> > > use e-mail addresses not associated with their GitHub user names) and
> > > that number is increasing steadily over time.
> >
> > This response reinforces one of my points, all this branch name change
> > business then has nothing to do with actually getting new
> > contributors.
> >
> > > We have invested greatly in providing systems to support developers of
> > > the project. We have a large and complex CI setup and nowadays it
> > > works pretty much like clockwork which is a huge change compared with
> > > a year or two ago.
> >
> > Agreed, and I have learned a lot from it just by observing.
> >
> > > If you are looking for individualized "mentorship and guidance"
> > > _beyond_ pointers toward what part of the project you should be
> > > looking at to solve a problem, feedback on issues about whether or not
> > > something is deemed useful or high priority or not, and feedback on
> > > your PRs whether you are on the right track or not, I think your
> > > expectations -- at this stage of the project -- may not be reasonable.
> > > The number of regularly active developers in this project for the
> > > parts that you have looked at is actually quite small. So you're
> > > talking about some of the 10 people at the top of the GitHub
> > > contributor list. It would be different if we were talking about an
> > > older project with an order of magnitude more regularly active
> > > developers.
> >
> > If pointers to you are: look at the serialisation code, then yes, I
> > was hoping for more along the lines of look at class XYZ in file bla.
> > I completely understand if that's not possible.  That is why I never
> > said anything before.  You may not remember, during the "whether to
> > support wheels" discussion, as I was impacted, I offered a compromise
> > of releasing a reduced feature-set wheel with simpler dependencies,
> > which was rejected with this exact argument.  I did not counter,
> > because it is a very reasonable position to take, and I'm in no
> > position to "demand" anything.
> >
> > I only wrote today because I felt maybe now there is a willingness for
> > newer, diverse contributors, because that's how this thread was
> > motivated.  So I stated the hurdles I have faced, and hoped instead of
> > wasting scarce resources on superficial changes the community could
> > address actual hurdles for new contributors like me.  Obviously I
> > misunderstood.
> >
> > > The area where I think we could improve the most is developer
> > > documentation, which in a sense is "self-service guidance" in
> > > understanding the codebases. Antoine and others have taken initiative
> > > on this but it often goes by the way side since the number of people
> > > with requisite knowledge to write it is small (countable on fingers
> > > and toes if you include all the programming languages) and very short
> > > of free cycles.
> >
> > I'm guessing you mean the Sphinx docs?  Whatever I have managed to use
> > Arrow for, it's thanks to those.  Maybe that is my cue, when hitting a
> > dead-end, "I should ask which source file do I look in?"
> >
> > Anyway, I don't want to waste anyone's time anymore. I felt there's
> > room for feedback, I was wrong, and I withdraw from this discussion.
> > I'll continue to lurk on the mailing list, and try to contribute when
> > I can.
> >
> > Cheers and thanks for your time,
> >
> > --
> > Suvayu
> >
> > Open source is the future. It sets us free.
> >

Re: Helping new contributors get started [was Re: Renaming master branch, removing blacklist/whitelist]

Posted by Neal Richardson <ne...@gmail.com>.
Hi Suvayu,
Thanks for your feedback. I'm sorry to hear that you feel that you haven't
had the best experiences trying to contribute to the project. For what it's
worth, I believe that raising concerns like this _is_ itself a valuable
contribution. So even if you haven't gotten to the point of having a pull
request merged, I don't think it's accurate to say that you've been trying
unsuccessfully to contribute--you're contributing right now.

As it turns out, just the other day I opened a JIRA issue about improving
the contributor guide (https://issues.apache.org/jira/browse/ARROW-9189),
and I'll be taking that up next week as part of our 1.0 website overhaul. I
agree that we can do a better job in helping new contributors participate,
and that many of those forms of contribution need not require lots of time
from Arrow core developers. Wes's point about the limited bandwidth to
provide mentorship is valid; that said, I've seen many successful cases of
first-time contributors getting the support they need. While there's
certainly room for improvement, I'm optimistic that we're on the right
track.

Neal


On Sat, Jun 20, 2020 at 11:25 AM Suvayu Ali <fa...@gmail.com> wrote:

> Hi Wes, others,
>
> Thank you for taking the time to draft a long response.
>
> On Sat, Jun 20, 2020 at 3:57 PM Wes McKinney <we...@gmail.com> wrote:
> >
> > From a purely factual view, the project is successfully attracting and
> > supporting contributors. Over 500 different people have contributed to
> > the project (more than the "420" printed on GitHub because many people
> > use e-mail addresses not associated with their GitHub user names) and
> > that number is increasing steadily over time.
>
> This response reinforces one of my points, all this branch name change
> business then has nothing to do with actually getting new
> contributors.
>
> > We have invested greatly in providing systems to support developers of
> > the project. We have a large and complex CI setup and nowadays it
> > works pretty much like clockwork which is a huge change compared with
> > a year or two ago.
>
> Agreed, and I have learned a lot from it just by observing.
>
> > If you are looking for individualized "mentorship and guidance"
> > _beyond_ pointers toward what part of the project you should be
> > looking at to solve a problem, feedback on issues about whether or not
> > something is deemed useful or high priority or not, and feedback on
> > your PRs whether you are on the right track or not, I think your
> > expectations -- at this stage of the project -- may not be reasonable.
> > The number of regularly active developers in this project for the
> > parts that you have looked at is actually quite small. So you're
> > talking about some of the 10 people at the top of the GitHub
> > contributor list. It would be different if we were talking about an
> > older project with an order of magnitude more regularly active
> > developers.
>
> If pointers to you are: look at the serialisation code, then yes, I
> was hoping for more along the lines of look at class XYZ in file bla.
> I completely understand if that's not possible.  That is why I never
> said anything before.  You may not remember, during the "whether to
> support wheels" discussion, as I was impacted, I offered a compromise
> of releasing a reduced feature-set wheel with simpler dependencies,
> which was rejected with this exact argument.  I did not counter,
> because it is a very reasonable position to take, and I'm in no
> position to "demand" anything.
>
> I only wrote today because I felt maybe now there is a willingness for
> newer, diverse contributors, because that's how this thread was
> motivated.  So I stated the hurdles I have faced, and hoped instead of
> wasting scarce resources on superficial changes the community could
> address actual hurdles for new contributors like me.  Obviously I
> misunderstood.
>
> > The area where I think we could improve the most is developer
> > documentation, which in a sense is "self-service guidance" in
> > understanding the codebases. Antoine and others have taken initiative
> > on this but it often goes by the way side since the number of people
> > with requisite knowledge to write it is small (countable on fingers
> > and toes if you include all the programming languages) and very short
> > of free cycles.
>
> I'm guessing you mean the Sphinx docs?  Whatever I have managed to use
> Arrow for, it's thanks to those.  Maybe that is my cue, when hitting a
> dead-end, "I should ask which source file do I look in?"
>
> Anyway, I don't want to waste anyone's time anymore. I felt there's
> room for feedback, I was wrong, and I withdraw from this discussion.
> I'll continue to lurk on the mailing list, and try to contribute when
> I can.
>
> Cheers and thanks for your time,
>
> --
> Suvayu
>
> Open source is the future. It sets us free.
>

Re: Helping new contributors get started [was Re: Renaming master branch, removing blacklist/whitelist]

Posted by Suvayu Ali <fa...@gmail.com>.
Hi Wes, others,

Thank you for taking the time to draft a long response.

On Sat, Jun 20, 2020 at 3:57 PM Wes McKinney <we...@gmail.com> wrote:
>
> From a purely factual view, the project is successfully attracting and
> supporting contributors. Over 500 different people have contributed to
> the project (more than the "420" printed on GitHub because many people
> use e-mail addresses not associated with their GitHub user names) and
> that number is increasing steadily over time.

This response reinforces one of my points, all this branch name change
business then has nothing to do with actually getting new
contributors.

> We have invested greatly in providing systems to support developers of
> the project. We have a large and complex CI setup and nowadays it
> works pretty much like clockwork which is a huge change compared with
> a year or two ago.

Agreed, and I have learned a lot from it just by observing.

> If you are looking for individualized "mentorship and guidance"
> _beyond_ pointers toward what part of the project you should be
> looking at to solve a problem, feedback on issues about whether or not
> something is deemed useful or high priority or not, and feedback on
> your PRs whether you are on the right track or not, I think your
> expectations -- at this stage of the project -- may not be reasonable.
> The number of regularly active developers in this project for the
> parts that you have looked at is actually quite small. So you're
> talking about some of the 10 people at the top of the GitHub
> contributor list. It would be different if we were talking about an
> older project with an order of magnitude more regularly active
> developers.

If pointers to you are: look at the serialisation code, then yes, I
was hoping for more along the lines of look at class XYZ in file bla.
I completely understand if that's not possible.  That is why I never
said anything before.  You may not remember, during the "whether to
support wheels" discussion, as I was impacted, I offered a compromise
of releasing a reduced feature-set wheel with simpler dependencies,
which was rejected with this exact argument.  I did not counter,
because it is a very reasonable position to take, and I'm in no
position to "demand" anything.

I only wrote today because I felt maybe now there is a willingness for
newer, diverse contributors, because that's how this thread was
motivated.  So I stated the hurdles I have faced, and hoped instead of
wasting scarce resources on superficial changes the community could
address actual hurdles for new contributors like me.  Obviously I
misunderstood.

> The area where I think we could improve the most is developer
> documentation, which in a sense is "self-service guidance" in
> understanding the codebases. Antoine and others have taken initiative
> on this but it often goes by the way side since the number of people
> with requisite knowledge to write it is small (countable on fingers
> and toes if you include all the programming languages) and very short
> of free cycles.

I'm guessing you mean the Sphinx docs?  Whatever I have managed to use
Arrow for, it's thanks to those.  Maybe that is my cue, when hitting a
dead-end, "I should ask which source file do I look in?"

Anyway, I don't want to waste anyone's time anymore. I felt there's
room for feedback, I was wrong, and I withdraw from this discussion.
I'll continue to lurk on the mailing list, and try to contribute when
I can.

Cheers and thanks for your time,

-- 
Suvayu

Open source is the future. It sets us free.