You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by Neal Richardson <ne...@gmail.com> on 2020/06/18 19:56:25 UTC

Renaming master branch, removing blacklist/whitelist

Hi all,
As you're likely aware, there's growing momentum in the developer community
to drop terminology that some find offensive. As a project that takes pride
in being welcoming and inclusive, I think this is something we should get
in front of--particularly as we're approaching a 1.0 release.

Specifically, I am proposing to:

1. rename the "master" branch to something else ("main" seems to be
popular; other version control systems use other words too).

2. replace "whitelist"/"blacklist" in our code with something like
"allowlist"/"blocklist", or otherwise renaming. A quick search of code
shows that we don't use them much, but there are some places in Archery
that do, as well as some vendored code (which we could look to see if
that's been updated upstream and pull in changes).

These are unrelated changes and we can address them independently.

Changing the default branch is potentially disruptive, though
https://www.hanselman.com/blog/EasilyRenameYourGitDefaultBranchFromMasterToMain.aspx
doesn't sound so bad: you can run 6 lines to update your local git checkout
to recognize the new default branch. Fresh clones from GitHub will
automatically have the default branch set correctly.

At least one Apache project has gotten to the point of requesting INFRA to
change the default branch (https://issues.apache.org/jira/browse/INFRA-20403)
and I would expect there are others that are somewhere in the process of
deciding. Many other projects and organizations, including git and GitHub,
are debating this too. I'm not optimistic that we could just wait for ASF
to make some decision and implement this for all projects--they are still
named "Apache" after all--so I think this on us to do.

Thoughts? I suspect that the default branch naming may elicit more reaction
and require debate (and a vote?); as for the whitelist/blacklist, I'll work
on a patch for that tomorrow unless there's strong objection, and we can
review specific lines on the PR.

Thanks,
Neal

Re: Renaming master branch, removing blacklist/whitelist

Posted by Jacques Nadeau <ja...@apache.org>.
Hi Suvayu, thanks for sharing your experiences. Clearly we have work to do.

Wrt to specific name changes, I agree with Wes. If something is negative to
a non-trivial portion of the population, why not use something that avoids
that issue where possible.



On Fri, Jun 19, 2020, 7:44 PM Suvayu Ali <fa...@gmail.com> wrote:

> Hi all,
>
> (sorry if this is a duplicate post, I always have trouble posting to this
> list)
>
> On Fri, Jun 19, 2020 at 5:54 PM Todd Hendricks <he...@gmail.com>
> wrote:
> >
> > I'm a black data scientist. For whatever it's worth, I have never taken
> > offense to the term "Master" branch, as I have never interpreted it to
> have
> > a derogatory connotation. It's literally never crossed my mind.
>
> As an Indian person, I would concur with what Todd said.
>
> That said, I would like to highlight a few things.  Since the
> community is spending time to discuss how to be more welcoming to a
> diverse group of contributors, instead of default branch names, there
> are many practically relevant issues that could be addressed.
>
> I've been trying to contribute to this project for about 2 yrs, rather
> unsuccessfully.  I come from the perspective of analysis rather than
> engineering.  But I'm no stranger to technical nitty gritties
> (particle physicist at CERN, data scientist at non-technical startups,
> scientific software dev).  I started by filing bug reports for my
> needs (pyarrow and parquet).  Most bug reports are still open, they
> received a bit of discussion, but mostly they have been assigned and
> reassigned to releases for over a yr.  On day one I had offered to do
> the work myself, but with some guidance, I didn't receive any.  So I
> gave up.
>
> Some months later, after Gandiva was released, I came back with the
> goal of using it from pyarrow.  While after some help I could do
> simple tests in C++, getting it to work with pyarrow proved difficult.
> I don't remember the exact hurdle, but I decided I would package it
> for my distro (Fedora) for simpler compilation.  So I contributed a
> few patches to the build system to build against system libraries
> instead of the vendored versions, including the ability to switch LLVM
> versions.  I think around this time Kou was overhauling the build
> system. My patches were not accepted, but some of the ground work I
> did hopefully help Kou.  Eventually though, I gave up.
>
> Soon after, I tried to build a wheel for ARM; I was gathering some
> data on an RPi.  That didn't go so well either, again, the reason was
> lack of guidance.  At the time, it was also expressed that wheels are
> disfavoured by the community, and not worth maintaining.  I see that
> position has changed now.
>
> There is a clear pattern here, if the community is really serious
> about addressing diversity and being inclusive, time would be better
> spent by addressing issues like contribution guidelines for beginners
> (not saying absolute beginners), mentoring, or triaging of open issues
> in terms of ease of contribution, and other concrete hurdles for new
> comers.  I realise people's time is scarce, but you have to start
> somewhere.  At the least, if someone guides me, I can pick up these
> tasks and the maintainers can focus on the more involved roles. If the
> issues I have highlighted cannot be prioritised, then wasting time on
> superficial issues like default branch names should also be avoided.
>
> I hope my comments are accepted as constructive criticism.
>
> Cheers,
>
> PS: whitelist/blacklist -> accept/reject seems quite reasonable;
> personally, colour based terminology has always been very unclear to
> me
>
> --
> Suvayu
>
> Open source is the future. It sets us free.
>

Re: Helping new contributors get started [was Re: Renaming master branch, removing blacklist/whitelist]

Posted by Amol Umbarkar <am...@gmail.com>.
Just wanted to share an experience from another project. I am a big fan of dask
developer log <https://blog.dask.org/2016/12/05/dask-dev-1>. It helped me
understand what project is currently focusing on and some pointers on past
decisions.

I understand the current stage and workload may not go in line with this
idea.

I don't mind compiling/maintaining the blog (weekly/ 2 weeks) if everyone
shares short notes on pieces that are being worked on.

~Amol

On Sun, Jun 21, 2020 at 2:30 AM Adam Lippai <ad...@rigo.sk> wrote:

> Undoubtedly, you always answer and that is amazing. Now all the help is
> core/pro -> beginner, but a average <-> average or average-> beginner
> cooperation would be nice. I understand it's not the time to introduce it
> yet, we don't have the critical mass. I didn't think of SO before, but
> indeed, it serves this purpose, it's a good forum for this.
>
> Thanks for the detailed answer.
>
> Best regards,
> Adam Lippai
>
>
> On Sat, Jun 20, 2020, 22:38 Wes McKinney <we...@gmail.com> wrote:
>
> > On Sat, Jun 20, 2020 at 3:19 PM Adam Lippai <ad...@rigo.sk> wrote:
> > >
> > > I've seen better and worse examples before.
> > > I was an active, beginner Drupal developer ~12 years ago. The Drupal
> > > project community was very strong, particularly in Hungary where I
> live.
> > > International and local IRC channels, international and local
> > > forums+events, highly customized issue tracker and superb
> documentation.
> > It
> > > was more mature and bigger that time. On the other hand when I tried to
> > > give back to Angular or React... Well... You are already ahead of them.
> > > React eventually recognized the problem and they try to solve it, but a
> > > large company's bureaucracy doesn't help that.
> > >
> > > My experience with Arrow is aligned with my expectations of a project
> of
> > > this age or size (and in a few fields you are awesome!). Andy Grove,
> > > xhochy, wesm, Joris were welcoming and responsive on Jira, Twitter and
> > this
> > > mailing list too. Ofc nobody worked for free on my ideas and I can't
> > > develop C++ or Rust alone (yet). What I can do now is tracking the
> > > development, the PRs (I've added a few more or less valuable, but not
> so
> > > unique comments) and I'm subscribed to a few Jira issues.
> > >
> > > At this point I could use a gitter/IRC/slack channel for discussions -
> > with
> > > peers instead of core devs - and using mailing list + JIRA doesn't help
> > > either. They are simply cumbersome, hard to navigate/search, focus is
> > lost
> > > when somebody is not sure what's interesting. A simpler issue tracker
> (eg
> > > GitHub issues) and a super simple forum instead of mailing list would
> > lower
> > > the barriers. I don't think this is a priority as this setup certainly
> > > serves your current workflows.
> >
> > On this I will say: we used to have a Slack channel but it didn't work
> > well. Only a few core developers ever looked at it and because of the
> > general "Slackification" of open source a lot of people would join the
> > Slack channel looking for help and be unable to get it. People also
> > reported bugs in Slack and we would learn about them weeks after the
> > fact, or never. I think if we added a new official communications
> > channel for the project right now it would likely suffer the same
> > fate. If we had 10x as many core developers then there might be enough
> > core devs who are comfortable with the additional modality that it
> > might make sense. We still have lots of people reporting bugs on Stack
> > Overflow and very few core developers regularly look at the SO
> > questions.
> >
> > By contrast, we nearly unfailingly respond to people on the mailing
> > list and JIRA. So if people are looking for help they can certainly
> > get it there.
> >
> > > Keep up the good work, you are amazing! I can't wait a more complete
> > > DataFusion, group by and join for pyarrow and other dozen exciting
> > > opportunities and features.
> > >
> > > tl;dr you are great, not behind, local communities/meetups are a good
> > > opportunity (but covid...), I find Jira + mailing list hard to use
> > > (mentally, as not core dev)
> > >
> > > Best regards,
> > > Adam Lippai
> > >
> > >
> > >
> > > On Sat, Jun 20, 2020, 21:23 Wes McKinney <we...@gmail.com> wrote:
> > >
> > > > On Sat, Jun 20, 2020 at 1:52 PM Neal Richardson
> > > > <ne...@gmail.com> wrote:
> > > > >
> > > > > Hi Suvayu,
> > > > > Thanks for your feedback. I'm sorry to hear that you feel that you
> > > > haven't
> > > > > had the best experiences trying to contribute to the project. For
> > what
> > > > it's
> > > > > worth, I believe that raising concerns like this _is_ itself a
> > valuable
> > > > > contribution. So even if you haven't gotten to the point of having
> a
> > pull
> > > > > request merged, I don't think it's accurate to say that you've been
> > > > trying
> > > > > unsuccessfully to contribute--you're contributing right now.
> > > > >
> > > > > As it turns out, just the other day I opened a JIRA issue about
> > improving
> > > > > the contributor guide (
> > https://issues.apache.org/jira/browse/ARROW-9189
> > > > ),
> > > > > and I'll be taking that up next week as part of our 1.0 website
> > > > overhaul. I
> > > > > agree that we can do a better job in helping new contributors
> > > > participate,
> > > > > and that many of those forms of contribution need not require lots
> of
> > > > time
> > > > > from Arrow core developers. Wes's point about the limited bandwidth
> > to
> > > > > provide mentorship is valid; that said, I've seen many successful
> > cases
> > > > of
> > > > > first-time contributors getting the support they need. While
> there's
> > > > > certainly room for improvement, I'm optimistic that we're on the
> > right
> > > > > track.
> > > >
> > > > Yes — to be clear, the core developers in my experience (myself
> > > > included) are spending a lot of time responding to questions on JIRA,
> > > > clarifying issues with issue reporters, and offering advice about how
> > > > to proceed. Additionally, we spend a lot of time reviewing code and
> > > > helping people get their patches ready to be merged. There's no way
> we
> > > > would have 500+ contributors if we were not doing these things.
> > > >
> > > > As far as getting the help that's needed from core developers, the
> > > > thing that helps someone like me the most is to have the "request" be
> > > > as specific and direct as possible. In any given day I might look at
> > > > 50-100 different issues and so if it's not clear what I need to do I
> > > > will often move on to the next thing. Example direct requests:
> > > >
> > > > * Do you think $PROPOSED_APPROACH is the right one?
> > > > * In which file(s) should I be looking to make changes?
> > > > * Is there anything related in the codebase I can look at to learn?
> > > >
> > > > I'm sure we can put this advice in our contributor guide.
> > > >
> > > > If you ask these questions and do not get an answer, it is OK to ask
> > again.
> > > >
> > > > I see six JIRA issues from Suvayu in the project
> > > >
> > > > * https://issues.apache.org/jira/browse/ARROW-1956
> > > > * https://issues.apache.org/jira/browse/ARROW-3806
> > > > * https://issues.apache.org/jira/browse/ARROW-4930
> > > > * https://issues.apache.org/jira/browse/ARROW-3792
> > > > * https://issues.apache.org/jira/browse/ARROW-3874
> > > > * https://issues.apache.org/jira/browse/ARROW-6577
> > > >
> > > > There are comments in all cases and the issues were resolved in 4 out
> > > > of 6 cases. I see one example of you asking for guidance
> > > > (https://issues.apache.org/jira/browse/ARROW-1956) on December 29,
> > > > 2017 while I (and presumably others) were on vacation for the New
> > > > Year. In the future, it is OK to be more persistent.
> > > >
> > > > Thanks
> > > >
> > > > > Neal
> > > > >
> > > > >
> > > > > On Sat, Jun 20, 2020 at 11:25 AM Suvayu Ali <fatkasuvayu@gmail.com
> >
> > > > wrote:
> > > > >
> > > > > > Hi Wes, others,
> > > > > >
> > > > > > Thank you for taking the time to draft a long response.
> > > > > >
> > > > > > On Sat, Jun 20, 2020 at 3:57 PM Wes McKinney <
> wesmckinn@gmail.com>
> > > > wrote:
> > > > > > >
> > > > > > > From a purely factual view, the project is successfully
> > attracting
> > > > and
> > > > > > > supporting contributors. Over 500 different people have
> > contributed
> > > > to
> > > > > > > the project (more than the "420" printed on GitHub because many
> > > > people
> > > > > > > use e-mail addresses not associated with their GitHub user
> > names) and
> > > > > > > that number is increasing steadily over time.
> > > > > >
> > > > > > This response reinforces one of my points, all this branch name
> > change
> > > > > > business then has nothing to do with actually getting new
> > > > > > contributors.
> > > > > >
> > > > > > > We have invested greatly in providing systems to support
> > developers
> > > > of
> > > > > > > the project. We have a large and complex CI setup and nowadays
> it
> > > > > > > works pretty much like clockwork which is a huge change
> compared
> > with
> > > > > > > a year or two ago.
> > > > > >
> > > > > > Agreed, and I have learned a lot from it just by observing.
> > > > > >
> > > > > > > If you are looking for individualized "mentorship and guidance"
> > > > > > > _beyond_ pointers toward what part of the project you should be
> > > > > > > looking at to solve a problem, feedback on issues about whether
> > or
> > > > not
> > > > > > > something is deemed useful or high priority or not, and
> feedback
> > on
> > > > > > > your PRs whether you are on the right track or not, I think
> your
> > > > > > > expectations -- at this stage of the project -- may not be
> > > > reasonable.
> > > > > > > The number of regularly active developers in this project for
> the
> > > > > > > parts that you have looked at is actually quite small. So
> you're
> > > > > > > talking about some of the 10 people at the top of the GitHub
> > > > > > > contributor list. It would be different if we were talking
> about
> > an
> > > > > > > older project with an order of magnitude more regularly active
> > > > > > > developers.
> > > > > >
> > > > > > If pointers to you are: look at the serialisation code, then
> yes, I
> > > > > > was hoping for more along the lines of look at class XYZ in file
> > bla.
> > > > > > I completely understand if that's not possible.  That is why I
> > never
> > > > > > said anything before.  You may not remember, during the "whether
> to
> > > > > > support wheels" discussion, as I was impacted, I offered a
> > compromise
> > > > > > of releasing a reduced feature-set wheel with simpler
> dependencies,
> > > > > > which was rejected with this exact argument.  I did not counter,
> > > > > > because it is a very reasonable position to take, and I'm in no
> > > > > > position to "demand" anything.
> > > > > >
> > > > > > I only wrote today because I felt maybe now there is a
> willingness
> > for
> > > > > > newer, diverse contributors, because that's how this thread was
> > > > > > motivated.  So I stated the hurdles I have faced, and hoped
> > instead of
> > > > > > wasting scarce resources on superficial changes the community
> could
> > > > > > address actual hurdles for new contributors like me.  Obviously I
> > > > > > misunderstood.
> > > > > >
> > > > > > > The area where I think we could improve the most is developer
> > > > > > > documentation, which in a sense is "self-service guidance" in
> > > > > > > understanding the codebases. Antoine and others have taken
> > initiative
> > > > > > > on this but it often goes by the way side since the number of
> > people
> > > > > > > with requisite knowledge to write it is small (countable on
> > fingers
> > > > > > > and toes if you include all the programming languages) and very
> > short
> > > > > > > of free cycles.
> > > > > >
> > > > > > I'm guessing you mean the Sphinx docs?  Whatever I have managed
> to
> > use
> > > > > > Arrow for, it's thanks to those.  Maybe that is my cue, when
> > hitting a
> > > > > > dead-end, "I should ask which source file do I look in?"
> > > > > >
> > > > > > Anyway, I don't want to waste anyone's time anymore. I felt
> there's
> > > > > > room for feedback, I was wrong, and I withdraw from this
> > discussion.
> > > > > > I'll continue to lurk on the mailing list, and try to contribute
> > when
> > > > > > I can.
> > > > > >
> > > > > > Cheers and thanks for your time,
> > > > > >
> > > > > > --
> > > > > > Suvayu
> > > > > >
> > > > > > Open source is the future. It sets us free.
> > > > > >
> > > >
> >
>

Re: Helping new contributors get started [was Re: Renaming master branch, removing blacklist/whitelist]

Posted by Adam Lippai <ad...@rigo.sk>.
Undoubtedly, you always answer and that is amazing. Now all the help is
core/pro -> beginner, but a average <-> average or average-> beginner
cooperation would be nice. I understand it's not the time to introduce it
yet, we don't have the critical mass. I didn't think of SO before, but
indeed, it serves this purpose, it's a good forum for this.

Thanks for the detailed answer.

Best regards,
Adam Lippai


On Sat, Jun 20, 2020, 22:38 Wes McKinney <we...@gmail.com> wrote:

> On Sat, Jun 20, 2020 at 3:19 PM Adam Lippai <ad...@rigo.sk> wrote:
> >
> > I've seen better and worse examples before.
> > I was an active, beginner Drupal developer ~12 years ago. The Drupal
> > project community was very strong, particularly in Hungary where I live.
> > International and local IRC channels, international and local
> > forums+events, highly customized issue tracker and superb documentation.
> It
> > was more mature and bigger that time. On the other hand when I tried to
> > give back to Angular or React... Well... You are already ahead of them.
> > React eventually recognized the problem and they try to solve it, but a
> > large company's bureaucracy doesn't help that.
> >
> > My experience with Arrow is aligned with my expectations of a project of
> > this age or size (and in a few fields you are awesome!). Andy Grove,
> > xhochy, wesm, Joris were welcoming and responsive on Jira, Twitter and
> this
> > mailing list too. Ofc nobody worked for free on my ideas and I can't
> > develop C++ or Rust alone (yet). What I can do now is tracking the
> > development, the PRs (I've added a few more or less valuable, but not so
> > unique comments) and I'm subscribed to a few Jira issues.
> >
> > At this point I could use a gitter/IRC/slack channel for discussions -
> with
> > peers instead of core devs - and using mailing list + JIRA doesn't help
> > either. They are simply cumbersome, hard to navigate/search, focus is
> lost
> > when somebody is not sure what's interesting. A simpler issue tracker (eg
> > GitHub issues) and a super simple forum instead of mailing list would
> lower
> > the barriers. I don't think this is a priority as this setup certainly
> > serves your current workflows.
>
> On this I will say: we used to have a Slack channel but it didn't work
> well. Only a few core developers ever looked at it and because of the
> general "Slackification" of open source a lot of people would join the
> Slack channel looking for help and be unable to get it. People also
> reported bugs in Slack and we would learn about them weeks after the
> fact, or never. I think if we added a new official communications
> channel for the project right now it would likely suffer the same
> fate. If we had 10x as many core developers then there might be enough
> core devs who are comfortable with the additional modality that it
> might make sense. We still have lots of people reporting bugs on Stack
> Overflow and very few core developers regularly look at the SO
> questions.
>
> By contrast, we nearly unfailingly respond to people on the mailing
> list and JIRA. So if people are looking for help they can certainly
> get it there.
>
> > Keep up the good work, you are amazing! I can't wait a more complete
> > DataFusion, group by and join for pyarrow and other dozen exciting
> > opportunities and features.
> >
> > tl;dr you are great, not behind, local communities/meetups are a good
> > opportunity (but covid...), I find Jira + mailing list hard to use
> > (mentally, as not core dev)
> >
> > Best regards,
> > Adam Lippai
> >
> >
> >
> > On Sat, Jun 20, 2020, 21:23 Wes McKinney <we...@gmail.com> wrote:
> >
> > > On Sat, Jun 20, 2020 at 1:52 PM Neal Richardson
> > > <ne...@gmail.com> wrote:
> > > >
> > > > Hi Suvayu,
> > > > Thanks for your feedback. I'm sorry to hear that you feel that you
> > > haven't
> > > > had the best experiences trying to contribute to the project. For
> what
> > > it's
> > > > worth, I believe that raising concerns like this _is_ itself a
> valuable
> > > > contribution. So even if you haven't gotten to the point of having a
> pull
> > > > request merged, I don't think it's accurate to say that you've been
> > > trying
> > > > unsuccessfully to contribute--you're contributing right now.
> > > >
> > > > As it turns out, just the other day I opened a JIRA issue about
> improving
> > > > the contributor guide (
> https://issues.apache.org/jira/browse/ARROW-9189
> > > ),
> > > > and I'll be taking that up next week as part of our 1.0 website
> > > overhaul. I
> > > > agree that we can do a better job in helping new contributors
> > > participate,
> > > > and that many of those forms of contribution need not require lots of
> > > time
> > > > from Arrow core developers. Wes's point about the limited bandwidth
> to
> > > > provide mentorship is valid; that said, I've seen many successful
> cases
> > > of
> > > > first-time contributors getting the support they need. While there's
> > > > certainly room for improvement, I'm optimistic that we're on the
> right
> > > > track.
> > >
> > > Yes — to be clear, the core developers in my experience (myself
> > > included) are spending a lot of time responding to questions on JIRA,
> > > clarifying issues with issue reporters, and offering advice about how
> > > to proceed. Additionally, we spend a lot of time reviewing code and
> > > helping people get their patches ready to be merged. There's no way we
> > > would have 500+ contributors if we were not doing these things.
> > >
> > > As far as getting the help that's needed from core developers, the
> > > thing that helps someone like me the most is to have the "request" be
> > > as specific and direct as possible. In any given day I might look at
> > > 50-100 different issues and so if it's not clear what I need to do I
> > > will often move on to the next thing. Example direct requests:
> > >
> > > * Do you think $PROPOSED_APPROACH is the right one?
> > > * In which file(s) should I be looking to make changes?
> > > * Is there anything related in the codebase I can look at to learn?
> > >
> > > I'm sure we can put this advice in our contributor guide.
> > >
> > > If you ask these questions and do not get an answer, it is OK to ask
> again.
> > >
> > > I see six JIRA issues from Suvayu in the project
> > >
> > > * https://issues.apache.org/jira/browse/ARROW-1956
> > > * https://issues.apache.org/jira/browse/ARROW-3806
> > > * https://issues.apache.org/jira/browse/ARROW-4930
> > > * https://issues.apache.org/jira/browse/ARROW-3792
> > > * https://issues.apache.org/jira/browse/ARROW-3874
> > > * https://issues.apache.org/jira/browse/ARROW-6577
> > >
> > > There are comments in all cases and the issues were resolved in 4 out
> > > of 6 cases. I see one example of you asking for guidance
> > > (https://issues.apache.org/jira/browse/ARROW-1956) on December 29,
> > > 2017 while I (and presumably others) were on vacation for the New
> > > Year. In the future, it is OK to be more persistent.
> > >
> > > Thanks
> > >
> > > > Neal
> > > >
> > > >
> > > > On Sat, Jun 20, 2020 at 11:25 AM Suvayu Ali <fa...@gmail.com>
> > > wrote:
> > > >
> > > > > Hi Wes, others,
> > > > >
> > > > > Thank you for taking the time to draft a long response.
> > > > >
> > > > > On Sat, Jun 20, 2020 at 3:57 PM Wes McKinney <we...@gmail.com>
> > > wrote:
> > > > > >
> > > > > > From a purely factual view, the project is successfully
> attracting
> > > and
> > > > > > supporting contributors. Over 500 different people have
> contributed
> > > to
> > > > > > the project (more than the "420" printed on GitHub because many
> > > people
> > > > > > use e-mail addresses not associated with their GitHub user
> names) and
> > > > > > that number is increasing steadily over time.
> > > > >
> > > > > This response reinforces one of my points, all this branch name
> change
> > > > > business then has nothing to do with actually getting new
> > > > > contributors.
> > > > >
> > > > > > We have invested greatly in providing systems to support
> developers
> > > of
> > > > > > the project. We have a large and complex CI setup and nowadays it
> > > > > > works pretty much like clockwork which is a huge change compared
> with
> > > > > > a year or two ago.
> > > > >
> > > > > Agreed, and I have learned a lot from it just by observing.
> > > > >
> > > > > > If you are looking for individualized "mentorship and guidance"
> > > > > > _beyond_ pointers toward what part of the project you should be
> > > > > > looking at to solve a problem, feedback on issues about whether
> or
> > > not
> > > > > > something is deemed useful or high priority or not, and feedback
> on
> > > > > > your PRs whether you are on the right track or not, I think your
> > > > > > expectations -- at this stage of the project -- may not be
> > > reasonable.
> > > > > > The number of regularly active developers in this project for the
> > > > > > parts that you have looked at is actually quite small. So you're
> > > > > > talking about some of the 10 people at the top of the GitHub
> > > > > > contributor list. It would be different if we were talking about
> an
> > > > > > older project with an order of magnitude more regularly active
> > > > > > developers.
> > > > >
> > > > > If pointers to you are: look at the serialisation code, then yes, I
> > > > > was hoping for more along the lines of look at class XYZ in file
> bla.
> > > > > I completely understand if that's not possible.  That is why I
> never
> > > > > said anything before.  You may not remember, during the "whether to
> > > > > support wheels" discussion, as I was impacted, I offered a
> compromise
> > > > > of releasing a reduced feature-set wheel with simpler dependencies,
> > > > > which was rejected with this exact argument.  I did not counter,
> > > > > because it is a very reasonable position to take, and I'm in no
> > > > > position to "demand" anything.
> > > > >
> > > > > I only wrote today because I felt maybe now there is a willingness
> for
> > > > > newer, diverse contributors, because that's how this thread was
> > > > > motivated.  So I stated the hurdles I have faced, and hoped
> instead of
> > > > > wasting scarce resources on superficial changes the community could
> > > > > address actual hurdles for new contributors like me.  Obviously I
> > > > > misunderstood.
> > > > >
> > > > > > The area where I think we could improve the most is developer
> > > > > > documentation, which in a sense is "self-service guidance" in
> > > > > > understanding the codebases. Antoine and others have taken
> initiative
> > > > > > on this but it often goes by the way side since the number of
> people
> > > > > > with requisite knowledge to write it is small (countable on
> fingers
> > > > > > and toes if you include all the programming languages) and very
> short
> > > > > > of free cycles.
> > > > >
> > > > > I'm guessing you mean the Sphinx docs?  Whatever I have managed to
> use
> > > > > Arrow for, it's thanks to those.  Maybe that is my cue, when
> hitting a
> > > > > dead-end, "I should ask which source file do I look in?"
> > > > >
> > > > > Anyway, I don't want to waste anyone's time anymore. I felt there's
> > > > > room for feedback, I was wrong, and I withdraw from this
> discussion.
> > > > > I'll continue to lurk on the mailing list, and try to contribute
> when
> > > > > I can.
> > > > >
> > > > > Cheers and thanks for your time,
> > > > >
> > > > > --
> > > > > Suvayu
> > > > >
> > > > > Open source is the future. It sets us free.
> > > > >
> > >
>

Re: Helping new contributors get started [was Re: Renaming master branch, removing blacklist/whitelist]

Posted by Wes McKinney <we...@gmail.com>.
On Sat, Jun 20, 2020 at 3:19 PM Adam Lippai <ad...@rigo.sk> wrote:
>
> I've seen better and worse examples before.
> I was an active, beginner Drupal developer ~12 years ago. The Drupal
> project community was very strong, particularly in Hungary where I live.
> International and local IRC channels, international and local
> forums+events, highly customized issue tracker and superb documentation. It
> was more mature and bigger that time. On the other hand when I tried to
> give back to Angular or React... Well... You are already ahead of them.
> React eventually recognized the problem and they try to solve it, but a
> large company's bureaucracy doesn't help that.
>
> My experience with Arrow is aligned with my expectations of a project of
> this age or size (and in a few fields you are awesome!). Andy Grove,
> xhochy, wesm, Joris were welcoming and responsive on Jira, Twitter and this
> mailing list too. Ofc nobody worked for free on my ideas and I can't
> develop C++ or Rust alone (yet). What I can do now is tracking the
> development, the PRs (I've added a few more or less valuable, but not so
> unique comments) and I'm subscribed to a few Jira issues.
>
> At this point I could use a gitter/IRC/slack channel for discussions - with
> peers instead of core devs - and using mailing list + JIRA doesn't help
> either. They are simply cumbersome, hard to navigate/search, focus is lost
> when somebody is not sure what's interesting. A simpler issue tracker (eg
> GitHub issues) and a super simple forum instead of mailing list would lower
> the barriers. I don't think this is a priority as this setup certainly
> serves your current workflows.

On this I will say: we used to have a Slack channel but it didn't work
well. Only a few core developers ever looked at it and because of the
general "Slackification" of open source a lot of people would join the
Slack channel looking for help and be unable to get it. People also
reported bugs in Slack and we would learn about them weeks after the
fact, or never. I think if we added a new official communications
channel for the project right now it would likely suffer the same
fate. If we had 10x as many core developers then there might be enough
core devs who are comfortable with the additional modality that it
might make sense. We still have lots of people reporting bugs on Stack
Overflow and very few core developers regularly look at the SO
questions.

By contrast, we nearly unfailingly respond to people on the mailing
list and JIRA. So if people are looking for help they can certainly
get it there.

> Keep up the good work, you are amazing! I can't wait a more complete
> DataFusion, group by and join for pyarrow and other dozen exciting
> opportunities and features.
>
> tl;dr you are great, not behind, local communities/meetups are a good
> opportunity (but covid...), I find Jira + mailing list hard to use
> (mentally, as not core dev)
>
> Best regards,
> Adam Lippai
>
>
>
> On Sat, Jun 20, 2020, 21:23 Wes McKinney <we...@gmail.com> wrote:
>
> > On Sat, Jun 20, 2020 at 1:52 PM Neal Richardson
> > <ne...@gmail.com> wrote:
> > >
> > > Hi Suvayu,
> > > Thanks for your feedback. I'm sorry to hear that you feel that you
> > haven't
> > > had the best experiences trying to contribute to the project. For what
> > it's
> > > worth, I believe that raising concerns like this _is_ itself a valuable
> > > contribution. So even if you haven't gotten to the point of having a pull
> > > request merged, I don't think it's accurate to say that you've been
> > trying
> > > unsuccessfully to contribute--you're contributing right now.
> > >
> > > As it turns out, just the other day I opened a JIRA issue about improving
> > > the contributor guide (https://issues.apache.org/jira/browse/ARROW-9189
> > ),
> > > and I'll be taking that up next week as part of our 1.0 website
> > overhaul. I
> > > agree that we can do a better job in helping new contributors
> > participate,
> > > and that many of those forms of contribution need not require lots of
> > time
> > > from Arrow core developers. Wes's point about the limited bandwidth to
> > > provide mentorship is valid; that said, I've seen many successful cases
> > of
> > > first-time contributors getting the support they need. While there's
> > > certainly room for improvement, I'm optimistic that we're on the right
> > > track.
> >
> > Yes — to be clear, the core developers in my experience (myself
> > included) are spending a lot of time responding to questions on JIRA,
> > clarifying issues with issue reporters, and offering advice about how
> > to proceed. Additionally, we spend a lot of time reviewing code and
> > helping people get their patches ready to be merged. There's no way we
> > would have 500+ contributors if we were not doing these things.
> >
> > As far as getting the help that's needed from core developers, the
> > thing that helps someone like me the most is to have the "request" be
> > as specific and direct as possible. In any given day I might look at
> > 50-100 different issues and so if it's not clear what I need to do I
> > will often move on to the next thing. Example direct requests:
> >
> > * Do you think $PROPOSED_APPROACH is the right one?
> > * In which file(s) should I be looking to make changes?
> > * Is there anything related in the codebase I can look at to learn?
> >
> > I'm sure we can put this advice in our contributor guide.
> >
> > If you ask these questions and do not get an answer, it is OK to ask again.
> >
> > I see six JIRA issues from Suvayu in the project
> >
> > * https://issues.apache.org/jira/browse/ARROW-1956
> > * https://issues.apache.org/jira/browse/ARROW-3806
> > * https://issues.apache.org/jira/browse/ARROW-4930
> > * https://issues.apache.org/jira/browse/ARROW-3792
> > * https://issues.apache.org/jira/browse/ARROW-3874
> > * https://issues.apache.org/jira/browse/ARROW-6577
> >
> > There are comments in all cases and the issues were resolved in 4 out
> > of 6 cases. I see one example of you asking for guidance
> > (https://issues.apache.org/jira/browse/ARROW-1956) on December 29,
> > 2017 while I (and presumably others) were on vacation for the New
> > Year. In the future, it is OK to be more persistent.
> >
> > Thanks
> >
> > > Neal
> > >
> > >
> > > On Sat, Jun 20, 2020 at 11:25 AM Suvayu Ali <fa...@gmail.com>
> > wrote:
> > >
> > > > Hi Wes, others,
> > > >
> > > > Thank you for taking the time to draft a long response.
> > > >
> > > > On Sat, Jun 20, 2020 at 3:57 PM Wes McKinney <we...@gmail.com>
> > wrote:
> > > > >
> > > > > From a purely factual view, the project is successfully attracting
> > and
> > > > > supporting contributors. Over 500 different people have contributed
> > to
> > > > > the project (more than the "420" printed on GitHub because many
> > people
> > > > > use e-mail addresses not associated with their GitHub user names) and
> > > > > that number is increasing steadily over time.
> > > >
> > > > This response reinforces one of my points, all this branch name change
> > > > business then has nothing to do with actually getting new
> > > > contributors.
> > > >
> > > > > We have invested greatly in providing systems to support developers
> > of
> > > > > the project. We have a large and complex CI setup and nowadays it
> > > > > works pretty much like clockwork which is a huge change compared with
> > > > > a year or two ago.
> > > >
> > > > Agreed, and I have learned a lot from it just by observing.
> > > >
> > > > > If you are looking for individualized "mentorship and guidance"
> > > > > _beyond_ pointers toward what part of the project you should be
> > > > > looking at to solve a problem, feedback on issues about whether or
> > not
> > > > > something is deemed useful or high priority or not, and feedback on
> > > > > your PRs whether you are on the right track or not, I think your
> > > > > expectations -- at this stage of the project -- may not be
> > reasonable.
> > > > > The number of regularly active developers in this project for the
> > > > > parts that you have looked at is actually quite small. So you're
> > > > > talking about some of the 10 people at the top of the GitHub
> > > > > contributor list. It would be different if we were talking about an
> > > > > older project with an order of magnitude more regularly active
> > > > > developers.
> > > >
> > > > If pointers to you are: look at the serialisation code, then yes, I
> > > > was hoping for more along the lines of look at class XYZ in file bla.
> > > > I completely understand if that's not possible.  That is why I never
> > > > said anything before.  You may not remember, during the "whether to
> > > > support wheels" discussion, as I was impacted, I offered a compromise
> > > > of releasing a reduced feature-set wheel with simpler dependencies,
> > > > which was rejected with this exact argument.  I did not counter,
> > > > because it is a very reasonable position to take, and I'm in no
> > > > position to "demand" anything.
> > > >
> > > > I only wrote today because I felt maybe now there is a willingness for
> > > > newer, diverse contributors, because that's how this thread was
> > > > motivated.  So I stated the hurdles I have faced, and hoped instead of
> > > > wasting scarce resources on superficial changes the community could
> > > > address actual hurdles for new contributors like me.  Obviously I
> > > > misunderstood.
> > > >
> > > > > The area where I think we could improve the most is developer
> > > > > documentation, which in a sense is "self-service guidance" in
> > > > > understanding the codebases. Antoine and others have taken initiative
> > > > > on this but it often goes by the way side since the number of people
> > > > > with requisite knowledge to write it is small (countable on fingers
> > > > > and toes if you include all the programming languages) and very short
> > > > > of free cycles.
> > > >
> > > > I'm guessing you mean the Sphinx docs?  Whatever I have managed to use
> > > > Arrow for, it's thanks to those.  Maybe that is my cue, when hitting a
> > > > dead-end, "I should ask which source file do I look in?"
> > > >
> > > > Anyway, I don't want to waste anyone's time anymore. I felt there's
> > > > room for feedback, I was wrong, and I withdraw from this discussion.
> > > > I'll continue to lurk on the mailing list, and try to contribute when
> > > > I can.
> > > >
> > > > Cheers and thanks for your time,
> > > >
> > > > --
> > > > Suvayu
> > > >
> > > > Open source is the future. It sets us free.
> > > >
> >

Re: Helping new contributors get started [was Re: Renaming master branch, removing blacklist/whitelist]

Posted by Adam Lippai <ad...@rigo.sk>.
I've seen better and worse examples before.
I was an active, beginner Drupal developer ~12 years ago. The Drupal
project community was very strong, particularly in Hungary where I live.
International and local IRC channels, international and local
forums+events, highly customized issue tracker and superb documentation. It
was more mature and bigger that time. On the other hand when I tried to
give back to Angular or React... Well... You are already ahead of them.
React eventually recognized the problem and they try to solve it, but a
large company's bureaucracy doesn't help that.

My experience with Arrow is aligned with my expectations of a project of
this age or size (and in a few fields you are awesome!). Andy Grove,
xhochy, wesm, Joris were welcoming and responsive on Jira, Twitter and this
mailing list too. Ofc nobody worked for free on my ideas and I can't
develop C++ or Rust alone (yet). What I can do now is tracking the
development, the PRs (I've added a few more or less valuable, but not so
unique comments) and I'm subscribed to a few Jira issues.

At this point I could use a gitter/IRC/slack channel for discussions - with
peers instead of core devs - and using mailing list + JIRA doesn't help
either. They are simply cumbersome, hard to navigate/search, focus is lost
when somebody is not sure what's interesting. A simpler issue tracker (eg
GitHub issues) and a super simple forum instead of mailing list would lower
the barriers. I don't think this is a priority as this setup certainly
serves your current workflows.

Keep up the good work, you are amazing! I can't wait a more complete
DataFusion, group by and join for pyarrow and other dozen exciting
opportunities and features.

tl;dr you are great, not behind, local communities/meetups are a good
opportunity (but covid...), I find Jira + mailing list hard to use
(mentally, as not core dev)

Best regards,
Adam Lippai



On Sat, Jun 20, 2020, 21:23 Wes McKinney <we...@gmail.com> wrote:

> On Sat, Jun 20, 2020 at 1:52 PM Neal Richardson
> <ne...@gmail.com> wrote:
> >
> > Hi Suvayu,
> > Thanks for your feedback. I'm sorry to hear that you feel that you
> haven't
> > had the best experiences trying to contribute to the project. For what
> it's
> > worth, I believe that raising concerns like this _is_ itself a valuable
> > contribution. So even if you haven't gotten to the point of having a pull
> > request merged, I don't think it's accurate to say that you've been
> trying
> > unsuccessfully to contribute--you're contributing right now.
> >
> > As it turns out, just the other day I opened a JIRA issue about improving
> > the contributor guide (https://issues.apache.org/jira/browse/ARROW-9189
> ),
> > and I'll be taking that up next week as part of our 1.0 website
> overhaul. I
> > agree that we can do a better job in helping new contributors
> participate,
> > and that many of those forms of contribution need not require lots of
> time
> > from Arrow core developers. Wes's point about the limited bandwidth to
> > provide mentorship is valid; that said, I've seen many successful cases
> of
> > first-time contributors getting the support they need. While there's
> > certainly room for improvement, I'm optimistic that we're on the right
> > track.
>
> Yes — to be clear, the core developers in my experience (myself
> included) are spending a lot of time responding to questions on JIRA,
> clarifying issues with issue reporters, and offering advice about how
> to proceed. Additionally, we spend a lot of time reviewing code and
> helping people get their patches ready to be merged. There's no way we
> would have 500+ contributors if we were not doing these things.
>
> As far as getting the help that's needed from core developers, the
> thing that helps someone like me the most is to have the "request" be
> as specific and direct as possible. In any given day I might look at
> 50-100 different issues and so if it's not clear what I need to do I
> will often move on to the next thing. Example direct requests:
>
> * Do you think $PROPOSED_APPROACH is the right one?
> * In which file(s) should I be looking to make changes?
> * Is there anything related in the codebase I can look at to learn?
>
> I'm sure we can put this advice in our contributor guide.
>
> If you ask these questions and do not get an answer, it is OK to ask again.
>
> I see six JIRA issues from Suvayu in the project
>
> * https://issues.apache.org/jira/browse/ARROW-1956
> * https://issues.apache.org/jira/browse/ARROW-3806
> * https://issues.apache.org/jira/browse/ARROW-4930
> * https://issues.apache.org/jira/browse/ARROW-3792
> * https://issues.apache.org/jira/browse/ARROW-3874
> * https://issues.apache.org/jira/browse/ARROW-6577
>
> There are comments in all cases and the issues were resolved in 4 out
> of 6 cases. I see one example of you asking for guidance
> (https://issues.apache.org/jira/browse/ARROW-1956) on December 29,
> 2017 while I (and presumably others) were on vacation for the New
> Year. In the future, it is OK to be more persistent.
>
> Thanks
>
> > Neal
> >
> >
> > On Sat, Jun 20, 2020 at 11:25 AM Suvayu Ali <fa...@gmail.com>
> wrote:
> >
> > > Hi Wes, others,
> > >
> > > Thank you for taking the time to draft a long response.
> > >
> > > On Sat, Jun 20, 2020 at 3:57 PM Wes McKinney <we...@gmail.com>
> wrote:
> > > >
> > > > From a purely factual view, the project is successfully attracting
> and
> > > > supporting contributors. Over 500 different people have contributed
> to
> > > > the project (more than the "420" printed on GitHub because many
> people
> > > > use e-mail addresses not associated with their GitHub user names) and
> > > > that number is increasing steadily over time.
> > >
> > > This response reinforces one of my points, all this branch name change
> > > business then has nothing to do with actually getting new
> > > contributors.
> > >
> > > > We have invested greatly in providing systems to support developers
> of
> > > > the project. We have a large and complex CI setup and nowadays it
> > > > works pretty much like clockwork which is a huge change compared with
> > > > a year or two ago.
> > >
> > > Agreed, and I have learned a lot from it just by observing.
> > >
> > > > If you are looking for individualized "mentorship and guidance"
> > > > _beyond_ pointers toward what part of the project you should be
> > > > looking at to solve a problem, feedback on issues about whether or
> not
> > > > something is deemed useful or high priority or not, and feedback on
> > > > your PRs whether you are on the right track or not, I think your
> > > > expectations -- at this stage of the project -- may not be
> reasonable.
> > > > The number of regularly active developers in this project for the
> > > > parts that you have looked at is actually quite small. So you're
> > > > talking about some of the 10 people at the top of the GitHub
> > > > contributor list. It would be different if we were talking about an
> > > > older project with an order of magnitude more regularly active
> > > > developers.
> > >
> > > If pointers to you are: look at the serialisation code, then yes, I
> > > was hoping for more along the lines of look at class XYZ in file bla.
> > > I completely understand if that's not possible.  That is why I never
> > > said anything before.  You may not remember, during the "whether to
> > > support wheels" discussion, as I was impacted, I offered a compromise
> > > of releasing a reduced feature-set wheel with simpler dependencies,
> > > which was rejected with this exact argument.  I did not counter,
> > > because it is a very reasonable position to take, and I'm in no
> > > position to "demand" anything.
> > >
> > > I only wrote today because I felt maybe now there is a willingness for
> > > newer, diverse contributors, because that's how this thread was
> > > motivated.  So I stated the hurdles I have faced, and hoped instead of
> > > wasting scarce resources on superficial changes the community could
> > > address actual hurdles for new contributors like me.  Obviously I
> > > misunderstood.
> > >
> > > > The area where I think we could improve the most is developer
> > > > documentation, which in a sense is "self-service guidance" in
> > > > understanding the codebases. Antoine and others have taken initiative
> > > > on this but it often goes by the way side since the number of people
> > > > with requisite knowledge to write it is small (countable on fingers
> > > > and toes if you include all the programming languages) and very short
> > > > of free cycles.
> > >
> > > I'm guessing you mean the Sphinx docs?  Whatever I have managed to use
> > > Arrow for, it's thanks to those.  Maybe that is my cue, when hitting a
> > > dead-end, "I should ask which source file do I look in?"
> > >
> > > Anyway, I don't want to waste anyone's time anymore. I felt there's
> > > room for feedback, I was wrong, and I withdraw from this discussion.
> > > I'll continue to lurk on the mailing list, and try to contribute when
> > > I can.
> > >
> > > Cheers and thanks for your time,
> > >
> > > --
> > > Suvayu
> > >
> > > Open source is the future. It sets us free.
> > >
>

Re: Helping new contributors get started [was Re: Renaming master branch, removing blacklist/whitelist]

Posted by Wes McKinney <we...@gmail.com>.
On Sat, Jun 20, 2020 at 1:52 PM Neal Richardson
<ne...@gmail.com> wrote:
>
> Hi Suvayu,
> Thanks for your feedback. I'm sorry to hear that you feel that you haven't
> had the best experiences trying to contribute to the project. For what it's
> worth, I believe that raising concerns like this _is_ itself a valuable
> contribution. So even if you haven't gotten to the point of having a pull
> request merged, I don't think it's accurate to say that you've been trying
> unsuccessfully to contribute--you're contributing right now.
>
> As it turns out, just the other day I opened a JIRA issue about improving
> the contributor guide (https://issues.apache.org/jira/browse/ARROW-9189),
> and I'll be taking that up next week as part of our 1.0 website overhaul. I
> agree that we can do a better job in helping new contributors participate,
> and that many of those forms of contribution need not require lots of time
> from Arrow core developers. Wes's point about the limited bandwidth to
> provide mentorship is valid; that said, I've seen many successful cases of
> first-time contributors getting the support they need. While there's
> certainly room for improvement, I'm optimistic that we're on the right
> track.

Yes — to be clear, the core developers in my experience (myself
included) are spending a lot of time responding to questions on JIRA,
clarifying issues with issue reporters, and offering advice about how
to proceed. Additionally, we spend a lot of time reviewing code and
helping people get their patches ready to be merged. There's no way we
would have 500+ contributors if we were not doing these things.

As far as getting the help that's needed from core developers, the
thing that helps someone like me the most is to have the "request" be
as specific and direct as possible. In any given day I might look at
50-100 different issues and so if it's not clear what I need to do I
will often move on to the next thing. Example direct requests:

* Do you think $PROPOSED_APPROACH is the right one?
* In which file(s) should I be looking to make changes?
* Is there anything related in the codebase I can look at to learn?

I'm sure we can put this advice in our contributor guide.

If you ask these questions and do not get an answer, it is OK to ask again.

I see six JIRA issues from Suvayu in the project

* https://issues.apache.org/jira/browse/ARROW-1956
* https://issues.apache.org/jira/browse/ARROW-3806
* https://issues.apache.org/jira/browse/ARROW-4930
* https://issues.apache.org/jira/browse/ARROW-3792
* https://issues.apache.org/jira/browse/ARROW-3874
* https://issues.apache.org/jira/browse/ARROW-6577

There are comments in all cases and the issues were resolved in 4 out
of 6 cases. I see one example of you asking for guidance
(https://issues.apache.org/jira/browse/ARROW-1956) on December 29,
2017 while I (and presumably others) were on vacation for the New
Year. In the future, it is OK to be more persistent.

Thanks

> Neal
>
>
> On Sat, Jun 20, 2020 at 11:25 AM Suvayu Ali <fa...@gmail.com> wrote:
>
> > Hi Wes, others,
> >
> > Thank you for taking the time to draft a long response.
> >
> > On Sat, Jun 20, 2020 at 3:57 PM Wes McKinney <we...@gmail.com> wrote:
> > >
> > > From a purely factual view, the project is successfully attracting and
> > > supporting contributors. Over 500 different people have contributed to
> > > the project (more than the "420" printed on GitHub because many people
> > > use e-mail addresses not associated with their GitHub user names) and
> > > that number is increasing steadily over time.
> >
> > This response reinforces one of my points, all this branch name change
> > business then has nothing to do with actually getting new
> > contributors.
> >
> > > We have invested greatly in providing systems to support developers of
> > > the project. We have a large and complex CI setup and nowadays it
> > > works pretty much like clockwork which is a huge change compared with
> > > a year or two ago.
> >
> > Agreed, and I have learned a lot from it just by observing.
> >
> > > If you are looking for individualized "mentorship and guidance"
> > > _beyond_ pointers toward what part of the project you should be
> > > looking at to solve a problem, feedback on issues about whether or not
> > > something is deemed useful or high priority or not, and feedback on
> > > your PRs whether you are on the right track or not, I think your
> > > expectations -- at this stage of the project -- may not be reasonable.
> > > The number of regularly active developers in this project for the
> > > parts that you have looked at is actually quite small. So you're
> > > talking about some of the 10 people at the top of the GitHub
> > > contributor list. It would be different if we were talking about an
> > > older project with an order of magnitude more regularly active
> > > developers.
> >
> > If pointers to you are: look at the serialisation code, then yes, I
> > was hoping for more along the lines of look at class XYZ in file bla.
> > I completely understand if that's not possible.  That is why I never
> > said anything before.  You may not remember, during the "whether to
> > support wheels" discussion, as I was impacted, I offered a compromise
> > of releasing a reduced feature-set wheel with simpler dependencies,
> > which was rejected with this exact argument.  I did not counter,
> > because it is a very reasonable position to take, and I'm in no
> > position to "demand" anything.
> >
> > I only wrote today because I felt maybe now there is a willingness for
> > newer, diverse contributors, because that's how this thread was
> > motivated.  So I stated the hurdles I have faced, and hoped instead of
> > wasting scarce resources on superficial changes the community could
> > address actual hurdles for new contributors like me.  Obviously I
> > misunderstood.
> >
> > > The area where I think we could improve the most is developer
> > > documentation, which in a sense is "self-service guidance" in
> > > understanding the codebases. Antoine and others have taken initiative
> > > on this but it often goes by the way side since the number of people
> > > with requisite knowledge to write it is small (countable on fingers
> > > and toes if you include all the programming languages) and very short
> > > of free cycles.
> >
> > I'm guessing you mean the Sphinx docs?  Whatever I have managed to use
> > Arrow for, it's thanks to those.  Maybe that is my cue, when hitting a
> > dead-end, "I should ask which source file do I look in?"
> >
> > Anyway, I don't want to waste anyone's time anymore. I felt there's
> > room for feedback, I was wrong, and I withdraw from this discussion.
> > I'll continue to lurk on the mailing list, and try to contribute when
> > I can.
> >
> > Cheers and thanks for your time,
> >
> > --
> > Suvayu
> >
> > Open source is the future. It sets us free.
> >

Re: Helping new contributors get started [was Re: Renaming master branch, removing blacklist/whitelist]

Posted by Neal Richardson <ne...@gmail.com>.
Hi Suvayu,
Thanks for your feedback. I'm sorry to hear that you feel that you haven't
had the best experiences trying to contribute to the project. For what it's
worth, I believe that raising concerns like this _is_ itself a valuable
contribution. So even if you haven't gotten to the point of having a pull
request merged, I don't think it's accurate to say that you've been trying
unsuccessfully to contribute--you're contributing right now.

As it turns out, just the other day I opened a JIRA issue about improving
the contributor guide (https://issues.apache.org/jira/browse/ARROW-9189),
and I'll be taking that up next week as part of our 1.0 website overhaul. I
agree that we can do a better job in helping new contributors participate,
and that many of those forms of contribution need not require lots of time
from Arrow core developers. Wes's point about the limited bandwidth to
provide mentorship is valid; that said, I've seen many successful cases of
first-time contributors getting the support they need. While there's
certainly room for improvement, I'm optimistic that we're on the right
track.

Neal


On Sat, Jun 20, 2020 at 11:25 AM Suvayu Ali <fa...@gmail.com> wrote:

> Hi Wes, others,
>
> Thank you for taking the time to draft a long response.
>
> On Sat, Jun 20, 2020 at 3:57 PM Wes McKinney <we...@gmail.com> wrote:
> >
> > From a purely factual view, the project is successfully attracting and
> > supporting contributors. Over 500 different people have contributed to
> > the project (more than the "420" printed on GitHub because many people
> > use e-mail addresses not associated with their GitHub user names) and
> > that number is increasing steadily over time.
>
> This response reinforces one of my points, all this branch name change
> business then has nothing to do with actually getting new
> contributors.
>
> > We have invested greatly in providing systems to support developers of
> > the project. We have a large and complex CI setup and nowadays it
> > works pretty much like clockwork which is a huge change compared with
> > a year or two ago.
>
> Agreed, and I have learned a lot from it just by observing.
>
> > If you are looking for individualized "mentorship and guidance"
> > _beyond_ pointers toward what part of the project you should be
> > looking at to solve a problem, feedback on issues about whether or not
> > something is deemed useful or high priority or not, and feedback on
> > your PRs whether you are on the right track or not, I think your
> > expectations -- at this stage of the project -- may not be reasonable.
> > The number of regularly active developers in this project for the
> > parts that you have looked at is actually quite small. So you're
> > talking about some of the 10 people at the top of the GitHub
> > contributor list. It would be different if we were talking about an
> > older project with an order of magnitude more regularly active
> > developers.
>
> If pointers to you are: look at the serialisation code, then yes, I
> was hoping for more along the lines of look at class XYZ in file bla.
> I completely understand if that's not possible.  That is why I never
> said anything before.  You may not remember, during the "whether to
> support wheels" discussion, as I was impacted, I offered a compromise
> of releasing a reduced feature-set wheel with simpler dependencies,
> which was rejected with this exact argument.  I did not counter,
> because it is a very reasonable position to take, and I'm in no
> position to "demand" anything.
>
> I only wrote today because I felt maybe now there is a willingness for
> newer, diverse contributors, because that's how this thread was
> motivated.  So I stated the hurdles I have faced, and hoped instead of
> wasting scarce resources on superficial changes the community could
> address actual hurdles for new contributors like me.  Obviously I
> misunderstood.
>
> > The area where I think we could improve the most is developer
> > documentation, which in a sense is "self-service guidance" in
> > understanding the codebases. Antoine and others have taken initiative
> > on this but it often goes by the way side since the number of people
> > with requisite knowledge to write it is small (countable on fingers
> > and toes if you include all the programming languages) and very short
> > of free cycles.
>
> I'm guessing you mean the Sphinx docs?  Whatever I have managed to use
> Arrow for, it's thanks to those.  Maybe that is my cue, when hitting a
> dead-end, "I should ask which source file do I look in?"
>
> Anyway, I don't want to waste anyone's time anymore. I felt there's
> room for feedback, I was wrong, and I withdraw from this discussion.
> I'll continue to lurk on the mailing list, and try to contribute when
> I can.
>
> Cheers and thanks for your time,
>
> --
> Suvayu
>
> Open source is the future. It sets us free.
>

Re: Helping new contributors get started [was Re: Renaming master branch, removing blacklist/whitelist]

Posted by Suvayu Ali <fa...@gmail.com>.
Hi Wes, others,

Thank you for taking the time to draft a long response.

On Sat, Jun 20, 2020 at 3:57 PM Wes McKinney <we...@gmail.com> wrote:
>
> From a purely factual view, the project is successfully attracting and
> supporting contributors. Over 500 different people have contributed to
> the project (more than the "420" printed on GitHub because many people
> use e-mail addresses not associated with their GitHub user names) and
> that number is increasing steadily over time.

This response reinforces one of my points, all this branch name change
business then has nothing to do with actually getting new
contributors.

> We have invested greatly in providing systems to support developers of
> the project. We have a large and complex CI setup and nowadays it
> works pretty much like clockwork which is a huge change compared with
> a year or two ago.

Agreed, and I have learned a lot from it just by observing.

> If you are looking for individualized "mentorship and guidance"
> _beyond_ pointers toward what part of the project you should be
> looking at to solve a problem, feedback on issues about whether or not
> something is deemed useful or high priority or not, and feedback on
> your PRs whether you are on the right track or not, I think your
> expectations -- at this stage of the project -- may not be reasonable.
> The number of regularly active developers in this project for the
> parts that you have looked at is actually quite small. So you're
> talking about some of the 10 people at the top of the GitHub
> contributor list. It would be different if we were talking about an
> older project with an order of magnitude more regularly active
> developers.

If pointers to you are: look at the serialisation code, then yes, I
was hoping for more along the lines of look at class XYZ in file bla.
I completely understand if that's not possible.  That is why I never
said anything before.  You may not remember, during the "whether to
support wheels" discussion, as I was impacted, I offered a compromise
of releasing a reduced feature-set wheel with simpler dependencies,
which was rejected with this exact argument.  I did not counter,
because it is a very reasonable position to take, and I'm in no
position to "demand" anything.

I only wrote today because I felt maybe now there is a willingness for
newer, diverse contributors, because that's how this thread was
motivated.  So I stated the hurdles I have faced, and hoped instead of
wasting scarce resources on superficial changes the community could
address actual hurdles for new contributors like me.  Obviously I
misunderstood.

> The area where I think we could improve the most is developer
> documentation, which in a sense is "self-service guidance" in
> understanding the codebases. Antoine and others have taken initiative
> on this but it often goes by the way side since the number of people
> with requisite knowledge to write it is small (countable on fingers
> and toes if you include all the programming languages) and very short
> of free cycles.

I'm guessing you mean the Sphinx docs?  Whatever I have managed to use
Arrow for, it's thanks to those.  Maybe that is my cue, when hitting a
dead-end, "I should ask which source file do I look in?"

Anyway, I don't want to waste anyone's time anymore. I felt there's
room for feedback, I was wrong, and I withdraw from this discussion.
I'll continue to lurk on the mailing list, and try to contribute when
I can.

Cheers and thanks for your time,

-- 
Suvayu

Open source is the future. It sets us free.

Helping new contributors get started [was Re: Renaming master branch, removing blacklist/whitelist]

Posted by Wes McKinney <we...@gmail.com>.
hi Suvayu,

Changing the subject so we can have a discussion about this
separately. It sounds to me a bit like you may be airing grievances
but I will offer my opinion and we can see what other people think.

From a purely factual view, the project is successfully attracting and
supporting contributors. Over 500 different people have contributed to
the project (more than the "420" printed on GitHub because many people
use e-mail addresses not associated with their GitHub user names) and
that number is increasing steadily over time.

We have invested greatly in providing systems to support developers of
the project. We have a large and complex CI setup and nowadays it
works pretty much like clockwork which is a huge change compared with
a year or two ago.

In general, I will say: if you wish for significant volunteer
mentorship especially in an early stage open source project you are
likely going to be disappointed. I associate these patterns with later
stage projects (think: the Python programming language or the Linux
kernel). I personally do not have the time -- I direct a team of
people working on the project working towards specific development
goals, and we are in turn accountable to the people who are sponsoring
our work. In addition, I do a large amount of individual
contributions.

If you are looking for individualized "mentorship and guidance"
_beyond_ pointers toward what part of the project you should be
looking at to solve a problem, feedback on issues about whether or not
something is deemed useful or high priority or not, and feedback on
your PRs whether you are on the right track or not, I think your
expectations -- at this stage of the project -- may not be reasonable.
The number of regularly active developers in this project for the
parts that you have looked at is actually quite small. So you're
talking about some of the 10 people at the top of the GitHub
contributor list. It would be different if we were talking about an
older project with an order of magnitude more regularly active
developers.

The area where I think we could improve the most is developer
documentation, which in a sense is "self-service guidance" in
understanding the codebases. Antoine and others have taken initiative
on this but it often goes by the way side since the number of people
with requisite knowledge to write it is small (countable on fingers
and toes if you include all the programming languages) and very short
of free cycles.

Thanks,
Wes

On Fri, Jun 19, 2020 at 9:44 PM Suvayu Ali <fa...@gmail.com> wrote:
>
> Hi all,
>
> (sorry if this is a duplicate post, I always have trouble posting to this list)
>
> On Fri, Jun 19, 2020 at 5:54 PM Todd Hendricks <he...@gmail.com> wrote:
> >
> > I'm a black data scientist. For whatever it's worth, I have never taken
> > offense to the term "Master" branch, as I have never interpreted it to have
> > a derogatory connotation. It's literally never crossed my mind.
>
> As an Indian person, I would concur with what Todd said.
>
> That said, I would like to highlight a few things.  Since the
> community is spending time to discuss how to be more welcoming to a
> diverse group of contributors, instead of default branch names, there
> are many practically relevant issues that could be addressed.
>
> I've been trying to contribute to this project for about 2 yrs, rather
> unsuccessfully.  I come from the perspective of analysis rather than
> engineering.  But I'm no stranger to technical nitty gritties
> (particle physicist at CERN, data scientist at non-technical startups,
> scientific software dev).  I started by filing bug reports for my
> needs (pyarrow and parquet).  Most bug reports are still open, they
> received a bit of discussion, but mostly they have been assigned and
> reassigned to releases for over a yr.  On day one I had offered to do
> the work myself, but with some guidance, I didn't receive any.  So I
> gave up.
>
> Some months later, after Gandiva was released, I came back with the
> goal of using it from pyarrow.  While after some help I could do
> simple tests in C++, getting it to work with pyarrow proved difficult.
> I don't remember the exact hurdle, but I decided I would package it
> for my distro (Fedora) for simpler compilation.  So I contributed a
> few patches to the build system to build against system libraries
> instead of the vendored versions, including the ability to switch LLVM
> versions.  I think around this time Kou was overhauling the build
> system. My patches were not accepted, but some of the ground work I
> did hopefully help Kou.  Eventually though, I gave up.
>
> Soon after, I tried to build a wheel for ARM; I was gathering some
> data on an RPi.  That didn't go so well either, again, the reason was
> lack of guidance.  At the time, it was also expressed that wheels are
> disfavoured by the community, and not worth maintaining.  I see that
> position has changed now.
>
> There is a clear pattern here, if the community is really serious
> about addressing diversity and being inclusive, time would be better
> spent by addressing issues like contribution guidelines for beginners
> (not saying absolute beginners), mentoring, or triaging of open issues
> in terms of ease of contribution, and other concrete hurdles for new
> comers.  I realise people's time is scarce, but you have to start
> somewhere.  At the least, if someone guides me, I can pick up these
> tasks and the maintainers can focus on the more involved roles. If the
> issues I have highlighted cannot be prioritised, then wasting time on
> superficial issues like default branch names should also be avoided.
>
> I hope my comments are accepted as constructive criticism.
>
> Cheers,
>
> PS: whitelist/blacklist -> accept/reject seems quite reasonable;
> personally, colour based terminology has always been very unclear to
> me
>
> --
> Suvayu
>
> Open source is the future. It sets us free.

Re: Renaming master branch, removing blacklist/whitelist

Posted by Suvayu Ali <fa...@gmail.com>.
Hi all,

(sorry if this is a duplicate post, I always have trouble posting to this list)

On Fri, Jun 19, 2020 at 5:54 PM Todd Hendricks <he...@gmail.com> wrote:
>
> I'm a black data scientist. For whatever it's worth, I have never taken
> offense to the term "Master" branch, as I have never interpreted it to have
> a derogatory connotation. It's literally never crossed my mind.

As an Indian person, I would concur with what Todd said.

That said, I would like to highlight a few things.  Since the
community is spending time to discuss how to be more welcoming to a
diverse group of contributors, instead of default branch names, there
are many practically relevant issues that could be addressed.

I've been trying to contribute to this project for about 2 yrs, rather
unsuccessfully.  I come from the perspective of analysis rather than
engineering.  But I'm no stranger to technical nitty gritties
(particle physicist at CERN, data scientist at non-technical startups,
scientific software dev).  I started by filing bug reports for my
needs (pyarrow and parquet).  Most bug reports are still open, they
received a bit of discussion, but mostly they have been assigned and
reassigned to releases for over a yr.  On day one I had offered to do
the work myself, but with some guidance, I didn't receive any.  So I
gave up.

Some months later, after Gandiva was released, I came back with the
goal of using it from pyarrow.  While after some help I could do
simple tests in C++, getting it to work with pyarrow proved difficult.
I don't remember the exact hurdle, but I decided I would package it
for my distro (Fedora) for simpler compilation.  So I contributed a
few patches to the build system to build against system libraries
instead of the vendored versions, including the ability to switch LLVM
versions.  I think around this time Kou was overhauling the build
system. My patches were not accepted, but some of the ground work I
did hopefully help Kou.  Eventually though, I gave up.

Soon after, I tried to build a wheel for ARM; I was gathering some
data on an RPi.  That didn't go so well either, again, the reason was
lack of guidance.  At the time, it was also expressed that wheels are
disfavoured by the community, and not worth maintaining.  I see that
position has changed now.

There is a clear pattern here, if the community is really serious
about addressing diversity and being inclusive, time would be better
spent by addressing issues like contribution guidelines for beginners
(not saying absolute beginners), mentoring, or triaging of open issues
in terms of ease of contribution, and other concrete hurdles for new
comers.  I realise people's time is scarce, but you have to start
somewhere.  At the least, if someone guides me, I can pick up these
tasks and the maintainers can focus on the more involved roles. If the
issues I have highlighted cannot be prioritised, then wasting time on
superficial issues like default branch names should also be avoided.

I hope my comments are accepted as constructive criticism.

Cheers,

PS: whitelist/blacklist -> accept/reject seems quite reasonable;
personally, colour based terminology has always been very unclear to
me

-- 
Suvayu

Open source is the future. It sets us free.

Re: Renaming master branch, removing blacklist/whitelist

Posted by Todd Hendricks <he...@gmail.com>.
Hi All,

I'm a black data scientist. For whatever it's worth, I have never taken
offense to the term "Master" branch, as I have never interpreted it to have
a derogatory connotation. It's literally never crossed my mind.

That said, I certainly appreciate the sentiment, and the spirit of the
discussion. It's nice to know people are looking for opportunities to move
us in the right direction. I have no investment in the outcome either way.
Just my $.02.



On Fri, Jun 19, 2020 at 10:27 AM Neal Richardson <
neal.p.richardson@gmail.com> wrote:

> Makes sense, I'm happy to monitor the situation and revisit the discussion
> in the coming weeks.
>
> FTR, the whitelist/blacklist language was resolved yesterday in
> https://github.com/apache/arrow/pull/7484.
>
> Neal
>
> On Fri, Jun 19, 2020 at 10:01 AM Micah Kornfield <em...@gmail.com>
> wrote:
>
> > GitHub is apparently looking into it as well:
> >> https://www.bbc.com/news/technology-53050955
> >
> > Yep, it seems like a few places are, that is why I think we should delay
> > any branch renaming until bigger providers can come to a consensus, I
> don't
> > want to have to make this change twice.
> >
> >
> >> FWIW when you clone (from GitHub at least), you get the default branch,
> >> whether it is named "master" or not.
> >
> > I'm not sure this covers all access paths.  Given the concern on the
> > linked thread from git-core, I really think we should wait until there is
> > consensus and the core git developers/providers can come to a consensus.
> >
> >
> >> Yes, and there are some reasonable arguments in there for why "main" is
> a
> >> better choice than other alternatives. I was surprised how little
> >> bikeshedding there was.
> >
> > There was also at least one linked thread about how "main" is problematic
> > in non-english speaking languages.  I'd prefer to let others bikeshed the
> > naming for us :)
> >
> > On Fri, Jun 19, 2020 at 9:55 AM Neal Richardson <
> > neal.p.richardson@gmail.com> wrote:
> >
> >> Thanks for the discussion, folks. I'm curious to hear what others think
> >> as well.
> >>
> >> Some responses inline.
> >>
> >> Neal
> >>
> >> On Thu, Jun 18, 2020 at 9:24 PM Micah Kornfield <em...@gmail.com>
> >> wrote:
> >>
> >>> sorry for the multiple posts ... I will also note that there is a lot
> of
> >>> debate on this change on the linked thread as well (and I'm not sure
> the
> >>> actual change will happen soon).
> >>>
> >>> On Thu, Jun 18, 2020 at 9:19 PM Micah Kornfield <emkornfield@gmail.com
> >
> >>> wrote:
> >>>
> >>> > FWIW Discussion on git core on naming [1], seems like it might be
> >>> > coalescing around "main".
> >>> >
> >>> > [1] https://lore.kernel.org/git/20200615205722.GG71506@syl.local/
> >>
> >>
> >> Yes, and there are some reasonable arguments in there for why "main" is
> a
> >> better choice than other alternatives. I was surprised how little
> >> bikeshedding there was.
> >>
> >>
> >>>
> >>> >
> >>> > On Thu, Jun 18, 2020 at 5:27 PM Micah Kornfield <
> emkornfield@gmail.com
> >>> >
> >>> > wrote:
> >>> >
> >>> >> I'm in favor of trying to align on neutral language within the
> >>> codebase.
> >>> >>
> >>> >> On branch naming, I think we should wait a little to see if a
> >>> consensus
> >>> >> converges on a new naming convention at least within Git/Github.
> >>
> >>
> >> GitHub is apparently looking into it as well:
> >> https://www.bbc.com/news/technology-53050955
> >>
> >>
> >>> On a
> >>> >> technical level, I'm not sure if automated tooling (e.g. crawlers)
> >>> outside
> >>> >> of the project might make assumptions about default branch  names or
> >>> what
> >>> >> is available in the github API for this type of metadata retrieval.
> >>>
> >>
> >> "default_branch" is already an attribute of "repository" objects in
> >> GitHub API responses
> >>
> >>
> >>> >>
> >>> >> Thanks,
> >>> >> Micah
> >>> >>
> >>> >> On Thu, Jun 18, 2020 at 1:48 PM Wes McKinney <we...@gmail.com>
> >>> wrote:
> >>> >>
> >>> >>> On Thu, Jun 18, 2020 at 3:33 PM Antoine Pitrou <antoine@python.org
> >
> >>> >>> wrote:
> >>> >>> >
> >>> >>> >
> >>> >>> > Hi,
> >>> >>> >
> >>> >>> > Le 18/06/2020 à 21:56, Neal Richardson a écrit :
> >>> >>> > > Hi all,
> >>> >>> > > As you're likely aware, there's growing momentum in the
> developer
> >>> >>> community
> >>> >>> > > to drop terminology that some find offensive.
> >>> >>> >
> >>> >>> > Yes.  Is it reasonable?  Does it achieve anything?  Is there any
> >>> sense
> >>> >>> > in trying to "drop terminology that some find offensive"?
> >>>
> >> >>>
> >>> >>> We wish to create a community that is open and as inclusive and
> >>> >>> welcoming as possible. So yes, IMHO if there is something that some
> >>> >>> people might find offensive (even if it is not intended that way),
> >>> >>> then there is value in removing that possibility from the equation.
> >>> >>> We're here to build a healthy community that builds software
> together
> >>> >>> and so respecting the perspectives of others (even if we disagree
> >>> with
> >>> >>> them) is a part of having a healthy community.
> >>> >>>
> >>> >>> > >
> >>> >>> >  As a project that takes pride
> >>> >>> > > in being welcoming and inclusive, I think this is something we
> >>> >>> should get
> >>> >>> > > in front of--particularly as we're approaching a 1.0 release.
> >>> >>> >
> >>> >>> > I don't think we would get "in front of".  We would just be
> >>> following
> >>> >>> > the "growing momentum".  In other words, we would do something
> >>> because
> >>> >>> > it's popular.
> >>> >>>
> >>> >>> Repeating sentiments from my response a few minutes ago, I think it
> >>> is
> >>> >>> better for us to avoid even the possibility of these concerns
> arising
> >>> >>> in this project. Let us spend our energy debating technical issues
> >>> >>> rather than social or political ones.
> >>> >>>
> >>> >>> > (I'll note that the urge to follow the "growing momentum" is how
> >>> the
> >>> >>> > developer community standardised on irritating tools like Git)
> >>> >>> >
> >>> >>> > In the long term, and in the face of the problems that it claims
> to
> >>> >>> > address, this seems futile to me.  But it makes some people feel
> >>> good
> >>> >>> > about doing something, and it's (small) PR for the project...
> >>> >>> >
> >>> >>> > Now to the specifics:
> >>> >>> >
> >>> >>> > > Specifically, I am proposing to:
> >>> >>> > >
> >>> >>> > > 1. rename the "master" branch to something else ("main" seems
> to
> >>> be
> >>> >>> > > popular; other version control systems use other words too).
> >>> >>> >
> >>> >>> > I used Mercurial before Git, and Mercurial uses "default".  I
> used
> >>> SVN
> >>> >>> > before Mercurial, and SVN uses "trunk".  I don't remember if CVS
> is
> >>> >>> > sophisticated enough to have any name for this concept :-)
> >>> >>> >
> >>> >>> > The problem, though, is that "master" is the overwhelming
> >>> convention in
> >>> >>> > Git land.  Well-known conventions make a better user experience
> >>> (you
> >>> >>> > clone a git repo, you get the "master" branch and you know it:
> >>> done).
> >>>
> >>
> >> FWIW when you clone (from GitHub at least), you get the default branch,
> >> whether it is named "master" or not.
> >>
> >>
> >>> >>> >
> >>> >>> > If we choose a non-"master" name, we add an additional hoop to
> jump
> >>> >>> > through for users to approach Arrow.  It's a small thing, but
> >>> usability
> >>> >>> > is often about such small things.
> >>> >>>
> >>> >>> I'm not concerned about this, given that Arrow is already on the
> >>> >>> sophisticated end of the spectrum for open source projects.
> >>> >>>
> >>> >>> > > 2. replace "whitelist"/"blacklist" in our code with something
> >>> like
> >>> >>> > > "allowlist"/"blocklist", or otherwise renaming.
> >>> >>> >
> >>> >>> > "allow"/"deny" sounds terser, and also seems more symmetric to
> me.
> >>> >>> > Also, be careful: "block" is very close, unsafely close, to
> >>> "black"...
> >>> >>> >
> >>> >>> > Regards
> >>> >>> >
> >>> >>> > Antoine.
> >>> >>>
> >>> >>
> >>>
> >>
>

Re: Renaming master branch, removing blacklist/whitelist

Posted by Neal Richardson <ne...@gmail.com>.
Makes sense, I'm happy to monitor the situation and revisit the discussion
in the coming weeks.

FTR, the whitelist/blacklist language was resolved yesterday in
https://github.com/apache/arrow/pull/7484.

Neal

On Fri, Jun 19, 2020 at 10:01 AM Micah Kornfield <em...@gmail.com>
wrote:

> GitHub is apparently looking into it as well:
>> https://www.bbc.com/news/technology-53050955
>
> Yep, it seems like a few places are, that is why I think we should delay
> any branch renaming until bigger providers can come to a consensus, I don't
> want to have to make this change twice.
>
>
>> FWIW when you clone (from GitHub at least), you get the default branch,
>> whether it is named "master" or not.
>
> I'm not sure this covers all access paths.  Given the concern on the
> linked thread from git-core, I really think we should wait until there is
> consensus and the core git developers/providers can come to a consensus.
>
>
>> Yes, and there are some reasonable arguments in there for why "main" is a
>> better choice than other alternatives. I was surprised how little
>> bikeshedding there was.
>
> There was also at least one linked thread about how "main" is problematic
> in non-english speaking languages.  I'd prefer to let others bikeshed the
> naming for us :)
>
> On Fri, Jun 19, 2020 at 9:55 AM Neal Richardson <
> neal.p.richardson@gmail.com> wrote:
>
>> Thanks for the discussion, folks. I'm curious to hear what others think
>> as well.
>>
>> Some responses inline.
>>
>> Neal
>>
>> On Thu, Jun 18, 2020 at 9:24 PM Micah Kornfield <em...@gmail.com>
>> wrote:
>>
>>> sorry for the multiple posts ... I will also note that there is a lot of
>>> debate on this change on the linked thread as well (and I'm not sure the
>>> actual change will happen soon).
>>>
>>> On Thu, Jun 18, 2020 at 9:19 PM Micah Kornfield <em...@gmail.com>
>>> wrote:
>>>
>>> > FWIW Discussion on git core on naming [1], seems like it might be
>>> > coalescing around "main".
>>> >
>>> > [1] https://lore.kernel.org/git/20200615205722.GG71506@syl.local/
>>
>>
>> Yes, and there are some reasonable arguments in there for why "main" is a
>> better choice than other alternatives. I was surprised how little
>> bikeshedding there was.
>>
>>
>>>
>>> >
>>> > On Thu, Jun 18, 2020 at 5:27 PM Micah Kornfield <emkornfield@gmail.com
>>> >
>>> > wrote:
>>> >
>>> >> I'm in favor of trying to align on neutral language within the
>>> codebase.
>>> >>
>>> >> On branch naming, I think we should wait a little to see if a
>>> consensus
>>> >> converges on a new naming convention at least within Git/Github.
>>
>>
>> GitHub is apparently looking into it as well:
>> https://www.bbc.com/news/technology-53050955
>>
>>
>>> On a
>>> >> technical level, I'm not sure if automated tooling (e.g. crawlers)
>>> outside
>>> >> of the project might make assumptions about default branch  names or
>>> what
>>> >> is available in the github API for this type of metadata retrieval.
>>>
>>
>> "default_branch" is already an attribute of "repository" objects in
>> GitHub API responses
>>
>>
>>> >>
>>> >> Thanks,
>>> >> Micah
>>> >>
>>> >> On Thu, Jun 18, 2020 at 1:48 PM Wes McKinney <we...@gmail.com>
>>> wrote:
>>> >>
>>> >>> On Thu, Jun 18, 2020 at 3:33 PM Antoine Pitrou <an...@python.org>
>>> >>> wrote:
>>> >>> >
>>> >>> >
>>> >>> > Hi,
>>> >>> >
>>> >>> > Le 18/06/2020 à 21:56, Neal Richardson a écrit :
>>> >>> > > Hi all,
>>> >>> > > As you're likely aware, there's growing momentum in the developer
>>> >>> community
>>> >>> > > to drop terminology that some find offensive.
>>> >>> >
>>> >>> > Yes.  Is it reasonable?  Does it achieve anything?  Is there any
>>> sense
>>> >>> > in trying to "drop terminology that some find offensive"?
>>>
>> >>>
>>> >>> We wish to create a community that is open and as inclusive and
>>> >>> welcoming as possible. So yes, IMHO if there is something that some
>>> >>> people might find offensive (even if it is not intended that way),
>>> >>> then there is value in removing that possibility from the equation.
>>> >>> We're here to build a healthy community that builds software together
>>> >>> and so respecting the perspectives of others (even if we disagree
>>> with
>>> >>> them) is a part of having a healthy community.
>>> >>>
>>> >>> > >
>>> >>> >  As a project that takes pride
>>> >>> > > in being welcoming and inclusive, I think this is something we
>>> >>> should get
>>> >>> > > in front of--particularly as we're approaching a 1.0 release.
>>> >>> >
>>> >>> > I don't think we would get "in front of".  We would just be
>>> following
>>> >>> > the "growing momentum".  In other words, we would do something
>>> because
>>> >>> > it's popular.
>>> >>>
>>> >>> Repeating sentiments from my response a few minutes ago, I think it
>>> is
>>> >>> better for us to avoid even the possibility of these concerns arising
>>> >>> in this project. Let us spend our energy debating technical issues
>>> >>> rather than social or political ones.
>>> >>>
>>> >>> > (I'll note that the urge to follow the "growing momentum" is how
>>> the
>>> >>> > developer community standardised on irritating tools like Git)
>>> >>> >
>>> >>> > In the long term, and in the face of the problems that it claims to
>>> >>> > address, this seems futile to me.  But it makes some people feel
>>> good
>>> >>> > about doing something, and it's (small) PR for the project...
>>> >>> >
>>> >>> > Now to the specifics:
>>> >>> >
>>> >>> > > Specifically, I am proposing to:
>>> >>> > >
>>> >>> > > 1. rename the "master" branch to something else ("main" seems to
>>> be
>>> >>> > > popular; other version control systems use other words too).
>>> >>> >
>>> >>> > I used Mercurial before Git, and Mercurial uses "default".  I used
>>> SVN
>>> >>> > before Mercurial, and SVN uses "trunk".  I don't remember if CVS is
>>> >>> > sophisticated enough to have any name for this concept :-)
>>> >>> >
>>> >>> > The problem, though, is that "master" is the overwhelming
>>> convention in
>>> >>> > Git land.  Well-known conventions make a better user experience
>>> (you
>>> >>> > clone a git repo, you get the "master" branch and you know it:
>>> done).
>>>
>>
>> FWIW when you clone (from GitHub at least), you get the default branch,
>> whether it is named "master" or not.
>>
>>
>>> >>> >
>>> >>> > If we choose a non-"master" name, we add an additional hoop to jump
>>> >>> > through for users to approach Arrow.  It's a small thing, but
>>> usability
>>> >>> > is often about such small things.
>>> >>>
>>> >>> I'm not concerned about this, given that Arrow is already on the
>>> >>> sophisticated end of the spectrum for open source projects.
>>> >>>
>>> >>> > > 2. replace "whitelist"/"blacklist" in our code with something
>>> like
>>> >>> > > "allowlist"/"blocklist", or otherwise renaming.
>>> >>> >
>>> >>> > "allow"/"deny" sounds terser, and also seems more symmetric to me.
>>> >>> > Also, be careful: "block" is very close, unsafely close, to
>>> "black"...
>>> >>> >
>>> >>> > Regards
>>> >>> >
>>> >>> > Antoine.
>>> >>>
>>> >>
>>>
>>

Re: Renaming master branch, removing blacklist/whitelist

Posted by Micah Kornfield <em...@gmail.com>.
>
> GitHub is apparently looking into it as well:
> https://www.bbc.com/news/technology-53050955

Yep, it seems like a few places are, that is why I think we should delay
any branch renaming until bigger providers can come to a consensus, I don't
want to have to make this change twice.


> FWIW when you clone (from GitHub at least), you get the default branch,
> whether it is named "master" or not.

I'm not sure this covers all access paths.  Given the concern on the linked
thread from git-core, I really think we should wait until there is
consensus and the core git developers/providers can come to a consensus.


> Yes, and there are some reasonable arguments in there for why "main" is a
> better choice than other alternatives. I was surprised how little
> bikeshedding there was.

There was also at least one linked thread about how "main" is problematic
in non-english speaking languages.  I'd prefer to let others bikeshed the
naming for us :)

On Fri, Jun 19, 2020 at 9:55 AM Neal Richardson <ne...@gmail.com>
wrote:

> Thanks for the discussion, folks. I'm curious to hear what others think as
> well.
>
> Some responses inline.
>
> Neal
>
> On Thu, Jun 18, 2020 at 9:24 PM Micah Kornfield <em...@gmail.com>
> wrote:
>
>> sorry for the multiple posts ... I will also note that there is a lot of
>> debate on this change on the linked thread as well (and I'm not sure the
>> actual change will happen soon).
>>
>> On Thu, Jun 18, 2020 at 9:19 PM Micah Kornfield <em...@gmail.com>
>> wrote:
>>
>> > FWIW Discussion on git core on naming [1], seems like it might be
>> > coalescing around "main".
>> >
>> > [1] https://lore.kernel.org/git/20200615205722.GG71506@syl.local/
>
>
> Yes, and there are some reasonable arguments in there for why "main" is a
> better choice than other alternatives. I was surprised how little
> bikeshedding there was.
>
>
>>
>> >
>> > On Thu, Jun 18, 2020 at 5:27 PM Micah Kornfield <em...@gmail.com>
>> > wrote:
>> >
>> >> I'm in favor of trying to align on neutral language within the
>> codebase.
>> >>
>> >> On branch naming, I think we should wait a little to see if a consensus
>> >> converges on a new naming convention at least within Git/Github.
>
>
> GitHub is apparently looking into it as well:
> https://www.bbc.com/news/technology-53050955
>
>
>> On a
>> >> technical level, I'm not sure if automated tooling (e.g. crawlers)
>> outside
>> >> of the project might make assumptions about default branch  names or
>> what
>> >> is available in the github API for this type of metadata retrieval.
>>
>
> "default_branch" is already an attribute of "repository" objects in GitHub
> API responses
>
>
>> >>
>> >> Thanks,
>> >> Micah
>> >>
>> >> On Thu, Jun 18, 2020 at 1:48 PM Wes McKinney <we...@gmail.com>
>> wrote:
>> >>
>> >>> On Thu, Jun 18, 2020 at 3:33 PM Antoine Pitrou <an...@python.org>
>> >>> wrote:
>> >>> >
>> >>> >
>> >>> > Hi,
>> >>> >
>> >>> > Le 18/06/2020 à 21:56, Neal Richardson a écrit :
>> >>> > > Hi all,
>> >>> > > As you're likely aware, there's growing momentum in the developer
>> >>> community
>> >>> > > to drop terminology that some find offensive.
>> >>> >
>> >>> > Yes.  Is it reasonable?  Does it achieve anything?  Is there any
>> sense
>> >>> > in trying to "drop terminology that some find offensive"?
>>
> >>>
>> >>> We wish to create a community that is open and as inclusive and
>> >>> welcoming as possible. So yes, IMHO if there is something that some
>> >>> people might find offensive (even if it is not intended that way),
>> >>> then there is value in removing that possibility from the equation.
>> >>> We're here to build a healthy community that builds software together
>> >>> and so respecting the perspectives of others (even if we disagree with
>> >>> them) is a part of having a healthy community.
>> >>>
>> >>> > >
>> >>> >  As a project that takes pride
>> >>> > > in being welcoming and inclusive, I think this is something we
>> >>> should get
>> >>> > > in front of--particularly as we're approaching a 1.0 release.
>> >>> >
>> >>> > I don't think we would get "in front of".  We would just be
>> following
>> >>> > the "growing momentum".  In other words, we would do something
>> because
>> >>> > it's popular.
>> >>>
>> >>> Repeating sentiments from my response a few minutes ago, I think it is
>> >>> better for us to avoid even the possibility of these concerns arising
>> >>> in this project. Let us spend our energy debating technical issues
>> >>> rather than social or political ones.
>> >>>
>> >>> > (I'll note that the urge to follow the "growing momentum" is how the
>> >>> > developer community standardised on irritating tools like Git)
>> >>> >
>> >>> > In the long term, and in the face of the problems that it claims to
>> >>> > address, this seems futile to me.  But it makes some people feel
>> good
>> >>> > about doing something, and it's (small) PR for the project...
>> >>> >
>> >>> > Now to the specifics:
>> >>> >
>> >>> > > Specifically, I am proposing to:
>> >>> > >
>> >>> > > 1. rename the "master" branch to something else ("main" seems to
>> be
>> >>> > > popular; other version control systems use other words too).
>> >>> >
>> >>> > I used Mercurial before Git, and Mercurial uses "default".  I used
>> SVN
>> >>> > before Mercurial, and SVN uses "trunk".  I don't remember if CVS is
>> >>> > sophisticated enough to have any name for this concept :-)
>> >>> >
>> >>> > The problem, though, is that "master" is the overwhelming
>> convention in
>> >>> > Git land.  Well-known conventions make a better user experience (you
>> >>> > clone a git repo, you get the "master" branch and you know it:
>> done).
>>
>
> FWIW when you clone (from GitHub at least), you get the default branch,
> whether it is named "master" or not.
>
>
>> >>> >
>> >>> > If we choose a non-"master" name, we add an additional hoop to jump
>> >>> > through for users to approach Arrow.  It's a small thing, but
>> usability
>> >>> > is often about such small things.
>> >>>
>> >>> I'm not concerned about this, given that Arrow is already on the
>> >>> sophisticated end of the spectrum for open source projects.
>> >>>
>> >>> > > 2. replace "whitelist"/"blacklist" in our code with something like
>> >>> > > "allowlist"/"blocklist", or otherwise renaming.
>> >>> >
>> >>> > "allow"/"deny" sounds terser, and also seems more symmetric to me.
>> >>> > Also, be careful: "block" is very close, unsafely close, to
>> "black"...
>> >>> >
>> >>> > Regards
>> >>> >
>> >>> > Antoine.
>> >>>
>> >>
>>
>

Re: Renaming master branch, removing blacklist/whitelist

Posted by Neal Richardson <ne...@gmail.com>.
Thanks for the discussion, folks. I'm curious to hear what others think as
well.

Some responses inline.

Neal

On Thu, Jun 18, 2020 at 9:24 PM Micah Kornfield <em...@gmail.com>
wrote:

> sorry for the multiple posts ... I will also note that there is a lot of
> debate on this change on the linked thread as well (and I'm not sure the
> actual change will happen soon).
>
> On Thu, Jun 18, 2020 at 9:19 PM Micah Kornfield <em...@gmail.com>
> wrote:
>
> > FWIW Discussion on git core on naming [1], seems like it might be
> > coalescing around "main".
> >
> > [1] https://lore.kernel.org/git/20200615205722.GG71506@syl.local/


Yes, and there are some reasonable arguments in there for why "main" is a
better choice than other alternatives. I was surprised how little
bikeshedding there was.


>
> >
> > On Thu, Jun 18, 2020 at 5:27 PM Micah Kornfield <em...@gmail.com>
> > wrote:
> >
> >> I'm in favor of trying to align on neutral language within the codebase.
> >>
> >> On branch naming, I think we should wait a little to see if a consensus
> >> converges on a new naming convention at least within Git/Github.


GitHub is apparently looking into it as well:
https://www.bbc.com/news/technology-53050955


> On a
> >> technical level, I'm not sure if automated tooling (e.g. crawlers)
> outside
> >> of the project might make assumptions about default branch  names or
> what
> >> is available in the github API for this type of metadata retrieval.
>

"default_branch" is already an attribute of "repository" objects in GitHub
API responses


> >>
> >> Thanks,
> >> Micah
> >>
> >> On Thu, Jun 18, 2020 at 1:48 PM Wes McKinney <we...@gmail.com>
> wrote:
> >>
> >>> On Thu, Jun 18, 2020 at 3:33 PM Antoine Pitrou <an...@python.org>
> >>> wrote:
> >>> >
> >>> >
> >>> > Hi,
> >>> >
> >>> > Le 18/06/2020 à 21:56, Neal Richardson a écrit :
> >>> > > Hi all,
> >>> > > As you're likely aware, there's growing momentum in the developer
> >>> community
> >>> > > to drop terminology that some find offensive.
> >>> >
> >>> > Yes.  Is it reasonable?  Does it achieve anything?  Is there any
> sense
> >>> > in trying to "drop terminology that some find offensive"?
>
>>>
> >>> We wish to create a community that is open and as inclusive and
> >>> welcoming as possible. So yes, IMHO if there is something that some
> >>> people might find offensive (even if it is not intended that way),
> >>> then there is value in removing that possibility from the equation.
> >>> We're here to build a healthy community that builds software together
> >>> and so respecting the perspectives of others (even if we disagree with
> >>> them) is a part of having a healthy community.
> >>>
> >>> > >
> >>> >  As a project that takes pride
> >>> > > in being welcoming and inclusive, I think this is something we
> >>> should get
> >>> > > in front of--particularly as we're approaching a 1.0 release.
> >>> >
> >>> > I don't think we would get "in front of".  We would just be following
> >>> > the "growing momentum".  In other words, we would do something
> because
> >>> > it's popular.
> >>>
> >>> Repeating sentiments from my response a few minutes ago, I think it is
> >>> better for us to avoid even the possibility of these concerns arising
> >>> in this project. Let us spend our energy debating technical issues
> >>> rather than social or political ones.
> >>>
> >>> > (I'll note that the urge to follow the "growing momentum" is how the
> >>> > developer community standardised on irritating tools like Git)
> >>> >
> >>> > In the long term, and in the face of the problems that it claims to
> >>> > address, this seems futile to me.  But it makes some people feel good
> >>> > about doing something, and it's (small) PR for the project...
> >>> >
> >>> > Now to the specifics:
> >>> >
> >>> > > Specifically, I am proposing to:
> >>> > >
> >>> > > 1. rename the "master" branch to something else ("main" seems to be
> >>> > > popular; other version control systems use other words too).
> >>> >
> >>> > I used Mercurial before Git, and Mercurial uses "default".  I used
> SVN
> >>> > before Mercurial, and SVN uses "trunk".  I don't remember if CVS is
> >>> > sophisticated enough to have any name for this concept :-)
> >>> >
> >>> > The problem, though, is that "master" is the overwhelming convention
> in
> >>> > Git land.  Well-known conventions make a better user experience (you
> >>> > clone a git repo, you get the "master" branch and you know it: done).
>

FWIW when you clone (from GitHub at least), you get the default branch,
whether it is named "master" or not.


> >>> >
> >>> > If we choose a non-"master" name, we add an additional hoop to jump
> >>> > through for users to approach Arrow.  It's a small thing, but
> usability
> >>> > is often about such small things.
> >>>
> >>> I'm not concerned about this, given that Arrow is already on the
> >>> sophisticated end of the spectrum for open source projects.
> >>>
> >>> > > 2. replace "whitelist"/"blacklist" in our code with something like
> >>> > > "allowlist"/"blocklist", or otherwise renaming.
> >>> >
> >>> > "allow"/"deny" sounds terser, and also seems more symmetric to me.
> >>> > Also, be careful: "block" is very close, unsafely close, to
> "black"...
> >>> >
> >>> > Regards
> >>> >
> >>> > Antoine.
> >>>
> >>
>

Re: Renaming master branch, removing blacklist/whitelist

Posted by Micah Kornfield <em...@gmail.com>.
sorry for the multiple posts ... I will also note that there is a lot of
debate on this change on the linked thread as well (and I'm not sure the
actual change will happen soon).

On Thu, Jun 18, 2020 at 9:19 PM Micah Kornfield <em...@gmail.com>
wrote:

> FWIW Discussion on git core on naming [1], seems like it might be
> coalescing around "main".
>
> [1] https://lore.kernel.org/git/20200615205722.GG71506@syl.local/
>
> On Thu, Jun 18, 2020 at 5:27 PM Micah Kornfield <em...@gmail.com>
> wrote:
>
>> I'm in favor of trying to align on neutral language within the codebase.
>>
>> On branch naming, I think we should wait a little to see if a consensus
>> converges on a new naming convention at least within Git/Github. On a
>> technical level, I'm not sure if automated tooling (e.g. crawlers) outside
>> of the project might make assumptions about default branch  names or what
>> is available in the github API for this type of metadata retrieval.
>>
>> Thanks,
>> Micah
>>
>> On Thu, Jun 18, 2020 at 1:48 PM Wes McKinney <we...@gmail.com> wrote:
>>
>>> On Thu, Jun 18, 2020 at 3:33 PM Antoine Pitrou <an...@python.org>
>>> wrote:
>>> >
>>> >
>>> > Hi,
>>> >
>>> > Le 18/06/2020 à 21:56, Neal Richardson a écrit :
>>> > > Hi all,
>>> > > As you're likely aware, there's growing momentum in the developer
>>> community
>>> > > to drop terminology that some find offensive.
>>> >
>>> > Yes.  Is it reasonable?  Does it achieve anything?  Is there any sense
>>> > in trying to "drop terminology that some find offensive"?
>>>
>>> We wish to create a community that is open and as inclusive and
>>> welcoming as possible. So yes, IMHO if there is something that some
>>> people might find offensive (even if it is not intended that way),
>>> then there is value in removing that possibility from the equation.
>>> We're here to build a healthy community that builds software together
>>> and so respecting the perspectives of others (even if we disagree with
>>> them) is a part of having a healthy community.
>>>
>>> > >
>>> >  As a project that takes pride
>>> > > in being welcoming and inclusive, I think this is something we
>>> should get
>>> > > in front of--particularly as we're approaching a 1.0 release.
>>> >
>>> > I don't think we would get "in front of".  We would just be following
>>> > the "growing momentum".  In other words, we would do something because
>>> > it's popular.
>>>
>>> Repeating sentiments from my response a few minutes ago, I think it is
>>> better for us to avoid even the possibility of these concerns arising
>>> in this project. Let us spend our energy debating technical issues
>>> rather than social or political ones.
>>>
>>> > (I'll note that the urge to follow the "growing momentum" is how the
>>> > developer community standardised on irritating tools like Git)
>>> >
>>> > In the long term, and in the face of the problems that it claims to
>>> > address, this seems futile to me.  But it makes some people feel good
>>> > about doing something, and it's (small) PR for the project...
>>> >
>>> > Now to the specifics:
>>> >
>>> > > Specifically, I am proposing to:
>>> > >
>>> > > 1. rename the "master" branch to something else ("main" seems to be
>>> > > popular; other version control systems use other words too).
>>> >
>>> > I used Mercurial before Git, and Mercurial uses "default".  I used SVN
>>> > before Mercurial, and SVN uses "trunk".  I don't remember if CVS is
>>> > sophisticated enough to have any name for this concept :-)
>>> >
>>> > The problem, though, is that "master" is the overwhelming convention in
>>> > Git land.  Well-known conventions make a better user experience (you
>>> > clone a git repo, you get the "master" branch and you know it: done).
>>> >
>>> > If we choose a non-"master" name, we add an additional hoop to jump
>>> > through for users to approach Arrow.  It's a small thing, but usability
>>> > is often about such small things.
>>>
>>> I'm not concerned about this, given that Arrow is already on the
>>> sophisticated end of the spectrum for open source projects.
>>>
>>> > > 2. replace "whitelist"/"blacklist" in our code with something like
>>> > > "allowlist"/"blocklist", or otherwise renaming.
>>> >
>>> > "allow"/"deny" sounds terser, and also seems more symmetric to me.
>>> > Also, be careful: "block" is very close, unsafely close, to "black"...
>>> >
>>> > Regards
>>> >
>>> > Antoine.
>>>
>>

Re: Renaming master branch, removing blacklist/whitelist

Posted by Micah Kornfield <em...@gmail.com>.
FWIW Discussion on git core on naming [1], seems like it might be
coalescing around "main".

[1] https://lore.kernel.org/git/20200615205722.GG71506@syl.local/

On Thu, Jun 18, 2020 at 5:27 PM Micah Kornfield <em...@gmail.com>
wrote:

> I'm in favor of trying to align on neutral language within the codebase.
>
> On branch naming, I think we should wait a little to see if a consensus
> converges on a new naming convention at least within Git/Github. On a
> technical level, I'm not sure if automated tooling (e.g. crawlers) outside
> of the project might make assumptions about default branch  names or what
> is available in the github API for this type of metadata retrieval.
>
> Thanks,
> Micah
>
> On Thu, Jun 18, 2020 at 1:48 PM Wes McKinney <we...@gmail.com> wrote:
>
>> On Thu, Jun 18, 2020 at 3:33 PM Antoine Pitrou <an...@python.org>
>> wrote:
>> >
>> >
>> > Hi,
>> >
>> > Le 18/06/2020 à 21:56, Neal Richardson a écrit :
>> > > Hi all,
>> > > As you're likely aware, there's growing momentum in the developer
>> community
>> > > to drop terminology that some find offensive.
>> >
>> > Yes.  Is it reasonable?  Does it achieve anything?  Is there any sense
>> > in trying to "drop terminology that some find offensive"?
>>
>> We wish to create a community that is open and as inclusive and
>> welcoming as possible. So yes, IMHO if there is something that some
>> people might find offensive (even if it is not intended that way),
>> then there is value in removing that possibility from the equation.
>> We're here to build a healthy community that builds software together
>> and so respecting the perspectives of others (even if we disagree with
>> them) is a part of having a healthy community.
>>
>> > >
>> >  As a project that takes pride
>> > > in being welcoming and inclusive, I think this is something we should
>> get
>> > > in front of--particularly as we're approaching a 1.0 release.
>> >
>> > I don't think we would get "in front of".  We would just be following
>> > the "growing momentum".  In other words, we would do something because
>> > it's popular.
>>
>> Repeating sentiments from my response a few minutes ago, I think it is
>> better for us to avoid even the possibility of these concerns arising
>> in this project. Let us spend our energy debating technical issues
>> rather than social or political ones.
>>
>> > (I'll note that the urge to follow the "growing momentum" is how the
>> > developer community standardised on irritating tools like Git)
>> >
>> > In the long term, and in the face of the problems that it claims to
>> > address, this seems futile to me.  But it makes some people feel good
>> > about doing something, and it's (small) PR for the project...
>> >
>> > Now to the specifics:
>> >
>> > > Specifically, I am proposing to:
>> > >
>> > > 1. rename the "master" branch to something else ("main" seems to be
>> > > popular; other version control systems use other words too).
>> >
>> > I used Mercurial before Git, and Mercurial uses "default".  I used SVN
>> > before Mercurial, and SVN uses "trunk".  I don't remember if CVS is
>> > sophisticated enough to have any name for this concept :-)
>> >
>> > The problem, though, is that "master" is the overwhelming convention in
>> > Git land.  Well-known conventions make a better user experience (you
>> > clone a git repo, you get the "master" branch and you know it: done).
>> >
>> > If we choose a non-"master" name, we add an additional hoop to jump
>> > through for users to approach Arrow.  It's a small thing, but usability
>> > is often about such small things.
>>
>> I'm not concerned about this, given that Arrow is already on the
>> sophisticated end of the spectrum for open source projects.
>>
>> > > 2. replace "whitelist"/"blacklist" in our code with something like
>> > > "allowlist"/"blocklist", or otherwise renaming.
>> >
>> > "allow"/"deny" sounds terser, and also seems more symmetric to me.
>> > Also, be careful: "block" is very close, unsafely close, to "black"...
>> >
>> > Regards
>> >
>> > Antoine.
>>
>

Re: Renaming master branch, removing blacklist/whitelist

Posted by Micah Kornfield <em...@gmail.com>.
I'm in favor of trying to align on neutral language within the codebase.

On branch naming, I think we should wait a little to see if a consensus
converges on a new naming convention at least within Git/Github. On a
technical level, I'm not sure if automated tooling (e.g. crawlers) outside
of the project might make assumptions about default branch  names or what
is available in the github API for this type of metadata retrieval.

Thanks,
Micah

On Thu, Jun 18, 2020 at 1:48 PM Wes McKinney <we...@gmail.com> wrote:

> On Thu, Jun 18, 2020 at 3:33 PM Antoine Pitrou <an...@python.org> wrote:
> >
> >
> > Hi,
> >
> > Le 18/06/2020 à 21:56, Neal Richardson a écrit :
> > > Hi all,
> > > As you're likely aware, there's growing momentum in the developer
> community
> > > to drop terminology that some find offensive.
> >
> > Yes.  Is it reasonable?  Does it achieve anything?  Is there any sense
> > in trying to "drop terminology that some find offensive"?
>
> We wish to create a community that is open and as inclusive and
> welcoming as possible. So yes, IMHO if there is something that some
> people might find offensive (even if it is not intended that way),
> then there is value in removing that possibility from the equation.
> We're here to build a healthy community that builds software together
> and so respecting the perspectives of others (even if we disagree with
> them) is a part of having a healthy community.
>
> > >
> >  As a project that takes pride
> > > in being welcoming and inclusive, I think this is something we should
> get
> > > in front of--particularly as we're approaching a 1.0 release.
> >
> > I don't think we would get "in front of".  We would just be following
> > the "growing momentum".  In other words, we would do something because
> > it's popular.
>
> Repeating sentiments from my response a few minutes ago, I think it is
> better for us to avoid even the possibility of these concerns arising
> in this project. Let us spend our energy debating technical issues
> rather than social or political ones.
>
> > (I'll note that the urge to follow the "growing momentum" is how the
> > developer community standardised on irritating tools like Git)
> >
> > In the long term, and in the face of the problems that it claims to
> > address, this seems futile to me.  But it makes some people feel good
> > about doing something, and it's (small) PR for the project...
> >
> > Now to the specifics:
> >
> > > Specifically, I am proposing to:
> > >
> > > 1. rename the "master" branch to something else ("main" seems to be
> > > popular; other version control systems use other words too).
> >
> > I used Mercurial before Git, and Mercurial uses "default".  I used SVN
> > before Mercurial, and SVN uses "trunk".  I don't remember if CVS is
> > sophisticated enough to have any name for this concept :-)
> >
> > The problem, though, is that "master" is the overwhelming convention in
> > Git land.  Well-known conventions make a better user experience (you
> > clone a git repo, you get the "master" branch and you know it: done).
> >
> > If we choose a non-"master" name, we add an additional hoop to jump
> > through for users to approach Arrow.  It's a small thing, but usability
> > is often about such small things.
>
> I'm not concerned about this, given that Arrow is already on the
> sophisticated end of the spectrum for open source projects.
>
> > > 2. replace "whitelist"/"blacklist" in our code with something like
> > > "allowlist"/"blocklist", or otherwise renaming.
> >
> > "allow"/"deny" sounds terser, and also seems more symmetric to me.
> > Also, be careful: "block" is very close, unsafely close, to "black"...
> >
> > Regards
> >
> > Antoine.
>

Re: Renaming master branch, removing blacklist/whitelist

Posted by Wes McKinney <we...@gmail.com>.
On Thu, Jun 18, 2020 at 3:33 PM Antoine Pitrou <an...@python.org> wrote:
>
>
> Hi,
>
> Le 18/06/2020 à 21:56, Neal Richardson a écrit :
> > Hi all,
> > As you're likely aware, there's growing momentum in the developer community
> > to drop terminology that some find offensive.
>
> Yes.  Is it reasonable?  Does it achieve anything?  Is there any sense
> in trying to "drop terminology that some find offensive"?

We wish to create a community that is open and as inclusive and
welcoming as possible. So yes, IMHO if there is something that some
people might find offensive (even if it is not intended that way),
then there is value in removing that possibility from the equation.
We're here to build a healthy community that builds software together
and so respecting the perspectives of others (even if we disagree with
them) is a part of having a healthy community.

> >
>  As a project that takes pride
> > in being welcoming and inclusive, I think this is something we should get
> > in front of--particularly as we're approaching a 1.0 release.
>
> I don't think we would get "in front of".  We would just be following
> the "growing momentum".  In other words, we would do something because
> it's popular.

Repeating sentiments from my response a few minutes ago, I think it is
better for us to avoid even the possibility of these concerns arising
in this project. Let us spend our energy debating technical issues
rather than social or political ones.

> (I'll note that the urge to follow the "growing momentum" is how the
> developer community standardised on irritating tools like Git)
>
> In the long term, and in the face of the problems that it claims to
> address, this seems futile to me.  But it makes some people feel good
> about doing something, and it's (small) PR for the project...
>
> Now to the specifics:
>
> > Specifically, I am proposing to:
> >
> > 1. rename the "master" branch to something else ("main" seems to be
> > popular; other version control systems use other words too).
>
> I used Mercurial before Git, and Mercurial uses "default".  I used SVN
> before Mercurial, and SVN uses "trunk".  I don't remember if CVS is
> sophisticated enough to have any name for this concept :-)
>
> The problem, though, is that "master" is the overwhelming convention in
> Git land.  Well-known conventions make a better user experience (you
> clone a git repo, you get the "master" branch and you know it: done).
>
> If we choose a non-"master" name, we add an additional hoop to jump
> through for users to approach Arrow.  It's a small thing, but usability
> is often about such small things.

I'm not concerned about this, given that Arrow is already on the
sophisticated end of the spectrum for open source projects.

> > 2. replace "whitelist"/"blacklist" in our code with something like
> > "allowlist"/"blocklist", or otherwise renaming.
>
> "allow"/"deny" sounds terser, and also seems more symmetric to me.
> Also, be careful: "block" is very close, unsafely close, to "black"...
>
> Regards
>
> Antoine.

Re: Renaming master branch, removing blacklist/whitelist

Posted by Antoine Pitrou <an...@python.org>.
Hi,

Le 18/06/2020 à 21:56, Neal Richardson a écrit :
> Hi all,
> As you're likely aware, there's growing momentum in the developer community
> to drop terminology that some find offensive.

Yes.  Is it reasonable?  Does it achieve anything?  Is there any sense
in trying to "drop terminology that some find offensive"?

>
 As a project that takes pride
> in being welcoming and inclusive, I think this is something we should get
> in front of--particularly as we're approaching a 1.0 release.

I don't think we would get "in front of".  We would just be following
the "growing momentum".  In other words, we would do something because
it's popular.

(I'll note that the urge to follow the "growing momentum" is how the
developer community standardised on irritating tools like Git)

In the long term, and in the face of the problems that it claims to
address, this seems futile to me.  But it makes some people feel good
about doing something, and it's (small) PR for the project...

Now to the specifics:

> Specifically, I am proposing to:
> 
> 1. rename the "master" branch to something else ("main" seems to be
> popular; other version control systems use other words too).

I used Mercurial before Git, and Mercurial uses "default".  I used SVN
before Mercurial, and SVN uses "trunk".  I don't remember if CVS is
sophisticated enough to have any name for this concept :-)

The problem, though, is that "master" is the overwhelming convention in
Git land.  Well-known conventions make a better user experience (you
clone a git repo, you get the "master" branch and you know it: done).

If we choose a non-"master" name, we add an additional hoop to jump
through for users to approach Arrow.  It's a small thing, but usability
is often about such small things.

> 2. replace "whitelist"/"blacklist" in our code with something like
> "allowlist"/"blocklist", or otherwise renaming.

"allow"/"deny" sounds terser, and also seems more symmetric to me.
Also, be careful: "block" is very close, unsafely close, to "black"...

Regards

Antoine.

Re: Renaming master branch, removing blacklist/whitelist

Posted by Wes McKinney <we...@gmail.com>.
hi Neal,

Thanks for bringing this up. Independent of who is "right" (or simply
"more right") about the merits of these changes, I support the use of
language in the project that is broadly accepted as neutral. If a term
used is neutral and clearly communicates its function, then we should
use that. If there is disagreement about a term's neutrality, then it
would be better to choose an alternative where there is not debate.
Having debates about whether or not something is neutral can become a
political or ideological matter, and I'd prefer to avoid even the
possibility of such issues in this project.

Renaming our default branch to "develop" or similar sounds good to me.
It's true that it would create some disruption of tools that have
"master" hard-coded but as a one-time change I think from a technical
standpoint I think it's something we can handle. On
whitelist/blacklist similarly I think allow/deny or include/exclude is
not only more neutral but also more clear with regard to function.

Thanks,
Wes

On Thu, Jun 18, 2020 at 2:56 PM Neal Richardson
<ne...@gmail.com> wrote:
>
> Hi all,
> As you're likely aware, there's growing momentum in the developer community
> to drop terminology that some find offensive. As a project that takes pride
> in being welcoming and inclusive, I think this is something we should get
> in front of--particularly as we're approaching a 1.0 release.
>
> Specifically, I am proposing to:
>
> 1. rename the "master" branch to something else ("main" seems to be
> popular; other version control systems use other words too).
>
> 2. replace "whitelist"/"blacklist" in our code with something like
> "allowlist"/"blocklist", or otherwise renaming. A quick search of code
> shows that we don't use them much, but there are some places in Archery
> that do, as well as some vendored code (which we could look to see if
> that's been updated upstream and pull in changes).
>
> These are unrelated changes and we can address them independently.
>
> Changing the default branch is potentially disruptive, though
> https://www.hanselman.com/blog/EasilyRenameYourGitDefaultBranchFromMasterToMain.aspx
> doesn't sound so bad: you can run 6 lines to update your local git checkout
> to recognize the new default branch. Fresh clones from GitHub will
> automatically have the default branch set correctly.
>
> At least one Apache project has gotten to the point of requesting INFRA to
> change the default branch (https://issues.apache.org/jira/browse/INFRA-20403)
> and I would expect there are others that are somewhere in the process of
> deciding. Many other projects and organizations, including git and GitHub,
> are debating this too. I'm not optimistic that we could just wait for ASF
> to make some decision and implement this for all projects--they are still
> named "Apache" after all--so I think this on us to do.
>
> Thoughts? I suspect that the default branch naming may elicit more reaction
> and require debate (and a vote?); as for the whitelist/blacklist, I'll work
> on a patch for that tomorrow unless there's strong objection, and we can
> review specific lines on the PR.
>
> Thanks,
> Neal