You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by Wes McKinney <we...@gmail.com> on 2018/06/11 22:33:36 UTC

Re: PyCon Arrow Sprint: Notes and suggestions for improvement

hi Alex,

Thank you for this feedback. Arrow is not the easiest project to
contribute to because some work (e.g. a lot of things involving the
Python bindings) requires work in multiple programming languages, and
the project is of a much lower-level nature than many engineers have
had exposure to.

I agree that JIRA could be much more detailed to make it easier for
new contributors to get involved. These are struggles that many open
source projects face, and it gets easier when there are more core
project maintainers. In enterprise software engineering teams, this
work is frequently managed by a full-time product manager and/or a
tech lead and/or an engineering manager. We don't really have that
here, so it's difficult to maintain the level of JIRA/issue detail
that a project inside a large company might have. Tech leads in large
engineering teams sometimes do not write any code, they spend their
time on code review and JIRA (or equivalent) authoring to do the work
you are describing.

As the most active curator of our JIRA over the project's lifetime and
serial release manager, I have spent a significant fraction of my time
maintaining the Arrow JIRA issues: adding tags, components, organizing
release milestones, fixing titles and descriptions, asking the
reporter to clarify or add examples. It's very time consuming.

My sense (without crunching the data) is that the lion's share
(probably over 50%) of the project's maintenance (the work that
happens outside of writing code -- JIRA management and code reviews /
merging patches) over the last 2.5 years has been done by 2 people:
Uwe Korn and myself. I don't see how we can do significantly better
than we are right now without more maintainers coming on board --
Antoine Pitrou joined the project earlier this year and has already
made a big impact. I have just created a new organization (Ursa Labs)
with the goal of providing full-time employment for Apache Arrow
maintainers/developers.

Thanks,
Wes

On Fri, May 25, 2018 at 9:08 PM, Alex Hagerman <al...@unexpectedeof.net> wrote:
> I recently went to my first PyCon. As part of that trip I had planned to participate in the sprint days with a focus on Apache Arrow. It's a library that I make use of daily and so I was hoping to use this time to contribute back while also helping new contributors on board. While the sprint days were not as productive as I had hoped they would be I think there were plenty of learning opportunities.
>
> • Because PyArrow binds to the C++ implementation a lot of tickets that start in the Python project can turn into tickets that require changes in the C++ project. Sometimes this made it difficult to suggest code oriented tickets if somebody was only comfortable with Python. I don't think this is a problem, but might be worth calling out on the Development or Getting Involved pages.
> • Related to the above point I've noticed more tickets being marked C++/Python. This was a big help as people were looking through JIRA
> • As Aneesh Karve and I tried to help individuals find a ticket to work on we noticed a lot of variation in tickets labelled beginner. Because of that we wanted to propose beginner tickets meet some criteria:
> o The ticket is labelled beginner
> o The ticket is not obsolete due to a new commit, library change, etc
> o The ticket clearly describes both the intent and potential implementation. API design or contact details would be helpful.
> o If manipulating pre-existing code it would be helpful to call out the directories and files if possible.
> o All language(s) required are tagged.
> o Consider if the ticket should be two tickets. For instance if there is a C++ component and a Python component is it possible for that to be two tickets with the Python ticket relying on the C++ being completed?
> • We also think it would be helpful to have a core contributor at any sprint for Apache Arrow. Due to the nature of the project; and some of the history that may not be readily accessible there is the potential for hidden complexity or implementation choices that conflict with historical design decisions. It's difficult to catch this up front without a core contributor present.
>
> One thing I want to call out was the interest around Arrow. We probably had 15 individuals throughout the course of the sprint days that showed up interested in helping the project.
>
> If you have any other questions related to the PyCon sprint I’m happy to provide any additional information I can.
>
> Thanks,
> Alex