You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by Alex Hagerman <al...@unexpectedeof.net> on 2018/05/26 01:08:04 UTC

PyCon Arrow Sprint: Notes and suggestions for improvement

I recently went to my first PyCon. As part of that trip I had planned to participate in the sprint days with a focus on Apache Arrow. It's a library that I make use of daily and so I was hoping to use this time to contribute back while also helping new contributors on board. While the sprint days were not as productive as I had hoped they would be I think there were plenty of learning opportunities.

• Because PyArrow binds to the C++ implementation a lot of tickets that start in the Python project can turn into tickets that require changes in the C++ project. Sometimes this made it difficult to suggest code oriented tickets if somebody was only comfortable with Python. I don't think this is a problem, but might be worth calling out on the Development or Getting Involved pages.
• Related to the above point I've noticed more tickets being marked C++/Python. This was a big help as people were looking through JIRA
• As Aneesh Karve and I tried to help individuals find a ticket to work on we noticed a lot of variation in tickets labelled beginner. Because of that we wanted to propose beginner tickets meet some criteria:
o The ticket is labelled beginner
o The ticket is not obsolete due to a new commit, library change, etc
o The ticket clearly describes both the intent and potential implementation. API design or contact details would be helpful.
o If manipulating pre-existing code it would be helpful to call out the directories and files if possible. 
o All language(s) required are tagged. 
o Consider if the ticket should be two tickets. For instance if there is a C++ component and a Python component is it possible for that to be two tickets with the Python ticket relying on the C++ being completed?
• We also think it would be helpful to have a core contributor at any sprint for Apache Arrow. Due to the nature of the project; and some of the history that may not be readily accessible there is the potential for hidden complexity or implementation choices that conflict with historical design decisions. It's difficult to catch this up front without a core contributor present.

One thing I want to call out was the interest around Arrow. We probably had 15 individuals throughout the course of the sprint days that showed up interested in helping the project.

If you have any other questions related to the PyCon sprint I’m happy to provide any additional information I can.

Thanks,
Alex